For the past 15 years, I've wanted to create a programming language. It is easy enough to do so, if one just wants to create something. But I've wanted to create something that improves in some substantial way on what already exists, and stands the test of time.

10 years ago, I created Flow. Flow is not widely known, but it's the asynchronous architecture that underpins Bitvise SSH Server and SSH Client. It's a message-passing architecture based on small, Erlang-like processes, implemented using fibers in C++. One of the main purposes of Flow was to make it easier to reason about a complex, asynchronous application, and I do believe it succeeded to an extent. Its grander motivation was to make it trivial to write complex applications that scale effortlessly to any number of parallel processors.

I believe that Flow would meet this second goal, but its ability to do so is yet untested. We do not actually need the type of scaling Flow would provide in our software. We use Flow inside traditional thread-based parallelization which works better with the architecture of our SSH Server. I have since written software such as a web server with an integrated database, and that doesn't need scaling on a component granularity either. While Flow has free scaling, it requires a shift in the programmer's perspective. Traditional thread-based approaches require no shift, and tend to work well enough.

I originally wanted to turn Flow into a language. I realize, now, that I could; but this language would have problems inherent in all languages that try to impose some way of thought on the programmer. By imposing this way of thought, the language alienates people to whom it is foreign. Secondly, the intentional restrictions of the language make it difficult to implement designs that people actually want to use. The language then has a choice: either condemn and not help with these patterns, so the language becomes obscure; or smuggle violations of its principal design through a backdoor.

And now, as prime example of backdoor smuggling of design violations, I present:


Haskell is known as a pure functional programming language which is actually so well designed that it can run nearly as fast as C++. Or so we're told.

To achieve C-like speeds in Haskell, you have to brutalize the beautiful language by treating it, well, like C.

Exhibit one: Jan Stolarek's article where he presents the following algorithm that takes a list of machine-wide integers, and computes the sum of squares of odd numbers in it:
sumSqrL :: [Int] -> Int
sumSqrL = sum . map (^2) . filter odd

What a piece of beautiful code. But according to Jan's measurements, it performs 8 times slower than C. He fixes this by reimplementing the algorithm as follows:
sumSqrPOp :: U.Vector Int -> Int
sumSqrPOp vec = runST $ do
let add a x = do
let !(I# v#) = x
odd# = v# `andI#` 1#
return $ a + I# (odd# *# v# *# v#)
foldM` add 0 vec -- replace ` with ' here

This is not only practically unreadable; it is practically imperative. It looks like it would look in C++, except that C++ is actually designed for this, and doesn't make it look so ugly.

If this is not sufficiently convincing, consider the problem of efficient string concatenation in Haskell. Of course, Haskell can append to a string:
xs `append` x = xs ++ [x]

The only problem is that, when repeated, this has quadratic performance. A string, you see, is represented as a list. So each time you append a single character, Haskell creates a new list; or if not that, at least traverses the whole existing list; and then appends the additional character.

I suggest exploring the Haskell solutions proposed in the above Stack Overflow link. Let me know if you think any of them are clear. In Haskell, string concatenation is an advanced topic.

Managed languages

The main shortcoming of managed languages; with prime examples .NET and Java; is that they require management infrastructure, which can't be written in the managed language. You can't write a garbage collector in a language that assumes there is a garbage collector, and in which you can't access memory.

The way I see it, managed languages are the wrong approach to solving the problem of reliability in software. Imagine there is a surgeon. As part of their job, the surgeon needs to cut people. The surgeon's hand is steady, but when cutting freehand with a scalpel, the surgeon sometimes makes mistakes.

To reduce the number of scalpel mistakes, we invent cutting templates. In the types of surgeries that are most common, the surgeon can now use a cutting template instead of using the scalpel freehand. This improves the safety of most surgeries, and is generally seen as a good thing.

C is like a surgeon without a cutting template. C++ is like a surgeon who has the choice of a cutting template, or a scalpel. Managed languages are like surgeons being treated like they're inept. You only get cutting templates; the scalpel is taken away. If the problem doesn't fit the templates – well – you can try to mangle it until it does.

Managed languages solve memory safety issues that were a problem in the 1990s and 00s, but are mostly gone in modern C++. What managed languages do not do is protect against incompetent programmers, who remain free to implement basic vulnerabilities to SQL injection and cross-site forgeries. Managed languages do not allow you to rely on programmers whom you couldn't trust with C++.


So what's wrong with C++?

Not really that much. This:

// Signed -> Signed

template <class OutType, class InType,
std::numeric_limits<InType >::is_signed &&
std::numeric_limits<OutType>::is_signed &&
sizeof(InType) > sizeof(OutType)
), int> = 0>
inline OutType NumCast(InType value)
// Input type is larger
EnsureThrow(value >= std::numeric_limits<OutType>::min());
EnsureThrow(value <= std::numeric_limits<OutType>::max());
return (OutType) value;

template <class OutType, class InType,
std::numeric_limits<InType >::is_signed &&
std::numeric_limits<OutType>::is_signed &&
sizeof(InType) <= sizeof(OutType)
), int> = 0>
inline OutType NumCast(InType value)
// Input type is NOT larger
return (OutType) value;

The above are two out of eight template functions that use SFINAE to implement NumCast - a mechanism we use to ensure at run-time that the value of one type of integer can be safely converted to another.

I show this to emphasize the clunkiness of C++ metaprogramming. The entire language is not so much designed, as it is undesigned. Each individual step in the language's evolution was carefully chosen. But from a bird's eye view, C++ ends up looking like a hack, built upon a hack, built upon a hack. These hacks mostly fit well together; there is actual beauty in their outcome, and satisfaction in using the language. But the above is not an example of intuitive metaprogramming.

Go? Rust?

I like Rust. Rust is smart. It has the obvious feature of rigid single-reference memory safety. I suspect this may be an "overarching idea" problem, and may require smuggling of design violations by the back door.

"Go" seems like a language you design if you want everything simple, and have high developer turnover. It's a systems language for when your developers are from JavaScript, and don't want to learn C++.

Where to go?

The kind of language I think is worthy of creating:
  • Can express any type of program. Not just a garbage collected program, or just a user-mode program; but also a kernel-mode program, a boot loader, or an embedded real-time program.
  • Does not impose limiting paradigms which then need to be broken through a backdoor. No functional Fata Morgana, no object-oriented Fata Morgana, no message-passing Fata Morgana. The language should allow these concepts to be expressed, but not foist them.
  • Has regular structure and strong built-in metaprogramming. This is a lesson that can be learned from LISP (hint hint hint). The main problem with LISP is all the parentheses.
It might not surprise that I have in mind a design for just such a language. :) In a future post, I would like to introduce Seed – a list-based, self-building front-end language, with a simple but powerful basic structure, from which a versatile platform can be built for any purpose. It will first be intended to target the LLVM intermediate representation; but there's no reason it could not also emit .NET or Java bytecode, platform-specific machine code, HTML, PDF, or text files.

Seed is also intended to not have subtly incompatible implementations, which continue to be an issue with C++. The idea is for most of the language to be defined in the language itself. Implementations are then much more likely to be compatible, as they use the same language definition.

And if you don't like the language? Change it. Most of the language will be defined in the language itself, so its definition is under your control.