Everyone now knows how the many-core processor era is upon us, and how this challenges existing methods of software development because current mainstream development methodologies are inappropriate for a new environment in which most software will need to be parallel if it is to take advantage of many cores.
The problem is not that existing languages make parallel programming impossible. It is that they don't make it simple.
This is similar to the problem of manual memory management in languages like C and C++. It takes years of discipline before you learn to write programs that will not leak memory or misuse it in some way - and even when you do, memory management still takes a considerable portion of the time you spend programming. Contrast that with platforms like Java and .NET which cut the red tape and allow you to simply specify what needs to be done, and the platform takes care of the memory management.
It is also similar to the problem of security in language like C and C++. Like memory management, C doesn't make it impossible to write secure code without buffer overruns. It just doesn't make it simple. If you're using C, you're going to shoot yourself in the foot and sooner or later have some kind of buffer overrun or loose pointer problem that's going to be a security hole. If you're using C++, it takes years of discipline before you learn to write programs in ways that avoid exposing yourself to security risks. Contrast that with platforms like Java and .NET where all code is implicitly secure and verified against any kind of buffer overrun or dangling pointer. You can still commit security mistakes in less obvious ways, but the lion share of potential vulnerability surface is eliminated.
Java and .NET make it possible for non-expert programmers - which means most - to write code that can actually be used. If these programmers were to use C or C++, their software would be riddled with memory leaks and security holes, rendering the software unstable and impossible to use. But since they write their code for .NET and Java, their programs might be inefficient and not as sleek as they might have been if they were written by experts; but the programs will work, and they might actually be used safely in practice.
The problem with parallel programming is that we have no mainstream, widely-accepted platform that makes parallel programming simple, like .NET and Java make it simple to use memory and avoid dangling pointers and buffer overruns.
The major paradigm we have for parallel programming today is the multithreaded, shared-memory locking scheme. As above, this paradigm doesn't make parallel programming impossible, it just makes it damn difficult to do well. You might have already read somewhere that, even if you are an expert, the question is not whether you will encounter race conditions and deadlocks; it is when. It takes years of discipline before you learn to write programs to avoid these problems, and even then thinking about this takes an inordinate amount of your time, and you can still easily shoot yourself in the foot.
I spent the last year or so thinking about what a good solution to the parallel programming problem might be. In the end, I discovered that the original designers of Unix got it right to begin with. But let me first tell you about the other things I considered.
I built up my skills programming on Windows, which uses the multithreaded, shared-memory, locking approach to parallel programming I criticized above. I was looking at how simple it is to program concurrent software in SQL with transactions, and I was wondering how the concept of transactions might be used in software. I found that this concept is called a Software Transactional Memory; I wrote my own C++ implementation of it and found that other implementations already exist in .NET and in Haskell. Software Transactional Memories sound good in principle. Instead of surrounding access to shared memory with manual locking, which leads to race conditions if you lock inappropriately and to deadlocks if you lock in a conflicting order, in STM, all access to shared memory takes place within a transaction. When you're in a transaction, you always see memory in a consistent state. If another thread needs to modify the memory in the meanwhile, the transaction mechanism makes sure that you either can still see the previous state, or it aborts your transaction so that you can retry it, this time hopefully getting a consistent state.
The problem with STM is that, no matter how you approach it, you cannot avoid the possibility that any transaction might need to be aborted and retried - either to avoid deadlock, or to avoid working with inconsistent state. Aborting and retrying, though, is a much easier proposition in functional languages like Haskell than in imperative languages like C++, C# and Java. The essential problem is that the language does not provide any means to separate code that can produce side effects (various kinds of I/O) from code that just manipulates memory. Code that just manipulates memory can be easily retried by resetting its state and letting it run again. But code that does I/O cannot be retried because it can result in incorrect behavior.
So I searched on, and I stumbled upon the concept of Flow-Based Programming. The idea of flow-based programming is that, instead of structuring the program into functions, methods and classes, you structure it into components. Components differ from functions and methods in that passing data to a component does not imply surrendering execution control to it. Instead, your component is free to continue its execution, and if it needs a result from another component, it will get it when the result is ready. In the mean time, the component can do other things.
Programs designed in a component-like manner are vastly simpler to parallelize than programs designed with classes and functions. I wrote a component execution controller in C++ that, altogether, consists of some 350 lines of code. Those 350 lines sum up the execution control for the whole application, regardless how many components it's made of. The components themselves are much simpler to write than functions and classes with multithreading support, because each component behaves as though it has the processor to itself. There are no conflicts with shared memory access; a component receives input and processes it locally without regard to any other components that might also be running. And the whole component design makes it possible to structure programs in an easy to understand way. There is a principle to how components are connected together, and this principle pervades the application. A program doesn't become unwieldy and grow out of control as it gets bigger. It just gets bigger, remaining manageable in the same way throughout.
The essential problem that makes parallel programming difficult is the pervasive use of the function call. The function call is a fundamentally single-threaded idea. It makes the execution of your code implicitly wait for execution of some other code, regardless of whether the two pieces of code actually need to wait for each other at all. Sometimes they do, such as when you need to receive a result from the function before you can continue. But much of the time, you should also be doing something else while waiting for the result, such as checking for user input and checking whether your program has been told to shut down. And in such cases, architecting your program based on function calls that won't return immediately is ruinous. It prevents you from handling events immediately that there's otherwise no reason you could not, making your program block unnecessarily; and it wastes processor time because you cannot do things concurrently that there's otherwise no reason you could not. A component-based design implicitly solves these problems like Java and .NET solve the memory management problem for you, resulting in cleaner code that's easier to write and easier to understand, which implies greater programmer effectiveness and higher security and reliability.
And at the end of it all, this is only a reinvention of the original Unix architecture. Originally, Unix contained no OS support for threads; it was considered unnecessary. In Unix, a process is much like a component. It is easy to create and destroy. It is single-threaded, making it easy to design and write, and it can communicate with other processes (components) through pipes (like stdin, stdout, stderr). Complex behavior arises out of the bundling together of multiple processes and connecting their inputs and outputs. So when you do something like "netstat -an | grep TCP.*22", you're effectively engaging in component programming - chaining together of components to achieve the desired result. The component design in Unix is kind of crude, what with the inflexible number of pipes and the need to manually serialize data as it's being passed from process to process, but it was a step in the right direction.
Microsoft's Singularity OS research project, incidentally, uses a single-threaded, lightweight-process, component-based model with no shared memory access as well. And previous research into reliable programming architectures like the Extremely Reliable Operating System (EROS), as well as Ericsson's Erlang, are based on similar principles.
Now, it's just a matter of putting this knowledge into a framework that everyone is able to use. I'm doing it with my component infrastructure for C++ and C# that I call Flow. It isn't yet ready for worldwide consumption, but it will be soon. Check back on this site in a while if you're interested in a sneak preview. :)
Showing 4 out of 4 comments, oldest first:
Comment on Oct 2, 2006 at 13:57 by boris kolar
I was quite surprised to learn that C++ metaprogramming is so powerful, you can actually do any computation you want at compile time. Actually, C++ is not the only language that can do this, D also lets you do amazing things. However, in C++ and D, metaprogramming looks really ugly and writing even most trivial metaprograms is a real pain. Yet I can see how metaprogramming will enable programmers to write powerful libraries, that will allow anyone to write cleaner code.
One interesting solution for parallelism is simply isolation (the "I" part of ACID transactions). You could define for a section of code, for example, that semantics should follow isolation principle: once you acquire a reference to an object, everything you do with that object has the same effect as if you had the only reference to that object. To resolve how changes in different isolated processes should combine, programmer could define priority functions, which is like a "compare" function, taking 2 changes and deciding which one should have priority (or, alternatively, taking 2 changes and returning a "combined" change). This would solve most of divide and conquer type of problems (which probably constitute a majority of problems that will benefit from concurrency). For example, in a 3D engine, you could ask every object to draw itself on a framebuffer, and supply "compare" function that would priorize objects that are more distant from camera. If you failed to provide "compare" function, prioritization could be deduced from order in source code (for example: default iteration order of "foreach" statement).
Another example, where metaprogramming would be usefull, is to force certain section of code to have "copy on write" semantics. So "BigInt x = 5; BigInt y = 6; y++; printf(x)" would print "5". Of course, you can do this now in C++, by overloading assignment or writing BigInt as a struct, but a metaprogramming facility that would do all that overloading and stuff would make such intentions much more readable and less prone to errors. Note that strict copy-on-write semantics everywhere would enable automatic parallelization, because there is no real "writing" anywhere.
The third example is automatic caching. You could, for example, write programs like "char[really_big] a; foreach (int i, char c; a) c = compute(i)" and the compiler would not actually allocate "char[really_big] a", but a cache of "compute(i)" values with semantics "compute(i) = i in cache ? return cached(i) : dynamic compute(i)".
To me, isolation and copy-on-write semantics and caching seem natural, so they should be set as default or at least easily enabled with metaprogramming directive. That way, algorithms could look cleaner and more readable. You can also add automatic logging (even without changing any code), make code reversible (automatically store "undo" information), persistence, allow serialization of threads,... Possibilities are endless.
Comment on Oct 2, 2006 at 14:05 by boris kolar
Comment on Nov 20, 2006 at 16:43 by Anonymous
Comment on Nov 21, 2006 at 20:38 by denisbider