Transactional Memory: The Future?

There’s a really nice introductory article on transactional memory in the ACM Queue. It gives a great overview of the advantages and challenges posed by this technique, and I highly recommend it. As multiple cores become more common, and the number of cores continues to increase, the regular inability of the average programmer to comprehend and code for high-concurrency will begin to put a major damper on productivity. If you thought software was buggy now, just wait! The only solution is to make higher-level concurrency concepts available to the mainstream.

Unfortunately, the mainstream toolsets and libraries are years behind the hardware at this point. Intel will be releasing a quad core processor sometime next year, yet the latest and shiniest versions of neither Java nor .NET have any real story to tell here beyond the existing coarse-grained manual locking that has been empirically shown to be so challenging.

Since the software and hardware supporting transactional memory are still so far down the pipe, I expect we might see a major shift towards functional programming with immutable data structures. If you squint, immutable data structures are essentially worst-case, lazy-versioned transactions: Reads are always guaranteed to be atomic because the data can never change, and every write causes a new data state to be created. Using that new data can only occur at explicitly coded junction points, where an executing thread must take the result from a parallel function and use it rather than its own data state. In the event of an error, any new objects can simply be tossed to the garbage collector, essentially using the call stack for automatic rollback of failed transactions.

Wes Moise has been stumbling on this stuff a lot over the last few years. He has a pretty good post discussing the advantages of functional programming in a modern project.

Finally, it will be interesting to watch the impending blowback from the performance nutjobs when this bombshell is dropped on them:

STM implementations can incur a 40-50 percent overhead compared with lock-based code on a single thread. Moreover, STM implementations incur additional overhead if they have to guarantee isolation between transactional and nontransactional code. Reducing this overhead is an active area of research. Like other forms of TM, STMs don’t have a satisfactory way of handling irrevocable actions such as I/O and system calls, nor can they execute arbitrary precompiled binaries within a transaction.

Remember when the garbage collector was introduced into mainstream programming, mostly thanks to Java? The naysayers cried out and tore at their hair, claiming that it would never perform because it was too slow and used too much memory. They claimed it wasn’t necessary because it just took a little effort to track your memory allocations correctly. They predicted the end of “real programmers”, and waxed nostalgic for the good old days, when men were men and twiddled bits in assembly.

(Microsoft was firmly in that camp at the time, and as I recall it was among their favorite FUDs to spread when trying to keep people from migrating away from Visual Basic and COM. It is a testament to their marketing savvy that those travesties survived for as long as they did in the face of Java - a clearly superior competitor. It took five years before Microsoft finally released .NET as a worthy competing platform!)

And then they’ll lose, because multi-threaded programming is just too hard, in the same way that that the increasing complexity of modern systems made it just too hard to manually track memory usage. Our time as human programmers is just too valuable to spend on such inane details. Sure, there might be a 50% penalty today, but it will get better. In the meantime, paying that penalty is the only realistic way to take advantage of that extra CPU in your dual core chip. More importantly, it is also the only way to be prepared for the 4-, 8-, 16-core chips coming in the future.

A 50% penalty is a wash with two CPUs. With a quad core chip, it turns into a 200% gain.