Nitrogen Narcosis - Part II: The Serious Drawbacks of Explicit Multi-Threading
In my last posting, I mentioned that explicit multi-threading has serious drawbacks:
· Multi-threaded applications are more difficult to test than single-threaded applications and are hard to debug.
The main, underlying problem with multi-threading is non-determinism. Multiple threads running simultaneously don’t run in lockstep unless you explicitly synchronize them. However, you want to minimize synchronization because it has a negative impact on performance.
Because these threads don’t run in lockstep, and because they can access data structures and devices simultaneously (for example, two threads writing to the same memory location), the result is a very difficult class of bugs to find, reproduce, and solve. Even inserting synchronization constructs doesn’t always make things easier — mistakes in explicit synchronization are what lead to deadlocks, where two threads are waiting for each other and thus never continue.
The exact timing of all of this is determined by many factors, thus making it impossible to know that when you ship your well-tested product, it’s not going to break instantly because timing has changed ever so slightly.
RapidMind solves this because threading, synchronization and more are handled by the platform such that your application is deadlock-free, race condition free, and deterministic!
Another concern:
· Explicit multi-threading doesn’t scale well as the number of cores increases.
Multi-threading’s explicit “threads” suggest that you use task parallelism as a model. But task parallelism doesn’t scale well because if you have only K threads where K is some constant, you’re never going to get a speedup over more than K cores.
The threading model is built around the concept of task, where every task has a separate sequence of control. RapidMind does not use task parallelism but uses data parallelism. Data parallelism is based on the fact that applications often operate on collections of data, and units of work are often associated with separate elements of such collections. The RapidMind platform uses an SPMD (Single Program, Multiple Data) stream programming model. Using an SPMD data-parallelism model, allows you to work with familiar concepts like functions and arrays but also directly express parallel algorithms.
And a final issue:
· Explicit multi-threading can’t leverage the use of accelerators.
Accelerators are parallel machines, and multi-threading is a way to express parallelism in your code. However, the OS threading APIs only target the main CPU cores on which the OS and applications themselves are running. Taking advantage of accelerators requires using further, accelerator-specific, APIs. Furthermore, languages such as C, C++, Java, and almost all other programming languages in common use, assume a shared-memory programming model, where all memory is accessible equally by all computational devices. This isn’t the case for GPUs, for example, where the GPU has a separate memory that it accesses, and data must be explicitly transferred between main memory and GPU memory.
RapidMind automatically manages synchronization between the host and the accelerator, so that you don’t have to.
