30,000 Bees and the Multi-Core Adrenaline Rush

I recently installed 30,000 bees in two hives. About 3 pounds of bees per hive. I wasn’t stung once.

 

A week later I went to back to check the hives–one colony was a little smaller, not quite 3 pounds, so I wanted to make sure the queen was alive and well and reigning over her brood. I took apart the hive boxes, found the queen in one frame and again, I wasn’t stung. Such friendly bees.

But it’s intimidating when you first open the box and see a bee-covered frame.

You take a deep breath and gather your courage because bees can smell the adrenaline coursing through your veins. I was reading about an informal study recently. Some 200 embedded developers noted why they are slow to adopt embedded multi-core technology. (Survey measures readiness to adopt multicore technology.) To them, it must feel like 60 pounds of bees have swarmed their desks.

The problem looks overwhelming but the reigning messages are wrong. We have the thread locking bee, the single-processor-bias bee– that one took a lot of royal jelly to rear into a queen, and I’ve just spotted the familiar lack-of-determinism bee. This is a messy hive.

A New Hive and a New Reign

RapidMind Fixes the Single Processor Bias Problem

The RapidMind parallel programming model is portable to a wide range of parallel hardware architectures, including vector and stream machines, such as GPUs, as well as distributed memory machines, such as the Cell BE. The system provides a strong execution and data abstraction that is simultaneously modular, portable, and efficient.

The RapidMind platform provides a set of backends. Each manages the execution of RapidMind programs on a particular processor. The RapidMind platform manages communication and data flow between the host processor and target device(s). It handles memory transfers and load balancing, leaving you free to focus on high-level programming. The dynamic runtime-compiler and processor support modules compile RapidMind programs optimally for the specific processor in use.

  • The GLSL backend executes RapidMind programs on Graphics Processing Units (GPUs).
  • The Cell BE backend executes RapidMind programs on the Cell BE Broadband Engine.
  • The x86 backend executes RapidMind programs on AMD and Intel processors.
  • The Debug backend executes RapidMind programs on the host processor, compiling programs with a C compiler.
RapidMind Has Advanced Debug Support

The Debug backend executes RapidMind Programs on the host processor, compiling the RapidMind Programs with a C++ compiler. Debug information is generated for the compiled programs, allowing them to be debugged in a debugger using techniques such as setting breakpoints, inspecting values and stepping through code. This allows the RapidMind Programs to be debugged, line by line, within a debugger or IDE.

The RapidMind Inspector allows you to view how data in a RapidMind-enabled application is modified as the application is executed. (It is an optional package available in RapidMind Multi-Core Platform Tools.) The RapidMind Inspector provides graphical views that present not just the Program values at a given iteration of the Program but also of the entire data bound to those values. This allows you to inspect the contents of an array bound to an input value or view the contents of an array bound to an output value. Moreover, the RapidMind Inspector allows you to control the execution of the Program so that you can watch how the data is modified from one iteration to the next.

RapidMind Programs Are Deadlock-Free and Deterministic

The main, underlying problem with multi-threading is non-determinism. Multiple threads running simultaneously do not run in lockstep unless you explicitly synchronize them. However, you want to minimize synchronization because it has a negative impact on performance.

Because these threads do not run in lockstep, and because they can access data structures and devices simultaneously (for example, two threads writing to the same memory location), the result is a very difficult class of bugs to find, reproduce, and solve. Even inserting synchronization constructs does not always make things easier — mistakes in explicit synchronization are what lead to deadlocks, where two threads are waiting for each other and thus never continue.

The exact timing of all of this is determined by many factors, thus making it impossible to know that when you ship your well-tested product, it’s not going to break instantly because timing has changed ever so slightly. RapidMind solves this because threading, synchronization and more are handled by the platform such that your application is deadlock-free, race condition free, and deterministic. Programs written with our platform cannot suffer from deadlock, read-write hazards, or synchronization errors. The platform uses a bulk synchronization model that supports a conceptual single thread of control, making debugging straightforward. The structure of the language makes parallelism explicit, however, encouraging the development and use of efficient and scalable parallel algorithms.

RapidMind Allows for Performance Tuning

You can trace important performance events by using the RapidMind platform performance log. This log contains messages generated whenever the platform notices something relevant to performance, such as the inability to perform a certain optimization, or a feature that is being used inefficiently. Most messages are generated during runtime compilation of the program. However, some important messages (that is, transfers to host memory) are generated when preparing to execute a compiled program. The performance log features several different levels of verbosity. See also “How we get improved performance [using RapidMind] on a single core

So here are the makings of a fully-functioning hive. The bees know what they’re supposed to do because the right framework has been set. Your work is done here.

Leave a Reply