Teraflops, Petaflops, and Turning Hours into Minutes, and Minutes into Seconds
Thursday, June 19th, 2008Posted By Dr. Michael McCool
It’s been a busy couple of weeks. First, we just came back from SIFMA where we demonstrated an approximately 55x speedup on an important (and non-trivial) financial option pricing algorithm, using AMD hardware. (Since we’re using RapidMind, we can also run the same code on all our other hardware targets). This coincided with an announcement of our support for AMD’s FireStream product line.
Second, the ISC conference (http://www.supercomp.de/isc08/content/) is going on right now and there have been a number of new hardware announcements. AMD announced a new FireStream card (FireStream 9250), and NVIDIA also announced a new line of GPUs (Tesla 10P). Both of these products are capable of teraflop performance, which is of course great news for people using RapidMind. At the other end of the scale, the largest Cell BE installation in the world, the LANL Roadrunner (http://www.lanl.gov/roadrunner/), broke the petaflop barrier using an actual benchmark, Linpack. Since RapidMind also targets the Cell BE, this is also good news, as it demonstrates clearly the power of this architecture and its ability to scale in large installations.
Getting back to what we did at SIFMA, we demonstrated a 55x speedup on something called a binomial option pricer. I will talk about this at greater length in an upcoming article, but will mention some interesting points here. First, the binomial pricer, unlike our previous results on Monte-Carlo pricers, is very memory-intensive. It’s similar to iterated convolution and explicit PDE solvers. As we also demonstrated great scalability on Barcelona processors and on the FireStream, this application shows how RapidMind can be used to tackle memory-bound applications as well as compute-bound applications such as Monte Carlo. Second, option pricing is considered a “fundamental primitive” in computational finance. In particular, risk evaluation requires a large number of such evaluations and is an important workload used in day-to-day practice by many financial institutions. The speedup factor we have demonstrated has the potential to reduce such calculations from hours to minutes. As in the other application areas that we target, this can potentially transform the workflow practices where these computations are used. If a computation takes hours, you basically have to run it in batch mode, possibly overnight. If it takes minutes, you can run it many times during the day, as part of an interactive workflow, and use up-to-date inputs, enabling completely new ways to do business. This kind of transformation can create incredible new opportunities for our clients. And as demonstrated by the recent hardware announcement I’ve noted above, multi-core and in particular heterogeneous core processors and their deployments are in a definite growth phase, so even better results can be expected in the near future.




