How FAH works: Folding on streaming processors (GPU's and PS3)

One of the critical issues in computer science right now is the limits to how fast a single CPU can calculate.  While Moore’s law is still going strong in a literal sense — i.e. the number of transistors which one can put on a chip is doubling every 2 years or so — this doubling of transistors is not leading to a doubling in CPU speed as it used to over the last few decades.  Well, at least not for typical programs (eg Microsoft Word).  In order to get big speed increases, there’sa major change in the programming paradigm.   One key change is the existence of "streaming processors."  GPU’s and the Cell processor in the PS3 are both examples of streaming processors.

What makes streaming processors potentially much faster than regular CPU’s is how they handle computation vs memory access.  Normal CPU’s use lots and lots and lots of transistors on cache (local memory on the CPU chip to help keep the CPU fed with data and instructions).  Streaming processors use the additional transistors on additional computing elements (eg floating point units).  By doing so, they can do lots of FLoating point OPerations per second (FLOPS) in an optimal situation, although getting one’s code to behave optimally is not easy.  Typically this means balancing FLOPS with memory access to make sure that there’s data available for calculation.  This has been the primary challenge in our GPU and PS3 codes, and is something which we have, for the most part, figured out for a significant subset of the calculations we run on FAH.

These advances have lead to our GPU and PS3 clients.  The family history of all of this starts with the GPU core.  This GPU code was then brought over to the PS3 and enhanced.  We are working to bring back some of those scientific enhancements back to the GPU code.  This is all pretty bleeding edge, but so was distributed computing in 2000 when we started.  Our expectation is that given how modern processors are developing, in 8-10 years streaming processors will be much more standard and will be a major way in which FAH works. 

Finally, it’s also interesting to think of how CPU’s may themselves turn into streaming processors.  As CPU’s add more cores, they start to have more functionality similar to streaming processors in a limited way.  Perhaps more interesting are some of the new chips rumored to be developed at Intel and AMD/ATI.  Intel’s 80-core chip is very interesting and something which our code would likely run well on.  Also, the fusion of AMD’s CPU’s with ATI’s GPU’s could be very exciting, potentially bringing the best of both worlds.  We’re looking forward to these and lots of other emerging technology.  FAH is running very fast now (over a petaflop, i.e. 1,000,000,000,000,000 floating point operations per second !) and we look forward to continuing to push the frontiers.