This post may get a little technical, but I wanted to start a new set of posts describing the inner workings of Folding@home. To say FAH is complex is in many ways an understatement. On the surface, FAH does seem a lot like other distributed computing projects: there’s lots of WU’s which go out to client machines and they get calculated and then come back. However, there are a lot of differences going on in FAH.
One of the principal challenges in FAH is that we’re trying to use lots of processors to in a sense speed up a calculation which many would have thought was intrinsically serial, i.e. that could only be done by a single very, very, very fast processor. The reason why is that we are studying how proteins change in time, and it’s hard to parallelize the 23rd step if you haven’t completed the 22nd step, etc.
However, through the years we have been developing ways to solve this issue. In the last few years, we have made significant progress in a method called "Markov State Models" or MSMs for short. MSMs in a sense allow us to parallelize these seemingly intrinsically serial tasks. The way this works is that we build a kinetic model for the process, dividing up the possible dynamics into a series of states (related protein conformations) and rates between these states. The rates are what’s calculated in FAH WU’s. Once we have all this data, we need to run some fairly sophisticated Bayesian Machine Learning methods to identify what are reasonable states and then to calculate the rates between them.
We have had a several recent advances in MSM methodology and those papers are on our papers website. We have also had several MSM applications including studying protein folding, lipid vesicle fusion, and abeta aggregation (Alzhiemer’s Disease simulations). While we will continue to improve our MSM methodology, we are very much excited about the potential applications and have great thrusts going in both areas.