Why is this approach particularly useful?

This approach can be powerful because not only is it very amendable to distributed computing, but the available computational resources can be used more efficiently. A protein spends most of its folding time “stuck” in an energetically-favorable position, with transitions – the processes largely of interest – taking place only rarely. Likewise, any straightforward simulation of protein folding will also waste valuable time generating data with little information content. However, using the adaptive sampling concept, the model can identify when the simulation is stuck, and then reinitialize new simulations starting from potentially more fruitful areas, avoiding the wasteful process of re-exploring areas that are already well understood.

In a recent paper, we compared MSMs to more traditional simulation methods. We compared some very long folding trajectories from the Anton supercomputer to an MSM built from the same folding data. Although our MSM “chops up” the simulation into a bunch of short trajectories, it was able to reproduce their simulations very well. Moreover, we also found that the MSM approach revealed new insights into the folding process (a new folding pathway) that was missing in ANTON’s more traditional approach.