Statistical model selection for Markov models of biomolecular dynamics.

R. T. McGibbon, C. R. Schwantes, and V. S. Pande. J Phys Chem B. 118 6475-81 (2014)

Markov state models provide a powerful framework for the analysis of biomolecular conformation dynamics in terms of their metastable states and transition rates. These models provide both a quantitative and comprehensible description of the long-time scale dynamics of large molecular dynamics with a Master equation and have been successfully used to study protein folding, protein conformational change, and protein-ligand binding. However, to achieve satisfactory performance, existing methodologies often require expert intervention when defining the model’s discrete state space. While standard model selection methodologies focus on the minimization of systematic bias and disregard statistical error, we show that by consideration of the states’ conditional distribution over conformations, both sources of error can be balanced evenhandedly. Application of techniques that consider both systematic bias and statistical error on two 100 μs molecular dynamics trajectories of the Fip35 WW domain shows agreement with existing techniques based on self-consistency of the model’s relaxation time scales with more suitable results in regimes in which those time scale-based techniques encourage overfitting. By removing the need for expert tuning, these methods should reduce modeling bias and lower the barriers to entry in Markov state model construction.