A minimum variance clustering approach produces robust and interpretable coarse-grained models.
J Chem Theory Comput. 2017 Dec 18;:
Authors: Husic BE, McKiernan KA, Wayment-Steele HK, Sultan MM, Pande VS
Markov state models (MSMs) are a powerful framework for the analysis of molecular dynamics datasets, such as protein folding simulations, due to their straightforward construction and statistical rigor. Coarse-graining MSMs into an interpretable number of macrostates is a crucial step for connecting theoretical results with experimental observables. Here, we present the minimum variance clustering approach (MVCA) for coarse-graining a MSM into a macrostate model. The method utilizes agglomerative clustering with Ward’s minimum variance objective function, and the similarity of the microstate dynamics are determined using the Jensen-Shannon divergence between the corresponding rows in the MSM transition probability matrix. We first show that MVCA produces intuitive results for a simple tripeptide system, and is robust to long-duration statistical artifacts. MVCA is then applied to two protein folding simulations of the same protein in different force fields to demonstrate that a different number of macrostates is appropriate for each model, revealing a misfolded state present in only one of the simulations. Finally, we show that the same method can be used to analyze a dataset containing many MSMs from simulations in different force fields by aggregating them into groups and quantifying their dynamical similarity in the context of force field parameter choices. The minimum variance clustering approach with the Jensen-Shannon divergence provides a powerful tool to group dynamics by similarity, both among model states and among dynamical models themselves.
PMID: 29253336 [PubMed – as supplied by publisher]