Here are our peer-reviewed results from Folding@home. For summaries of these methods and papers, as well as the scientific background behind Folding@home, please see the Folding@home article on Wikipedia.
Note that it can take quite a while to go from a result to a published peer review article (often as much as a year). These papers represent our progress to date that’s publicly available, with lots more on the way.
The distribution rules for published papers vary by the publication in which the paper appears. Due to these rules, a public web-source of each paper may not be immediately available. If full version is not linked below or available elsewhere on the Internet (Google Scholar can be helpful for this), most, if not all of these publications are freely available at a local municipal or collegial library. Note these articles are written for fellow scientists, so the contents are fairly technical.
SARS-CoV-2 Nsp16 activation mechanism and a cryptic pocket with pan-coronavirus antiviral potential
As we all know, coronaviruses are a major threat to human health. In the past two decades these viruses have caused three epidemics: SARS in 2004, MERS in 2012, and now, COVID-19 has caused a global pandemic. The painfully obvious questions to address are 1) how do we prevent coronavirus-caused diseases? and 2) how do we treat people who get sick with these diseases? Fortunately, the newly approved vaccines are a great step on question 1. However, the second question is an open question.
Like all viruses, coronaviruses employ molecular machines, called proteins, to carry out a cycle of infecting host (human) cells and replicating. If you can disable the function of these proteins, the viruses can’t continue this cycle! Therefore, figuring out how these proteins operate and devising ways to “turn off” these proteins is a core focus when developing a drug meant to treat someone who is infected.
In our most recent work, we use computer simulations to understand the moving parts of an essential SARS-CoV-2 protein. While we can usually figure out the general shape (i.e. 3D structure) of a protein with various experiments (including shooting x-rays at the protein), even the most powerful microscopes in the world cannot show us how these proteins move! Computer simulations show us how a protein actually moves, which can both inform us how they function and show us new ways to disable them.
The coronavirus protein we study in this work is called Nsp16. It helps the virus evade our immune response, so if we could disable Nsp16, then our immune system would more easily recognize the virus and knock it out. People have previously tried to disable Nsp16 by wedging a drug into its “active site” (i.e. the region where it performs a reaction to disguise viral RNA from the human immune system). In the case of Nsp16, the active site consists of two pockets, one that binds RNA and one that binds another molecule called SAM. However, there is a similar human protein (CMTr1) that has a similar looking active site, so targeting the coronavirus protein without disabling the human protein is challenging.
One important difference distinguishing human relatives of Nsp16 from the coronavirus Nsp16 protein is that the coronavirus protein needs to be attached to another viral protein, Nsp10, to carry out its function.
In this work, we first show how Nsp10 binding to Nsp16 activates Nsp16, and then, by comparing motions of Nsp16 and its human homolog (CMTr1) we identify a way to disable Nsp16 without disabling the human protein.
First, we showed how Nsp10 binding to Nsp16 activates Nsp16 to be able to function. Specifically, when Nsp10 attaches to Nsp16, Nsp16’s 3D structure changes so that its active site is more open (shown in the movie below). With a more open active site Nsp16 is able to bind to RNA and SAM, which are both required to enable the RNA-disguising reaction!
Next, we find a way to target coronavirus Nsp16 without targeting an important human relative, the protein CMTr1. Specifically, we find that some of Nsp16’s motions lead to the formation of a ‘cryptic’ pocket in a region of the protein near the active site, but human CMTr1 does not form this pocket! Importantly, when this pocket is open, the active site is closed, meaning Nsp16 likely cannot bind to the necessary molecules that it needs to disguise its genome! Therefore, if someone can develop a small molecule to stick into and wedge open this pocket, it would likely disable the protein. And, we find that Nsp16 proteins of other coronaviruses also have this pocket, so targeting this pocket could knockout other coronaviruses (like SARS, MERS, etc.) Fortunately, this would likely not affect human CMTr1 since it does not form this pocket.
Citizen Scientists Create an Exascale Computer to Combat COVID-19
The SARS-CoV-2/COVID-19 pandemic continues to threaten global health and socioeconomic stability. Experiments have revealed snapshots of many of the viral components but remain blind to moving parts of these molecular machines. To capture these essential processes, over a million citizen scientists have banded together through the Folding@home distributed computing project to create the world’s first Exascale computer and simulate protein dynamics. An unprecedented 0.1 seconds of simulation of the viral proteome reveal how the spike complex uses conformational masking to evade an immune response, conformational changes implicated in the function of other viral proteins, and ‘cryptic’ pockets that are absent in experimental snapshots. These structures and mechanistic insights present new targets for the design of therapeutics. This living document will be updated as we perform further analysis and make the data publicly accessible.
Conformational distributions of isolated myosin motor domains encode their mechanochemical properties.
Elife. 2020 May 29;9:
Authors: Porter JR, Meller A, Zimmerman MI, Greenberg MJ, Bowman GR
Myosin motor domains perform an extraordinary diversity of biological functions despite sharing a common mechanochemical cycle. Motors are adapted to their function, in part, by tuning the thermodynamics and kinetics of steps in this cycle. However, it remains unclear how sequence encodes these differences, since biochemically distinct motors often have nearly indistinguishable crystal structures. We hypothesized that sequences produce distinct biochemical phenotypes by modulating the relative probabilities of an ensemble of conformations primed for different functional roles. To test this hypothesis, we modeled the distribution of conformations for 12 myosin motor domains by building Markov state models (MSMs) from an unprecedented two milliseconds of all-atom, explicit-solvent molecular dynamics simulations. Comparing motors reveals shifts in the balance between nucleotide-favorable and nucleotide-unfavorable P-loop conformations that predict experimentally measured duty ratios and ADP release rates better than sequence or individual structures. This result demonstrates the power of an ensemble perspective for interrogating sequence-function relationships.
PMID: 32479265 [PubMed – as supplied by publisher]
Investigating Cryptic Binding Sites by Molecular Dynamics Simulations.
Acc Chem Res. 2020 Mar 05;:
Authors: Kuzmanic A, Bowman GR, Juarez-Jimenez J, Michel J, Gervasio FL
ConspectusThis Account highlights recent advances and discusses major challenges in investigations of cryptic (hidden) binding sites by molecular simulations. Cryptic binding sites are not visible in protein targets crystallized without a ligand and only become visible crystallographically upon binding events. These sites have been shown to be druggable and might provide a rare opportunity to target difficult proteins. However, due to their hidden nature, they are difficult to find through experimental screening. Computational methods based on atomistic molecular simulations remain one of the best approaches to identify and characterize cryptic binding sites. However, not all methods are equally efficient. Some are more apt at quickly probing protein dynamics but do not provide thermodynamic or druggability information, while others that are able to provide such data are demanding in terms of time and resources. Here, we review the recent contributions of mixed-solvent simulations, metadynamics, Markov state models, and other enhanced sampling methods to the field of cryptic site identification and characterization. We discuss how these methods were able to provide precious information on the nature of the site opening mechanisms, to predict previously unknown sites which were used to design new ligands, and to compute the free energy landscapes and kinetics associated with the opening of the sites and the binding of the ligands. We highlight the potential and the importance of such predictions in drug discovery, especially for difficult (“undruggable”) targets. We also discuss the major challenges in the field and their possible solutions.
PMID: 32134250 [PubMed – as supplied by publisher]
Microcanonical coarse-graining of the kinetic Ising model.
J Chem Phys. 2020 Feb 28;152(8):084104
Authors: Sigg D, Voelz VA, Carnevale V
We propose a scheme for coarse-graining the dynamics of the 2-D kinetic Ising model onto the microcanonical ensemble. At subcritical temperatures, 2-D and higher-dimensional Ising lattices possess two basins of attraction separated by a free energy barrier. Projecting onto the microcanonical ensemble has the advantage that the dependence of the crossing rate constant on environmental conditions can be obtained from a single Monte Carlo trajectory. Using various numerical methods, we computed the forward rate constants of coarse-grained representations of the Ising model and compared them with the true value obtained from brute force simulation. While coarse-graining preserves detailed balance, the computed rate constants for barrier heights between 5 kT and 9 kT were consistently 50% larger than the true value. Markovianity testing revealed loss of dynamical memory, which we propose accounts for coarse-graining error. Committor analysis did not support the alternative hypothesis that microcanonical projection is incompatible with an optimal reaction coordinate. The correct crossing rate constant was obtained by spectrally decomposing the diffusion coefficient near the free energy barrier and selecting the slowest (reactive) component. The spectral method also yielded the correct rate constant in the 3-D Ising lattice, where coarse-graining error was 6% and memory effects were diminished. We conclude that microcanonical coarse-graining supplemented by spectral analysis of short-term barrier fluctuations provides a comprehensive kinetic description of barrier crossing in a non-inertial continuous-time jump process.
Assessing the accuracy of octanol-water partition coefficient predictions in the SAMPL6 Part II log P Challenge.
J Comput Aided Mol Des. 2020 Feb 27;:
Authors: Işık M, Bergazin TD, Fox T, Rizzi A, Chodera JD, Mobley DL
The SAMPL Challenges aim to focus the biomolecular and physical modeling community on issues that limit the accuracy of predictive modeling of protein-ligand binding for rational drug design. In the SAMPL5 log D Challenge, designed to benchmark the accuracy of methods for predicting drug-like small molecule transfer free energies from aqueous to nonpolar phases, participants found it difficult to make accurate predictions due to the complexity of protonation state issues. In the SAMPL6 log P Challenge, we asked participants to make blind predictions of the octanol-water partition coefficients of neutral species of 11 compounds and assessed how well these methods performed absent the complication of protonation state effects. This challenge builds on the SAMPL6 p[Formula: see text] Challenge, which asked participants to predict p[Formula: see text] values of a superset of the compounds considered in this log P challenge. Blind prediction sets of 91 prediction methods were collected from 27 research groups, spanning a variety of quantum mechanics (QM) or molecular mechanics (MM)-based physical methods, knowledge-based empirical methods, and mixed approaches. There was a 50% increase in the number of participating groups and a 20% increase in the number of submissions compared to the SAMPL5 log D Challenge. Overall, the accuracy of octanol-water log P predictions in SAMPL6 Challenge was higher than cyclohexane-water log D predictions in SAMPL5, likely because modeling only the neutral species was necessary for log P and several categories of method benefited from the vast amounts of experimental octanol-water log P data. There were many highly accurate methods: 10 diverse methods achieved RMSE less than 0.5 log P units. These included QM-based methods, empirical methods, and mixed methods with physical modeling supported with empirical corrections. A comparison of physical modeling methods showed that QM-based methods outperformed MM-based methods. The average RMSE of the most accurate five MM-based, QM-based, empirical, and mixed approach methods based on RMSE were 0.92 ± 0.13, 0.48 ± 0.06, 0.47 ± 0.05, and 0.50 ± 0.06, respectively.
PMID: 32107702 [PubMed – as supplied by publisher]
The pKa is the standard measure used to describe the aqueous proton affinity of a compound, indicating the proton concentration (pH) at which two protonation states (e.g. A- and AH) have equal free energy. However, compounds can have additional protonation states (e.g. AH2+), and may assume multiple tautomeric forms, with the protons in different positions (microstates). Macroscopic pKas give the pH where the molecule changes its total number of protons, while microscopic pKas identify the tautomeric states involved. As tautomers have the same number of protons, the free energy difference between them and their relative probability is pH independent so there is no pKa connecting them. The question arises: What is the best way to describe protonation equilibria of a complex molecule in any pH range? Knowing the number of protons and the relative free energy of all microstates at a single pH, ∆G°, provides all the information needed to determine the free energy, and thus the probability of each microstate at each pH. Microstate probabilities as a function of pH generate titration curves that highlight the low energy, observable microstates, which can then be compared with experiment. A network description connecting microstates as nodes makes it straightforward to test thermodynamic consistency of microstate free energies. The utility of this analysis is illustrated by a description of one molecule from the SAMPL6 Blind pKa Prediction Challenge. Analysis of microstate ∆G°s also makes a more compact way to archive and compare the pH dependent behavior of compounds with multiple protonatable sites.
PMID: 32052350 [PubMed – as supplied by publisher]
The SAMPL6 SAMPLing challenge: assessing the reliability and efficiency of binding free energy calculations.
J Comput Aided Mol Des. 2020 Jan 27;:
Authors: Rizzi A, Jensen T, Slochower DR, Aldeghi M, Gapsys V, Ntekoumes D, Bosisio S, Papadourakis M, Henriksen NM, de Groot BL, Cournia Z, Dickson A, Michel J, Gilson MK, Shirts MR, Mobley DL, Chodera JD
Approaches for computing small molecule binding free energies based on molecular simulations are now regularly being employed by academic and industry practitioners to study receptor-ligand systems and prioritize the synthesis of small molecules for ligand design. Given the variety of methods and implementations available, it is natural to ask how the convergence rates and final predictions of these methods compare. In this study, we describe the concept and results for the SAMPL6 SAMPLing challenge, the first challenge from the SAMPL series focusing on the assessment of convergence properties and reproducibility of binding free energy methodologies. We provided parameter files, partial charges, and multiple initial geometries for two octa-acid (OA) and one cucurbituril (CB8) host-guest systems. Participants submitted binding free energy predictions as a function of the number of force and energy evaluations for seven different alchemical and physical-pathway (i.e., potential of mean force and weighted ensemble of trajectories) methodologies implemented with the GROMACS, AMBER, NAMD, or OpenMM simulation engines. To rank the methods, we developed an efficiency statistic based on bias and variance of the free energy estimates. For the two small OA binders, the free energy estimates computed with alchemical and potential of mean force approaches show relatively similar variance and bias as a function of the number of energy/force evaluations, with the attach-pull-release (APR), GROMACS expanded ensemble, and NAMD double decoupling submissions obtaining the greatest efficiency. The differences between the methods increase when analyzing the CB8-quinine system, where both the guest size and correlation times for system dynamics are greater. For this system, nonequilibrium switching (GROMACS/NS-DS/SB) obtained the overall highest efficiency. Surprisingly, the results suggest that specifying force field parameters and partial charges is insufficient to generally ensure reproducibility, and we observe differences between seemingly converged predictions ranging approximately from 0.3 to 1.0 kcal/mol, even with almost identical simulations parameters and system setup (e.g., Lennard-Jones cutoff, ionic composition). Further work will be required to completely identify the exact source of these discrepancies. Among the conclusions emerging from the data, we found that Hamiltonian replica exchange-while displaying very small variance-can be affected by a slowly-decaying bias that depends on the initial population of the replicas, that bidirectional estimators are significantly more efficient than unidirectional estimators for nonequilibrium free energy calculations for systems considered, and that the Berendsen barostat introduces non-negligible artifacts in expanded ensemble simulations.
PMID: 31984465 [PubMed – as supplied by publisher]
Spatial and temporal alterations in protein structure by EGF regulate cryptic cysteine oxidation.
Sci Signal. 2020 Jan 21;13(615):
Authors: Behring JB, van der Post S, Mooradian AD, Egan MJ, Zimmerman MI, Clements JL, Bowman GR, Held JM
Stimulation of plasma membrane receptor tyrosine kinases (RTKs), such as the epidermal growth factor receptor (EGFR), locally increases the abundance of reactive oxygen species (ROS). These ROS then oxidize cysteine residues in proteins to potentiate downstream signaling. Spatial confinement of ROS is an important regulatory mechanism of redox signaling that enables the stimulation of different RTKs to oxidize distinct sets of downstream proteins. To uncover additional mechanisms that specify cysteines that are redox regulated by EGF stimulation, we performed time-resolved quantification of the EGF-dependent oxidation of 4200 cysteine sites in A431 cells. Fifty-one percent of cysteines were statistically significantly oxidized by EGF stimulation. Furthermore, EGF induced three distinct spatiotemporal patterns of cysteine oxidation in functionally organized protein networks, consistent with the spatial confinement model. Unexpectedly, protein crystal structure analysis and molecular dynamics simulations indicated widespread redox regulation of cryptic cysteine residues that are solvent exposed only upon changes in protein conformation. Phosphorylation and increased flux of nucleotide substrates served as two distinct modes by which EGF specified the cryptic cysteine residues that became solvent exposed and redox regulated. Because proteins that are structurally regulated by different RTKs or cellular perturbations are largely unique, these findings suggest that solvent exposure and redox regulation of cryptic cysteine residues contextually delineate redox signaling networks.
Adaptive Markov state model estimation using short reseeding trajectories.
J Chem Phys. 2020 Jan 14;152(2):024103
Authors: Wan H, Voelz VA
In the last decade, advances in molecular dynamics (MD) and Markov State Model (MSM) methodologies have made possible accurate and efficient estimation of kinetic rates and reactive pathways for complex biomolecular dynamics occurring on slow time scales. A promising approach to enhanced sampling of MSMs is to use “adaptive” methods, in which new MD trajectories are “seeded” preferentially from previously identified states. Here, we investigate the performance of various MSM estimators applied to reseeding trajectory data, for both a simple 1D free energy landscape and mini-protein folding MSMs of WW domain and NTL9(1-39). Our results reveal the practical challenges of reseeding simulations and suggest a simple way to reweight seeding trajectory data to better estimate both thermodynamic and kinetic quantities.
Reconciling simulated ensembles of apomyoglobin with experimental HDX data using Bayesian inference and multi-ensemble Markov State Models.
J Chem Theory Comput. 2020 Jan 09;:
Authors: Wan H, Ge Y, Razavi A, Voelz VA
Hydrogen/deuterium exchange (HDX) is a powerful technique to investigate protein conformational dynamics at amino acid resolution. Because HDX provides a measurement of solvent exposure of backbone hydrogens, ensemble-averaged over potentially slow kinetic processes, it has been challenging to use HDX protection factors to refine structural ensembles obtained from molecular dynamics simulations. This entails two dual challenges: (1) identifying structural observables that best correlate with backbone amide protection from exchange, and (2) restraining these observables in molecular simulations to model ensembles consistent with experimental measurements. Here, we make significant progress on both fronts. First, we describe an improved predictor of HDX protection factors from structural observables in simulated ensembles, parameterized from ultra-long molecular dynamics simulation trajectory data, with a Bayesian inference approach used to retain the full posterior distribution of model parameters. We next present a new method for obtaining simulated ensembles in agreement with experimental HDX protection factors, in which molecular simulations are performed at various temperatures and restraint biases, and used to construct multi-ensemble Markov State Models (MSMs). Finally, the BICePs algorithm (Bayesian Inference of Conformational Populations) is then used with our HDX protection factor predictor to infer which thermodynamic ensemble agrees best with experiment, and estimate populations of each conformational state in the MSM. To illustrate the approach, we use a combination of HDX protection factor restraints and chemical shift restraints to model the conformational ensemble of apomyoglobin at pH 6. The resulting ensemble agrees well with experiment, and gives insight into the all-atom structure of disordered helices F and H in the absence of heme.
PMID: 31917926 [PubMed – as supplied by publisher]
The Cap-Snatching SFTSV Endonuclease Domain Is an Antiviral Target.
Cell Rep. 2020 Jan 07;30(1):153-163.e5
Authors: Wang W, Shin WJ, Zhang B, Choi Y, Yoo JS, Zimmerman MI, Frederick TE, Bowman GR, Gross ML, Leung DW, Jung JU, Amarasinghe GK
Severe fever with thrombocytopenia syndrome virus (SFTSV) is a tick-borne virus with 12%-30% case mortality rates and is related to the Heartland virus (HRTV) identified in the United States. Together, SFTSV and HRTV are emerging segmented, negative-sense RNA viral (sNSV) pathogens with potential global health impact. Here, we characterize the amino-terminal cap-snatching endonuclease domain of SFTSV polymerase (L) and solve a 2.4-Å X-ray crystal structure. While the overall structure is similar to those of other cap-snatching sNSV endonucleases, differences near the C terminus of the SFTSV endonuclease suggest divergence in regulation. Influenza virus endonuclease inhibitors, including the US Food and Drug Administration (FDA) approved Baloxavir (BXA), inhibit the endonuclease activity in in vitro enzymatic assays and in cell-based studies. BXA displays potent activity with a half maximal inhibitory concentration (IC50) of ∼100 nM in enzyme inhibition and an EC50 value of ∼250 nM against SFTSV and HRTV in plaque assays. Together, our data support sNSV endonucleases as an antiviral target.
Partition coefficients describe the equilibrium partitioning of a single, defined charge state of a solute between two liquid phases in contact, typically a neutral solute. Octanol-water partition coefficients ([Formula: see text]), or their logarithms (log P), are frequently used as a measure of lipophilicity in drug discovery. The partition coefficient is a physicochemical property that captures the thermodynamics of relative solvation between aqueous and nonpolar phases, and therefore provides an excellent test for physics-based computational models that predict properties of pharmaceutical relevance such as protein-ligand binding affinities or hydration/solvation free energies. The SAMPL6 Part II octanol-water partition coefficient prediction challenge used a subset of kinase inhibitor fragment-like compounds from the SAMPL6 [Formula: see text] prediction challenge in a blind experimental benchmark. Following experimental data collection, the partition coefficient dataset was kept blinded until all predictions were collected from participating computational chemistry groups. A total of 91 submissions were received from 27 participating research groups. This paper presents the octanol-water log P dataset for this SAMPL6 Part II partition coefficient challenge, which consisted of 11 compounds (six 4-aminoquinazolines, two benzimidazole, one pyrazolo[3,4-d]pyrimidine, one pyridine, one 2-oxoquinoline substructure containing compounds) with log P values in the range of 1.95-4.09. We describe the potentiometric log P measurement protocol used to collect this dataset using a Sirius T3, discuss the limitations of this experimental approach, and share suggestions for future log P data collection efforts for the evaluation of computational methods.
PMID: 31858363 [PubMed – as supplied by publisher]
A Small-Molecule Pan-Id Antagonist Inhibits Pathologic Ocular Neovascularization.
Cell Rep. 2019 Oct 01;29(1):62-75.e7
Authors: Wojnarowicz PM, Lima E Silva R, Ohnaka M, Lee SB, Chin Y, Kulukian A, Chang SH, Desai B, Garcia Escolano M, Shah R, Garcia-Cao M, Xu S, Kadam R, Goldgur Y, Miller MA, Ouerfelli O, Yang G, Arakawa T, Albanese SK, Garland WA, Stoller G, Chaudhary J, Norton L, Soni RK, Philip J, Hendrickson RC, Iavarone A, Dannenberg AJ, Chodera JD, Pavletich N, Lasorella A, Campochiaro PA, Benezra R
Id helix-loop-helix (HLH) proteins (Id1-4) bind E protein bHLH transcription factors, preventing them from forming active transcription complexes that drive changes in cell states. Id proteins are primarily expressed during development to inhibit differentiation, but they become re-expressed in adult tissues in diseases of the vasculature and cancer. We show that the genetic loss of Id1/Id3 reduces ocular neovascularization in mouse models of wet age-related macular degeneration (AMD) and retinopathy of prematurity (ROP). An in silico screen identifies AGX51, a small-molecule Id antagonist. AGX51 inhibits the Id1-E47 interaction, leading to ubiquitin-mediated degradation of Ids, cell growth arrest, and reduced viability. AGX51 is well-tolerated in mice and phenocopies the genetic loss of Id expression in AMD and ROP models by inhibiting retinal neovascularization. Thus, AGX51 is a first-in-class compound that antagonizes an interaction formerly considered undruggable and that may have utility in the management of multiple diseases.
Ancestral reconstruction reveals mechanisms of ERK regulatory evolution.
Elife. 2019 Aug 13;8:
Authors: Sang D, Pinglay S, Wiewiora RP, Selvan ME, Lou HJ, Chodera JD, Turk BE, Gümüş ZH, Holt LJ
Protein kinases are crucial to coordinate cellular decisions and therefore their activities are strictly regulated. Previously we used ancestral reconstruction to determine how CMGC group kinase specificity evolved (Howard et al., 2014). In the present study, we reconstructed ancestral kinases to study the evolution of regulation, from the inferred ancestor of CDKs and MAPKs, to modern ERKs. Kinases switched from high to low autophosphorylation activity at the transition to the inferred ancestor of ERKs 1 and 2. Two synergistic amino acid changes were sufficient to induce this change: shortening of the β3-αC loop and mutation of the gatekeeper residue. Restoring these two mutations to their inferred ancestral state led to a loss of dependence of modern ERKs 1 and 2 on the upstream activating kinase MEK in human cells. Our results shed light on the evolutionary mechanisms that led to the tight regulation of a kinase that is central in development and disease.
Small-molecule targeting of MUSASHI RNA-binding activity in acute myeloid leukemia.
Nat Commun. 2019 Jun 19;10(1):2691
Authors: Minuesa G, Albanese SK, Xie W, Kazansky Y, Worroll D, Chow A, Schurer A, Park SM, Rotsides CZ, Taggart J, Rizzi A, Naden LN, Chou T, Gourkanti S, Cappel D, Passarelli MC, Fairchild L, Adura C, Glickman JF, Schulman J, Famulare C, Patel M, Eibl JK, Ross GM, Bhattacharya S, Tan DS, Leslie CS, Beuming T, Patel DJ, Goldgur Y, Chodera JD, Kharas MG
The MUSASHI (MSI) family of RNA binding proteins (MSI1 and MSI2) contribute to a wide spectrum of cancers including acute myeloid leukemia. We find that the small molecule Ro 08-2750 (Ro) binds directly and selectively to MSI2 and competes for its RNA binding in biochemical assays. Ro treatment in mouse and human myeloid leukemia cells results in an increase in differentiation and apoptosis, inhibition of known MSI-targets, and a shared global gene expression signature similar to shRNA depletion of MSI2. Ro demonstrates in vivo inhibition of c-MYC and reduces disease burden in a murine AML leukemia model. Thus, we identify a small molecule that targets MSI’s oncogenic activity. Our study provides a framework for targeting RNA binding proteins in cancer.
The dynamic conformational landscape of the protein methyltransferase SETD8.
Elife. 2019 May 13;8:
Authors: Chen S, Wiewiora RP, Meng F, Babault N, Ma A, Yu W, Qian K, Hu H, Zou H, Wang J, Fan S, Blum G, Pittella-Silva F, Beauchamp KA, Tempel W, Jiang H, Chen K, Skene RJ, Zheng YG, Brown PJ, Jin J, Luo C, Chodera JD, Luo M
Elucidating the conformational heterogeneity of proteins is essential for understanding protein function and developing exogenous ligands. With the rapid development of experimental and computational methods, it is of great interest to integrate these approaches to illuminate the conformational landscapes of target proteins. SETD8 is a protein lysine methyltransferase (PKMT), which functions in vivo via the methylation of histone and nonhistone targets. Utilizing covalent inhibitors and depleting native ligands to trap hidden conformational states, we obtained diverse X-ray structures of SETD8. These structures were used to seed distributed atomistic molecular dynamics simulations that generated a total of six milliseconds of trajectory data. Markov state models, built via an automated machine learning approach and corroborated experimentally, reveal how slow conformational motions and conformational states are relevant to catalysis. These findings provide molecular insight on enzymatic catalysis and allosteric mechanisms of a PKMT via its detailed conformational landscape.
PMID: 31081496 [PubMed – as supplied by publisher]
Authors: Wissler HL, Ehlerding EB, Lyu Z, Zhao Y, Zhang S, Eshraghi A, Buuh ZY, McGuth JC, Guan Y, Engle JW, Bartlett SJ, Voelz VA, Cai W, Wang RE
The rapid ascension of immune checkpoint blockade treatments has placed an emphasis on the need for viable, robust, and noninvasive imaging methods for immune checkpoint proteins, which could be of diagnostic value. Immunoconjugate-based positron emission tomography (immuno-PET) allows for sensitive and quantitative imaging of target levels and has promising potential for the noninvasive evaluation of immune checkpoint proteins. However, the advancement of immuno-PET is currently limited by available imaging tools, which heavily rely on full-length IgGs with Fc-mediated effects and are heterogeneous mixtures upon random conjugation with chelators for imaging. Herein, we have developed a site-specific αPD-L1 Fab conjugate with the chelator NOTA, enabling radiolabeling for PET imaging, using the amber suppression-mediated genetic incorporation of unnatural amino acid (UAA), p-azidophenylalanine. This Fab conjugate is homogeneous and demonstrated tight binding towards the PD-L1 antigen in vitro. The radiolabeled version, 64Cu-NOTA-αPD-L1, has been employed in PET imaging to allow for effective visualization and mapping of the biodistribution of PD-L1 in two normal mouse models, including the capturing of different PD-L1 expression levels in the spleens of the different mouse types. Follow-up in vivo blocking studies and ex vivo fluorescent staining further validated specific tissue uptakes of the imaging agent. This approach illustrates the utility of UAA-based site-specific Fab conjugation as a general strategy for making sensitive PET imaging probes, which could facilitate the elucidation of the roles of a wide variety of immune checkpoint proteins in immunotherapy.
PMID: 30875232 [PubMed – as supplied by publisher]
Cooperative Changes in Solvent Exposure Identify Cryptic Pockets, Switches, and Allosteric Coupling.
Biophys J. 2019 Jan 25;:
Authors: Porter JR, Moeder KE, Sibbald CA, Zimmerman MI, Hart KM, Greenberg MJ, Bowman GR
Proteins are dynamic molecules that undergo conformational changes to a broad spectrum of different excited states. Unfortunately, the small populations of these states make it difficult to determine their structures or functional implications. Computer simulations are an increasingly powerful means to identify and characterize functionally relevant excited states. However, this advance has uncovered a further challenge: it can be extremely difficult to identify the most salient features of large simulation data sets. We reasoned that many functionally relevant conformational changes are likely to involve large, cooperative changes to the surfaces that are available to interact with potential binding partners. To examine this hypothesis, we introduce a method that returns a prioritized list of potentially functional conformational changes by segmenting protein structures into clusters of residues that undergo cooperative changes in their solvent exposure, along with the hierarchy of interactions between these groups. We term these groups exposons to distinguish them from other types of clusters that arise in this analysis and others. We demonstrate, using three different model systems, that this method identifies experimentally validated and functionally relevant conformational changes, including conformational switches, allosteric coupling, and cryptic pockets. Our results suggest that key functional sites are hubs in the network of exposons. As a further test of the predictive power of this approach, we apply it to discover cryptic allosteric sites in two different β-lactamase enzymes that are widespread sources of antibiotic resistance. Experimental tests confirm our predictions for both systems. Importantly, we provide the first evidence, to our knowledge, for a cryptic allosteric site in CTX-M-9 β-lactamase. Experimentally testing this prediction did not require any mutations and revealed that this site exerts the most potent allosteric control over activity of any pockets found in β-lactamases to date. Discovery of a similar pocket that was previously overlooked in the well-studied TEM-1 β-lactamase demonstrates the utility of exposons.
PMID: 30744991 [PubMed – as supplied by publisher]
Peptoids are peptidomimetics of interest in the fields of drug development and biomaterials. However, obtaining stable secondary structures is challenging and designing these requires effective control of the peptoid tertiary amide cis/trans equilibrium. Herein, we report new fluorine containing aromatic monomers that can control peptoid conformation. Specifically, we demonstrate that a fluoro-pyridine group can be used to circumvent the need for monomer chirality to control the cis/trans equilibrium. We also show that incorporation of a trifluoro-methyl group (NCF3Rpe) rather than a methyl group (NRpe) at the α-carbon of a monomer gives rise to a 5-fold increase in cis-isomer preference.
PMID: 30739443 [PubMed – as supplied by publisher]
Authors: Hanson SM, Georghiou G, Thakur MK, Miller WT, Rest JS, Chodera JD, Seeliger MA
ATP-competitive kinase inhibitors often bind several kinases due to the high conservation of the ATP binding pocket. Through clustering analysis of a large kinome profiling dataset, we found a cluster of eight promiscuous kinases that on average bind more than five times more kinase inhibitors than the other 398 kinases in the dataset. To understand the structural basis of promiscuous inhibitor binding, we determined the co-crystal structure of the receptor tyrosine kinase DDR1 with the type I inhibitors dasatinib and VX-680. Surprisingly, we find that DDR1 binds these type I inhibitors in an inactive conformation typically reserved for type II inhibitors. Our computational and biochemical studies show that DDR1 is unusually stable in this inactive conformation, giving a mechanistic explanation for inhibitor promiscuity. This phenotypic clustering analysis provides a strategy to obtain functional insights not available by sequence comparison alone.
PMID: 30612951 [PubMed – as supplied by publisher]
Advanced Methods for Accessing Protein Shape-Shifting Present New Therapeutic Opportunities.
Trends Biochem Sci. 2018 Dec 13;:
Authors: Knoverek CR, Amarasinghe GK, Bowman GR
A protein is a dynamic shape-shifter whose function is determined by the set of structures it adopts. Unfortunately, atomically detailed structures are only available for a few conformations of any given protein, and these structures have limited explanatory and predictive power. Here, we provide a brief historical perspective on protein dynamics and introduce recent advances in computational and experimental methods that are providing unprecedented access to protein shape-shifting. Next, we focus on how these tools are revealing the mechanism of allosteric communication and features like cryptic pockets; both of which present new therapeutic opportunities. A major theme is the importance of considering the relative probabilities of different structures and the control one can exert over protein function by modulating this balance.
PMID: 30555007 [PubMed – as supplied by publisher]
Atomistic structural ensemble refinement reveals non-native structure stabilizes a sub-millisecond folding intermediate of CheY.
Sci Rep. 2017 03 08;7:44116
Authors: Shi J, Nobrega RP, Schwantes C, Kathuria SV, Bilsel O, Matthews CR, Lane TJ, Pande VS
The dynamics of globular proteins can be described in terms of transitions between a folded native state and less-populated intermediates, or excited states, which can play critical roles in both protein folding and function. Excited states are by definition transient species, and therefore are difficult to characterize using current experimental techniques. Here, we report an atomistic model of the excited state ensemble of a stabilized mutant of an extensively studied flavodoxin fold protein CheY. We employed a hybrid simulation and experimental approach in which an aggregate 42 milliseconds of all-atom molecular dynamics were used as an informative prior for the structure of the excited state ensemble. This prior was then refined against small-angle X-ray scattering (SAXS) data employing an established method (EROS). The most striking feature of the resulting excited state ensemble was an unstructured N-terminus stabilized by non-native contacts in a conformation that is topologically simpler than the native state. Using these results, we then predict incisive single molecule FRET experiments as a means of model validation. This study demonstrates the paradigm of uniting simulation and experiment in a statistical model to study the structure of protein excited states and rationally design validating experiments.
Solving the RNA design problem with reinforcement learning.
PLoS Comput Biol. 2018 06;14(6):e1006176
Authors: Eastman P, Shi J, Ramsundar B, Pande VS
We use reinforcement learning to train an agent for computational RNA design: given a target secondary structure, design a sequence that folds to that structure in silico. Our agent uses a novel graph convolutional architecture allowing a single model to be applied to arbitrary target structures of any length. After training it on randomly generated targets, we test it on the Eterna100 benchmark and find it outperforms all previous algorithms. Analysis of its solutions shows it has successfully learned some advanced strategies identified by players of the game Eterna, allowing it to solve some very difficult structures. On the other hand, it has failed to learn other strategies, possibly because they were not required for the targets in the training set. This suggests the possibility that future improvements to the training protocol may yield further gains in performance.
Overview of the SAMPL6 host-guest binding affinity prediction challenge.
J Comput Aided Mol Des. 2018 Nov 10;:
Authors: Rizzi A, Murkli S, McNeill JN, Yao W, Sullivan M, Gilson MK, Chiu MW, Isaacs L, Gibb BC, Mobley DL, Chodera JD
Accurately predicting the binding affinities of small organic molecules to biological macromolecules can greatly accelerate drug discovery by reducing the number of compounds that must be synthesized to realize desired potency and selectivity goals. Unfortunately, the process of assessing the accuracy of current computational approaches to affinity prediction against binding data to biological macromolecules is frustrated by several challenges, such as slow conformational dynamics, multiple titratable groups, and the lack of high-quality blinded datasets. Over the last several SAMPL blind challenge exercises, host-guest systems have emerged as a practical and effective way to circumvent these challenges in assessing the predictive performance of current-generation quantitative modeling tools, while still providing systems capable of possessing tight binding affinities. Here, we present an overview of the SAMPL6 host-guest binding affinity prediction challenge, which featured three supramolecular hosts: octa-acid (OA), the closely related tetra-endo-methyl-octa-acid (TEMOA), and cucurbituril (CB8), along with 21 small organic guest molecules. A total of 119 entries were received from ten participating groups employing a variety of methods that spanned from electronic structure and movable type calculations in implicit solvent to alchemical and potential of mean force strategies using empirical force fields with explicit solvent models. While empirical models tended to obtain better performance than first-principle methods, it was not possible to identify a single approach that consistently provided superior results across all host-guest systems and statistical metrics. Moreover, the accuracy of the methodologies generally displayed a substantial dependence on the system considered, emphasizing the need for host diversity in blind evaluations. Several entries exploited previous experimental measurements of similar host-guest systems in an effort to improve their physical-based predictions via some manner of rudimentary machine learning; while this strategy succeeded in reducing systematic errors, it did not correspond to an improvement in statistical correlation. Comparison to previous rounds of the host-guest binding free energy challenge highlights an overall improvement in the correlation obtained by the affinity predictions for OA and TEMOA systems, but a surprising lack of improvement regarding root mean square error over the past several challenge rounds. The data suggests that further refinement of force field parameters, as well as improved treatment of chemical effects (e.g., buffer salt conditions, protonation states), may be required to further enhance predictive accuracy.
PMID: 30415285 [PubMed – as supplied by publisher]
pKa measurements for the SAMPL6 prediction challenge for a set of kinase inhibitor-like fragments.
J Comput Aided Mol Des. 2018 Nov 07;:
Authors: Işık M, Levorse D, Rustenburg AS, Ndukwe IE, Wang H, Wang X, Reibarkh M, Martin GE, Makarov AA, Mobley DL, Rhodes T, Chodera JD
Determining the net charge and protonation states populated by a small molecule in an environment of interest or the cost of altering those protonation states upon transfer to another environment is a prerequisite for predicting its physicochemical and pharmaceutical properties. The environment of interest can be aqueous, an organic solvent, a protein binding site, or a lipid bilayer. Predicting the protonation state of a small molecule is essential to predicting its interactions with biological macromolecules using computational models. Incorrectly modeling the dominant protonation state, shifts in dominant protonation state, or the population of significant mixtures of protonation states can lead to large modeling errors that degrade the accuracy of physical modeling. Low accuracy hinders the use of physical modeling approaches for molecular design. For small molecules, the acid dissociation constant (pKa) is the primary quantity needed to determine the ionic states populated by a molecule in an aqueous solution at a given pH. As a part of SAMPL6 community challenge, we organized a blind pKa prediction component to assess the accuracy with which contemporary pKa prediction methods can predict this quantity, with the ultimate aim of assessing the expected impact on modeling errors this would induce. While a multitude of approaches for predicting pKa values currently exist, predicting the pKas of drug-like molecules can be difficult due to challenging properties such as multiple titratable sites, heterocycles, and tautomerization. For this challenge, we focused on set of 24 small molecules selected to resemble selective kinase inhibitors-an important class of therapeutics replete with titratable moieties. Using a Sirius T3 instrument that performs automated acid-base titrations, we used UV absorbance-based pKa measurements to construct a high-quality experimental reference dataset of macroscopic pKas for the evaluation of computational pKa prediction methodologies that was utilized in the SAMPL6 pKa challenge. For several compounds in which the microscopic protonation states associated with macroscopic pKas were ambiguous, we performed follow-up NMR experiments to disambiguate the microstates involved in the transition. This dataset provides a useful standard benchmark dataset for the evaluation of pKa prediction methodologies on kinase inhibitor-like compounds.
PMID: 30406372 [PubMed – as supplied by publisher]
Quantifying Configuration-Sampling Error in Langevin Simulations of Complex Molecular Systems.
Entropy (Basel). 2018 May;20(5):
Authors: Fass J, Sivak DA, Crooks GE, Beauchamp KA, Leimkuhler B, Chodera JD
While Langevin integrators are popular in the study of equilibrium properties of complex systems, it is challenging to estimate the timestep-induced discretization error: the degree to which the sampled phase-space or configuration-space probability density departs from the desired target density due to the use of a finite integration timestep. Sivak et al., introduced a convenient approach to approximating a natural measure of error between the sampled density and the target equilibrium density, the Kullback-Leibler (KL) divergence, in phase space, but did not specifically address the issue of configuration-space properties, which are much more commonly of interest in molecular simulations. Here, we introduce a variant of this near-equilibrium estimator capable of measuring the error in the configuration-space marginal density, validating it against a complex but exact nested Monte Carlo estimator to show that it reproduces the KL divergence with high fidelity. To illustrate its utility, we employ this new near-equilibrium estimator to assess a claim that a recently proposed Langevin integrator introduces extremely small configuration-space density errors up to the stability limit at no extra computational expense. Finally, we show how this approach to quantifying sampling bias can be applied to a wide variety of stochastic integrators by following a straightforward procedure to compute the appropriate shadow work, and describe how it can be extended to quantify the error in arbitrary marginal or conditional distributions of interest.
Traditional approaches to specifying a molecular mechanics force field encode all the information needed to assign force field parameters to a given molecule into a discrete set of atom types. This is equivalent to a representation consisting of a molecular graph comprising a set of vertices, which represent atoms labeled by atom type, and unlabeled edges, which represent chemical bonds. Bond stretch, angle bend, and dihedral parameters are then assigned by looking up bonded pairs, triplets, and quartets of atom types in parameter tables to assign valence terms and using the atom types themselves to assign nonbonded parameters. This approach, which we call indirect chemical perception because it operates on the intermediate graph of atom-typed nodes, creates a number of technical problems. For example, atom types must be sufficiently complex to encode all necessary information about the molecular environment, making it difficult to extend force fields encoded this way. Atom typing also results in a proliferation of redundant parameters applied to chemically equivalent classes of valence terms, needlessly increasing force field complexity. Here, we describe a new approach to assigning force field parameters via direct chemical perception. Rather than working through the intermediary of the atom-typed graph, direct chemical perception operates directly on the unmodified chemical graph of the molecule to assign parameters. In particular, parameters are assigned to each type of force field term (e.g., bond stretch, angle bend, torsion, and Lennard-Jones) based on standard chemical substructure queries implemented via the industry-standard SMARTS chemical perception language, using SMIRKS extensions that permit labeling of specific atoms within a chemical pattern. We use this to implement a new force field format, called the SMIRKS Native Open Force Field (SMIRNOFF) format. We demonstrate the power and generality of this approach using examples of specific molecules that pose problems for indirect chemical perception and construct and validate a minimalist yet very general force field, SMIRNOFF99Frosst. We find that a parameter definition file only ∼300 lines long provides coverage of all but <0.02% of a 5 million molecule drug-like test set. Despite its simplicity, the accuracy of SMIRNOFF99Frosst for small molecule hydration free energies and selected properties of pure organic liquids is similar to that of the General Amber Force Field, whose specification requires thousands of parameters. This force field provides a starting point for further optimization and refitting work to follow.
PMID: 30351006 [PubMed – as supplied by publisher]
Simulation of spontaneous G protein activation reveals a new intermediate driving GDP unbinding.
Elife. 2018 Oct 05;7:
Authors: Sun X, Singh S, Blumer K, Bowman GR
Activation of heterotrimeric G proteins is a key step in many signaling cascades. However, a complete mechanism for this process, which requires allosteric communication between binding sites that are ~30 Å apart, remains elusive. We construct an atomically-detailed model of G protein activation by combining three powerful computational methods: metadynamics, Markov state models (MSMs), and CARDS analysis of correlated motions. We uncover a mechanism that is consistent with a wide variety of structural and biochemical data. Surprisingly, the rate-limiting step for GDP release correlates with tilting rather than translation of the GPCR-binding helix 5. β-Strands 1-3 and helix 1 emerge as hubs in the allosteric network that links conformational changes in the GPCR-binding site to disordering of the distal nucleotide-binding site and consequent GDP release. Our approach and insights provide foundations for understanding disease-implicated G protein mutants, illuminating slow events in allosteric networks, and examining unbinding processes with slow off-rates.
PMID: 30289386 [PubMed – as supplied by publisher]
Simulations of the regulatory ACT domain of human phenylalanine hydroxylase unveil its mechanism of phenylalanine binding.
J Biol Chem. 2018 Oct 04;:
Authors: Ge Y, Borne E, Stewart S, Hansen MR, Arturo EC, Jaffe EK, Voelz VA
Phenylalanine hydroxylase (PAH) regulates phenylalanine (Phe) levels in mammals to prevent neurotoxicity resulting from high Phe concentrations as observed in genetic disorders leading to hyperphenylalaninemia and phenylketonuria. PAH senses elevated Phe concentrations by transient allosteric Phe binding to a protein-protein interface between ACT domains of different subunits in a PAH tetramer. This interface is present in an activated PAH tetramer (A-PAH) and absent in a resting-state PAH tetramer (RS-PAH). To investigate this allosteric sensing mechanism, here we used the GROMACS molecular dynamics simulation suite on the Folding@home computing platform to perform extensive molecular simulations and Markov state model (MSM) analysis of Phe binding to ACT domain dimers. These simulations strongly implicated a conformational selection mechanism for Phe association with ACT domain dimers and revealed protein motions that act as a gating mechanism for Phe binding. The MSMs also illuminate a highly mobile hairpin loop, consistent with experimental findings also presented here that the PAH variant L72W does not shift the PAH structural equilibrium toward the activated state. Finally, simulations of ACT domain monomers are presented, in which spontaneous transitions between resting-state and activated conformations are observed, also consistent with a mechanism of conformational selection. These mechanistic details provide detailed insight into the regulation of PAH activation and provide testable hypotheses for the development of new allosteric effectors to correct structural and functional defects in PAH.
PMID: 30287685 [PubMed – as supplied by publisher]
Isothermal titration calorimetry (ITC) is the only technique able to determine both the enthalpy and entropy of noncovalent association in a single experiment. The standard data analysis method based on nonlinear regression, however, provides unrealistically small uncertainty estimates due to its neglect of dominant sources of error. Here, we present a Bayesian framework for sampling from the posterior distribution of all thermodynamic parameters and other quantities of interest from one or more ITC experiments, allowing uncertainties and correlations to be quantitatively assessed. For a series of ITC measurements on metal:chelator and protein:ligand systems, the Bayesian approach yields uncertainties which represent the variability from experiment to experiment more accurately than the standard data analysis. In some datasets, the median enthalpy of binding is shifted by as much as 1.5 kcal/mol. A Python implementation suitable for analysis of data generated by MicroCal instruments (and adaptable to other calorimeters) is freely available online.
Kinases play a critical role in cellular signaling and are dysregulated in a number of diseases, such as cancer, diabetes, and neurodegeneration. Therapeutics targeting kinases currently account for roughly 50% of cancer drug discovery efforts. The ability to explore human kinase biochemistry and biophysics in the laboratory is essential to designing selective inhibitors and studying drug resistance. Bacterial expression systems are superior to insect or mammalian cells in terms of simplicity and cost effectiveness but have historically struggled with human kinase expression. Following the discovery that phosphatase coexpression produced high yields of Src and Abl kinase domains in bacteria, we have generated a library of 52 His-tagged human kinase domain constructs that express above 2 μg/mL of culture in an automated bacterial expression system utilizing phosphatase coexpression (YopH for Tyr kinases and lambda for Ser/Thr kinases). Here, we report a structural bioinformatics approach to identifying kinase domain constructs previously expressed in bacteria and likely to express well in our protocol, experiments demonstrating our simple construct selection strategy selects constructs with good expression yields in a test of 84 potential kinase domain boundaries for Abl, and yields from a high-throughput expression screen of 96 human kinase constructs. Using a fluorescence-based thermostability assay and a fluorescent ATP-competitive inhibitor, we show that the highest-expressing kinases are folded and have well-formed ATP binding sites. We also demonstrate that these constructs can enable characterization of clinical mutations by expressing a panel of 48 Src and 46 Abl mutations. The wild-type kinase construct library is available publicly via Addgene.
Elucidating the inhibition of peptidoglycan biosynthesis in Staphylococcus aureus by albocycline, a macrolactone isolated from Streptomyces maizeus.
Bioorg Med Chem. 2018 Jul 23;26(12):3453-3460
Authors: Liang H, Zhou G, Ge Y, D’Ambrosio EA, Eidem TM, Blanchard C, Shehatou C, Chatare VK, Dunman PM, Valentine AM, Voelz VA, Grimes CL, Andrade RB
Antibiotic resistance is a serious threat to global public health, and methicillin-resistant Staphylococcus aureus (MRSA) is a poignant example. The macrolactone natural product albocycline, derived from various Streptomyces strains, was recently identified as a promising antibiotic candidate for the treatment of both MRSA and vancomycin-resistant S. aureus (VRSA), which is another clinically relevant and antibiotic resistant strain. Moreover, it was hypothesized that albocycline’s antimicrobial activity was derived from the inhibition of peptidoglycan (i.e., bacterial cell wall) biosynthesis. Herein, preliminary mechanistic studies are performed to test the hypothesis that albocycline inhibits MurA, the enzyme that catalyzes the first step of peptidoglycan biosynthesis, using a combination of biological assays alongside molecular modeling and simulation studies. Computational modeling suggests albocycline exists as two conformations in solution, and computational docking of these conformations to an ensemble of simulated receptor structures correctly predicted preferential binding to S. aureus MurA-the enzyme that catalyzes the first step of peptidoglycan biosynthesis-over Escherichia coli (E. coli) MurA. Albocycline isolated from the producing organism (Streptomyces maizeus) weakly inhibited S. aureus MurA (IC50 of 480 μM) but did not inhibit E. coli MurA. The antimicrobial activity of albocycline against resistant S. aureus strains was superior to that of vancomycin, preferentially inhibiting Gram-positive organisms. Albocycline was not toxic to human HepG2 cells in MTT assays. While these studies demonstrate that albocycline is a promising lead candidate against resistant S. aureus, taken together they suggest that MurA is not the primary target, and further work is necessary to identify the major biological target.
Acquired resistance to IDH inhibition through trans or cis dimer-interface mutations.
Nature. 2018 Jul;559(7712):125-129
Authors: Intlekofer AM, Shih AH, Wang B, Nazir A, Rustenburg AS, Albanese SK, Patel M, Famulare C, Correa FM, Takemoto N, Durani V, Liu H, Taylor J, Farnoud N, Papaemmanuil E, Cross JR, Tallman MS, Arcila ME, Roshal M, Petsko GA, Wu B, Choe S, Konteatis ZD, Biller SA, Chodera JD, Thompson CB, Levine RL, Stein EM
Somatic mutations in the isocitrate dehydrogenase 2 gene (IDH2) contribute to the pathogenesis of acute myeloid leukaemia (AML) through the production of the oncometabolite 2-hydroxyglutarate (2HG)1-8. Enasidenib (AG-221) is an allosteric inhibitor that binds to the IDH2 dimer interface and blocks the production of 2HG by IDH2 mutants9,10. In a phase I/II clinical trial, enasidenib inhibited the production of 2HG and induced clinical responses in relapsed or refractory IDH2-mutant AML11. Here we describe two patients with IDH2-mutant AML who had a clinical response to enasidenib followed by clinical resistance, disease progression, and a recurrent increase in circulating levels of 2HG. We show that therapeutic resistance is associated with the emergence of second-site IDH2 mutations in trans, such that the resistance mutations occurred in the IDH2 allele without the neomorphic R140Q mutation. The in trans mutations occurred at glutamine 316 (Q316E) and isoleucine 319 (I319M), which are at the interface where enasidenib binds to the IDH2 dimer. The expression of either of these mutant disease alleles alone did not induce the production of 2HG; however, the expression of the Q316E or I319M mutation together with the R140Q mutation in trans allowed 2HG production that was resistant to inhibition by enasidenib. Biochemical studies predicted that resistance to allosteric IDH inhibitors could also occur via IDH dimer-interface mutations in cis, which was confirmed in a patient with acquired resistance to the IDH1 inhibitor ivosidenib (AG-120). Our observations uncover a mechanism of acquired resistance to a targeted therapy and underscore the importance of 2HG production in the pathogenesis of IDH-mutant malignancies.
Authors: Hernández CX, Wayment-Steele HK, Sultan MM, Husic BE, Pande VS
Often the analysis of time-dependent chemical and biophysical systems produces high-dimensional time-series data for which it can be difficult to interpret which individual features are most salient. While recent work from our group and others has demonstrated the utility of time-lagged covariate models to study such systems, linearity assumptions can limit the compression of inherently nonlinear dynamics into just a few characteristic components. Recent work in the field of deep learning has led to the development of the variational autoencoder (VAE), which is able to compress complex datasets into simpler manifolds. We present the use of a time-lagged VAE, or variational dynamics encoder (VDE), to reduce complex, nonlinear processes to a single embedding with high fidelity to the underlying dynamics. We demonstrate how the VDE is able to capture nontrivial dynamics in a variety of examples, including Brownian dynamics and atomistic protein folding. Additionally, we demonstrate a method for analyzing the VDE model, inspired by saliency mapping, to determine what features are selected by the VDE model to describe dynamics. The VDE presents an important step in applying techniques from deep learning to more accurately model and interpret complex biophysics.
Biomolecular simulations are typically performed in an aqueous environment where the number of ions remains fixed for the duration of the simulation, generally with either a minimally neutralizing ion environment or a number of salt pairs intended to match the macroscopic salt concentration. In contrast, real biomolecules experience local ion environments where the salt concentration is dynamic and may differ from bulk. The degree of salt concentration variability and average deviation from the macroscopic concentration remains, as yet, unknown. Here, we describe the theory and implementation of a Monte Carlo osmostat that can be added to explicit solvent molecular dynamics or Monte Carlo simulations to sample from a semigrand canonical ensemble in which the number of salt pairs fluctuates dynamically during the simulation. The osmostat reproduces the correct equilibrium statistics for a simulation volume that can exchange ions with a large reservoir at a defined macroscopic salt concentration. To achieve useful Monte Carlo acceptance rates, the method makes use of nonequilibrium candidate Monte Carlo (NCMC) moves in which monovalent ions and water molecules are alchemically transmuted using short nonequilibrium trajectories, with a modified Metropolis-Hastings criterion ensuring correct equilibrium statistics for an ( Δμ, N, p, T) ensemble to achieve a ∼1046× boost in acceptance rates. We demonstrate how typical protein (DHFR and the tyrosine kinase Src) and nucleic acid (Drew-Dickerson B-DNA dodecamer) systems exhibit salt concentration distributions that significantly differ from fixed-salt bulk simulations and display fluctuations that are on the same order of magnitude as the average.
Accurately predicting protein-ligand binding affinities and binding modes is a major goal in computational chemistry, but even the prediction of ligand binding modes in proteins poses major challenges. Here, we focus on solving the binding mode prediction problem for rigid fragments. That is, we focus on computing the dominant placement, conformation, and orientations of a relatively rigid, fragment-like ligand in a receptor, and the populations of the multiple binding modes which may be relevant. This problem is important in its own right, but is even more timely given the recent success of alchemical free energy calculations. Alchemical calculations are increasingly used to predict binding free energies of ligands to receptors. However, the accuracy of these calculations is dependent on proper sampling of the relevant ligand binding modes. Unfortunately, ligand binding modes may often be uncertain, hard to predict, and/or slow to interconvert on simulation time scales, so proper sampling with current techniques can require prohibitively long simulations. We need new methods which dramatically improve sampling of ligand binding modes. Here, we develop and apply a nonequilibrium candidate Monte Carlo (NCMC) method to improve sampling of ligand binding modes. In this technique, the ligand is rotated and subsequently allowed to relax in its new position through alchemical perturbation before accepting or rejecting the rotation and relaxation as a nonequilibrium Monte Carlo move. When applied to a T4 lysozyme model binding system, this NCMC method shows over 2 orders of magnitude improvement in binding mode sampling efficiency compared to a brute force molecular dynamics simulation. This is a first step toward applying this methodology to pharmaceutically relevant binding of fragments and, eventually, drug-like molecules. We are making this approach available via our new Binding modes of ligands using enhanced sampling (BLUES) package which is freely available on GitHub.
Model Selection Using BICePs: A Bayesian Approach for Force Field Validation and Parameterization.
J Phys Chem B. 2018 May 31;122(21):5610-5622
Authors: Ge Y, Voelz VA
The Bayesian Inference of Conformational Populations (BICePs) algorithm reconciles theoretical predictions of conformational state populations with sparse and/or noisy experimental measurements. Among its key advantages is its ability to perform objective model selection through a quantity we call the BICePs score, which reflects the integrated posterior evidence in favor of a given model, computed through free energy estimation methods. Here, we explore how the BICePs score can be used for force field validation and parametrization. Using a 2D lattice protein as a toy model, we demonstrate that BICePs is able to select the correct value of an interaction energy parameter given ensemble-averaged experimental distance measurements. We show that if conformational states are sufficiently fine-grained, the results are robust to experimental noise and measurement sparsity. Using these insights, we apply BICePs to perform force field evaluations for all-atom simulations of designed β-hairpin peptides against experimental NMR chemical shift measurements. These tests suggest that BICePs scores can be used for model selection in the context of all-atom simulations. We expect this approach to be particularly useful for the computational foldamer design as a tool for improving general-purpose force fields given sparse experimental measurements.
Authors: Shamay Y, Shah J, Işık M, Mizrachi A, Leibold J, Tschaharganeh DF, Roxbury D, Budhathoki-Uprety J, Nawaly K, Sugarman JL, Baut E, Neiman MR, Dacek M, Ganesh KS, Johnson DC, Sridharan R, Chu KL, Rajasekhar VK, Lowe SW, Chodera JD, Heller DA
Development of targeted nanoparticle drug carriers often requires complex synthetic schemes involving both supramolecular self-assembly and chemical modification. These processes are generally difficult to predict, execute, and control. We describe herein a targeted drug delivery system that is accurately and quantitatively predicted to self-assemble into nanoparticles based on the molecular structures of precursor molecules, which are the drugs themselves. The drugs assemble with the aid of sulfated indocyanines into particles with ultrahigh drug loadings of up to 90%. We devised quantitative structure-nanoparticle assembly prediction (QSNAP) models to identify and validate electrotopological molecular descriptors as highly predictive indicators of nano-assembly and nanoparticle size. The resulting nanoparticles selectively targeted kinase inhibitors to caveolin-1-expressing human colon cancer and autochthonous liver cancer models to yield striking therapeutic effects while avoiding pERK inhibition in healthy skin. This finding enables the computational design of nanomedicines based on quantitative models for drug payload selection.
Transferable neural networks for enhanced sampling of protein dynamics.
J Chem Theory Comput. 2018 Mar 12;:
Authors: Sultan MM, Wayment-Steele HK, Pande VS
Variational auto-encoder frameworks have demonstrated success in reducing complex nonlinear dynamics in molecular simulation to a single non-linear embedding. In this work, we illustrate how this non-linear latent embedding can be used as a collective variable for enhanced sampling, and present a simple modification that allows us to rapidly perform sampling in multiple related systems. We first demonstrate our method is able to describe the effects of force field changes in capped alanine dipeptide after learning a model using AMBER99. We further provide a simple extension to variational dynamics encoders that allows the model to be trained in a more efficient manner on larger systems by encoding the outputs of a linear transformation using time-structure based independent component analysis (tICA). Using this technique, we show how such a model trained for one protein, the WW domain, can efficiently be transferred to perform enhanced sampling on a related mutant protein, the GTT mutation. This method shows promise for its ability to rapidly sample related systems using a single transferable collective variable, enabling us to probe the effects of variation in increasingly large systems of biophysical interest.
PMID: 29529369 [PubMed – as supplied by publisher]
Electron Cryo-microscopy Structure of Ebola Virus Nucleoprotein Reveals a Mechanism for Nucleocapsid-like Assembly.
Cell. 2018 Feb 22;172(5):966-978.e12
Authors: Su Z, Wu C, Shi L, Luthra P, Pintilie GD, Johnson B, Porter JR, Ge P, Chen M, Liu G, Frederick TE, Binning JM, Bowman GR, Zhou ZH, Basler CF, Gross ML, Leung DW, Chiu W, Amarasinghe GK
Ebola virus nucleoprotein (eNP) assembles into higher-ordered structures that form the viral nucleocapsid (NC) and serve as the scaffold for viral RNA synthesis. However, molecular insights into the NC assembly process are lacking. Using a hybrid approach, we characterized the NC-like assembly of eNP, identified novel regulatory elements, and described how these elements impact function. We generated a three-dimensional structure of the eNP NC-like assembly at 5.8 Å using electron cryo-microscopy and identified a new regulatory role for eNP helices α22-α23. Biochemical, biophysical, and mutational analyses revealed that inter-eNP contacts within α22-α23 are critical for viral NC assembly and regulate viral RNA synthesis. These observations suggest that the N terminus and α22-α23 of eNP function as context-dependent regulatory modules (CDRMs). Our current study provides a framework for a structural mechanism for NC-like assembly and a new therapeutic target.
Many eukaryotic protein kinases are activated by phosphorylation on a specific conserved residue in the regulatory activation loop, a post-translational modification thought to stabilize the active DFG-In state of the catalytic domain. Here we use a battery of spectroscopic methods that track different catalytic elements of the kinase domain to show that the ~100 fold activation of the mitotic kinase Aurora A (AurA) by phosphorylation occurs without a population shift from the DFG-Out to the DFG-In state, and that the activation loop of the activated kinase remains highly dynamic. Instead, molecular dynamics simulations and electron paramagnetic resonance experiments show that phosphorylation triggers a switch within the DFG-In subpopulation from an autoinhibited DFG-In substate to an active DFG-In substate, leading to catalytic activation. This mechanism raises new questions about the functional role of the DFG-Out state in protein kinases.
Theoretical restrictions on longest implicit time scales in Markov state models of biomolecular dynamics.
J Chem Phys. 2018 Jan 28;148(4):044111
Authors: Sinitskiy AV, Pande VS
Markov state models (MSMs) have been widely used to analyze computer simulations of various biomolecular systems. They can capture conformational transitions much slower than an average or maximal length of a single molecular dynamics (MD) trajectory from the set of trajectories used to build the MSM. A rule of thumb claiming that the slowest implicit time scale captured by an MSM should be comparable by the order of magnitude to the aggregate duration of all MD trajectories used to build this MSM has been known in the field. However, this rule has never been formally proved. In this work, we present analytical results for the slowest time scale in several types of MSMs, supporting the above rule. We conclude that the slowest implicit time scale equals the product of the aggregate sampling and four factors that quantify: (1) how much statistics on the conformational transitions corresponding to the longest implicit time scale is available, (2) how good the sampling of the destination Markov state is, (3) the gain in statistics from using a sliding window for counting transitions between Markov states, and (4) a bias in the estimate of the implicit time scale arising from finite sampling of the conformational transitions. We demonstrate that in many practically important cases all these four factors are on the order of unity, and we analyze possible scenarios that could lead to their significant deviation from unity. Overall, we provide for the first time analytical results on the slowest time scales captured by MSMs. These results can guide further practical applications of MSMs to biomolecular dynamics and allow for higher computational efficiency of simulations.
Bryostatin 1 (henceforth bryostatin) is in clinical trials for the treatment of Alzheimer’s disease and for HIV/AIDS eradication. It is also a preclinical lead for cancer immunotherapy and other therapeutic indications. Yet nothing is known about the conformation of bryostatin bound to its protein kinase C (PKC) target in a membrane microenvironment. As a result, efforts to design more efficacious, better tolerated, or more synthetically accessible ligands have been limited to structures that do not include PKC or membrane effects known to influence PKC-ligand binding. This problem extends more generally to many membrane-associated proteins in the human proteome. Here, we use rotational-echo double-resonance (REDOR) solid-state NMR to determine the conformations of PKC modulators bound to the PKCδ-C1b domain in the presence of phospholipid vesicles. The conformationally limited PKC modulator phorbol diacetate (PDAc) is used as an initial test substrate. While unanticipated partitioning of PDAc between an immobilized protein-bound state and a mobile state in the phospholipid assembly was observed, a single conformation in the bound state was identified. In striking contrast, a bryostatin analogue (bryolog) was found to exist exclusively in a protein-bound state, but adopts a distribution of conformations as defined by three independent distance measurements. The detection of multiple PKCδ-C1b-bound bryolog conformers in a functionally relevant phospholipid complex reveals the inherent dynamic nature of cellular systems that is not captured with single-conformation static structures. These results indicate that binding, selectivity, and function of PKC modulators, as well as the design of new modulators, are best addressed using a dynamic multistate model, an analysis potentially applicable to other membrane-associated proteins.
Warfarin traps human vitamin K epoxide reductase in an intermediate state during electron transfer.
Nat Struct Mol Biol. 2017 01;24(1):69-76
Authors: Shen G, Cui W, Zhang H, Zhou F, Huang W, Liu Q, Yang Y, Li S, Bowman GR, Sadler JE, Gross ML, Li W
Although warfarin is the most widely used anticoagulant worldwide, the mechanism by which warfarin inhibits its target, human vitamin K epoxide reductase (hVKOR), remains unclear. Here we show that warfarin blocks a dynamic electron-transfer process in hVKOR. A major fraction of cellular hVKOR is in an intermediate redox state containing a Cys51-Cys132 disulfide, a characteristic accommodated by a four-transmembrane-helix structure of hVKOR. Warfarin selectively inhibits this major cellular form of hVKOR, whereas disruption of the Cys51-Cys132 disulfide impairs warfarin binding and causes warfarin resistance. Relying on binding interactions identified by cysteine alkylation footprinting and mass spectrometry coupled with mutagenesis analysis, we conducted structure simulations, which revealed a closed warfarin-binding pocket stabilized by the Cys51-Cys132 linkage. Understanding the selective warfarin inhibition of a specific redox state of hVKOR should enable the rational design of drugs that exploit the redox chemistry and associated conformational changes in hVKOR.
Predicting resistance of clinical Abl mutations to targeted kinase inhibitors using alchemical free-energy calculations.
Commun Biol. 2018;1:
Authors: Hauser K, Negron C, Albanese SK, Ray S, Steinbrecher T, Abel R, Chodera JD, Wang L
The therapeutic effect of targeted kinase inhibitors can be significantly reduced by intrinsic or acquired resistance mutations that modulate the affinity of the drug for the kinase. In cancer, the majority of missense mutations are rare, making it difficult to predict their impact on inhibitor affinity. This complicates the practice of precision medicine, pairing of patients with clinical trials, and development of next-generation inhibitors. Here, we examine the potential for alchemical free-energy calculations to predict how kinase mutations modulate inhibitor affinities to Abl, a major target in chronic myelogenous leukemia (CML). We find these calculations can achieve useful accuracy in predicting resistance for a set of eight FDA-approved kinase inhibitors across 144 clinically-identified point mutations, achieving a root mean square error in binding free energy changes of 184.108.40.206 kcal/mol (95% confidence interval) and correctly classifying mutations as resistant or susceptible with 888293% accuracy. Since these calculations are fast on modern GPUs, this benchmark establishes the potential for physical modeling to collaboratively support the rapid assessment and anticipation of the potential for patient mutations to affect drug potency in clinical applications.
Markov state models (MSMs) are a powerful framework for analyzing dynamical systems, such as molecular dynamics (MD) simulations, that have gained widespread use over the past several decades. This review offers a complete picture of the MSM field to date, presented for a general audience as a timeline of key developments in the field. We sequentially address early studies that motivated the method, canonical papers that established the use of MSMs for MD analysis, and subsequent advances in software and analysis protocols. The derivation of a variational principle for MSMs in 2013 signified a turning point from expertise-driving MSM building to a systematic, objective protocol. The variational approach, combined with best practices for model selection and open-source software, enabled a wide range of MSM analysis for applications such as protein folding and allostery, ligand binding, and protein-protein association. To conclude, the current frontiers of methods development are highlighted, as well as exciting applications in experimental design and drug discovery.
PMID: 29323881 [PubMed – as supplied by publisher]
Mechanistic Basis for ATP-Dependent Inhibition of Glutamine Synthetase by Tabtoxinine-β-lactam.
Biochemistry. 2018 01 09;57(1):117-135
Authors: Patrick GJ, Fang L, Schaefer J, Singh S, Bowman GR, Wencewicz TA
Tabtoxinine-β-lactam (TβL), also known as wildfire toxin, is a time- and ATP-dependent inhibitor of glutamine synthetase produced by plant pathogenic strains of Pseudomonas syringae. Here we demonstrate that recombinant glutamine synthetase from Escherichia coli phosphorylates the C3-hydroxyl group of the TβL 3-(S)-hydroxy-β-lactam (3-HβL) warhead. Phosphorylation of TβL generates a stable, noncovalent enzyme-ADP-inhibitor complex that resembles the glutamine synthetase tetrahedral transition state. The TβL β-lactam ring remains intact during enzyme inhibition, making TβL mechanistically distinct from traditional β-lactam antibiotics such as penicillin. Our findings could enable the design of new 3-HβL transition state inhibitors targeting enzymes in the ATP-dependent carboxylate-amine ligase superfamily with broad therapeutic potential in many disease areas.
Prediction of New Stabilizing Mutations Based on Mechanistic Insights from Markov State Models.
ACS Cent Sci. 2017 Dec 27;3(12):1311-1321
Authors: Zimmerman MI, Hart KM, Sibbald CA, Frederick TE, Jimah JR, Knoverek CR, Tolia NH, Bowman GR
Protein stabilization is fundamental to enzyme function and evolution, yet understanding the determinants of a protein’s stability remains a challenge. This is largely due to a shortage of atomically detailed models for the ensemble of relevant protein conformations and their relative populations. For example, the M182T substitution in TEM β-lactamase, an enzyme that confers antibiotic resistance to bacteria, is stabilizing but the precise mechanism remains unclear. Here, we employ Markov state models (MSMs) to uncover how M182T shifts the distribution of different structures that TEM adopts. We find that M182T stabilizes a helix that is a key component of a domain interface. We then predict the effects of other mutations, including a novel stabilizing mutation, and experimentally test our predictions using a combination of stability measurements, crystallography, NMR, and in vivo measurements of bacterial fitness. We expect our insights and methodology to provide a valuable foundation for protein design.
A minimum variance clustering approach produces robust and interpretable coarse-grained models.
J Chem Theory Comput. 2017 Dec 18;:
Authors: Husic BE, McKiernan KA, Wayment-Steele HK, Sultan MM, Pande VS
Markov state models (MSMs) are a powerful framework for the analysis of molecular dynamics datasets, such as protein folding simulations, due to their straightforward construction and statistical rigor. Coarse-graining MSMs into an interpretable number of macrostates is a crucial step for connecting theoretical results with experimental observables. Here, we present the minimum variance clustering approach (MVCA) for coarse-graining a MSM into a macrostate model. The method utilizes agglomerative clustering with Ward’s minimum variance objective function, and the similarity of the microstate dynamics are determined using the Jensen-Shannon divergence between the corresponding rows in the MSM transition probability matrix. We first show that MVCA produces intuitive results for a simple tripeptide system, and is robust to long-duration statistical artifacts. MVCA is then applied to two protein folding simulations of the same protein in different force fields to demonstrate that a different number of macrostates is appropriate for each model, revealing a misfolded state present in only one of the simulations. Finally, we show that the same method can be used to analyze a dataset containing many MSMs from simulations in different force fields by aggregating them into groups and quantifying their dynamical similarity in the context of force field parameter choices. The minimum variance clustering approach with the Jensen-Shannon divergence provides a powerful tool to group dynamics by similarity, both among model states and among dynamical models themselves.
PMID: 29253336 [PubMed – as supplied by publisher]
Designing small molecules to target cryptic pockets yields both positive and negative allosteric modulators.
PLoS One. 2017;12(6):e0178678
Authors: Hart KM, Moeder KE, Ho CMW, Zimmerman MI, Frederick TE, Bowman GR
Allosteric drugs, which bind to proteins in regions other than their main ligand-binding or active sites, make it possible to target proteins considered “undruggable” and to develop new therapies that circumvent existing resistance. Despite growing interest in allosteric drug discovery, rational design is limited by a lack of sufficient structural information about alternative binding sites in proteins. Previously, we used Markov State Models (MSMs) to identify such “cryptic pockets,” and here we describe a method for identifying compounds that bind in these cryptic pockets and modulate enzyme activity. Experimental tests validate our approach by revealing both an inhibitor and two activators of TEM β-lactamase (TEM). To identify hits, a library of compounds is first virtually screened against either the crystal structure of a known cryptic pocket or an ensemble of structures containing the same cryptic pocket that is extracted from an MSM. Hit compounds are then screened experimentally and characterized kinetically in individual assays. We identify three hits, one inhibitor and two activators, demonstrating that screening for binding to allosteric sites can result in both positive and negative modulation. The hit compounds have modest effects on TEM activity, but all have higher affinities than previously identified inhibitors, which bind the same cryptic pocket but were found, by chance, via a computational screen targeting the active site. Site-directed mutagenesis of key contact residues predicted by the docking models is used to confirm that the compounds bind in the cryptic pocket as intended. Because hit compounds are identified from docking against both the crystal structure and structures from the MSM, this platform should prove suitable for many proteins, particularly targets whose crystal structures lack obvious druggable pockets, and for identifying both inhibitory and activating small-molecule modulators.
Control of porphyrin interactions via structural changes of a peptoid scaffold.
Org Biomol Chem. 2017 Nov 22;15(45):9670-9679
Authors: Yang W, Kang B, Voelz VA, Seo J
Nature utilizes optimally organized pigments in light-harvesting complexes. To mimic the natural photosynthetic proteins, effective control over inter-pigment interactions is necessary to attain the desired photophysical properties. Previously, we developed porphyrin-peptoid conjugates (PPCamide) and displayed two porphyrins at defined positions on an α-helical peptoid using a flexible n-butyl linker. Herein, we synthesized new porphyrin-peptoid conjugates (PPCC-C), where porphyrins are conjugated through a rigid C-C linkage to the helical peptoid via the Suzuki-Miyaura cross-coupling reaction. With PPCC-C, we studied the effects of backbone conformation, inter-porphyrin distance, and the linker flexibility on porphyrin interactions. When the rigid C-C linkage was used, conformational homogeneity of the PPC increased, providing more effective intramolecular excitonic couplings between the porphyrins; however, the intermolecular porphyrin J-aggregation decreased. In PPCC-C with a nonameric peptoid backbone, the formation of a threaded loop conformation was observed, which could be switched back to a helical conformation by N-terminal acetylation or by the addition of a protic solvent. This threaded loop-to-helix conversion restored the intramolecular porphyrin interactions. Our results suggest that PPCs represent an excellent system for control over porphyrin interactions and therefore are useful as a model system to elucidate pigment interactions in nature or as a molecular construct with switchable photophysical properties.
RXRA regulates transcription as part of a heterodimer with 14 other nuclear receptors, including the peroxisome proliferator-activated receptors (PPARs). Analysis from TCGA raised the possibility that hyperactive PPAR signaling, either due to PPAR gamma gene amplification or RXRA hot-spot mutation (S427F/Y) drives 20-25% of human bladder cancers. Here, we characterize mutant RXRA, demonstrating it induces enhancer/promoter activity in the context of RXRA/PPAR heterodimers in human bladder cancer cells. Structure-function studies indicate that the RXRA substitution allosterically regulates the PPAR AF2 domain via an aromatic interaction with the terminal tyrosine found in PPARs. In mouse urothelial organoids, PPAR agonism is sufficient to drive growth-factor-independent growth in the context of concurrent tumor suppressor loss. Similarly, mutant RXRA stimulates growth-factor-independent growth of Trp53/Kdm6a null bladder organoids. Mutant RXRA-driven growth of urothelium is reversible by PPAR inhibition, supporting PPARs as targetable drivers of bladder cancer.
Millisecond dynamics of BTK reveal kinome-wide conformational plasticity within the apo kinase domain.
Sci Rep. 2017 Nov 15;7(1):15604
Authors: Sultan MM, Denny RA, Unwalla R, Lovering F, Pande VS
Bruton tyrosine kinase (BTK) is a key enzyme in B-cell development whose improper regulation causes severe immunodeficiency diseases. Design of selective BTK therapeutics would benefit from improved, in-silico structural modeling of the kinase’s solution ensemble. However, this remains challenging due to the immense computational cost of sampling events on biological timescales. In this work, we combine multi-millisecond molecular dynamics (MD) simulations with Markov state models (MSMs) to report on the thermodynamics, kinetics, and accessible states of BTK’s kinase domain. Our conformational landscape links the active state to several inactive states, connected via a structurally diverse intermediate. Our calculations predict a kinome-wide conformational plasticity, and indicate the presence of several new potentially druggable BTK states. We further find that the population of these states and the kinetics of their inter-conversion are modulated by protonation of an aspartate residue, establishing the power of MD & MSMs in predicting effects of chemical perturbations.
Note: MSM lag time cannot be used for variational model selection.
J Chem Phys. 2017 Nov 07;147(17):176101
Authors: Husic BE, Pande VS
The variational principle for conformational dynamics has enabled the systematic construction of Markov state models through the optimization of hyperparameters by approximating the transfer operator. In this note, we discuss why the lag time of the operator being approximated must be held constant in the variational approach.
Molecular dynamics simulations reveal ligand-controlled positioning of a peripheral protein complex in membranes.
Nat Commun. 2017 02 23;8(1):6
Authors: Ryckbosch SM, Wender PA, Pande VS
Bryostatin is in clinical trials for Alzheimer’s disease, cancer, and HIV/AIDS eradication. It binds to protein kinase C competitively with diacylglycerol, the endogenous protein kinase C regulator, and plant-derived phorbol esters, but each ligand induces different activities. Determination of the structural origin for these differing activities by X-ray analysis has not succeeded due to difficulties in co-crystallizing protein kinase C with relevant ligands. More importantly, static, crystal-lattice bound complexes do not address the influence of the membrane on the structure and dynamics of membrane-associated proteins. To address this general problem, we performed long-timescale (400-500 µs aggregate) all-atom molecular dynamics simulations of protein kinase C-ligand-membrane complexes and observed that different protein kinase C activators differentially position the complex in the membrane due in part to their differing interactions with waters at the membrane inner leaf. These new findings enable new strategies for the design of simpler, more effective protein kinase C analogs and could also prove relevant to other peripheral protein complexes.Natural supplies of bryostatin, a compound in clinical trials for Alzheimer’s disease, cancer, and HIV, are scarce. Here, the authors perform molecular dynamics simulations to understand how bryostatin interacts with membrane-bound protein kinase C, offering insights for the design of bryostatin analogs.
Endogenous retinoid X receptor ligands in mouse hematopoietic cells.
Sci Signal. 2017 Oct 31;10(503):
Authors: Niu H, Fujiwara H, di Martino O, Hadwiger G, Frederick TE, Menéndez-Gutiérrez MP, Ricote M, Bowman GR, Welch JS
The retinoid X receptor α (RXRA) has been implicated in diverse hematological processes. To identify natural ligands of RXRA that are present in hematopoietic cells, we adapted an upstream activation sequence-green fluorescent protein (UAS-GFP) reporter mouse to detect natural RXRA ligands in vivo. We observed reporter activity in diverse types of hematopoietic cells in vivo. Reporter activity increased during granulocyte colony-stimulating factor (G-CSF)-induced granulopoiesis and after phenylhydrazine (PHZ)-induced anemia, suggesting the presence of dynamically regulated natural RXRA ligands in hematopoietic cells. Mouse plasma activated Gal4-UAS reporter cells in vitro, and plasma from mice treated with G-CSF or PHZ recapitulated the patterns of reporter activation that we observed in vivo. Plasma from mice with dietary vitamin A deficiency only mildly reduced RXRA reporter activity, whereas plasma from mice on a fatty acid restriction diet reduced reporter activity, implicating fatty acids as plasma RXRA ligands. Through differential extraction coupled with mass spectrometry, we identified the long-chain fatty acid C24:5 as a natural RXRA ligand that was greatly increased in abundance in response to hematopoietic stress. Together, these data suggest that natural RXRA ligands are present and dynamically increased in abundance in mouse hematopoietic cells in vivo.
Solution-Phase Conformation and Dynamics of Conjugated Isoindigo-Based Donor-Acceptor Polymer Single Chains.
J Phys Chem Lett. 2017 Oct 25;:
Authors: Lee FL, Barati Farimani A, Gu KL, Yan H, Toney MF, Bao Z, Pande VS
Conjugated polymers are the key material in thin film organic optoelectronic devices due to the versatility of these molecules combined with their semiconducting properties. A molecular scale understanding of conjugated polymers is important to the optimization of the thin film morphology. In this work, we examine the solution-phase behavior of conjugated isoindigo-based donor-acceptor polymer single chains of various chain lengths using atomistic molecular dynamics (MD) simulations. Our simulations elucidate the transition from a rod-like to a coil-like conformation from an analysis of normal modes and persistence length. In addition, we find another transition based on the solvent environment, contrasting the coil-like conformation in a good solvent with a globule-like conformation in a poor solvent. Overall, our results provide valuable insights into the transition between conformational regimes for conjugated polymers as a function of both the chain length and the solvent environment, which will help to accurately parameterize higher level models.
PMID: 29065685 [PubMed – as supplied by publisher]
Simulated Dynamics of Glycans on Ligand-Binding Domain of NMDA Receptors Reveals Strong Dynamic Coupling between Glycans and Protein Core.
J Chem Theory Comput. 2017 Oct 11;:
Authors: Sinitskiy AV, Pande VS
N-methyl-D-aspartate (NMDA) receptors, key neuronal receptors playing the central role in learning and memory, are heavily glycosylated in vivo. Astonishingly little is known about the structure, dynamics and physiological relevance of glycans attached to them. We recently demonstrated that certain glycans on the ligand binding domain (LBD) of NMDA receptors (NMDARs) can serve as intramolecular potentiators, changing EC50 of NMDAR co-agonists. In this work, we use molecular dynamics trajectories, in aggregate 86.5 μs long, of the glycosylated LBD of the GluN1 subunit of the NMDAR to investigate the behavior of glycans on NMDARs. Though all glycans in our simulations were structurally the same (Man5), the dynamics of glycans at different locations on NMDARs was surprisingly different. The slowest-timescale motions that we detected in various glycans in some cases corresponded to a flipping of parts of glycans relative to each other, while in other cases reduced to a head-to-tail bending of a glycan. We predict that timescales of conformational changes in glycans on the GluN1 LBD of NMDARs range from nanoseconds to at least hundreds of microseconds. Some of the conformational changes in the glycans correlate with the physiologically important clamshell-like opening and closing of the GluN1 LBD domain. Thus, glycans are an integral part of NMDARs, and computational models of NMDARs should include glycans to faithfully represent the structure and the dynamics of these receptors.
PMID: 29019687 [PubMed – as supplied by publisher]
Transfer Learning from Markov Models Leads to Efficient Sampling of Related Systems.
J Phys Chem B. 2017 Sep 22;:
Authors: Sultan MM, Pande VS
We recently showed that the time-structure based independent component analysis method from Markov state model literature provided a set of variationally optimal slow collective variables for Metadynamics (tICA-Metadynamics). In this paper, we extend the methodology towards efficient sampling of related mutants by borrowing ideas from transfer learning methods in machine learning. Our method explicitly assumes that a similar set of slow modes and metastable states are found in both the wild type (base line) and its mutants. Under this assumption, we describe a few simple techniques using sequence mapping for transferring the slow modes and structural information contained in the wild type simulation to a mutant model for performing enhanced sampling. The resulting simulations can then be reweighted onto the full-phase space using Multi-state Bennett Acceptance Ratio, allowing for thermodynamic comparison against the wild type. We first benchmark our methodology by re-capturing alanine dipeptide dynamics across a range of different atomistic force fields, including the polarizable Amoeba force field, after learning a set of slow modes using Amber ff99sb-ILDN. We next extend the method by including structural data from the wild type simulation and apply the technique to recapturing the affects of the GTT mutation on the FIP35 WW domain.
PMID: 28938073 [PubMed – as supplied by publisher]
Modeling the mechanism of CLN025 beta-hairpin formation.
J Chem Phys. 2017 Sep 14;147(10):104107
Authors: McKiernan KA, Husic BE, Pande VS
Beta-hairpins are substructures found in proteins that can lend insight into more complex systems. Furthermore, the folding of beta-hairpins is a valuable test case for benchmarking experimental and theoretical methods. Here, we simulate the folding of CLN025, a miniprotein with a beta-hairpin structure, at its experimental melting temperature using a range of state-of-the-art protein force fields. We construct Markov state models in order to examine the thermodynamics, kinetics, mechanism, and rate-determining step of folding. Mechanistically, we find the folding process is rate-limited by the formation of the turn region hydrogen bonds, which occurs following the downhill hydrophobic collapse of the extended denatured protein. These results are presented in the context of established and contradictory theories of the beta-hairpin folding process. Furthermore, our analysis suggests that the AMBER-FB15 force field, at this temperature, best describes the characteristics of the full experimental CLN025 conformational ensemble, while the AMBER ff99SB-ILDN and CHARMM22* force fields display a tendency to overstabilize the native state.
Bridging Microscopic and Macroscopic Mechanisms of p53-MDM2 Binding with Kinetic Network Models.
Biophys J. 2017 Aug 22;113(4):785-793
Authors: Zhou G, Pantelopulos GA, Mukherjee S, Voelz VA
Under normal cellular conditions, the tumor suppressor protein p53 is kept at low levels in part due to ubiquitination by MDM2, a process initiated by binding of MDM2 to the intrinsically disordered transactivation domain (TAD) of p53. Many experimental and simulation studies suggest that disordered domains such as p53 TAD bind their targets nonspecifically before folding to a tightly associated conformation, but the microscopic details are unclear. Toward a detailed prediction of binding mechanisms, pathways, and rates, we have performed large-scale unbiased all-atom simulations of p53-MDM2 binding. Markov state models (MSMs) constructed from the trajectory data predict p53 TAD binding pathways and on-rates in good agreement with experiment. The MSM reveals that two key bound intermediates, each with a nonnative arrangement of hydrophobic residues in the MDM2 binding cleft, control the overall on-rate. Using microscopic rate information from the MSM, we parameterize a simple four-state kinetic model to 1) determine that induced-fit pathways dominate the binding flux over a large range of concentrations, and 2) predict how modulation of residual p53 helicity affects binding, in good agreement with experiment. These results suggest new ways in which microscopic models of peptide binding, coupled with simple few-state binding flux models, can be used to understand biological function in physiological contexts.
Identification of simple reaction coordinates from complex dynamics.
J Chem Phys. 2017 01 28;146(4):044109
Authors: McGibbon RT, Husic BE, Pande VS
Reaction coordinates are widely used throughout chemical physics to model and understand complex chemical transformations. We introduce a definition of the natural reaction coordinate, suitable for condensed phase and biomolecular systems, as a maximally predictive one-dimensional projection. We then show that this criterion is uniquely satisfied by a dominant eigenfunction of an integral operator associated with the ensemble dynamics. We present a new sparse estimator for these eigenfunctions which can search through a large candidate pool of structural order parameters and build simple, interpretable approximations that employ only a small number of these order parameters. Example applications with a small molecule’s rotational dynamics and simulations of protein conformational change and folding show that this approach can filter through statistical noise to identify simple reaction coordinates from complex dynamics.
Molecular simulations and free-energy calculations suggest conformation-dependent anion binding to a cytoplasmic site as a mechanism for Na+/K+-ATPase ion selectivity.
J Biol Chem. 2017 07 28;292(30):12412-12423
Authors: Razavi AM, Delemotte L, Berlin JR, Carnevale V, Voelz VA
Na+/K+-ATPase transports Na+ and K+ ions across the cell membrane via an ion-binding site becoming alternatively accessible to the intra- and extracellular milieu by conformational transitions that confer marked changes in ion-binding stoichiometry and selectivity. To probe the mechanism of these changes, we used molecular simulation and free-energy perturbation approaches to identify probable protonation states of Na+- and K+-coordinating residues in E1P and E2P conformations of Na+/K+-ATPase. Analysis of these simulations revealed a molecular mechanism responsible for the change in protonation state: the conformation-dependent binding of an anion (a chloride ion in our simulations) to a previously unrecognized cytoplasmic site in the loop between transmembrane helices 8 and 9, which influences the electrostatic potential of the crucial Na+-coordinating residue Asp926 This mechanistic model is consistent with experimental observations and provides a molecular-level picture of how E1P to E2P enzyme conformational transitions are coupled to changes in ion-binding stoichiometry and selectivity.
Discovery of novel brain permeable and G protein-biased beta-1 adrenergic receptor partial agonists for the treatment of neurocognitive disorders.
PLoS One. 2017;12(7):e0180319
Authors: Yi B, Jahangir A, Evans AK, Briggs D, Ravina K, Ernest J, Farimani AB, Sun W, Rajadas J, Green M, Feinberg EN, Pande VS, Shamloo M
The beta-1 adrenergic receptor (ADRB1) is a promising therapeutic target intrinsically involved in the cognitive deficits and pathological features associated with Alzheimer’s disease (AD). Evidence indicates that ADRB1 plays an important role in regulating neuroinflammatory processes, and activation of ADRB1 may produce neuroprotective effects in neuroinflammatory diseases. Novel small molecule modulators of ADRB1, engineered to be highly brain permeable and functionally selective for the G protein with partial agonistic activity, could have tremendous value both as pharmacological tools and potential lead molecules for further preclinical development. The present study describes our ongoing efforts toward the discovery of functionally selective partial agonists of ADRB1 that have potential therapeutic value for AD and neuroinflammatory disorders, which has led to the identification of the molecule STD-101-D1. As a functionally selective agonist of ADRB1, STD-101-D1 produces partial agonistic activity on G protein signaling with an EC50 value in the low nanomolar range, but engages very little beta-arrestin recruitment compared to the unbiased agonist isoproterenol. STD-101-D1 also inhibits the tumor necrosis factor α (TNFα) response induced by lipopolysaccharide (LPS) both in vitro and in vivo, and shows high brain penetration. Other than the therapeutic role, this newly identified, functionally selective, partial agonist of ADRB1 is an invaluable research tool to study mechanisms of G protein-coupled receptor signal transduction.
OpenMM 7: Rapid development of high performance algorithms for molecular dynamics.
PLoS Comput Biol. 2017 Jul 26;13(7):e1005659
Authors: Eastman P, Swails J, Chodera JD, McGibbon RT, Zhao Y, Beauchamp KA, Wang LP, Simmonett AC, Harrigan MP, Stern CD, Wiewiora RP, Brooks BR, Pande VS
OpenMM is a molecular dynamics simulation toolkit with a unique focus on extensibility. It allows users to easily add new features, including forces with novel functional forms, new integration algorithms, and new simulation protocols. Those features automatically work on all supported hardware types (including both CPUs and GPUs) and perform well on all of them. In many cases they require minimal coding, just a mathematical description of the desired function. They also require no modification to OpenMM itself and can be distributed independently of OpenMM. This makes it an ideal tool for researchers developing new simulation methods, and also allows those new methods to be immediately available to the larger community.
PMID: 28746339 [PubMed – as supplied by publisher]
Computational and Experimental Evaluation of Designed β-Cap Hairpins Using Molecular Simulations and Kinetic Network Models.
J Chem Inf Model. 2017 07 24;57(7):1609-1620
Authors: Ge Y, Kier BL, Andersen NH, Voelz VA
Molecular simulation has been used to model the detailed folding properties of peptides, yet prospective computational peptide design by such approaches remains challenging and nontrivial. To test the accuracy of simulation-based hairpin design, we characterized the folding properties of a series of so-called β-cap hairpin peptides designed to mimic a conserved hairpin of LapD, a bacterial intracellular signaling protein, both experimentally by NMR spectroscopy and computationally by implicit-solvent replica-exchange molecular dynamics using three different AMBER force fields (ff96, ff99sb-ildn, and ff99sb-ildn-NMR). A unique challenge presented by these designs is the presence of both a terminal Trp-Trp capping motif and a conserved GWxQ motif in the hairpin turn required for binding to LapG. Consistent with previous studies, we found AMBER ff96 to be the most accurate when used with the OBC GBSA implicit solvent model, despite its known bias toward β-sheet conformations when used in explicit-solvent simulations. To gain microscopic insight into the folding landscape of the hairpin designs, we additionally performed parallel simulations on the Folding@home distributed computing platform using AMBER ff99sb-ildn-NMR with TIP3P explicit solvent. Markov state models (MSMs) built from trajectory data reveal a number of non-native interactions between Trp and other amino acid side chains, creating potential problems in achieving well-folded hairpin structures in solution.
Authors: Ramsundar B, Liu B, Wu Z, Verras A, Tudor M, Sheridan RP, Pande VS
Multitask deep learning has emerged as a powerful tool for computational drug discovery. However, despite a number of preliminary studies, multitask deep networks have yet to be widely deployed in the pharmaceutical and biotech industries. This lack of acceptance stems from both software difficulties and from lack of understanding of the robustness of multitask deep networks. Our work aims to resolve both of these barriers to adoption. We introduce a high-quality open-source implementation of multitask deep networks as part of the DeepChem open-source platform. Our implementation enables simple python scripts to construct, fit, and evaluate sophisticated deep models. We use our implementation to analyze the performance of multitask deep networks and related deep models on four collections of pharmaceutical data (three of which have not previously been analyzed in the literature). We split these datasets into train/valid/test using time and neighbor-split to test multitask deep-learning performance under challenging conditions. Our results demonstrate that multitask deep networks are surprisingly robust and can offer strong improvement over random forests. Our analysis and open-source implementation in DeepChem provide an argument that multitask deep-networks are ready for widespread use in commercial drug discovery.
PMID: 28692267 [PubMed – as supplied by publisher]
Precisely tuneable energy transfer system using peptoid helix-based molecular scaffold.
Sci Rep. 2017 Jul 06;7(1):4786
Authors: Kang B, Yang W, Lee S, Mukherjee S, Forstater J, Kim H, Goh B, Kim TY, Voelz VA, Pang Y, Seo J
The energy flow during natural photosynthesis is controlled by maintaining the spatial arrangement of pigments, employing helices as scaffolds. In this study, we have developed porphyrin-peptoid (pigment-helix) conjugates (PPCs) that can modulate the donor-acceptor energy transfer efficiency with exceptional precision by controlling the relative distance and orientation of the two pigments. Five donor-acceptor molecular dyads were constructed using zinc porphyrin and free base porphyrin (Zn(i + 2)-Zn(i + 6)), and highly efficient energy transfer was demonstrated with estimated efficiencies ranging from 92% to 96% measured by static fluorescence emission in CH2Cl2 and from 96.3% to 97.6% using femtosecond transient absorption measurements in toluene, depending on the relative spatial arrangement of the donor-acceptor pairs. Our results suggest that the remarkable precision and tunability exhibited by nature can be achieved by mimicking the design principles of natural photosynthetic proteins.
MSMBuilder: Statistical Models for Biomolecular Dynamics.
Biophys J. 2017 Jan 10;112(1):10-15
Authors: Harrigan MP, Sultan MM, Hernández CX, Husic BE, Eastman P, Schwantes CR, Beauchamp KA, McGibbon RT, Pande VS
MSMBuilder is a software package for building statistical models of high-dimensional time-series data. It is designed with a particular focus on the analysis of atomistic simulations of biomolecular dynamics such as protein folding and conformational change. MSMBuilder is named for its ability to construct Markov state models (MSMs), a class of models that has gained favor among computational biophysicists. In addition to both well-established and newer MSM methods, the package includes complementary algorithms for understanding time-series data such as hidden Markov models and time-structure based independent component analysis. MSMBuilder boasts an easy to use command-line interface, as well as clear and consistent abstractions through its Python application programming interface. MSMBuilder was developed with careful consideration for compatibility with the broader machine learning community by following the design of scikit-learn. The package is used primarily by practitioners of molecular dynamics, but is just as applicable to other computational or experimental time-series measurements.
The oral microbiome is a dynamic environment inhabited by both commensals and pathogens. Among these is Streptococcus mutans, the causative agent of dental caries, the most prevalent childhood disease. Carolacton has remarkably specific activity against S. mutans, causing acid-mediated cell death during biofilm formation; however, its complex structure limits its utility. Herein, we report the diverted total synthesis and biological evaluation of a rationally designed library of simplified analogs that unveiled three unique biofilm phenotypes further validating the role of natural product synthesis in the discovery of new biological phenomena.
L-2-Hydroxyglutarate production arises from noncanonical enzyme function at acidic pH.
Nat Chem Biol. 2017 05;13(5):494-500
Authors: Intlekofer AM, Wang B, Liu H, Shah H, Carmona-Fontaine C, Rustenburg AS, Salah S, Gunner MR, Chodera JD, Cross JR, Thompson CB
The metabolite 2-hydroxyglutarate (2HG) can be produced as either a D-R- or L-S- enantiomer, each of which inhibits α-ketoglutarate (αKG)-dependent enzymes involved in diverse biologic processes. Oncogenic mutations in isocitrate dehydrogenase (IDH) produce D-2HG, which causes a pathologic blockade in cell differentiation. On the other hand, oxygen limitation leads to accumulation of L-2HG, which can facilitate physiologic adaptation to hypoxic stress in both normal and malignant cells. Here we demonstrate that purified lactate dehydrogenase (LDH) and malate dehydrogenase (MDH) catalyze stereospecific production of L-2HG via ‘promiscuous’ reduction of the alternative substrate αKG. Acidic pH enhances production of L-2HG by promoting a protonated form of αKG that binds to a key residue in the substrate-binding pocket of LDHA. Acid-enhanced production of L-2HG leads to stabilization of hypoxia-inducible factor 1 alpha (HIF-1α) in normoxia. These findings offer insights into mechanisms whereby microenvironmental factors influence production of metabolites that alter cell fate and function.
Solvation free energies can now be calculated precisely from molecular simulations, providing a valuable test of the energy functions underlying these simulations. Here, we briefly review “alchemical” approaches for calculating the solvation free energies of small, neutral organic molecules from molecular simulations, and illustrate by applying them to calculate aqueous solvation free energies (hydration free energies). These approaches use a non-physical pathway to compute free energy differences from a simulation or set of simulations and appear to be a particularly robust and general-purpose approach for this task. We also present an update (version 0.5) to our FreeSolv database of experimental and calculated hydration free energies of neutral compounds and provide input files in formats for several simulation packages. This revision to FreeSolv provides calculated values generated with a single protocol and software version, rather than the heterogeneous protocols used in the prior version of the database. We also further update the database to provide calculated enthalpies and entropies of hydration and some experimental enthalpies and entropies, as well as electrostatic and nonpolar components of solvation free energies.
The catalytic activity of many protein kinases is controlled by conformational changes of a conserved Asp-Phe-Gly (DFG) motif. We used an infrared probe to track the DFG motif of the mitotic kinase Aurora A (AurA) and found that allosteric activation by the spindle-associated protein Tpx2 involves an equilibrium shift toward the active DFG-in state. Förster resonance energy transfer experiments show that the activation loop undergoes a nanometer-scale movement that is tightly coupled to the DFG equilibrium. Tpx2 further activates AurA by stabilizing a water-mediated allosteric network that links the C-helix to the active site through an unusual polar residue in the regulatory spine. The polar spine residue and water network of AurA are essential for phosphorylation-driven activation, but an alternative form of the water network found in related kinases can support Tpx2-driven activation, suggesting that variations in the water-mediated hydrogen bond network mediate regulatory diversification in protein kinases.
Quantifying Allosteric Communication via Both Concerted Structural Changes and Conformational Disorder with CARDS.
J Chem Theory Comput. 2017 Apr 11;13(4):1509-1517
Authors: Singh S, Bowman GR
Allosteric (i.e., long-range) communication within proteins is crucial for many biological processes, such as the activation of signaling cascades in response to specific stimuli. However, the physical basis for this communication remains unclear. Existing computational methods for identifying allostery focus on the role of concerted structural changes, but recent experimental work demonstrates that disorder is also an important factor. Here, we introduce the Correlation of All Rotameric and Dynamical States (CARDS) framework for quantifying correlations between both the structure and disorder of different regions of a protein. To quantify disorder, we draw inspiration from methods for quantifying “dynamic heterogeneity” from chemical physics to classify segments of a dihedral’s time evolution as being in either ordered or disordered regimes. To demonstrate the utility of this approach, we apply CARDS to the Catabolite Activator Protein (CAP), a transcriptional activator that is regulated by Cyclic Adenosine MonoPhosphate (cAMP) binding. We find that CARDS captures allosteric communication between the two cAMP-Binding Domains (CBDs). Importantly, CARDS reveals that this coupling is dominated by disorder-mediated correlations, consistent with NMR experiments that establish allosteric coupling between the CBDs occurs without a concerted structural change. CARDS also recapitulates an enhanced role for disorder in the communication between the DNA-Binding Domains (DBDs) and CBDs in the S62F variant of CAP. Finally, we demonstrate that using CARDS to find communication hotspots identifies regions of CAP that are in allosteric communication without foreknowledge of their identities. Therefore, we expect CARDS to be of great utility for both understanding and predicting allostery.
tICA-Metadynamics: Accelerating Metadynamics by using kinetically selected collective variables.
J Chem Theory Comput. 2017 Apr 06;:
Authors: M Sultan M, Pande VS
Metadynamics is a powerful enhanced molecular dynamics sampling method that accelerates simulations by adding history-dependent multi-dimensional Gaussians along selective collective variables (CVs). In practice, choosing a small number of slow CVs remains challenging due to the inherent high dimensionality of biophysical systems. Here we show that time-structure based independent component analysis (tICA), a recent advance in Markov state model literature, can be used to identify a set of variationally optimal slow coordinates for use as CVs for Metadynamics. We show that linear and non-linear tICA-Metadynamics can complement existing MD studies by explicitly sampling the system’s slowest modes, and can even drive transitions along the slowest modes even when no such transitions are observed in unbiased simulations.
PMID: 28383914 [PubMed – as supplied by publisher]
Markov modeling reveals novel intracellular modulation of the human TREK-2 selectivity filter.
Sci Rep. 2017 Apr 04;7(1):632
Authors: Harrigan MP, McKiernan KA, Shanmugasundaram V, Denny RA, Pande VS
Two-pore domain potassium (K2P) channel ion conductance is regulated by diverse stimuli that directly or indirectly gate the channel selectivity filter (SF). Recent crystal structures for the TREK-2 member of the K2P family reveal distinct “up” and “down” states assumed during activation via mechanical stretch. We performed 195 μs of all-atom, unbiased molecular dynamics simulations of the TREK-2 channel to probe how membrane stretch regulates the SF gate. Markov modeling reveals a novel “pinched” SF configuration that stretch activation rapidly destabilizes. Free-energy barrier heights calculated for critical steps in the conduction pathway indicate that this pinched state impairs ion conduction. Our simulations predict that this low-conductance state is accessed exclusively in the compressed, “down” conformation in which the intracellular helix arrangement allosterically pinches the SF. By explicitly relating structure to function, we contribute a critical piece of understanding to the evolving K2P puzzle.
Computationally Discovered Potentiating Role of Glycans on NMDA Receptors.
Sci Rep. 2017 Apr 05;7:44578
Authors: Sinitskiy AV, Stanley NH, Hackos DH, Hanson JE, Sellers BD, Pande VS
N-methyl-D-aspartate receptors (NMDARs) are glycoproteins in the brain central to learning and memory. The effects of glycosylation on the structure and dynamics of NMDARs are largely unknown. In this work, we use extensive molecular dynamics simulations of GluN1 and GluN2B ligand binding domains (LBDs) of NMDARs to investigate these effects. Our simulations predict that intra-domain interactions involving the glycan attached to residue GluN1-N440 stabilize closed-clamshell conformations of the GluN1 LBD. The glycan on GluN2B-N688 shows a similar, though weaker, effect. Based on these results, and assuming the transferability of the results of LBD simulations to the full receptor, we predict that glycans at GluN1-N440 might play a potentiator role in NMDARs. To validate this prediction, we perform electrophysiological analysis of full-length NMDARs with a glycosylation-preventing GluN1-N440Q mutation, and demonstrate an increase in the glycine EC50 value. Overall, our results suggest an intramolecular potentiating role of glycans on NMDA receptors.
Propagation of the Allosteric Modulation Induced by Sodium in the δ-Opioid Receptor.
Chemistry. 2017 Apr 03;23(19):4615-4624
Authors: Sun X, Laroche G, Wang X, Ågren H, Bowman GR, Giguère PM, Tu Y
Allosteric sodium in the helix bundle of a G protein-coupled receptor (GPCR) can modulate the receptor activation on the intracellular side. This phenomenon has confounded the GPCR community for decades. In this work, we present a theoretical model that reveals the mechanism of the allosteric modulation induced by sodium in the δ-opioid receptor. We found that the allosteric sodium ion exploits a distinct conformation of the key residue Trp2746.48 to propagate the modulation to helices 5 and 6, which further transmits along the helices and regulates their positions on the intracellular side. This mechanism is supported by subsequent functional assays. Remarkably, our results highlight the contrast between the allosteric effects towards two GPCR partners, the G protein and β-arrestin, as indicated by the fact that the allosteric modulation initiated by the sodium ion significantly affects the β-arrestin recruitment, while it alters the G protein signaling only moderately. We believe that the mechanism revealed in this work can be used to explain allosteric effects initiated by sodium in other GPCRs since the allosteric sodium is highly conserved across GPCRs.
A Maximum-Caliber Approach to Predicting Perturbed Folding Kinetics Due to Mutations.
J Chem Theory Comput. 2016 Dec 13;12(12):5768-5776
Authors: Wan H, Zhou G, Voelz VA
We present a maximum-caliber method for inferring transition rates of a Markov state model (MSM) with perturbed equilibrium populations given estimates of state populations and rates for an unperturbed MSM. It is similar in spirit to previous approaches, but given the inclusion of prior information, it is more robust and simple to implement. We examine its performance in simple biased diffusion models of kinetics and then apply the method to predicting changes in folding rates for several highly nontrivial protein folding systems for which non-native interactions play a significant role, including (1) tryptophan variants of the GB1 hairpin, (2) salt-bridge mutations of the Fs peptide helix, and (3) MSMs built from ultralong folding trajectories of FiP35 and GTT variants of the WW domain. In all cases, the method correctly predicts changes in folding rates, suggesting the wide applicability of maximum-caliber approaches to efficiently predict how mutations perturb protein conformational dynamics.
Authors: Brosey CA, Ho C, Long WZ, Singh S, Burnett K, Hura GL, Nix JC, Bowman GR, Ellenberger T, Tainer JA
Apoptosis-inducing factor (AIF) is critical for mitochondrial respiratory complex biogenesis and for mediating necroptotic parthanatos; these functions are seemingly regulated by enigmatic allosteric switching driven by NADH charge-transfer complex (CTC) formation. Here, we define molecular pathways linking AIF’s active site to allosteric switching regions by characterizing dimer-permissive mutants using small-angle X-ray scattering (SAXS) and crystallography and by probing AIF-CTC communication networks using molecular dynamics simulations. Collective results identify two pathways propagating allostery from the CTC active site: (1) active-site H454 links to S480 of AIF’s central β-strand to modulate a hydrophobic border at the dimerization interface, and (2) an interaction network links AIF’s FAD cofactor, central β-strand, and Cβ-clasp whereby R529 reorientation initiates C-loop release during CTC formation. This knowledge of AIF allostery and its flavoswitch mechanism provides a foundation for biologically understanding and biomedically controlling its participation in mitochondrial homeostasis and cell death.
Training and Validation of a Liquid-Crystalline Phospholipid Bilayer Force Field.
J Chem Theory Comput. 2016 Dec 13;12(12):5960-5967
Authors: McKiernan KA, Wang LP, Pande VS
We present a united-atom model (gb-fb15) for the molecular dynamics simulation of hydrated liquid-crystalline dipalmitoylphosphatidylcholine (DPPC) phospholipid bilayers. This model was constructed through the parameter-space minimization of a regularized least-squares objective function via the ForceBalance method. The objective function was computed using a training set of experimental bilayer area per lipid and deuterium order parameter. This model was validated by comparison to experimental volume per lipid, X-ray scattering form factor, thermal area expansivity, area compressibility modulus, and lipid lateral diffusion coefficient. These comparisons demonstrate that gb-fb15 is robust to temperature variation and an improvement over the original model for both the training and validation properties.
Small molecule distribution coefficients between immiscible nonaqueuous and aqueous phases-such as cyclohexane and water-measure the degree to which small molecules prefer one phase over another at a given pH. As distribution coefficients capture both thermodynamic effects (the free energy of transfer between phases) and chemical effects (protonation state and tautomer effects in aqueous solution), they provide an exacting test of the thermodynamic and chemical accuracy of physical models without the long correlation times inherent to the prediction of more complex properties of relevance to drug discovery, such as protein-ligand binding affinities. For the SAMPL5 challenge, we carried out a blind prediction exercise in which participants were tasked with the prediction of distribution coefficients to assess its potential as a new route for the evaluation and systematic improvement of predictive physical models. These measurements are typically performed for octanol-water, but we opted to utilize cyclohexane for the nonpolar phase. Cyclohexane was suggested to avoid issues with the high water content and persistent heterogeneous structure of water-saturated octanol phases, since it has greatly reduced water content and a homogeneous liquid structure. Using a modified shake-flask LC-MS/MS protocol, we collected cyclohexane/water distribution coefficients for a set of 53 druglike compounds at pH 7.4. These measurements were used as the basis for the SAMPL5 Distribution Coefficient Challenge, where 18 research groups predicted these measurements before the experimental values reported here were released. In this work, we describe the experimental protocol we utilized for measurement of cyclohexane-water distribution coefficients, report the measured data, propose a new bootstrap-based data analysis procedure to incorporate multiple sources of experimental error, and provide insights to help guide future iterations of this valuable exercise in predictive modeling.
Structure and Dynamics of PD-L1 and an Ultra-High-Affinity PD-1 Receptor Mutant.
Structure. 2016 Oct 04;24(10):1719-1728
Authors: Pascolutti R, Sun X, Kao J, Maute RL, Ring AM, Bowman GR, Kruse AC
The immune checkpoint receptor PD-1 and its ligand, PD-L1, have emerged as key regulators of anti-tumor immunity in humans. Recently, we reported an ultra-high-affinity PD-1 mutant, termed high-affinity consensus (HAC) PD-1, which shows superior therapeutic efficacy in mice compared with antibodies. However, the molecular details underlying the action of this agent remain incompletely understood, and a molecular view of PD-1/PD-L1 interactions in general is only beginning to emerge. Here, we report the structure of HAC PD-1 in complex with PD-L1, showing that it binds PD-L1 using a unique set of polar interactions. Biophysical studies and long-timescale molecular dynamics experiments reveal the mechanisms by which ten point mutations confer a 35,000-fold enhancement in binding affinity, and offer atomic-scale views of the role of conformational dynamics in PD-1/PD-L1 interactions. Finally, we show that the HAC PD-1 exhibits pH-dependent affinity, with pseudo-irreversible binding in a low pH setting akin to the tumor microenvironment.
Modelling proteins’ hidden conformations to predict antibiotic resistance.
Nat Commun. 2016 10 06;7:12965
Authors: Hart KM, Ho CM, Dutta S, Gross ML, Bowman GR
TEM β-lactamase confers bacteria with resistance to many antibiotics and rapidly evolves activity against new drugs. However, functional changes are not easily explained by differences in crystal structures. We employ Markov state models to identify hidden conformations and explore their role in determining TEM’s specificity. We integrate these models with existing drug-design tools to create a new technique, called Boltzmann docking, which better predicts TEM specificity by accounting for conformational heterogeneity. Using our MSMs, we identify hidden states whose populations correlate with activity against cefotaxime. To experimentally detect our predicted hidden states, we use rapid mass spectrometric footprinting and confirm our models’ prediction that increased cefotaxime activity correlates with reduced Ω-loop flexibility. Finally, we design novel variants to stabilize the hidden cefotaximase states, and find their populations predict activity against cefotaxime in vitro and in vivo. Therefore, we expect this framework to have numerous applications in drug and protein design.
These data may explain the high incidence of mTOR mutations observed in clear cell kidney cancer, where VHL loss and HIF activation is pathognomonic. Our study provides mechanistic and therapeutic insights concerning mTOR mutations in human diseases.
Genomic studies have linked mTORC1 pathway–activating mutations with exceptional response to treatment with allosteric inhibitors of mTORC1 called rapalogs. Rapalogs are approved for selected cancer types, including kidney and breast cancers. Here, we used sequencing data from 22 human kidney cancer cases to identify the activating mechanisms conferred by mTOR mutations observed in human cancers and advance precision therapeutics. mTOR mutations that clustered in focal adhesion kinase targeting domain (FAT) and kinase domains enhanced mTORC1 kinase activity, decreased nutrient reliance, and increased cell size. We identified 3 distinct mechanisms of hyperactivation, including reduced binding to DEP domain–containing MTOR-interacting protein (DEPTOR), resistance to regulatory associated protein of mTOR–mediated (RAPTOR-mediated) suppression, and altered kinase kinetics. Of the 28 mTOR double mutants, activating mutations could be divided into 6 complementation groups, resulting in synergistic Rag- and Ras homolog enriched in brain–independent (RHEB-independent) mTORC1 activation. mTOR mutants were resistant to DNA damage–inducible transcript 1–mediated (REDD1-mediated) inhibition, confirming that activating mutations can bypass the negative feedback pathway formed between HIF1 and mTORC1 in the absence of von Hippel–Lindau (VHL) tumor suppressor expression. Moreover, VHL-deficient cells that expressed activating mTOR mutants grew tumors that were sensitive to rapamycin treatment.
Mechanistically distinct cancer-associated mTOR activation clusters predict sensitivity to rapamycin.
J Clin Invest. 2016 09 01;126(9):3526-40
Authors: Xu J, Pham CG, Albanese SK, Dong Y, Oyama T, Lee CH, Rodrik-Outmezguine V, Yao Z, Han S, Chen D, Parton DL, Chodera JD, Rosen N, Cheng EH, Hsieh JJ
Genomic studies have linked mTORC1 pathway-activating mutations with exceptional response to treatment with allosteric inhibitors of mTORC1 called rapalogs. Rapalogs are approved for selected cancer types, including kidney and breast cancers. Here, we used sequencing data from 22 human kidney cancer cases to identify the activating mechanisms conferred by mTOR mutations observed in human cancers and advance precision therapeutics. mTOR mutations that clustered in focal adhesion kinase targeting domain (FAT) and kinase domains enhanced mTORC1 kinase activity, decreased nutrient reliance, and increased cell size. We identified 3 distinct mechanisms of hyperactivation, including reduced binding to DEP domain-containing MTOR-interacting protein (DEPTOR), resistance to regulatory associated protein of mTOR-mediated (RAPTOR-mediated) suppression, and altered kinase kinetics. Of the 28 mTOR double mutants, activating mutations could be divided into 6 complementation groups, resulting in synergistic Rag- and Ras homolog enriched in brain-independent (RHEB-independent) mTORC1 activation. mTOR mutants were resistant to DNA damage-inducible transcript 1-mediated (REDD1-mediated) inhibition, confirming that activating mutations can bypass the negative feedback pathway formed between HIF1 and mTORC1 in the absence of von Hippel-Lindau (VHL) tumor suppressor expression. Moreover, VHL-deficient cells that expressed activating mTOR mutants grew tumors that were sensitive to rapamycin treatment. These data may explain the high incidence of mTOR mutations observed in clear cell kidney cancer, where VHL loss and HIF activation is pathognomonic. Our study provides mechanistic and therapeutic insights concerning mTOR mutations in human diseases.
Markov models of the apo-MDM2 lid region reveal diffuse yet two-state binding dynamics and receptor poses for computational docking.
Sci Rep. 2016 08 19;6:31631
Authors: Mukherjee S, Pantelopulos GA, Voelz VA
MDM2 is a negative regulator of p53 activity and an important target for cancer therapeutics. The N-terminal lid region of MDM2 modulates interactions with p53 via competition for its binding cleft, exchanging slowly between docked and undocked conformations in the absence of p53. To better understand these dynamics, we constructed Markov State Models (MSMs) from large collections of unbiased simulation trajectories of apo-MDM2, and find strong evidence for diffuse, yet two-state folding and binding of the N-terminal region to the p53 receptor site. The MSM also identifies holo-like receptor conformations highly suitable for computational docking, despite initiating trajectories from closed-cleft receptor structures unsuitable for docking. Fixed-anchor docking studies using a test set of high-affinity small molecules and peptides show simulated receptor ensembles achieve docking successes comparable to cross-docking studies using crystal structures of receptors bound by alternative ligands. For p53, the best-scoring receptor structures have the N-terminal region lid region bound in a helical conformation mimicking the bound structure of p53, suggesting lid region association induces receptor conformations suitable for binding. These results suggest that MD + MSM approaches can sample binding-competent receptor conformations suitable for computational peptidomimetic design, and that inclusion of disordered regions may be essential to capturing the correct receptor dynamics.
Our results suggest that the combination of MSM with TPT provides an effective framework to represent conformational transitions in complex biomolecular systems.
Nonreceptor tyrosine kinases of the Src family are large multidomain allosteric proteins that are crucial to cellular signaling pathways. In a previous study, we generated a Markov state model (MSM) to simulate the activation of c-Src catalytic domain, used as a prototypical tyrosine kinase. The long-time kinetics of transition predicted by the MSM was in agreement with experimental observations. In the present study, we apply the framework of transition path theory (TPT) to the previously constructed MSM to characterize the main features of the activation pathway. The analysis indicates that the activating transition, in which the activation loop first opens up followed by an inward rotation of the αC-helix, takes place via a dense set of intermediate microstates distributed within a fairly broad “transition tube” in a multidimensional conformational subspace connecting the two end-point conformations. Multiple microstates with negligible equilibrium probabilities carry a large transition flux associated with the activating transition, which explains why extensive conformational sampling is necessary to accurately determine the kinetics of activation.
We envision how these diverse experimental and computational tools can work together through formation of a “computational beamline” that will allow key functional features to be identified in IDP structural ensembles.
The traditional structure-function paradigm has provided significant insights for well-folded proteins in which structures can be easily and rapidly revealed by X-ray crystallography beamlines. However, approximately one-third of the human proteome is comprised of intrinsically disordered proteins and regions (IDPs/IDRs) that do not adopt a dominant well-folded structure, and therefore remain “unseen” by traditional structural biology methods. This Perspective considers the challenges raised by the “Dark Proteome”, in which determining the diverse conformational substates of IDPs in their free states, in encounter complexes of bound states, and in complexes retaining significant disorder requires an unprecedented level of integration of multiple and complementary solution-based experiments that are analyzed with state-of-the art molecular simulation, Bayesian probabilistic models, and high-throughput computation.
The rapidly expanding body of available genomic and protein structural data provides a rich resource for understanding protein dynamics with biomolecular simulation. While computational infrastructure has grown rapidly, simulations on an omics scale are not yet widespread, primarily because software infrastructure to enable simulations at this scale has not kept pace. It should now be possible to study protein dynamics across entire (super)families, exploiting both available structural biology data and conformational similarities across homologous proteins. Here, we present a new tool for enabling high-throughput simulation in the genomics era. Ensembler takes any set of sequences-from a single sequence to an entire superfamily-and shepherds them through various stages of modeling and refinement to produce simulation-ready structures. This includes comparative modeling to all relevant PDB structures (which may span multiple conformational states of interest), reconstruction of missing loops, addition of missing atoms, culling of nearly identical structures, assignment of appropriate protonation states, solvation in explicit solvent, and refinement and filtering with molecular simulation to ensure stable simulation. The output of this pipeline is an ensemble of structures ready for subsequent molecular simulations using computer clusters, supercomputers, or distributed computing projects like Folding@home. Ensembler thus automates much of the time-consuming process of preparing protein models suitable for simulation, while allowing scalability up to entire superfamilies. A particular advantage of this approach can be found in the construction of kinetic models of conformational dynamics-such as Markov state models (MSMs)-which benefit from a diverse array of initial configurations that span the accessible conformational states to aid sampling. We demonstrate the power of this approach by constructing models for all catalytic domains in the human tyrosine kinase family, using all available kinase catalytic domain structures from any organism as structural templates. Ensembler is free and open source software licensed under the GNU General Public License (GPL) v2. It is compatible with Linux and OS X. The latest release can be installed via the conda package manager, and the latest source can be downloaded from https://github.com/choderalab/ensembler.
After reanalyzing simulations of NuG2-a designed mutant of protein G-generated by Lindorff-Larsen et al. with time structure-based independent components analysis and Markov state models as well as performing 1.5 ms of additional sampling on Folding@home, we found an intermediate with a register-shift in one of the β-sheets that was visited along a minor folding pathway. The minor folding pathway was initiated by the register-shifted sheet, which is composed of solely nonnative contacts, suggesting that for some peptides, nonnative contacts can lead to productive folding events.
To confirm this experimentally, we suggest a mutational strategy for stabilizing the register shift, as well as an infrared experiment that could observe the nonnative folding nucleus.
Finally, our model predicts a novel binding interface that is well-populated in the Ca(2+)-bound regime and, thus, a candidate for pharmacological intervention.
Calmodulin (CaM) is a ubiquitous Ca(2+) sensor and a crucial signalling hub in many pathways aberrantly activated in disease. However, the mechanistic basis of its ability to bind diverse signalling molecules including G-protein-coupled receptors, ion channels and kinases remains poorly understood. Here we harness the high resolution of molecular dynamics simulations and the analytical power of Markov state models to dissect the molecular underpinnings of CaM binding diversity. Our computational model indicates that in the absence of Ca(2+), sub-states in the folded ensemble of CaM’s C-terminal domain present chemically and sterically distinct topologies that may facilitate conformational selection. Furthermore, we find that local unfolding is off-pathway for the exchange process relevant for peptide binding, in contrast to prior hypotheses that unfolding might account for binding diversity.Finally, our model predicts a novel binding interface that is well-populated in the Ca(2+)-bound regime and, thus, a candidate for pharmacological intervention.
Salt-bridge interactions play an important role in stabilizing many protein structures, and have been shown to be designable features for protein design. In this work, we study the effects of non-native salt bridges on the folding of a soluble alanine-based peptide (Fs peptide) using extensive all-atom molecular dynamics simulations performed on the Folding@home distributed computing platform. Using Markov State Models, we show how non-native salt-bridges affect the folding kinetics of Fs peptide by perturbing specific conformational states. Furthermore, we present methods for the automatic detection and analysis of such states. These results provide insight into helix foldingmechanisms and useful information to guide simulation-based computational protein design.
Insights into Peptoid Helix Folding Cooperativity from an Improved Backbone Potential.
J Phys Chem B. 2015 Dec 17;119(50):15407-17
Authors: Mukherjee S, Zhou G, Michel C, Voelz VA
Peptoids (N-substituted oligoglycines) are biomimetic polymers that can fold into a variety of unique structural scaffolds. Peptoid helices, which result from the incorporation of bulky chiral side chains, are a key peptoid structural motif whose formation has not yet been accurately modeled in molecular simulations. Here, we report that a simple modification of the backbone φ-angle potential in GAFF is able to produce well-folded cis-amide helices of (S)-N-(1-phenylethyl)glycine (Nspe), consistent with experiment. We validate our results against both QM calculations and NMR experiments. For this latter task, we make quantitative comparisons to sparse NOE data using the Bayesian Inference of Conformational Populations (BICePs) algorithm, a method we have recently developed for this purpose. We then performed extensive REMD simulations of Nspe oligomers as a function of chain length and temperature to probe the molecular forces driving cooperative helix formation. Analysis of simulation data by Lifson-Roig helix-coil theory show that the modified potential predicts much more cooperative folding for Nspe helices. Unlike peptides, per-residue entropy changes for helix nucleation and extension are mostly positive, suggesting that steric bulk provides the main driving force for folding. We expect these results to inform future work aimed at predicting and designing peptoid peptidomimetics and tertiary assemblies of peptoid helices.
FAST Conformational Searches by Balancing Exploration/Exploitation Trade-Offs.
J Chem Theory Comput. 2015 Dec 08;11(12):5747-57
Authors: Zimmerman MI, Bowman GR
Molecular dynamics simulations are a powerful means of understanding conformational changes. However, it is still difficult to simulate biologically relevant time scales without the use of specialized supercomputers. Here, we introduce a goal-oriented sampling method, called fluctuation amplification of specific traits (FAST), for extending the capabilities of commodity hardware. This algorithm rapidly searches conformational space for structures with desired properties by balancing trade-offs between focused searches around promising solutions (exploitation) and trying novel solutions (exploration). FAST was inspired by the hypothesis that many physical properties have an overall gradient in conformational space, akin to the energetic gradients that are known to guide proteins to their folded states. For example, we expect that transitioning from a conformation with a small solvent-accessible surface area to one with a large surface area will require passing through a series of conformations with steadily increasing surface areas. We demonstrate that such gradients are common through retrospective analysis of existing Markov state models (MSMs). Then we design the FAST algorithm to exploit these gradients to find structures with desired properties by (1) recognizing and amplifying structural fluctuations along gradients that optimize a selected physical property whenever possible, (2) overcoming barriers that interrupt these overall gradients, and (3) rerouting to discover alternative paths when faced with insurmountable barriers. To test FAST, we compare its performance to other methods for three common types of problems: (1) identifying unexpected binding pockets, (2) discovering the preferred paths between specific structures, and (3) folding proteins. Our conservative estimate is that FAST outperforms conventional simulations and an adaptive sampling algorithm by at least an order of magnitude. Furthermore, FAST yields both the proper thermodynamics and kinetics, allowing for a direct connection with kinetic experiments that is impossible with many other advanced sampling algorithms because they provide only thermodynamic information. Therefore, we expect FAST to be of great utility for a wide range of applications.
Biophys J. 2015 Oct 20;109(8):1528-32. doi: 10.1016/j.bpj.2015.08.015.
McGibbon RT, Beauchamp KA, Harrigan MP, Klein C, Swails JM, Hernández CX, Schwantes CR, Wang LP, Lane TJ, Pande VS
As molecular dynamics (MD) simulations continue to evolve into powerful computational tools for studying complex biomolecular systems, the necessity of flexible and easy-to-use software tools for the analysis of these simulations is growing. We have developed MDTraj, a modern, lightweight, and fast software package for analyzing MD simulations. MDTraj reads and writes trajectory data in a wide variety of commonly used formats. It provides a large number of trajectory analysis capabilities including minimal root-mean-square-deviation calculations, secondary structure assignment, and the extraction of common order parameters. The package has a strong focus on interoperability with the wider scientific Python ecosystem, bridging the gap between MD data and the rapidly growing collection of industry-standard statistical analysis and visualization tools in Python. MDTraj is a powerful and user-friendly software package that simplifies the analysis of MD data and connects these datasets with the modern interactive data science software ecosystem in Python.
Proc Natl Acad Sci U S A. 2015 Aug 18;112(33):10377-82. doi: 10.1073/pnas.1501804112. Epub 2015 Aug 3.
Weber JK, Shukla D, Pande VS
Life is fundamentally a nonequilibrium phenomenon. At the expense of dissipated energy, living things perform irreversible processes that allow them to propagate and reproduce. Within cells, evolution has designed nanoscale machines to do meaningful work with energy harnessed from a continuous flux of heat and particles. As dictated by the Second Law of Thermodynamics and its fluctuation theorem corollaries, irreversibility in nonequilibrium processes can be quantified in terms of how much entropy such dynamics produce. In this work, we seek to address a fundamental question linking biology and nonequilibrium physics: can the evolved dissipative pathways that facilitate biomolecular function be identified by their extent of entropy production in general relaxation processes? We here synthesize massive molecular dynamics simulations, Markov state models (MSMs), and nonequilibrium statistical mechanical theory to probe dissipation in two key classes of signaling proteins: kinases and G-protein-coupled receptors (GPCRs). Applying machinery from large deviation theory, we use MSMs constructed from protein simulations to generate dynamics conforming to positive levels of entropy production. We note the emergence of an array of peaks in the dynamical response (transient analogs of phase transitions) that draw the proteins between distinct levels of dissipation, and we see that the binding of ATP and agonist molecules modifies the observed dissipative landscapes. Overall, we find that dissipation is tightly coupled to activation in these signaling systems: dominant entropy-producing trajectories become localized near important barriers along known biological activation pathways. We go on to classify an array of equilibrium and nonequilibrium molecular switches that harmonize to promote functional dynamics.
Continuous-time Markov processes over finite state-spaces are widely used to model dynamical processes in many fields of natural and social science. Here, we introduce a maximum likelihood estimator for constructing such models from data observed at a finite time interval. This estimator is dramatically more efficient than prior approaches, enables the calculation of deterministic confidence intervals in all model parameters, and can easily enforce important physical constraints on the models such as detailed balance. We demonstrate and discuss the advantages of these models over existing discrete-time Markov models for the analysis of molecular dynamics simulations.
J Chem Phys. 2015 Jul 7;143(1):014504. doi: 10.1063/1.4923338.
Qi R, Wang LP, Wang Q, Pande VS, Ren P
We report the development of a united AMOEBA (uAMOEBA) polarizable water model, which is computationally 3-5 times more efficient than the three-site AMOEBA03 model in molecular dynamics simulations while providing comparable accuracy for gas-phase and liquid properties. In this coarse-grained polarizable water model, both electrostatic (permanent and induced) and van der Waals representations have been reduced to a single site located at the oxygen atom. The permanent charge distribution is described via the molecular dipole and quadrupole moments and the many-body polarization via an isotropic molecular polarizability, all located at the oxygen center. Similarly, a single van der Waals interaction site is used for each water molecule. Hydrogen atoms are retained only for the purpose of defining local frames for the molecular multipole moments and intramolecular vibrational modes. The parameters have been derived based on a combination of ab initio quantum mechanical and experimental data set containing gas-phase cluster structures and energies, and liquid thermodynamic properties. For validation, additional properties including dimer interaction energy, liquid structures, self-diffusion coefficient, and shear viscosity have been evaluated. The results demonstrate good transferability from the gas to the liquid phase over a wide range of temperatures, and from nonpolar to polar environments, due to the presence of molecular polarizability. The water coordination, hydrogen-bonding structure, and dynamic properties given by uAMOEBA are similar to those derived from the all-atom AMOEBA03 model and experiments. Thus, the current model is an accurate and efficient alternative for modeling water.
Comput Sci Eng. 2015 Jul 1;12(4):34-39.
Eastman P, Pande VS
The wide diversity of computer architectures today requires a new approach to software development. OpenMM is a framework for molecular mechanics simulations, allowing a single program to run efficiently on a variety of hardware platforms.
Nat Commun. 2015 Jun 15;6:7283. doi: 10.1038/ncomms8283.
Vanatta DK, Shukla D, Lawrenz M, Pande VS
Recent successes in simulating protein structure and folding dynamics have demonstrated the power of molecular dynamics to predict the long timescale behaviour of proteins. Here, we extend and improve these methods to predict molecular switches that characterize conformational change pathways between the active and inactive state of nitrogen regulatory protein C (NtrC). By employing unbiased Markov state model-based molecular dynamics simulations, we construct a dynamic picture of the activation pathways of this key bacterial signalling protein that is consistent with experimental observations and predicts new mutants that could be used for validation of the mechanism. Moreover, these results suggest a novel mechanistic paradigm for conformational switching.
Methods Enzymol. 2015;557:551-72. doi: 10.1016/bs.mie.2014.12.007. Epub 2015 Mar 24.
Shukla D, Lawrenz M, Pande VS.
G-protein-coupled receptors (GPCRs) are a versatile family of membrane-bound signaling proteins. Despite the recent successes in obtaining crystal structures of GPCRs, much needs to be learned about the conformational changes associated with their activation. Furthermore, the mechanism by which ligands modulate the activation of GPCRs has remained elusive. Molecular simulations provide a way of obtaining detailed an atomistic description of GPCR activation dynamics. However, simulating GPCR activation is challenging due to the long timescales involved and the associated challenge of gaining insights from the “Big” simulation datasets. Here, we demonstrate how cloud-computing approaches have been used to tackle these challenges and obtain insights into the activation mechanism of GPCRs. In particular, we review the use of Markov state model (MSM)-based sampling algorithms for sampling milliseconds of dynamics of a major drug target, the G-protein-coupled receptor β2-AR. MSMs of agonist and inverse agonist-bound β2-AR reveal multiple activation pathways and how ligands function via modulation of the ensemble of activation pathways. We target this ensemble of conformations with computer-aided drug design approaches, with the goal of designing drugs that interact more closely with diverse receptor states, for overall increased efficacy and specificity. We conclude by discussing how cloud-based approaches present a powerful and broadly available tool for studying the complex biological systems routinely.
J Chem Phys. 2015 Mar 28;142(12):124105. doi: 10.1063/1.4916292. McGibbon RT, Pande VS
Markov state models are a widely used method for approximating the eigenspectrum of the molecular dynamics propagator, yielding insight into the long-timescale statistical kinetics and slow dynamical modes of biomolecular systems. However, the lack of a unified theoretical framework for choosing between alternative models has hampered progress, especially for non-experts applying these methods to novel biological systems. Here, we consider cross-validation with a new objective function for estimators of these slow dynamical modes, a generalized matrix Rayleigh quotient (GMRQ), which measures the ability of a rank-m projection operator to capture the slow subspace of the system. It is shown that a variational theorem bounds the GMRQ from above by the sum of the first m eigenvalues of the system’s propagator, but that this bound can be violated when the requisite matrix elements are estimated subject to statistical uncertainty. This overfitting can be detected and avoided through cross-validation. These result make it possible to construct Markov state models for protein dynamics in a way that appropriately captures the tradeoff between systematic and statistical errors.
J Phys Chem B. 2015 Jul 23;119(29):9423-37. doi: 10.1021/jp510896n. Epub 2015 Feb 26.Laury ML1, Wang LP2, Pande VS2, Head-Gordon T3, Ponder JW1.
A set of improved parameters for the AMOEBA polarizable atomic multipole water model is developed. An automated procedure, ForceBalance, is used to adjust model parameters to enforce agreement with ab initio-derived results for water clusters and experimental data for a variety of liquid phase properties across a broad temperature range. The values reported here for the new AMOEBA14 water model represent a substantial improvement over the previous AMOEBA03 model. The AMOEBA14 model accurately predicts the temperature of maximum density and qualitatively matches the experimental density curve across temperatures from 249 to 373 K. Excellent agreement is observed for the AMOEBA14 model in comparison to experimental properties as a function of temperature, including the second virial coefficient, enthalpy of vaporization, isothermal compressibility, thermal expansion coefficient, and dielectric constant. The viscosity, self-diffusion constant, and surface tension are also well reproduced. In comparison to high-level ab initio results for clusters of 2-20 water molecules, the AMOEBA14 model yields results similar to AMOEBA03 and the direct polarization iAMOEBA models. With advances in computing power, calibration data, and optimization techniques, we recommend the use of the AMOEBA14 water model for future studies employing a polarizable water model.
Acc Chem Res. 2015 Feb 17;48(2):414-22 Authors: Shukla D, Hernández CX, Weber JK, Pande VS.
Protein function is inextricably linked to protein dynamics. As we move from a static structural picture to a dynamic ensemble view of protein structure and function, novel computational paradigms are required for observing and understanding conformational dynamics of proteins and its functional implications. In principle, molecular dynamics simulations can provide the time evolution of atomistic models of proteins, but the long time scales associated with functional dynamics make it difficult to observe rare dynamical transitions. The issue of extracting essential functional components of protein dynamics from noisy simulation data presents another set of challenges in obtaining an unbiased understanding of protein motions. Therefore, a methodology that provides a statistical framework for efficient sampling and a human-readable view of the key aspects of functional dynamics from data analysis is required. The Markov state model (MSM), which has recently become popular worldwide for studying protein dynamics, is an example of such a framework.
We describe an innovative protocol for ab initio prediction of ligand crystallographic binding poses and highly effective analysis of large datasets generated for protein-ligand dynamics. We include a procedure for setup and performance of distributed molecular dynamics simulations on cloud computing architectures, a model for efficient analysis of simulation data, and a metric for evaluation of model convergence. We give accurate binding pose predictions for five ligands ranging in affinity from 7 nM to > 200 μM for the immunophilin protein FKBP12, for expedited results in cases where experimental structures are difficult to produce. Our approach goes beyond single, low energy ligand poses to give quantitative kinetic information that can inform protein engineering and ligand design.
Kinetic network models of tryptophan mutations in β-hairpins reveal the importance of non-native interactions.
J Chem Theory Comput. 2015 Jun 09;11(6):2801-12
Authors: Razavi AM, Voelz VA
We present an analysis of the most extensive explicit-solvent simulations of β-hairpins to date (9.4 ms in aggregate), with the aim of probing the effects of tryptophan mutations on folding. From molecular simulations of GB1 hairpin, trpzip4, trpzip5, and trpzip6 performed on Folding@home, Markov State Models (MSMs) were constructed using a unified set of metastable states, enabling objective comparison of folding mechanisms. MSM models display quantitative agreement with experimental structural observables and folding kinetics, and predict multimodal kinetics due to specific non-native kinetic traps, which be identified as on- or off-pathway from the network topology. We quantify kinetic frustration by several means, including the s-ensemble method to evaluate glasslike behavior. Free-energy profiles and transition state movement clearly show stabilization of non-native states as Trp mutations are introduced. Remarkably, we find that “β-capped” sequences (trpzip4 and trpzip5) are able to overcome this frustration and remain cooperative two-state folders with a large time-scale gap. These results suggest that, while β-capping motifs are robust, fold stabilization by tryptophan generally may require overcoming significant non-native kinetic traps, perhaps explaining their under-representation in natural proteins.
Surprisal Metrics for Quantifying Perturbed Conformational Dynamics in Markov State Models.
J Chem Theory Comput. 2014 Dec 09;10(12):5716-28
Authors: Voelz VA, Elman B, Razavi AM, Zhou G
Markov state models (MSMs), which model conformational dynamics as a network of transitions between metastable states, have been increasingly used to model the thermodynamics and kinetics of biomolecules. In considering perturbations to molecular dynamics induced by sequence mutations, chemical modifications, or changes in external conditions, it is important to assess how transition rates change, independent of changes in metastable state definitions. Here, we present a surprisal metric to quantify the difference in metastable state transitions for two closely related MSMs, taking into account the statistical uncertainty in observed transition counts. We show that the surprisal is a relative entropy metric closely related to the Jensen-Shannon divergence between two MSMs, which can be used to identify conformational states most affected by perturbations. As examples, we apply the surprisal metric to a two-dimensional lattice model of a protein hairpin with mutations to hydrophobic residues, all-atom simulations of the Fs peptide α-helix with a salt-bridge mutation, and a comparison of protein G β-hairpin with its trpzip4 variant. Moreover, we show that surprisal-based adaptive sampling is an efficient strategy to reduce the statistical uncertainty in the Jensen-Shannon divergence, which could be a useful strategy for molecular simulation-based ab initio design.
D. Shukla, Y. L. Meng, B. Roux, and V. S. Pande. Nat Commun 5 (2014)
Unregulated activation of Src kinases leads to aberrant signalling, uncontrolled growth and differentiation of cancerous cells. Reaching a complete mechanistic understanding of large-scale conformational transformations underlying the activation of kinases could greatly help in the development of therapeutic drugs for the treatment of these pathologies. In principle, the nature of conformational transition could be modelled in silico via atomistic molecular dynamics simulations, although this is very challenging because of the long activation timescales. Here we employ a computational paradigm that couples transition pathway techniques and Markov state model-based massively distributed simulations for mapping the conformational landscape of c-src tyrosine kinase. The computations provide the thermodynamics and kinetics of kinase activation for the first time, and help identify key structural intermediates. Furthermore, the presence of a novel allosteric site in an intermediate state of c-src that could be potentially used for drug design is predicted.
R. T. McGibbon, C. R. Schwantes, and V. S. Pande. J Phys Chem B. 118 6475-81 (2014)
Markov state models provide a powerful framework for the analysis of biomolecular conformation dynamics in terms of their metastable states and transition rates. These models provide both a quantitative and comprehensible description of the long-time scale dynamics of large molecular dynamics with a Master equation and have been successfully used to study protein folding, protein conformational change, and protein-ligand binding. However, to achieve satisfactory performance, existing methodologies often require expert intervention when defining the model’s discrete state space. While standard model selection methodologies focus on the minimization of systematic bias and disregard statistical error, we show that by consideration of the states’ conditional distribution over conformations, both sources of error can be balanced evenhandedly. Application of techniques that consider both systematic bias and statistical error on two 100 μs molecular dynamics trajectories of the Fip35 WW domain shows agreement with existing techniques based on self-consistency of the model’s relaxation time scales with more suitable results in regimes in which those time scale-based techniques encourage overfitting. By removing the need for expert tuning, these methods should reduce modeling bias and lower the barriers to entry in Markov state model construction.
J. K. Weber, R. L. Jack, C. R. Schwantes, and V. S. Pande. Biophys J. 107(4) 974-82 (2014).
Developing an understanding of protein misfolding processes presents a crucial challenge for unlocking the mysteries of human disease. In this article, we present our observations of β-sheet-rich misfolded states on a number of protein dynamical landscapes investigated through molecular dynamics simulation and Markov state models. We employ a nonequilibrium statistical mechanical theory to identify the glassy states in a protein’s dynamics, and we discuss the nonnative, β-sheet-rich states that play a distinct role in the slowest dynamics within seven protein folding systems. We highlight the fundamental similarity between these states and the amyloid structures responsible for many neurodegenerative diseases, and we discuss potential consequences for mechanisms of protein aggregation and intermolecular amyloid formation.
C. R. Baiz, Y.-S. Lin, C. S. Peng, K. A. Beauchamp, V. A. Voelz, V. S. Pande, and A. Tokmakoff. Biophysical Journal, 106 1359-1370 (2014)
The folding mechanism of the N-terminal domain of ribosomal protein L9 (NTL91–39) is studied using temperature-jump (T-jump) amide I′ two-dimensional infrared (2D IR) spectroscopy in combination with spectral simulations based on a Markov state model (MSM) built from millisecond-long molecular dynamics trajectories. The results provide evidence for a compact well-structured folded state and a heterogeneous fast-exchanging denatured state ensemble exhibiting residual secondary structure. The folding rate of 26.4 μs−1 (at 80°C), extracted from the T-jump response of NTL91–39, compares favorably with the 18 μs−1 obtained from the MSM. Structural decomposition of the MSM and analysis along the folding coordinate indicates that helix-formation nucleates the global folding. Simulated difference spectra, corresponding to the global folding transition of the MSM, are in qualitative agreement with measured T-jump 2D IR spectra. The experiments demonstrate the use of T-jump 2D IR spectroscopy as a valuable tool for studying protein folding, with direct connections to simulations. The results suggest that in addition to predicting the correct native structure and folding time constant, molecular dynamics simulations carried out with modern force fields provide an accurate description of folding mechanisms in small proteins.
K. J. Kohlhoff*, D. Shukla*, M. Lawrenz*, G. R. Bowman, D. E. Konerding, D. Belov, R, B. Altman, and V. S. Pande. Nature Chemistry 6 15-21 (2014)
We applied methods developed and honed on Folding@home to Google Exacycle (Google’s massively parallel cloud resource — a lot like running Folding@home behind their firewall). The resources that Google donated to PG/Folding@home was pretty massive, allowing us to tackle a really significant and challenging problem in biology and drug design.
Specifically, we were able to study the protein dynamics of B2AR, a G-Protein Coupled Receptor (GPCR) important in many medical and biological processes (especially asthma and heart disease). What’s even more exciting to us is that this opens the door for FAH and our methods to be much more broadly applied. In fact, this is just the first of several papers in the pipeline using FAH and FAH-like methods to tackle challenging biomedical problems.
For more information, you can also check out Stanford’s news’ coverage on this.
R. T. McGibbon and V. S. Pande. J. Chem. Theory Comput., DOI: 10.1021/ct400132h (2013)
Building Markov State Models (MSMs) is at the heart of how FAH can take advantage of hundreds of thousands of processors in a very efficient way. This paper describes a new scheme for how to more accurately build MSMs, especially for areas in conformational change, relevant for cancer and other areas in drug design.
J. K. Weber, R. Jack, and V. S. Pande. JACS, in press (2013)
With all of our work in protein folding, we can step back and ask questions of what’s common, especially for protein misfolding. This paper is a start in this direction, using a new method to probe our models. This is just the tip of the iceberg as in later papers (under review), we will show how this approach can give insight into protein misfolding, giving new hypotheses for how protein misfolding (relevant in many diseases, such as Alzheimer’s and Parkinson’s) could arise.
The folding times accessible by simulation have increased exponentially over the past decade. Shown are all protein folding simulations conducted using unbiased, all-atom MD in empirical force-fields reported in the literature. Some folding times for the same protein differ, due to various mutations. FAH results are in blue, results from Shaw’s Anton supercomputer are in red.
This a review of protein folding achievement from Folding@home and other researchers. Our findings demonstrate that Folding@home is capable of simulating large, complex, and slow-folding proteins, beyond the capabilities of other systems, including the specialized hardware in the supercomputer from David Shaw’s DESRES group.
Quantitatively accurate all-atom molecular dynamics (MD) simulations of protein folding have long been considered a holy grail of computational biology. Due to the large system sizes and long timescales involved, such a pursuit was for many years computationally intractable. Further, sufficiently accurate forcefields needed to be developed in order to realistically model folding. This decade, however, saw the first reports of folding simulations describing kinetics on the order of milliseconds, placing many proteins firmly within reach of these methods. Progress in sampling and forcefield accuracy, however, presents a new challenge: how to turn huge MD datasets into scientific understanding. Here, we review recent progress in MD simulation techniques and show how the vast datasets generated by such techniques present new challenges for analysis. We critically discuss the state of the art, including reaction coordinate and Markov state model (MSM) methods, and provide a perspective for the future.
Eastman P, Friedrichs MS, Chodera JD, Radmer RJ, Bruns CM, Ku JP, Beauchamp KA, Lane TJ, Wang LP, Shukla D, Tye T, Houston M, Stich T, Klein C, Shirts MR, Pande VS.
Journal of Chemical Theory and Computation (Jan 2013)
This paper discusses OpenMM 4, powerful and adaptable library for molecular dynamics, is one of the key components behind our current GPU FahCores. A more recent release, OpenMM 5, powers the upcoming FahCore 17 (Zeta). We are looking forward to OpenMM 5.1, which should be a big release from the user’s perspective and offers a lot of exciting scientific features.
OpenMM is a software toolkit for performing molecular simulations on a range of high performance computing architectures. It is based on a layered architecture: the lower layers function as a reusable library that can be invoked by any application, while the upper layers form a complete environment for running molecular simulations. The library API hides all hardware-specific dependencies and optimizations from the users and developers of simulation programs: they can be run without modification on any hardware on which the API has been implemented. The current implementations of OpenMM include support for graphics processing units using the OpenCL and CUDA frameworks. In addition, OpenMM was designed to be extensible, so new hardware architectures can be accommodated and new functionality (e.g., energy terms and integrators) can be easily added.
Markov state models (MSMs) for the study of biomolecule folding simulations have emerged as a powerful tool for computational study of folding dynamics. MSMExplorer is a visualization application purpose-built to visualize these MSMs with an aim to increase the efficacy and reach of MSM science.
In this paper we have initiated an study to build Markov state models for molecular dynamical systems with solvent degrees of freedom. The methods we described should also be broadly applicable to a wide range of biomolecular simulation analyses.
Markov state models have been widely used to study conformational changes of biological macromolecules. These models are built from short timescale simulations and then propagated to extract long timescale dynamics. However, the solvent information in molecular simulations are often ignored in current methods, because of the large number of solvent molecules in a system and the indistinguishability of solvent molecules upon their exchange.
We present a solvent signature that compactly summarizes the solvent distribution in the high-dimensional data, and then define a distance metric between different configurations using this signature. We next incorporate the solvent information into the construction of Markov state models and present a fast geometric clustering algorithm which combines both the solute-based and solvent-based distances.
This details some new results from Folding@home on Alzheimer’s Disease.
Amyloid beta (Aβ) peptide plays an important role in Alzheimer’s disease. A number of mutations in the Aβ sequence lead to familial Alzheimer’s disease, congophilic amyloid angiopathy, or hereditary cerebral hemorrhage with amyloid. Using molecular dynamics simulations of ∼200 μs for each system, we characterize and contrast the consequences of four pathogenic mutations (Italian, Dutch, Arctic, and Iowa) for the structural ensemble of the Aβ monomer. The four familial mutations are found to have distinct consequences for the monomer structure.
Ring AM, Lin JX, Feng D, Mitra S, Rickert M, Bowman GR, Pande VS, Li P, Moraga I, Spolski R, Ozkan E, Leonard WJ, Garcia KC.
Natural Immunology (Dec 2012)
This is a continuation of paper #100, “Exploiting a natural conformational switch to engineer an interleukin-2 ‘superkine’”. Our results provide new insights for the development of new specific therapeutics based on interleukin.
Interleukin 15 (IL-15) and IL-2 have distinct immunological functions even though both signal through the receptor subunit IL-2Rβ and the common γ-chain (γ(c)). Here we found that in the structure of the IL-15-IL-15Rα-IL-2Rβ-γ(c) quaternary complex, IL-15 binds to IL-2Rβ and γ(c) in a heterodimer nearly indistinguishable from that of the IL-2-IL-2Rα-IL-2Rβ-γ(c) complex, despite their different receptor-binding chemistries. IL-15Rα substantially increased the affinity of IL-15 for IL-2Rβ, and this allostery was required for IL-15 trans signaling. Consistent with their identical IL-2Rβ-γ(c) dimer geometries, IL-2 and IL-15 showed similar signaling properties in lymphocytes, with any differences resulting from disparate receptor affinities. Thus, IL-15 and IL-2 induced similar signals, and the cytokine specificity of IL-2Rα versus IL-15Rα determined cellular responsiveness. Our results provide new insights for the development of specific immunotherapeutics based on IL-15 or IL-2.
Walker JR, Novick PA, Parsons WH, McGregor M, Zablocki J, Pande VS, Du Bois J.
Proceedings of the National Academy of Sciences, USA (Dec 2012)
Here we use molecular dynamics and chemical informatics methods to better understand sodium channels. This is important as it has applications to developing pain medicines, whose functionality relies on interacting with these channels.
Human nociceptive voltage-gated sodium channel (Na(v)1.7), a target of significant interest for the development of antinociceptive agents, is blocked by low nanomolar concentrations of (-)-tetrodotoxin(TTX) but not (+)-saxitoxin (STX) and (+)-gonyautoxin-III (GTX-III). These findings question the long-accepted view that the 1.7 isoform is both tetrodotoxin- and saxitoxin-sensitive and identify the outer pore region of the channel as a possible target for the design of Na(v)1.7-selective inhibitors. Single- and double-point amino acid mutagenesis studies along with whole-cell electrophysiology recordings establish two domain III residues (T1398 and I1399), which occur as methionine and aspartate in other Na(v) isoforms, as critical determinants of STX and gonyautoxin-III binding affinity. An advanced homology model of the Na(v) pore region is used to provide a structural rationalization for these surprising results.
Proceedings of the National Academy of Sciences, USA (Oct 2012)
Markov State Models (MSMs) have been very helpful in analyzing protein folding. Here they yield new insight into simplified models of protein folding.
Markov state models constructed from molecular dynamics simulations have recently shown success at modeling protein folding kinetics. Here we introduce two methods, flux PCCA+ (FPCCA+) and sliding constraint rate estimation (SCRE), that allow accurate rate models from protein folding simulations. We apply these techniques to fourteen massive simulation datasets generated by Anton and Folding@home. Our protocol quantitatively identifies the suitability of describing each system using two-state kinetics and predicts experimentally detectable deviations from two-state behavior. An analysis of the villin headpiece and FiP35 WW domain detects multiple native substates that are consistent with experimental data. Applying the same protocol to GTT, NTL9, and protein G suggests that some beta containing proteins can form long-lived native-like states with small register shifts. Even the simplest protein systems show folding and functional dynamics involving three or more states.
Voelz VA, Jäger M, Yao S, Chen Y, Zhu L, Waldauer SA, Bowman GR, Friedrichs M, Bakajin O, Lapidus LJ, Weiss S, Pande VS.
Journal of the American Chemical Society (Aug 2012)
This is a major, major paper on protein folding. Acyl-CoA is the current benchmark for the slowest folding protein (approximetely 10 milliseconds) and the most complex (contains 86 amino acids). We worked closely with experimentalists to test and connect our work to their experiments.
Protein folding is a fundamental process in biology, key to understanding many human diseases. Experimentally, proteins often appear to fold via simple two- or three-state mechanisms involving mainly native-state interactions, yet recent network models built from atomistic simulations of small proteins suggest the existence of many possible metastable states and folding pathways. We reconcile these two pictures in a combined experimental and simulation study of acyl-coenzyme A binding protein (ACBP), a two-state folder (folding time ~10 ms) exhibiting residual unfolded-state structure, and a putative early folding intermediate. Using single-molecule FRET in conjunction with side-chain mutagenesis, we first demonstrate that the denatured state of ACBP at near-zero denaturant is unusually compact and enriched in long-range structure that can be perturbed by discrete hydrophobic core mutations. We then employ ultrafast laminar-flow mixing experiments to study the folding kinetics of ACBP on the microsecond time scale. These studies, along with Trp-Cys quenching measurements of unfolded-state dynamics, suggest that unfolded-state structure forms on a surprisingly slow (~100 μs) time scale, and that sequence mutations strikingly perturb both time-resolved and equilibrium smFRET measurements in a similar way. A Markov state model (MSM) of the ACBP folding reaction, constructed from over 30 ms of molecular dynamics trajectory data, predicts a complex network of metastable stables, residual unfolded-state structure, and kinetics consistent with experiment but no well-defined intermediate preceding the main folding barrier. Taken together, these experimental and simulation results suggest that the previously characterized fast kinetic phase is not due to formation of a barrier-limited intermediate but rather to a more heterogeneous and slow acquisition of unfolded-state structure.
A. M. Levin, D. L. Bates, A. M. Ring, J. T. Lin, L. Su, C. Krieg, G. R. Bowman, P. Novick, V. S. Pande, H. E. Kohrt, O. Boyman, C. G. Fathman, K. C. Garcia
Nature, volume 484, pages 529-33 (2012)
It has long been known that a protein called IL-2 can help stimulate an immune response to fight AIDS or cancer, so in theory giving people with diseases like immune deficiencies IL-2 could be tremendously helpful. In practice, however, giving them IL-2 often leads to severe heart and lung problems. To find a better solution, collaborators at Stanford designed a variant of IL-2 that can stimulate an immune response without causing any side effects. However, they couldn’t understand how it worked because the two proteins had almost identical structures! Using Folding@home, we showed that IL-2 is a relatively floppy protein while our collaborators’ variant is locked into a structure that is poised to stimulate an immune response.
The immunostimulatory cytokine interleukin-2 (IL-2) is a growth factor for a wide range of leukocytes, including T cells and natural killer (NK) cells. Considerable effort has been invested in using IL-2 as a therapeutic agent for a variety of immune disorders ranging from AIDS to cancer. However, adverse effects have limited its use in the clinic. On activated T cells, IL-2 signals through a quaternary ‘high affinity’ receptor complex consisting of IL-2, IL-2Rα (termed CD25), IL-2Rβ and IL-2Rγ. Naive T cells express only a low density of IL-2Rβ and IL-2Rγ, and are therefore relatively insensitive to IL-2, but acquire sensitivity after CD25 expression, which captures the cytokine and presents it to IL-2Rβ and IL-2Rγ. Here, using in vitro evolution, we eliminated the functional requirement of IL-2 for CD25 expression by engineering an IL-2 ‘superkine’ (also called super-2) with increased binding affinity for IL-2Rβ. Crystal structures of the IL-2 superkine in free and receptor-bound forms showed that the evolved mutations are principally in the core of the cytokine, and molecular dynamics simulations indicated that the evolved mutations stabilized IL-2, reducing the flexibility of a helix in the IL-2Rβ binding site, into an optimized receptor-binding conformation resembling that when bound to CD25. The evolved mutations in the IL-2 superkine recapitulated the functional role of CD25 by eliciting potent phosphorylation of STAT5 and vigorous proliferation of T cells irrespective of CD25 expression. Compared to IL-2, the IL-2 superkine induced superior expansion of cytotoxic T cells, leading to improved antitumour responses in vivo, and elicited proportionally less expansion of T regulatory cells and reduced pulmonary oedema. Collectively, we show that in vitro evolution has mimicked the functional role of CD25 in enhancing IL-2 potency and regulating target cell specificity, which has implications for immunotherapy.
The aggregation of amyloid beta (Aβ) peptides plays an important role in the development of Alzheimer’s disease. Despite extensive effort, it has been difficult to characterize the secondary and tertiary structure of the Aβ monomer, the starting point for aggregation, due to its hydrophobicity and high aggregation propensity. Here, we employ extensive molecular dynamics simulations with atomistic protein and water models to determine structural ensembles for Aβ(42), Aβ(40), and Aβ(42)-E22K (the Italian mutant) monomers in solution. Sampling of a total of >700 microseconds in all-atom detail with explicit solvent enables us to observe the effects of peptide length and a pathogenic mutation on the disordered Aβ monomer structural ensemble. Aβ(42) and Aβ(40) have crudely similar characteristics but reducing the peptide length from 42 to 40 residues reduces β-hairpin formation near the C-terminus. The pathogenic Italian E22K mutation induces helix formation in the region of residues 20-24. This structural alteration may increase helix-helix interactions between monomers, resulting in altered mechanism and kinetics of Aβ oligomerization.
A simple model is presented that describes general features of protein folding, in good agreement with experimental results and detailed all-atom simulations. Starting from microscopic physics, and with no free parameters, this model predicts that protein folding occurs remarkably quickly because native-like states are kinetic hubs. A hub-like network arises naturally out of microscopic physical concerns, specifically the kinetic longevity of native contacts during a search of globular conformations. The model predicts folding times scaling as τ(f) ∼ e(ξN) in the number of residues, but because the model shows ξ is small, the folding times are much faster than Levinthal’s approximation. Importantly, the folding time scale is found to be small due to the topology and structure of the network. We show explicitly how our model agrees with generic experimental features of the folding process, including the scaling of τ(f) with N, two-state thermodynamics, a sharp peak in C(V), and native-state fluctuations.
Paul A. Novick, Dahabada H. Lopes, Kim M. Branson, Alexandra Esteras-Chopo, Isabella A. Graef, Gal Bitan, and Vijay S. Pande Journal of Medicinal Chemistry (2012)
These results have been a long time in coming and in many ways represents a major achievement for Folding@home (FAH) in general, demonstrating that the approach we started 10 years ago can make significant steps forward in our long term goals.
Specifically, our long term goals have been to 1) develop new methods to tackle the computational challenges of simulating protein folding; 2) apply these methods to gain new insights into protein folding; 3) use these methods and new insights to simulate Aß protein misfolding, a key process in the toxicity of Alzheimer’s Disease (AD); and finally 4) to use those simulations to develop new small molecule drug candidates for AD. In the early years of FAH, we concentrated on the first two goals above. In the last 5-7 years, we have worked to accomplish the third goal. I’m now very excited to report our progress on the last goal –– using FAH for the development of new therapeutic strategies for AD.
In a paper just published in the Journal of Medicinal Chemistry, we report on tests of predictions from earlier Folding@home simulations, and how these predictions have led to a new strategy to fight Alzheimer’s Disease. While this is not a cure, it is a major step towards our final goal, some light at the end of the tunnel.
The next steps, now underway in our lab, are to take this lead compound and help push it towards a viable drug. It’s too early to report on our preliminary results there (I like to only talk publicly about work after it’s passed through peer review), I’m very excited that the directions set out in this paper do appear to be bearing fruit in terms of a viable drug (not just a drug candidate).
Drug design studies targeting one of the primary toxic agents in Alzheimer’s disease, soluble oligomers of amyloid β-protein (Aβ), have been complicated by the rapid, heterogeneous aggregation of Aβ and the resulting difficulty to structurally characterize the peptide. To address this, we have developed [Nle35, d-
Pro37]Aβ42, a substituted peptide inspired from molecular dynamics simulations which forms structures stable enough to be analyzed by NMR. We report herein that [Nle35, d-Pro37]Aβ42 stabilizes the trimer and prevents mature fibril and β-sheet formation. Further, [Nle35, d-Pro37]Aβ42 interacts with WT Aβ42 and reduces aggregation levels and fibril formation in mixtures. Using ligand-based drug design based on [Nle35, d-Pro37]Aβ42, a lead compound was identified with effects on inhibition similar to the peptide. The ability of [Nle35, d-Pro37]Aβ42 and the compound to inhibit the aggregation of Aβ42 provides a novel tool to study the structure of Aβ oligomers. More broadly, our data demonstrate how molecular dynamics simulation can guide experiment for further research into AD.
P. Novick, J. Rajadas, C.W. Liu, N. W. Kelley, M. Inayathullah, and V. S. Pande.
PLoS ONE (2011)
It is believed that Alzheimer’s Disease results from the misfolding of the Abeta peptide. Understanding how Abeta misfolds could give us some key insights into how to cure Alzheimer’s Disease. This paper experimentally tests a key prediction made in an earlier paper (paper #58: “Simulating oligomerization at experimental concentrations and long timescales: A Markov state model approach” by Nicholas W. Kelley, V. Vishal, Grant A. Krafft, and Vijay S. Pande. J. Chem. Phys. 129, 214707 (2008); DOI:10.1063/1.3010881). In this paper, we show experimentally that there appears to be a beta turn in the Abeta as predicted. This leads to a very stable form of misfolded Abeta which could be used as a starting point for a new Alzheimer’s therapy. We are heavily pursuing this research direction at the moment.
Enhanced production of a 42-residue beta amyloid peptide (Aβ42) in affected parts of the brain has been suggested to be the main causative factor for the development of Alzheimer’s Disease (AD). The severity of the disease depends not only on the amount of the peptide but also its conformational transition leading to the formation of oligomeric amyloid-derived diffusible ligands (ADDLs) in the brain of AD patients. Despite being significant to the understanding of AD mechanism, no atomic-resolution structures are available for these species due to the evanescent nature of ADDLs that hinders most structural biophysical investigations. Based on our molecular modeling and computational studies, we have designed Met35Nle and G37p mutations in the Aβ42 peptide (Aβ42Nle35p37) that appear to organize Aβ42 into stable oligomers. 2D NMR on the Aβ42Nle35p37 peptide revealed the occurrence of two β-turns in the V24-N27 and V36-V39 stretches that could be the possible cause for the oligomer stability. We did not observe corresponding NOEs for the V24-N27 turn in the Aβ21–43Nle35p37 fragment suggesting the need for the longer length amyloid peptide to form the stable oligomer promoting conformation. Because of the presence of two turns in the mutant peptide which were absent in solid state NMR structures for the fibrils, we propose, fibril formation might be hindered. The biophysical information obtained in this work could aid in the development of structural models for toxic oligomer formation that could facilitate the development of therapeutic approaches to AD.
S. Pronk, P. Larson, I. Pouya, G. Bowman, I. Haque, K. Beaucamp, B. Hess, V. S. Pande, P. M. Kasson, E. Lindahl.
Supercomputing 2011, submitted (2011)
Biomolecular simulation is a core application on supercomputers, but it is exceptionally difficult to achieve the strong scaling necessary to reach biologically relevant timescales. Here, we present a new paradigm for parallel adaptive molecular dynamics and a publicly available implementation: Copernicus. This framework combines performance-leading molecular dynamics parallelized on three levels (SIMD, threads, and message-passing) with kinetic clustering, statistical model building and real-time result monitoring.
Copernicus enables execution as single parallel jobs with automatic resource allocation. Even for a small protein such as villin (9,864 atoms), Copernicus exhibits near-linear strong scaling from 1 to 5,376 AMD cores. Starting from extended chains we observe structures 0.6Å from the native state within 30h, and achieve sufficient sampling to predict the native state without a priori knowledge after 80-90h. To match Copernicus’ efficiency, a classical simulation would have to exceed 50μs per day, currently infeasible even with custom hardware designed for simulations.
Proceedings of the National Academy of Sciences, USA, in press (2011)
We are constantly honing our methods to improve Folding@home’s ability to predict the behavior of proteins. This paper demonstrates the current state of the art in terms of both sampling and analysis. When compared to detailed TTET experiments, we show that our methods can piece out even fairly detailed aspects of folding. However, we also see the ways in which our models are not perfect, suggesting how we can improve our methods even further.
As the fastest folding protein, the villin headpiece (HP35) serves as an important bridge between simulation and experimental studies of protein folding. Despite the simplicity of this system, experiments continue to reveal a number of surprises, including structure in the unfolded state and complex equilibrium dynamics near the native state. Using 2.5 ms of molecular dynamics and Markov state models, we connect to current experimental results in three ways. First, we present and validate a novel method for the quantitative prediction of triplet–triplet energy transfer experiments. Second, we construct a many-state model for HP35 that is consistent with previous experiments. Finally, we predict contact-formation time traces for all 1,225 possible triplet–triplet energy transfer experiments on HP35.
A common theme of studies using molecular simulation is a necessary compromise between computational efficiency and resolution of the force field that is used. Significant efforts have been directed at combining multiple levels of granularity within a single simulation in order to maintain the efficiency of coarse-grained models, while using finer resolution in regions where such details are expected to play an important role. A specific example of this paradigm is the development of hybrid solvent models, which explicitly sample the solvent degrees of freedom within a specified domain while utilizing a continuum description elsewhere. Unfortunately, these models are complicated by the presence of structural artifacts at or near the explicit/implicit boundary. The presence of these artifacts significantly complicates the use of such models, both undermining the accuracy obtained and necessitating the parameterization of effective potentials to counteract the artificial interactions. In this work, we introduce a novel hybrid solvent model that employs a smoothly decoupled particle interface (SDPI), a switching region that gradually transitions from fully interacting particles to a continuum solvent. The resulting SDPI model allows for the use of an implicit solvent model based on a simple theory that needs to only reproduce the behavior of bulk solvent rather than the more complex features of local interactions. In this study, the SDPI model is tested on spherical hybrid domains using a coarse-grained representation of water that includes only Lennard-Jones interactions. The results demonstrate that this model is capable of reproducing solvent configurations absent of boundary artifacts, as if they were taken from full explicit simulations.
Current Opinion in Structural Biology 21 4-11 (2011)
Protein folding is an important problem in structural biology with significant medical implications, particularly for misfolding disorders like Alzheimer’s disease. Solving the folding problem will ultimately require a combination of theory and experiment, with theoretical models providing a comprehensive view of folding and experiments grounding these models in reality. Here we review progress towards this goal over the past decade, with an emphasis on recent theoretical advances that are empowering chemically detailed models of folding and the new results these technologies are providing. In particular, we discuss new insights made possible by Markov state models (MSMs), including the role of non-native contacts and the hub-like character of protein folded states.
Journal of the American Chemical Society 133 664-667 (2011)
Protein folding is a classic grand challenge that is relevant to numerous human diseases, such as protein misfolding diseases like Alzheimer’s disease. Solving the folding problem will ultimately require a combination of theory, simulation, and experiment, with theory and simulation providing an atomically detailed picture of both the thermodynamics and kinetics of folding and experimental tests grounding these models in reality. However, theory and simulation generally fall orders of magnitude short of biologically relevant time scales. Here we report significant progress toward closing this gap: an atomistic model of the folding of an 80-residue fragment of the λ repressor protein with explicit solvent that captures dynamics on a 10 milliseconds time scale. In addition, we provide a number of predictions that warrant further experimental investigation. For example, our model’s native state is a kinetic hub, and biexponential kinetics arises from the presence of many free-energy basins separated by barriers of different heights rather than a single low barrier along one reaction coordinate (the previously proposed incipient downhill folding scenario).
As nascent proteins are synthesized by the ribosome, they depart via an exit tunnel running through the center of the large subunit. The exit tunnel likely plays an important part in various aspects of translation. Although water plays a key role in many bio-molecular processes, the nature of water confined to the exit tunnel has remained unknown. Furthermore, solvent in biological cavities has traditionally been characterized as either a continuous dielectric fluid, or a discrete tightly bound molecule. Using atomistic molecular dynamics simulations, we predict that the thermodynamic and kinetic properties of water confined within the ribosome exit tunnel are quite different from this simple two-state model. We find that the tunnel creates a complex microenvironment for the solvent resulting in perturbed rotational dynamics and heterogeneous dielectric behavior. This gives rise to a very rugged solvation landscape and significantly retarded solvent diffusion. We discuss how this non-bulk-like solvent is likely to affect important biophysical processes such as sequence dependent stalling, co-translational folding, and antibiotic binding. We conclude with a discussion of the general applicability of these results to other biological cavities.
GPU computing has great potential, but consumer GPUs lack some key features that are present in CPUs, especially error checking RAM. Do memory issues on GPUs cause problems? We used a small fraction of the large number of GPUs on Folding@home to test this. Understanding these issues were key to reliable large-scale GPU computing in Folding@home.
Graphics processing units (GPUs) are gaining widespread use in computational chemistry and other scientific simulation contexts because of their huge performance advantages relative to conventional CPUs. However, the reliability
of GPUs in error-intolerant applications is largely unproven. In particular, a lack of error checking and correcting (ECC) capability in the memory subsystems of graphics cards has been cited as a hindrance to the acceptance of GPUs as
high performance coprocessors, but the impact of this design has not been previously quantified. In this article we present MemtestG80, our software for assessing memory error rates on NVIDIA G80 and GT200-architecture-based graphics cards. Furthermore, we present the results of a large-scale assessment of GPU error rate, conducted by running MemtestG80 on over 20,000 hosts on the Folding@home distributed computing network. Our control experiments on consumer-grade and dedicated-GPGPU hardware in a controlled environment found no errors. However, our survey over cards on Folding@home finds that, in their installed environments, two-thirds of tested GPUs exhibit a detectable, pattern-sensitive rate of memory soft errors. We demonstrate that these errors persist after controlling for overclocking and environmental proxies for temperature, but depend strongly on board architecture.
Simulating protein folding has been a challenging problem for decades due to the long timescales involved (compared with what is possible to simulate) and the challenges of gaining insight from the complex nature of the resulting simulation data. Markov State Models (MSMs) present a means to tackle both of these challenges, yielding simulations on experimentally relevant timescales, statistical significance, and coarse grained representations that are readily humanly understandable. Here, we review this method with the intended audience of non-experts, in order to introduce the method to a broader audience. We review the motivations, methods, and caveats of MSMs, as well as some recent highlights of applications of the method. We conclude by discussing how this approach is part of a paradigm shift in how one uses simulations, away from anecdotal single-trajectory approaches to a more comprehensive statistical approach.
Proceedings of the National Academy of Sciences, USA 107 10890-10895 (2010)
Understanding molecular kinetics, and particularly protein folding, is a classic grand challenge in molecular biophysics. Network models, such as Markov state models (MSMs), are one potential solution to this problem. MSMs have recently yielded quantitative agreement with experimentally derived structures and folding rates for specific systems, leaving them positioned to potentially provide a deeper understanding of molecular kinetics that can lead to experimentally testable hypotheses. Here we use existing MSMs for the villin headpiece and NTL9, which were constructed from atomistic simulations, to accomplish this goal. In addition, we provide simpler, humanly comprehensible networks that capture the essence of molecular kinetics and reproduce qualitative phenomena like the apparent two-state folding often seen in experiments. Together, these models show that protein dynamics are dominated by stochastic jumps between numerous metastable states and that proteins have heterogeneous unfolded states (many unfolded basins that interconvert more rapidly with the native state than with one another) yet often still appear two-state. Most importantly, we find that protein native states are hubs that can be reached quickly from any other state. However, metastability and a web of nonnative states slow the average folding rate. Experimental tests for these findings and their implications for other fields, like protein design, are also discussed.
Journal of Computational and Theoretical Chemistry 6 787–794 (2010)
Computer simulations can complement experiments by providing insight into molecular kinetics with atomic resolution. Unfortunately, even the most powerful supercomputers can only simulate small systems for short time scales, leaving modeling of most biologically relevant systems and time scales intractable. In this work, however, we show that molecular simulations driven by adaptive sampling of networks called Markov State Models (MSMs) can yield tremendous time and resource savings, allowing previously intractable calculations to be performed on a routine basis on existing hardware. We also introduce a distance metric (based on the relative entropy) for comparing MSMs. We primarily employ this metric to judge the convergence of various sampling schemes but it could also be employed to assess the effects of perturbations to a system (e.g., determining how changing the temperature or making a mutation changes a system’s dynamics).
Computing in Science and Engineering 12 34-39 (2010)
OpenMM is the key library which powers GPU computing in Folding@home. This paper discusses some key aspects of how OpenMM works.
The wide diversity of computer architectures today requires a new approach to software development. OpenMM is an abstraction layer for molecular mechanics simulations, allowing a single program to run efficiently on a variety of hardware platforms.
J. Ponder, C. Wu, V. S. Pande, I. Haque, R. A. Distasio Jr., D. Lambrecht, M. Head-Gordon, G. Clark, M. Johnson, and T. Head-Gordon.
J. Phys. Chem. B., 114 2549-2564 (2010)
Molecular force fields have been approaching a generational transition over the past several years, moving away from well-established and well-tuned, but intrinsically limited, fixed point charge models toward more intricate and expensive polarizable models that should allow more accurate description of molecular properties. The recently introduced AMOEBA force field is a leading publicly available example of this next generation of theoretical model, but to date, it has only received relatively limited validation, which we address here. We show that the AMOEBA force field is in fact a significant improvement over fixed charge models for small molecule structural and thermodynamic observables in particular, although further fine-tuning is necessary to describe solvation free energies of drug-like small molecules, dynamical properties away from ambient conditions, and possible improvements in aromatic interactions. State of the art electronic structure calculations reveal generally very good agreement with AMOEBA for demanding problems such as relative conformational energies of the alanine tetrapeptide and isomers of water sulfate complexes. AMOEBA is shown to be especially successful on protein−ligand binding and computational X-ray crystallography where polarization and accurate electrostatics are critical.
V. Voelz, V. Singh, W. J. Wedemeyer, L. Lapidus, and V. S. Pande.
Journal of the ACS, 132 4702–4709 (2010)
While several experimental techniques now exist for characterizing protein unfolded states, all-atom simulation of unfolded states has been challenging due to the long time scales and conformational sampling required. We address this problem by using a combination of accelerated calculations on graphics processor units and distributed computing to simulate tens of thousands of molecular dynamics trajectories each up to 10 μs (for a total aggregate simulation time of 127 ms). We used this approach in conjunction with Trp-Cys contact quenching experiments to characterize the unfolded structure and dynamics of protein L. We employed a polymer theory method to make quantitative comparisons between high-temperature simulated and chemically denatured experimental ensembles and find that reaction-limited quenching rates calculated from simulation agree remarkably well with experiment. In both experiment and simulation, we find that unfolded-state intramolecular diffusion rates are very slow compared to highly denatured chains and that a single-residue mutation can significantly alter unfolded-state dynamics and structure. This work suggests a view of the unfolded state in which surprisingly low diffusion rates could limit folding and opens the door for all-atom molecular simulation to be a useful predictive tool for characterizing protein unfolded states along with experiments that directly measure intramolecular diffusion.
Molecular kinetics underlies all biological phenomena and, like many other biological processes, may best be understood in terms of networks. These networks, called Markov state models (MSMs), are typically built from physical simulations. Thus, they are capable of quantitative prediction of experiments and can also provide an intuition for complex conformational changes. Their primary application has been to protein folding; however, these technologies and the insights they yield are transferable. For example, MSMs have already proved useful in understanding human diseases, such as protein misfolding and aggregation in Alzheimer’s disease.
Peter M. Kasson, Erik Lindahl, and Vijay S. Pande.
Journal of the American Chemical Society (2011) Epub 25 Feb.
Building on our previous simulations of membrane fusion, we have used the power of Folding@home to systematically analyze the fusion reaction between two vesicles and the molecular nature of water in this reaction. For purposes of computational tractability, many approaches neglect the detailed structure of water for large membrane simulations. In this case, we show that this detailed structure affects both the thermodynamics and the dynamics of the fusion reaction. These results have important implications for how we should perform vesicle fusion simulations; they also give a new example of structured water between two flexible hydrophilic interfaces. This water structure may be important in a number of cell-cell interactions.
Membrane interfaces are critical to many cellular functions, yet the vast array of molecular components involved make the fundamental physics of interaction difficult to define. Water has been shown to play an important role in the dynamics of small biological systems, for example when trapped in hydrophobic regions, but the molecular details of water have generally been thought dispensable when considering large membrane interfaces. Nevertheless, spectroscopic data indicate that water has distinct, ordered behavior near membrane surfaces. While coarse-grained simulations have achieved success recently in aiding understanding the dynamics of membrane assemblies, it is natural to ask, does the missing chemical nature of water play an important role? We have therefore performed atomic-resolution simulations of vesicle fusion to understand the role of chemical detail, particularly the molecular structure of water, in membrane fusion and at membrane interfaces more generally. These membrane interfaces present a form of hydrophilic confinement, yielding surprising, non-bulk-like water behavior.
AJ DePaul, EJ Thompson, SS Patel, K Haldeman & EJ Sorin. Nucleic Acids Research (2010) 38, 4856-4867 (Featured Cover Article)
We continue to study a small but ubiquitous RNA structural motif known as the GNRA tetraloop. This structure plays a role in the formation of larger RNA’s and is also of great interest due to its statistical overabundance in RNA structure. Our study demonstrates the highly flexible and dynamic properties of this structure, and also highlights the ability of this sequence to take on a number of non-native configurations in order to interact with adjacent RNA strands, suggesting that conformational entropy acts to stabilize this loop when not in its native conformation.
Conformational equilibrium within the ubiquitous GNRA tetraloop motif was simulated at the ensemble level, including 10,000 independent all atom molecular dynamics trajectories totaling over 110 microseconds of simulation time. This robust sampling reveals a highly dynamic structure comprised of 15 conformational microstates. We assemble a Markov model that includes transitions ranging from the nanosecond to microsecond timescales and is dominated by six key loop conformations that contribute to fluctuations around the native state. Mining of the Protein Data Bank provides an abundance of structures in which GNRA tetraloops participate in tertiary contact formation. Most predominantly observed in the experimental data are interactions of the native loop structure within the minor groove of adjacent helical regions. Additionally, a second trend is observed in which the tetraloop assumes non-native conformations while participating in multiple tertiary contacts, in some cases involving multiple possible loop conformations. This tetraloop flexibility can act to counterbalance the energetic penalty associated with assuming non-native loop structures in forming tertiary contacts. The GNRA motif has thus evolved not only to readily participate in simple tertiary interactions involving native loop structure, but also to easily adapt tetraloop secondary conformation in order to participate in larger, more complex tertiary interactions.
EJ Thompson, AJ DePaul, SS Patel & EJ Sorin. PLoS ONE (2010) 5, e10056
Our assessment of biophysical force fields as applied to helical peptides and proteins continues with the comparison of “next generation” AMBER-03 and AMBER-99SB to our previous results, particularly with respect to our AMBER variant, AMBER-99phi. Here we also incorporate simulations of a flexible and largely helical protein in order to assess the ability of these molecular models to adequately stabilize such structures.
Multiple variants of the AMBER all-atom force field were quantitatively evaluated with respect to their ability to accurately characterize helix-coil equilibria in explicit solvent simulations. Using a global distributed computing network, absolute conformational convergence was achieved for large ensembles of the capped A21 and Fs helical peptides. Further assessment of these AMBER variants was conducted via simulations of a flexible 164-residue five-helix-bundle protein, apolipophorin-III, on the 100 ns timescale. Of the contemporary potentials that had not been assessed previously, the AMBER-99SB force field showed significant helix-destabilizing tendencies, with beta bridge formation occurring in helical peptides, and unfolding of apolipophorin-III occurring on the tens of nanoseconds timescale. The AMBER-03 force field, while showing adequate helical propensities for both peptides and stabilizing apolipophorin-III, (i) predicts an unexpected decrease in helicity with ALA to ARG+ substitution, (ii) lacks experimentally observed 3-10 helical content, and (iii) deviates strongly from average apolipophorin-III NMR structural properties. As is observed for AMBER-99SB, AMBER-03 significantly overweighs the contribution of extended and polyproline backbone configurations to the conformational equilibrium. In contrast, the AMBER-99phi force field, which was previously shown to best reproduce experimental measurements of the helix-coil transition in model helical peptides, adequately stabilizes apolipophorin-III and yields both an average gyration radius and polar solvent exposed surface area that are in excellent agreement with the NMR ensemble.
Recent work from detailed simulations of protein folding resulting from Folding@home have suggested some surprises and radical changes in how one conceptualizes protein folding kinetics. One of the more unusual aspects found in these simulations is the role of the native state as a kinetic hub (see paper #74). Here, we propose a new theory of protein folding that uses structural information in its kinetic equations and gives a much richer picture than previous theories. One key result is a prediction for what would cause the native state to be a kinetic hub and when one would see this effect (and in particular why it was not seen in simpler simulation studies previously).
We present a simple model of protein folding dynamics that captures key qualitative elements recently seen in all-atom simulations. The goals of this theory are to serve as a simple formalism for gaining deeper insight into the physical properties seen in detailed simulations as well as to serve as a model to easily compare why these simulations suggest a different kinetic mechanism than previous simple models. Specifically, we find that non-native contacts play a key role in determining the mechanism, which can shift dramatically as the energetic strength of non-native interactions is changed. For protein-like non-native interactions, our model finds that the native state is a kinetic hub, connecting the strength of relevant interactions directly to the nature of folding kinetics.
G. R. Bowman and V. S. Pande. Proceedings of the National Academy of Sciences, USA (2010).
By analyzing recent results from Folding@home, we have found a set of general properties emerging regarding how proteins fold. In particular, one of them comes as a surprise compared to previous models: the native state is a kinetic hub. This has implications for how we think about protein folding in general as well as applications of protein folding in biology and disease.
Understanding molecular kinetics, and particularly protein folding, is a classic grand challenge in molecular biophysics. Network models, such as Markov state models (MSMs), are one potential solution to this problem. MSMs have recently yielded quantitative agreement with experimentally derived structures and folding rates for specific systems, leaving them positioned to potentially provide a deeper understanding of molecular kinetics that can lead to experimentally testable hypotheses. Here we use existing MSMs for the villin headpiece and NTL9, which were constructed from atomistic simulations, to accomplish this goal. In addition, we provide simpler, humanly comprehensible networks that capture the essence of molecular kinetics and reproduce qualitative phenomena like the apparent two-state folding often seen in experiments. Together, these models show that protein dynamics are dominated by stochastic jumps between numerous metastable states and that proteins have heterogeneous unfolded states (many unfolded basins that interconvert more rapidly with the native state than with one another) yet often still appear two-state. Most importantly, we find that protein native states are hubs that can be reached quickly from any other state. However, metastability and a web of nonnative states slow the average folding rate. Experimental tests for these findings and their implications for other fields, like protein design, are also discussed.
P. M. Kasson, E. Lindahl, and V. S. Pande. PLoS Computational Biology 6(6): e1000829 (2010).
Membrane fusion is a common underlying process critical to neurotransmitter release, cellular trafficking, and infection by many viruses. Proteins have been identified that catalyze fusion, and mutations to these proteins have yielded important information on how fusion occurs. However, the precise mechanism by which membrane fusion begins is the subject of active investigation. We have used atomic-resolution simulations to model the process of vesicle fusion and to identify a transition state for the formation of an initial fusion stalk. Doing so required substantial technical advances in combining high-performance simulation and distributed computing to analyze the transition state of a complex reaction in a large system. The transition state we identify in our simulations involves specific structural changes by a few lipid molecules. We also simulate fusion peptides from influenza hemagglutinin and show that they promote the same structural changes as are required for fusion in our model. We therefore hypothesize that these changes to individual lipid molecules may explain a portion of the catalytic activity of fusion proteins such as influenza hemagglutinin.
Membrane fusion is essential to both cellular vesicle trafficking and infection by enveloped viruses. While the fusion protein assemblies that catalyze fusion are readily identifiable, the specific activities of the proteins involved and nature of the membrane changes they induce remain unknown. Here, we use many atomic-resolution simulations of vesicle fusion to examine the molecular mechanisms for fusion in detail. We employ committor analysis for these million-atom vesicle fusion simulations to identify a transition state for fusion stalk formation. In our simulations, this transition state occurs when the bulk properties of each lipid bilayer remain in a lamellar state but a few hydrophobic tails bulge into the hydrophilic interface layer and make contact to nucleate a stalk. Additional simulations of influenza fusion peptides in lipid bilayers show that the peptides promote similar local protrusion of lipid tails. Comparing these two sets of simulations, we obtain a common set of structural changes between the transition state for stalk formation and the local environment of peptides known to catalyze fusion. Our results thus suggest that the specific molecular properties of individual lipids are highly important to vesicle fusion and yield an explicit structural model that could help explain the mechanism of catalysis by fusion proteins.
Voelz VA, Bowman GR, Beauchamp K, Pande VS. Journal of the American Chemical Society (2010)
Simulating protein folding on the millisecond timescale has been a major challenge for many years. When we started Folding@home, our first goal was to break the microsecond barrier. This barrier is 1000x fold harder and represents a major step forward in molecular simulation. Specifically, in a recent paper (http://pubs.acs.org/doi/abs/10.1021/ja9090353), Folding@home researchers Vincent Voelz, Greg Bowman, Kyle Beauchamp, and Vijay Pande have broken this barrier. The movie below is of one of the trajectories that folded (i.e. started unfolded and ended up in the folded state). From simulations like these, we have found some new surprises in how proteins fold. Please see the paper (url above) for more details.
Why is this important? This is important since protein misfolding occurs on long timescales and this first simulation on the millisecond simulation for protein folding means we have demonstrated our new Markov State Model (MSM) technology can successfully simulate long timescales. It make sense to go after protein folding first, since there is a wealth of experimental data for us to test our simulations. While this paper on protein folding has just come out, we have already been using this MSM technology to study protein misfolding in Alzheimer’s Disease, following up from our 2008 paper. While our previous paper (#58 below) was able to get to long enough timescales to see small molecular weight oligomers, this new methodology gives us hope to push further with our simulations of Alzheimer’s, making more direct connections to larger, more complex Abeta oligomers than we were previously able to do.
This is a pretty exciting moment for us in terms of what we can now do with simulations, and we’re looking forward to new applications of this technology.
To date, the slowest-folding proteins folded ab initio by all-atom molecular dynamics simulations have had folding times in the range of nanoseconds to microseconds. We report simulations of several folding trajectories of NTL9(1−39), a protein which has a folding time of 1.5 ms. Distributed molecular dynamics simulations in implicit solvent on GPU processors were used to generate ensembles of trajectories out to 40 μs for several temperatures and starting states. At a temperature less than the melting point of the force field, we observe a small number of productive folding events, consistent with predictions from a model of parallel uncoupled two-state simulations. The posterior distribution of the folding rate predicted from the data agrees well with the experimental folding rate (640/s). Markov State Models (MSMs) built from the data show a gap in the implied time scales indicative of two-state folding and heterogeneous pathways connecting diffuse mesoscopic substates. Structural analysis of the 14 out of 2000 macrostates transited by the top 10 folding pathways reveals that native-like pairing between strands 1 and 2 only occurs for macrostates with pfold > 0.5, suggesting β12 hairpin formation may be rate-limiting. We believe that using simulation data such as these to seed adaptive resampling simulations will be a promising new method for achieving statistically converged descriptions of folding landscapes at longer time scales than ever before.
S. Bacallado, J. Chodera, and V. Pande. Journal of Chemical Physics 131, 045106 (2009).
Markov State Models (MSMs) are one of the most common ways to analyze Folding@Home simulations. This paper introduces a new validation method, which could play an important role in automating their construction
Discrete-space Markov models are a convenient way of describing the
kinetics of biomolecules. The most common strategies used to validate
these models employ statistics from simulation data, such as the
eigenvalue spectrum of the inferred rate matrix, which are often
associated with large uncertainties. Here, we propose a Bayesian
approach, which makes it possible to differentiate between models at a
fixed lag time making use of short trajectories. The hierarchical
definition of the models allows one to compare instances with any
number of states. We apply a conjugate prior for reversible Markov
chains, which was recently introduced in the statistics literature.
The method is tested in two different systems, a Monte Carlo dynamics
simulation of a two-dimensional model system a
P. M. Kasson, D. L. Ensign, and V. S. Pande. Journal of the American Chemical Society (2009). Published online 2009 July 28.
The influenza virus infects people and animals by binding to complex sugar molecules on the surface of the respiratory tract. Bird viruses bind most strongly to bird cell-surface sugars and human viruses bind most strongly to human cell-surface sugars. As the recent swine-origin influenza virus has demonstrated, there is considerable overlap between the binding ability of human and pig viruses to cells of the other host. Changes to this binding affinity are one key component for viruses to make a jump between species, and it is difficult to predict the necessary mutations ahead of time. We would like to predict high-risk mutations to enable better surveillance and early control of potential inter-species transmission events. This work represents a first step in that direction, as we examine mutations to H5N1 avian influenza that alter ligand binding. We use Folding@home as a powerful computational screen to evaluate mutations that will eventually require experimental testing to verify.
Influenza virus attaches to and infects target cells via binding of cell-surface glycans by the viral hemagglutinin. This binding specificity is considered a major reason why avian influenza is typically poorly transmitted between humans, while swine influenza is better transmitted due to glycan similarity between the human and swine upper respiratory tract. Predicting mutations that control glycan binding is thus important to continued surveillance against new pandemic influenza strains. We have designed a molecular-dynamics approach for scoring potential mutants with predictive power for both receptor-binding-domain and allosteric mutations similar to those identified from clinical isolates of avian influenza. We have performed thousands of simulations of 17 different hemagglutinin mutants totaling >1 ms in length and employ a Bayesian model to rank mutations that disrupt the stability of the hemagglutinin−ligand complex. Based on our simulations, we predict a significantly increased koff for seven of these mutants. This means of using molecular dynamics analysis to make experimentally verifiable predictions offers a potentially general method to identify ligand-binding mutants, particularly allosteric ones. Our analysis of ligand dissociation provides a means to evaluate mutants prior to experimental mutagenesis and testing and constitutes an important step toward understanding the determinants of ligand binding by H5N1 influenza.
G. R. Bowman and V. S. Pande. PLoS ONE 4(6): e5840 (2009)
Here we continue our efforts to use methods developed in the folding mechanism community to both better understand and improve structure prediction. Our previous work demonstrated that Rosetta’s coarse-grained potentials may actually impede accurate structure prediction at full-atom resolution. Based on this work we postulated that it may be time to work completely at full-atom resolution but that doing so may require more careful attention to the kinetics of convergence.
G. R. Bowman, X. Huang, and V. S. Pande. Methods 49(2): 197-201 (2009)
Part of understanding a molecule’s conformational dynamics is mapping out the dominant metastable, or long lived, states that it occupies. Once identified, the rates for transitioning between these states may then be determined in order to create a complete model of the system’s conformational dynamics. Here we describe the use of the MSMBuilder package (now available at http://simtk.org/home/msmbuilder/) to build Markov State Models (MSMs) to identify the metastable states from Generalized Ensemble (GE) simulations, as well as other simulation datasets. Besides building MSMs, the code also includes tools for model evaluation and visualization.
A. Beberg and V. S. Pande. ipdps, pp.1-8, IEEE International Symposium on Parallel & Distributed Processing (2009)
Accurate simulation of biophysical processes requires vast computing resources. Folding@home is a distributed computing system first released in 2000 to provide such resources needed to simulate protein folding and other biomolecular phenomena. Now operating in the range of 5 PetaFLOPS sustained, it provides more computing power than can typically be gathered and operated locally due to cost, physical space, and electrical/cooling load. This paper describes the architecture and operation of Folding@home, along with some lessons learned over the lifetime of the project.
V. A. Voelz, E. Luttmann, G. R. Bowman, and V.S. Pande. International Journal of Molecular Sciences 10(3): 1013 (2009)
Recently a temperature-jump FTIR study of a designed three-stranded sheet showing a fast relaxation time of ~140 ± 20 ns was published. We performed massively parallel molecular dynamics simulations in explicit solvent to probe the structural events involved in this relaxation. While our simulations produce similar relaxation rates, the structural ensemble is broad. We observe the formation of turn structure, but only very weak interaction in the strand regions, which is consistent with the lack of strong backbone-backbone NOEs in previous structural NMR studies. These results suggest that either DPDP-II folds at time scales longer than 240 ns, or that DPDP-II is not a well-defined three-stranded β-sheet. This work also provides an opportunity to compare the performance of several popular force field models against one another.
Vincent A. Voelz, Paula Petrone and Vijay S. Pande. Pacific Symposium on Biocomputing, 14:340-352 (2009)
We present a new multiscale method that combines all-atom molecular dy-
namics with coarse-grained sampling, towards the aim of bridging two levels of
physiology: the atomic scale of protein side chains and small molecules, and the
huge scale of macromolecular complexes like the ribosome. Our approach uses
all-atom simulations of peptide (or other ligand) fragments to calculate local
3D spatial potentials of mean force (PMF). The individual fragment PMFs are
then used as a potential for a coarse-grained chain representation of the entire
molecule. Conformational space and sequence space are sampled efficiently us-
ing generalized ensemble Monte Carlo. Here, we apply this method to the study
of nascent polypeptides inside the cavity of the ribosome exit tunnel. We show
how the method can be used to explore the accessible conformational and se-
quence space of nascent polypeptide chains near the ribosome peptidyl transfer
center (PTC), with the eventual aim of understanding the basis of speciﬁcity
for co-translational regulation. The method has many potential applications
to predicting binding speciﬁcity and design, and is sufficiently general to allow even greater separation of scales in future work.
D. L. Ensign and V. S. Pande. Biophysical Journal 96 L53-55 (2009)
We describe molecular dynamics simulations resulting in the folding the Fip35 Hpin1 WW domain. The simulations were run on a distributed set of graphics processors, which are capable of providing up to two orders of magnitude faster computation than conventional processors. Using the Folding@home distributed computing system, we generated thousands of independent trajectories in an implicit solvent model, totaling over 2.73 ms of simulations. A small number of these trajectories folded; the folding proceeded along several distinct routes and the system folded into two distinct three-stranded beta-sheet conformations, showing that the folding mechanism of this system is distinctly heterogeneous.
M. S. Friedrichs, P. Eastman, V. Vaidyanathan, M. Houston, S. LeGrand, A. L. Beberg, D. L. Ensign, C. M. Bruns, V. S. Pande. J. Comp. Chem., (2009)
SUMMARY. This paper describes the code behind the Folding@home GPU clients, detailing how they work, how we achieved such a significant speed up on GPUs, and other implementation details.
ABSTRACT. We describe a complete implementation of all-atom protein molecular dynamics running entirely on a graphics processing unit (GPU), including all standard force field terms, integration, constraints, and implicit solvent. We discuss the design of our algorithms and important optimizations needed to fully take advantage of a GPU. We evaluate its performance, and show that it can be more than 700 times faster than a conventional implementation running on a single CPU core.
Nicholas W. Kelley, Xuhui Huang, Stephen Tam, Christoph Spiess, Judith Frydman and Vijay S. Pande. Journal of Molecular Biology (2009)
SUMMARY. The aggregation of the Huntingtin (Htt) protein has been implicated as the cause of Huntington’s disease. However, how this aggregation occurs and why can happen so quickly is still largely unknown. Inspired by recent experimental results in the Frydman lab at Stanford, we have investigated a new possible mechanism for the aggregation of the Huntingtin (Htt) protein, with implications for better understanding how the Htt protein aggregates.
Peter M. Kasson and Vijay S. Pande. Pacific Symposium on Biocomputing 14:492-503(2009).
The influenza hemagglutinin protein performs several important
functions, including attaching the virus to cells it will infect and
releasing the viral genome into the interior of the cell. Most
protective antibodies against influenza also bind to the hemagglutinin
protein. We wish to understand how mutations to hemagglutinin affect
viral function, including what keeps avian influenza (“bird flu”) from
being readily transmissible between humans. In this paper, we have
applied a technique from information theory known as mutual
information to genetic sequence data to predict important mutation
sites on the hemagglutinin protein. In follow-up work, we are
combining this technique with other methods to refine these
predictions and test some of them using Folding@home.
Influenza hemagglutinin mediates both cell-surface binding and cell
entry by the virus. Mutations to hemagglutinin are thus critical in
determining host species specificity and viral infectivity. Previous
approaches have primarily considered point mutations and sequence
conservation; here we develop a complementary approach using mutual
information to examine concerted mutations. For hemagglutinin,
several overlapping selective pressures can cause such concerted
mutations, including the host immune response, ligand recognition and
host specificity, and functional requirements for pH-induced
activation and membrane fusion. Using sequence mutual information as
a metric, we extracted clusters of concerted mutation sites and
analyzed them in the context of crystallographic data. Comparison of
influenza isolates from two subtypes—human H3N2 strains and human and
avian H5N1 strains—yielded substantial differences in spatial
localization of the clustered residues. We hypothesize that the
clusters on the globular head of H3N2 hemagglutinin may relate to
antibody recognition (as many protective antibodies are known to bind
in that region), while the clusters in common to H3N2 and H5N1
hemagglutinin may indicate shared functional roles. We propose that
these shared sites may be particularly fruitful for mutagenesis
studies in understanding the infectivity of this common human
pathogen. The combination of sequence mutual information and
structural analysis thus helps generate novel functional hypotheses
that would not be apparent via either method alone.
Edgar Luttmann, Daniel L. Ensign, Vishal Vaidyanathan, Mike Houston, Noam Rimon, Jeppe Øland, Guha Jayachandran, Mark Friedrichs, Vijay S. Pande. Journal of Computational Chemistry (2009)
SUMMARY. In this paper, we detail how we were able to get great speed increases for Folding@home (and actually certain molecular dynamics calculations in general) on the PS3. This is our first paper using the PS3, laying out the “how does it work,” with a follow up paper in the works describing the results obtained in FAH from PS3 clients. It is also worth noting that this paper is a collaboration between FAH team members (Luttmann, Ensign, Vaidyanathan, Houston [now at AMD], Jayachandran, Friedrichs, and Pande) with developers at Sony (Rimon and Øland and their coworkers).
ABSTRACT. Implementation of molecular dynamics (MD) calculations on novel architectures will vastly increase its power to calculate the physical properties of complex systems. Herein, we detail algorithmic advances developed to accelerate MD simulations on the Cell processor, a commodity processor found in PlayStation 3 (PS3). In particular, we discuss issues regarding memory access versus computation and the types of calculations which are best suited for streaming processors such as the Cell, focusing on implicit solvation models. We conclude with a comparison of improved performance on the PS3′s Cell processor over more traditional processors.
Paula M. Petrone, Christopher D. Snow, Del Lucent, and Vijay S. Pande
Proceedings of the National Academy of Sciences, USA 2008
SUMMARY The ribosome is a fascinating molecular machine, responsible for the synthesis of proteins. For this reason it is of fundamental importance to protein folding (as the last step in the central dogma of biology) as well as to human health (since the ribosome is the target of a very large fraction of antibiotics). One of the questions revolving around ribosome function is why is there a large tunnel inside the ribosome, through which proteins exit after being synthesized. In this paper, we used “bigWU” classic clients (clients which allow larger systems to run) since the ribosome is so huge that it would not run on regular classic clients. The primary goal of this paper was to analyze the surface of the ribosome tunnel. Understanding the nature of this surface would be useful for both understanding the fundamental nature of protein synthesis as well as how key antibiotics interact with the ribosome. An interesting related discovery was the identification of a potential “ribosome gate” which can open and close selectively, based on what is interacting with the gate. This suggests novel hypotheses for several aspects of ribosome function as well as interesting new directions for work on studying the ribosome and for new routes for antibiotics.
ABSTRACT The ribosome is a large complex catalyst responsible for the synthesis of new proteins, an essential function for life. New proteins emerge from the ribosome through an exit tunnel as nascent polypeptide chains. Recent findings indicate that tunnel interactions with the nascent polypeptide chain might be relevant for the regulation of translation. However, the specific ribosomal structural features that mediate this process are unknown. Performing molecular dynamics simulations, we are studying the interactions between components of the ribosome exit tunnel and different chemical probes (specifically different amino acid side chains or monovalent inorganic ions). Our free-energy maps describe the physicochemical environment of the tunnel, revealing binding crevices and free-energy barriers for single amino acids and ions. Our simulations indicate that transport out of the tunnel could be different for diverse amino acid species. In addition, our results predict a notable protein–RNA interaction between a flexible 23S rRNA tetraloop (gate) and ribosomal protein L39 (latch) that could potentially obstruct the tunnel’s exit. By relating our simulation data to earlier biochemical studies, we propose that ribosomal features at the exit of the tunnel can play a role in the regulation of nascent chain exit and ion flux. Moreover, our free-energy maps may provide a context for interpreting sequence-dependent nascent chain phenomenology.
Nicholas W. Kelley, V. Vishal, Grant A. Krafft, and Vijay S. Pande
J. Chem. Phys. 129, 214707 (2008); DOI:10.1063/1.3010881
SUMMARY. Abeta misfolding and aggregation is believed to be the cause of Alzheimer’s Disease. Simulations, like Folding@home, are a natural way to understand this process. However, there are several key challenges for simulating the key step — oligomerization.
This work represents a new way to simulate Abeta oligomerization, with a key advance of being able to simulate experimentally relevant timescales and concentrations, using a novel method. We use this new method and the power provided by Folding@home donors to simulate oligomerization in all-atom detail. This has lead to specific predictions about the process, which we are now testing experimentally. In many ways, this paper is the “tip of the iceberg” for the Folding@home activities in AD, with a lot more interesting results to come, especially in terms of experimental tests of our predictions and interesting new possibilities for new drugs and AD therapeutics.
This work ran exclusively on classic clients. For the follow up simulations, we are using a mixture of GPU, SMP, and classic clients. Due to the large number of classic clients, they allow us to calculations not possible on the other platforms. However, the raw speed (but smaller number) of the GPU and SMP clients allow us to get a good rough idea quickly, refining later with classic clients.
ABSTRACT. Here, we present a novel computational approach for describing the formation of oligomeric assemblies at experimental concentrations and timescales. We propose an extension to the Markovian state model approach, where one includes low concentration oligomeric states analytically. This allows simulation on long timescales (seconds timescale) and at arbitrarily low concentrations (e.g., the micromolar concentrations found in experiments), while still using an all-atom model for protein and solvent. As a proof of concept, we apply this methodology to the oligomerization of an Abeta peptide fragment (Abeta 21–43). Abeta oligomers are now widely recognized as the primary neurotoxic structures leading to Alzheimer’s disease. Our computational methods predict that Abeta trimers form at micromolar concentrations in 10 ms, while tetramers form 1000 times more slowly. Moreover, the simulation results predict specific intermonomer contacts present in the oligomer ensemble as well as putative structures for small molecular weight oligomers. Based on our simulations and statistical models, we propose a novel mutation to stabilize the trimeric form of Abeta in an experimentally verifiable manner.
Gregory R. Bowman, Xuhui Huang, Yuan Yao, Jian Sun, Gunnar Carlsson, Leonidas J. Guibas, and Vijay S. Pande
J. Am. Chem. Soc., 130 (30), 9676–9678, 2008
ABSTRACT. Hairpins are a ubiquitous secondary structure motif in RNA molecules. Despite their simple structure, there is some debate over whether they fold in a two-state or multi-state manner. We have studied the folding of a small tetraloop hairpin using a serial version of replica exchange molecular dynamics on a distributed computing environment. On the basis of these simulations, we have identified a number of intermediates that are consistent with experimental results. We also find that folding is not simply the reverse of high-temperature unfolding and suggest that this may be a general feature of biomolecular folding.
ABSTRACT.We have implemented the serial replica exchange method (SREM) and simulated tempering (ST) enhanced sampling algorithms in a global distributed computing environment. Here we examine the helix-coil transition of a 21 residue alpha-helical peptide in explicit solvent. For ST, we demonstrate the efficacy of a new method for determining initial weights allowing the system to perform a random walk in temperature space based on short trial simulations. These weights are updated throughout the production simulation by an adaptive weighting method. We give a detailed comparison of SREM, ST, as well as standard MD and find that SREM and ST give equivalent results in reasonable agreement with experimental data. In addition, we find that both enhanced sampling methods are much more efficient than standard MD simulations. The melting temperature of the Fs peptide with the AMBER99phi potential was calculated to be about 310 K, which is in reasonable agreement with the experimental value of 334 K. We also discuss other temperature dependent properties of the helix-coil transition. Although ST has certain advantages over SREM, both SREM and ST are shown to be powerful methods via distributed computing and will be applied extensively in future studies of complex bimolecular systems.
Nina Singhal Heinrichs and Vijay S. Pande. Journal of Chemical Physics (2007)
SUMMARY. This paper lays out how one can revamp FAH calculations to make them considerably more efficient, perhaps by as much as 1000x reduction in the needed computer time. The basic idea is that we use FAH to build a model of the problem in question (a so-called Markovian state model or MSM) and then use the MSM to predict experimental quantities. When using an MSM to make predictions, the question is usually have we done enough computation to make a sufficiently good (precise) prediction. By calculating the uncertainty (precision) on the fly, we can now send FAH clients to the parts of the problem which are uncertainty limiting. We show that this approach can be considerably more efficiently (1000x) than just running with even sampling. This approach is being incorporated into the FAH server code. One exciting ramification of this work is that while MSM’s were originally formulated as a means to use a large distributed cluster (like Folding@home with 300,000 processors) to try to reproduce what a single, hypothetical machine which is 300,000x faster (which doesn’t exist) could do. However, even if that 300,000x faster machine did exist, we show that our approach would be more efficient than a single, long trajectory, suggesting that MSM-based methods should be useful for a very broad set of computer hardware, not just distributed computing platforms.
D. Ensign, P. M. Kasson, and V. S. Pande. Journal of Molecular Biology (2007)
SUMMARY: This paper describes the first set of results generated using the SMP clients. The main advantage of using SMP for these sorts of calculations is that the amount of computation that one client can do is several times larger than the traditional clients. This means that our simulations can get many times longer that before; in fact, this has allowed us to generate several hundred folding trajectories of the fastest-folding protein known, the HP35-NleNle variant of the villin headpiece subdomain. In this paper, because our simulation time scales compare well to the 700-nanosecond experimental folding time of this protein, AND we’ve generated enough trajectories to get good statistics, we can shed some light on the experimental results. To summarize the result, the first helix of the protein was thought to be highly structured in the unfolded state of the protein; we’ve suggested that structure in this part of the molecule is not enough to lead to fast folding, and that longer time scales than the 700-ns mark may be present in this system.
Check out the movie: it shows some simulation we did for this work, although watching one trajectory is emphatically NOT statistically significant! Some more visualizations of villin from our earlier work can be found on this page.
We have also made the raw data available to researchers on a SimTk.org page. This site includes the raw data, as well as scripts to automate the process and a VMD plugin to allow for browsing of the data. Please contact email@example.com if you need help with doing this.
ABSTRACT: We have performed molecular dynamics simulations on a set of nine unfolded conformations of the fastest-folding protein yet discovered, a variant of the villin headpiece subdomain (HP-35 NleNle). The simulations were generated using a new distributed computing method, yielding hundreds of trajectories each on a time scale comparable to the experimental folding time, despite the large (10,000 atom) size of the simulation system. This strategy eliminates the need to assume a two-state kinetic model or to build a Markov state model. The relaxation to the folded state at 300 K from the unfolded configurations (generated by simulation at 373 K) was monitored by a method intended to reflect the experimental observable (quenching of tryptophan by histidine). We also monitored the relaxation to the native state by directly comparing structural snapshots with the native state. The rate of relaxation to the native state and the number of resolvable kinetic time scales both depend upon starting structure. Moreover, starting structures with folding rates most similar to experiment show some native-like structure in the N-terminal helix (helix 1) and the phenylalanine residues constituting the hydrophobic core, suggesting that these elements may exist in the experimentally relevant unfolded state. Our large-scale simulation data reveal kinetic complexity not resolved in the experimental data. Based on these findings, we propose additional experiments to further probe the kinetics of villin folding.
P. M. Kasson and V. S. Pande. PLoS Computational Biology (2007)
SUMMARY: Here, we use molecular molecular-dynamics simulations of lipid vesicle fusion under different lipid compositions to generate a more detailed explanation for how composition controls membrane fusion. We predict that lipid composition affects both the initial process of forming a contact stalk between two vesicles and the formation of a metastable hemifused intermediate. These two roles act in concert to change both the rate of fusion and the level of detectable fusion intermediates. We also present initial results on fusion of vesicles at different membrane curvatures. Recent experimental results suggest that the creation of highly curved membranes is important to fusion of synaptic vesicles. Our simulations cover a curvature regime similar to these experimental systems. In combination with previous results, we predict that the effect of lipid composition on fusion is general across different membrane curvatures, but that the rate of fusion is controlled by both composition and curvature.
ABSTRACT: Membrane fusion is critical to biological processes such as viral infection, endocrine hormone secretion, and neurotransmission, yet the precise mechanistic details of the fusion process remain unknown. Current experimental and computational model systems approximate the complex physiological membrane environment for fusion using one or a few protein and lipid species. Here, we report results of a computational model system for fusion in which the ratio of lipid components was systematically varied, using thousands of simulations of up to a microsecond in length to predict the effects of lipid composition on both fusion kinetics and mechanism. In our simulations, increased phosphatidylcholine content in vesicles causes increased activation energies for formation of the initial stalk-like intermediate for fusion and of hemifusion intermediates, in accordance with previous continuum-mechanics theoretical treatments. We also use our large simulation dataset to quantitatively compare the mechanism by which vesicles fuse at different lipid compositions, showing a significant difference in fusion kinetics and mechanism at different compositions simulated. As physiological membranes have different compositions in the inner and outer leaflets, we examine the effect of such asymmetry, as well as the effect of membrane curvature on fusion. These predicted effects of lipid composition on fusion mechanism both underscore the way in which experimental model system construction may affect the observed mechanism of fusion and illustrate a potential mechanism for cellular regulation of the fusion process by altering membrane composition.
P. M. Kasson, A. Zomorodian, S. Park, N. Singhal, L. J. Guibas, and V. S. Pande. Bioinformatics (2007)
SUMMARY: One challenge in analyzing membrane fusion pathways is simply characterizing the structural intermediates involved. This paper describes use of methods from computational topology and geometry to better measure changes in vesicle structure relevant to fusion.
MOTIVATION: Membrane fusion constitutes a key stage in cellular processes such as synaptic neurotransmission and infection by enveloped viruses. Current experimental assays for fusion have thus far been unable to resolve early fusion events in fine structural detail. We have previously used molecular dynamics simulations to develop mechanistic models of fusion by small lipid vesicles. Here, we introduce a novel structural measurement of vesicle topology and fusion geometry: persistent voids.
RESULTS: Persistent voids calculations enable systematic measurement of structural changes in vesicle fusion by assessing fusion stalk widths. They also constitute a generally applicable technique for assessing lipid topological change. We use persistent voids to compute dynamic relationships between hemifusion neck widening and formation of a full fusion pore in our simulation data. We predict that a tightly coordinated process of hemifusion neck expansion and pore formation is responsible for the rapid vesicle fusion mechanism, while isolated enlargement of the hemifusion diaphragm leads to the formation of a metastable hemifused intermediate. These findings suggest that rapid fusion between small vesicles proceeds via a small hemifusion diaphragm rather than a fully expanded one.
SUMMARY: When proteins fold inside a cell, they are frequently subjected to various amounts of spatial confinement. Specifically, misfolded or unfolded proteins can be encapsulated inside a helper molecule called a chaperonin. These chaperonins are involved with helping proteins fold inside cells. Here we investigate how confinement affects protein folding using a simple model: a fast folding mini-protein confined to a nanopore. We find that if we confine the protein, but allow the surrounding water molecules to pass freely in and out of the nanopore, the protein is more likely to reach the folded state. On the other hand, if we make the nanopore water-tight, we find that the protein is less likely to fold. Specifically it is pushed into a small non-native globule. This suggests that when thinking of folding inside a confined space (like a chaperonin) it is important to remember both protein and water are confined, and this confined water can have an affect on protein folding.
ABSTRACT: Although most experimental and theoretical studies of protein folding involve proteins in vitro, the effects of spatial confinement may complicate protein folding in vivo. In this study, we examine the folding dynamics of villin (a small fast folding protein) with explicit solvent confined to an inert nanopore. We have calculated the probability of folding before unfolding (P fold) under various confinement regimes. Using P fold correlation techniques, we observed two competing effects. Confining protein alone promotes folding by destabilizing the unfolded state. In contrast, confining both protein and solvent gives rise to a solvent-mediated effect that destabilizes the native state. When both protein and solvent are confined we see unfolding to a compact unfolded state different from the unfolded state seen in bulk. Thus, we demonstrate that the confinement of solvent has a significant impact on protein kinetics and thermodynamics. We conclude with a discussion of the implications of these results for folding in confined environments such as the chaperonin cavity in vivo.
J. Chodera, N. Singhal, V. S. Pande, K. Dill, and W. Swope. Journal of Chemical Physics, (2007)
SUMMARY: In order to break up calculations to run on Folding@home and then repiece them together in order to act like a single, very, very, very fast computer, we need special algorithms. We are constantly trying to improve our methods in these directions and this paper represents our latest state of the art in this direction.
ABSTRACT: To meet the challenge of modeling the conformational dynamics of biological macromolecules over long timescales, much recent effort has been devoted to constructing stochastic kinetic models, often in the form of discrete-state Markov models, from short molecular dynamics simulations. To construct useful models that faithfully represent dynamics at the timescales of interest, it is necessary to decompose configuration space into a set of kinetically metastable states. Previous attempts to define these states have relied upon either prior knowledge of the slow degrees of freedom or on the application of conformational clustering techniques which assume that conformationally distinct clusters are also kinetically distinct. Here, we present a first version of an automatic algorithm for the discovery of kinetically metastable states that is generally applicable to solvated macromolecules. Given molecular dynamics trajectories initiated from a well-defined starting distribution, the algorithm discovers long-lived, kinetically metastable states through successive iterations of partitioning and aggregating conformation space into kinetically related regions. We apply this method to three peptides in explicit solvent terminally blocked alanine, the engineered 12-residue beta-hairpin trpzip2, and the 21-residue helical Fs peptide to assess its ability to generate physically meaningful states and faithful kinetic models.
SUMMARY: Storage@home is a distributed storage infrastructure
developed to solve the problem of backing up and sharing
petabytes of scientific results using a distributed model of
volunteer managed hosts. Data is maintained by a
mixture of replication and monitoring, with repairs done
Erich Elsen, Mike Houston, V. Vishal, Eric Darve, Pat Hanrahan, and Vijay Pande. Proceedings of the 2006 ACM/IEEE conference on Supercomputing (2006).
SUMMARY. This paper details our first efforts with GPU’s for molecular dynamics. This work lead to the GPU1 FAH core. We have other papers in the works describing the successor to the GPU1 core as well as the PS3 core.
ABSTRACT. Commercial graphics processors (GPUs) have high compute capacity at very low cost, which makes them attractive for general purpose scientific computing. In this poster we show how graphics processors can be used for N-body simulations to obtain large improvements in performance over current generation CPUs. We have developed a highly optimized algorithm for performing the O(N^2) force calculations that constitute the major part of stellar and molecular dynamics simulations. In the calculations, we achieve sustained performance of nearly 100 GFlops on an ATI X1900XTX. The performance on GPUs 25x an Intel Pentium 4, and 2x specialized hardware such as GRAPE-6A, but at a fraction of the cost. Furthermore, the wide availability of GPUs has significant implications for cluster computing and distributed computing efforts like Folding@home.
SUMMARY: We have been applying Folding@home to study the nature of key proteins involved in how flu (the influenza virus) gains access into host cells. This paper reflects our first work in this direction.
ABSTRACT: Massively parallel all-atom, explicit solvent molecular dynamics simulations were used to explore the formation and existence of local structure in two small alpha-helical proteins, the villin headpiece and the helical fragment B of protein A. We report on the existence of transient helices and combinations of helices in the unfolded ensemble, and on the order of formation of helices, which appears to largely agree with previous experimental results. Transient local structure is observed even in the absence of overall native structure. We also calculate sets of residue-residue pairs that are statistically predictive of the formation of given local structures in our simulations.
Some more visualizations of villin from our earlier work can be found on this page.
C. Snow and V. S. Pande. Biophysical Journal, (2006)
ABSTRACT: Using distributed molecular dynamics simulations we located 4 distinct folding transitions for a 39 residue beta-beta-alpha-beta protein fold. We introduce and sequentially determine the transmission probability, Ptrans, of 500 conformations along each free energy barrier at room temperature, and determined which conformations were transition state ensemble members (Ptrans ≈ 0.5). We ran similar simulations at 82°C, determined the change in Ptrans with temperature for all 2,000 conformations, and observed Hammond behavior directly using Ptrans correlation. The polymer temperature increase only slightly perturbed the transition probabilities. We propose that diffusion along Ptrans may provide the configurational diffusion rate at the top of the barrier. Specifically, given a transition state conformation x0 with estimated Ptrans = 0.5, we selected a large set of subsequent conformations from independent trajectories, each exactly a small time δt after x0 (250ps). Then we calculated Ptrans for each of the new trial conformations. The P(Ptrans|δt=250ps) distribution reflects diffusion along an ideal kinetic reaction coordinate. This approach provides a novel perspective on the nature of a protein folding transition, and provides a framework for quantitative study of activated relaxation kinetics.
Guha Jayachandran, M. R. Shirts, S. Park, and V. S. Pande. Journal of Chemical Physics, (2006)
ABSTRACT: We present a technique for biomolecular free energy calculations that exploits highly parallelized sampling to significantly reduce the time to results. The technique combines free energies for multiple, nonoverlapping configurational macrostates and is naturally suited to distributed computing. We describe a methodology that uses this technique with docking, molecular dynamics, and free energy perturbation to compute absolute free energies of binding quickly compared to previous methods. The method does not require a priori knowledge of the binding pose as long as the docking technique used can generate reasonable binding modes. We demonstrate the method on the protein FKBP12 and eight of its inhibitors.
Guha Jayachandran, V. Vishal, and V. S. Pande. Journal of Chemical Physics (2006)
SUMMARY: We have developed a new method which greatly extends Folding@home’s ability to simulate long timescales. This new method (MSM) will be applied to essentially all new Folding@home projects. This paper demonstrates MSM’s applied to a challenging target — the villin headpiece.
ABSTRACT: We report on the use of large-scale distributed computing simulation and novel analysis techniques for examining the dynamics of a small protein. Matters addressed include folding rate, very long timescale kinetics, ensemble properties, and interaction with water. The target system for the study, the villin headpiece, has been of great interest to experimentalists and theorists both. Sampling totaled nearly 500 of the most extensive published to date for a system of villin’s size in explicit solvent with all atom detail and was in the form of tens of thousands of independent molecular dynamics trajectories, each several tens of nanoseconds in length. We report on kinetics sensitivity analyses that, using a set of short simulations, probed the role of water in villin’s folding and sensitivity to the simulation’s electrostatics treatment. By constructing Markovian state models from the collected data, we were able to propagate dynamics to times far beyond those directly simulated and to rapidly compute mean first passage times, long time kinetics (tens of microseconds), and evolution of ensemble property distributions over long times, otherwise currently impossible. We also tested our MSM by using it to predict the structure of villin de novo.
P. Kasson, N. Kelley, N. Singhal, M. Vrjlic, A. Brunger, and V. S. Pande. Proceedings of the National Academy of Sciences, USA
SUMMARY: These first results describe work we’ve been doing to study membrane fusion, the process by which two lipid membranes become one. This process is critical to proper functioning of the cell and also phenomena such as neurotransmission and infection by many viruses. We are seeking to understand how membrane fusion works so that we can eventually manipulate it. We hope such an understanding will lead to the development of new and more effective drugs to combat viral infection and treat neurologic diseases.
ABSTRACT: Lipid membrane fusion is critical to cellular transport and signaling processes such as constitutive secretion, neurotransmitter release, and infection by enveloped viruses. Here, we introduce a powerful computational methodology for simulating membrane fusion from a starting configuration designed to approximate activated prefusion assemblies from neuronal and viral fusion, producing results on a time scale and degree of mechanistic detail not previously possible to our knowledge. We use an approach to the long time scale simulation of fusion by constructing a Markovian state model with large-scale distributed computing, yielding an understanding of fusion mechanisms on time scales previously impossible to simulate to our knowledge. Our simulation data suggest a branched pathway for fusion, in which a common stalk-like intermediate can either rapidly form a fusion pore or remain in a metastable hemifused state that slowly forms fully fused vesicles. This branched reaction pathway provides a mechanistic explanation both for the biphasic fusion kinetics and the stable hemifused intermediates previously observed experimentally. Our distributed computing and Markovian state model approaches provide sufficient sampling to detect rare transitions, a systematic process for analyzing reaction pathways, and the ability to develop quantitative approximations of reaction kinetics for fusion.
SUMMARY: The ability to quantitatively predict electric fields in proteins has remained a great challenge. In this paper, we combine new experimental methods with new theoretical methods made possible by Folding@home distributed computing to greatly push the boundary of what one could previously predict. In particular, we see that a single structure is insufficient to make accurate predictions, suggesting that the ensemble approaches inherent to Folding@home may be important in predicting electrostatics in proteins.
ABSTRACT: The electric fields produced in folded proteins influence nearly every aspect of protein function. We present a vibrational spectroscopy technique that measures changes in electric field at a specific site of a protein as shifts in frequency (Stark shifts) of a calibrated nitrile vibration. A nitrile-containing inhibitor is used to deliver a unique probe vibration to the active site of human aldose reductase, and the response of the nitrile stretch frequency is measured for a series of mutations in the enzyme active site. These shifts yield quantitative information on electric fields that can be directly compared with electrostatics calculations. We show that extensive molecular dynamics simulations and ensemble averaging are required to reproduce the observed changes in field.
L.T. Chong, W. C. Swope, J. W. Pitera, and V. S. Pande. Journal of Molecular Biology (2006)
SUMMARY: Roughly half of all known cancers involve a mutation in a single protein: p53. P53 serves to protect us from getting cancer; when p53 fails, one often gets cancer. We have developed a new method for predicting how mutations in p53, a protein central to cancer, would impact p53. This new method is naturally suited for distributed computing and can predict several mutations found to date.
ABSTRACT: We have developed a novel computational alanine scanning approach that involves analysis of ensemble unfolding kinetics at high temperature to identify residues that are critical for the stability of a given protein. This approach has been applied to dimerization of the oligomerization domain (residues 326-355) of tumor suppressor p53. As validated by experimental results, our approach has reasonable success in identifying deleterious mutations, including mutations that have been linked to cancer. We discuss a method for determining the effect of mutations on the location of the dimerization transition state.
S. Park and V. S. Pande. Journal of Chemical Physics (2006)
SUMMARY: Markov State Models (MSM’s) have become a major part of how Folding@home calculations are performed. In particular, the MSM technique is at the heart of how one can divide complex calculations like protein folding or lipid vesicle dynamics on 10,000 to 100,000 CPU’s — i.e. how distributed computing can tackle complex problems. This paper presents a new way to test the validity of MSM’s generated to make sure that the models are suitable and self-consistent.
ABSTRACT: Markov state models are kinetic models built from the dynamics of molecular simulation trajectories by grouping similar configurations into states and examining the transition probabilities between states. Here we present a procedure for validating the underlying Markov assumption in Markov state models based on information theory using Shannon’s entropy. This entropy method is applied to a simple system and is compared with the previous eigenvalue method. The entropy method also provides a way to identify states that are least Markovian, which can then be divided into finer states to improve the model.
Young Min Rhee and Vijay S. Pande. Chemical Physics (2006)
SUMMARY: How important are local chemical features of proteins during the folding process? We assess protein folding models with varying degrees of chemical detail to gain an understanding of how they perform relative to some of today’s most sophisticated models.
ABSTRACT: Is an all-atom representation for protein and solvent necessary for simulating protein folding kinetics or can simpler models reproduce the results of more complex models? This question is relevant not just for simulation methodology, but also for the general understanding of the chemical details relevant for protein dynamics. With recent advances in computational methodology, it is now possible to simulate the folding kinetics of small proteins in all-atom detail. Therefore, with both detailed and simplified models of folding in hand, the outstanding questions are what the differences in these models are for the description of protein folding dynamics, and how we can quantitatively compare the folding mechanisms found in the models. To address the outstanding problem of how to determine the differences between folding mechanism in a sensitive and quantitative manner, we suggest a new method to quantify the non-linear correlation in folding commitment probability (Pfold) values. We use this method to probe the differences between a wide range of models for folding simulations, ranging from coarse grained Go models to all-atom models with implicit or explicit solvation. While the differences between less-detailed models (Go and implicit solvation models) and explicit solvation models are large, the differences within various explicit solvation models appear to be small, suggesting that the discrete nature of water may play a role in folding kinetics.
ABSTRACT: In striking contrast to simple polymer physics theory, which does not account for solvent effects, we find that physical confinement of solvated biopolymers decreases solvent entropy, which in turn leads to a reduction in the organized structural content of the polymer. Since our theory is based on a fundamental property of water-protein statistical mechanics, we expect it to have broad implications in many biological and material science contexts.
Eric J. Sorin, Young Min Rhee, Michael R. Shirts, and Vijay S. Pande. Journal of Molecular Biology (2006)
SUMMARY: How complicated is a helix, and how is the complexity of helical structure affected by the solvent? Here we show, through a novel “computational hydrophobic titration” experiment, that many features of helices can be rationalized and/or explained by considering the interactions along the peptide-solvent interface.
TECHNICAL ABSTRACT: The 21-residue polyalanine-based Fs peptide was studied using thousands of long, explicit solvent, atomistic molecular dynamics simulations which reached equilibrium at the ensemble level. Peptide conformational preference as a function of hydrophobicity was examined using a spectrum of explicit solvent models, and the peptide length dependence of the hydrophilic and hydrophobic components of solvent-accessible surface area for several ideal conformational types was also considered. Our results demonstrate how the character of the solvation interface induces several conformational preferences, including a decrease in mean helical content with increased hydrophilicity, which occurs predominantly through reduced nucleation tendency and, to a lesser extent, destabilization of helical propagation. Interestingly, an opposing effect occurs through increased propensity for 310-helix conformations, as well as increased polyproline structure. Our observations provide a framework for understanding previous reports of conformational preferences in polyalanine-based peptides including (i) terminal 310-helix prominence, (ii) low p-helix propensity, (iii) increased polyproline conformations in short and unfolded peptides, and (iv) membrane helix stability in the presence and absence of water. These observations lend physical insight into the role of water in peptide conformational equilibria at the atomic level, and expand our view of the complexity of even the most “simple” of biopolymers. Whereas previous studies have focused predominantly on hydrophobic effects with respect to tertiary structure, this report highlights the need for consideration of such effects on the secondary structural level.
Paula Petrone and Vijay S. Pande. Biophysical Journal (2006)
SUMMARY: In allosteric regulation, protein activity is altered when ligand binding (or unbinding) causes changes in the protein conformation. Little is known about which aspects of the protein architecture are responsible for allosteric regulation, however most of these changes involve collective displacements of atoms (domain and hinge-bending motions) which are likely to occur in the microsecond timescale. Normal mode analysis (NMA) decouples the complex motions and fluctuations of proteins into a linear combination of orthogonal basis vectors, each representing an independent concerted harmonic motion with a characteristic frequency. In principle, it would be a natural basis in which to represent conformational change that involves collective motions of atoms. This paper addresses the limitations of NMA, namely how many normal modes are necessary to achieve a certain degree of accuracy in the representation.
TECHNICAL ABSTRACT: We suggest a simple method to assess how many normal modes are needed to map a conformational change. By projecting the conformational change onto a subspace of the normal mode vectors and, using RMSD as a test of accuracy, we find that the first 20 modes only contribute 50% or less of the total conformational change in four test cases (myosin, calmodulin, NtrC, and hemoglobin). In some allosteric systems, like the molecular switch NtrC, the conformational change is localized to a limited number of residues. We find that many more modes are necessary to accurately map this collective displacement. In addition, the normal mode spectra can provide useful information about the details of the conformational change, especially when comparing structures with different bound ligands, in this case, calmodulin. Indeed, this approach presents normal mode analysis as a useful basis in which to capture the mechanism of conformational change, and shows that the number of normal modes needed to capture the essential collective motions of atoms should be chosen according to the required accuracy.
Bojan Zagrovic, Guha Jayachandran, Ian S. Millett, Sebastian Doniach and Vijay S. Pande. Journal of Molecular Biology (2005)
SUMMARY: Direct comparisons are made between Folding@home simulations and experimental measurements (SAXS) to determine molecular size of helical peptides of varying length, revealing the compact nature of such helical peptides.
TECHNICAL ABSTRACT: Using synchrotron radiation and the small-angle X-ray scattering technique we have measured the radii of gyration of a series of alaninebased a-helix-forming peptides of the composition Ace-(AAKAA)n-GYNH2, nZ2-7, in aqueous solvent at 10C. In contrast to other techniques typically used to study a-helices in isolation (such as nuclear magnetic resonance and circular dichroism), small-angle X-ray scattering reports on the global structure of a molecule and, as such, provides complementary information to these other, more sequence-local measuring techniques. The radii of gyration that we measure are, except for the 12-mer, lower than the radii of gyration of ideal a-helices or helices with frayed ends of the equivalent sequence-length. For example, the measured radius of gyration of the 37-mer is 14.2 A , which is to be compared with the radius of gyration of an ideal 37-mer a-helix of 17.6 A . Attempts are made to analyze the origin of this discrepancy in terms of the analytical Zimm-Bragg-Nagai (ZBN) theory, as well as distributed computing explicit solvent molecular dynamics simulations using two variants of the AMBER force-field. The ZBN theory, which treats helices as cylinders connected by random walk segments, predicts markedly larger radii of gyration than those measured. This is true even when the persistence length of the random walk parts is taken to be extremely short (about one residue). Similarly, the molecular dynamics simulations, at the level of sampling available to us, give inaccurate values of the radii of gyration of the molecules (by overestimating them by around 25% for longer peptides) and/or their helical content. We conclude that even at the short sequences examined here (%37 amino acid residues), these a-helical peptides behave as fluctuating semi-broken rods rather than straight cylinders with frayed ends.
Nina Singhal and Vijay S. Pande. Journal of Chemical Physics (2005)
SUMMARY: We validate the new Markovian State Model (MSM) for describing protein dynamics, and show how to efficiently calculate how accurate these models are. We also describe how to start new FAH simulations to best improve the accuracy of the model.
TECHNICAL ABSTRACT: In previous work, we described a Markovian state model (MSM) for analyzing molecular-dynamics trajectories, which involved grouping conformations into states and estimating the transition probabilities between states. In this paper, we analyze the errors in this model caused by finite sampling. We give different methods with various approximations to determine the precision of the reported mean first passage times. These approximations are validated on an 87 state toy Markovian system. In addition, we propose an efficient and practical sampling algorithm that uses these error calculations to build a MSM that has the same precision in mean first passage time values but requires an order of magnitude fewer samples. We also show how these methods can be scaled to large systems using sparse matrix methods.
Hideaki Fujutani, Yoshiaki Tanida, Masakatsu Ito, Guha Jayachandran, Christopher D. Snow, Michael R. Shirts, Eric J. Sorin, and Vijay S. Pande Journal of Chemical Physics (2005)
SUMMARY: Drug design calculations are generally very difficult. Here we show that calculations made previously on the Folding@home network are possible on a much smaller supercomputer system without loss of numerical precision.
TECHNICAL ABSTRACT: Direct calculations of the absolute binding free energies for eight FKBP ligands were performed using the Fujitsu BioServer massively parallel computer. Using latest version of the general AMBER force field (GAFF) for ligand model parameters and the Bennett acceptance ratio for computing free energy differences, we obtained an excellent linear fit between the calculated and experimental binding free energies. The RMS error from a linear fit is 0.4 kcal/mol for eight ligand complexes. In comparison with a previous study of the binding energies of these same eight ligand complexes, these results suggest that the use of improved model parameters can lead to more predictive binding estimates, and that these estimates can be obtained with significantly less computer time than previously thought. These findings make such direct methods more attractive for use in rational drug design.
Sanghyun Park, Randall J. Radmer, Teri E. Klein, and Vijay S. Pande. Journal of Computational Chemistry (2005)
SUMMARY: Simulation of the collagen triple helix has been given less attention than more common protein “folds.” Here we present newly derived parameters for such simulations to gain better agreement with experimental data, and thereby offering insight into the stability of the triple helix structure.
TECHNICAL ABSTRACT: Recently, the importance of proline ring pucker conformations in collagen has been suggested in the context of hydroxylation of prolines. The previous molecular mechanics parameters for hydroxyproline, however, do not reproduce the correct pucker preference. We have developed a new set of parameters that reproduces the correct pucker preference. Our molecular dynamics simulations of proline and hydroxyproline monomers as well as collagen-like peptides, using the new parameters, support the theory that the role of hydroxylation in collagen is to stabilize the triple helix by adjusting to the right pucker conformation (and thus the right f angle) in the Y position.
Michael R. Shirts & Vijay S. Pande. Journal of Chemical Physics (2005)
SUMMARY: We test new methods for free energy calculations — relevant for our computational drug design methodology. We find that the BAR method we previously investigated is significantly better than methods commonly employed. We have already gotten a lot of positive feedback about this work from others in the field, as they have been starting to use the results of this work to improve their calculations as well.
TECHNICAL ABSTRACT: Recent work has demonstrated the Bennett acceptance ratio method is the best asymptotically unbiased method for determining the equilibrium free energy between two end states given work distributions collected from either equilibrium and non-equilibrium data. However, it is still not clear what the practical advantage of this acceptance ratio method is over other common methods in atomistic simulations. In this study, we first review theoretical estimates of the bias and variance of exponential averaging (EXP), thermodynamic integration (TI), and the Bennett acceptance ratios (BAR). In the process, we present a new simple scheme for computing the variance and bias of many estimators, and demonstrate the connections between BAR and the weighted histogram analysis method. Next, a series of analytically solvable toy problems is examined to shed more light on the relative performance in terms of the bias and efficiency of these three methods. Interestingly, it is impossible to conclusively identify a best method for calculating the free energy, as each of the three methods performs more efficiently than the others in at least one situation examined in these toy problems. Finally, sample problems of the insertion/deletion of both a Lennard-Jones particle and a much larger molecule in TIP3P water are examined by these three methods. In all tests of atomistic systems, free energies obtained with BAR have significantly lower bias and smaller variance than when using EXP or TI, especially when the overlap in phase space between end states is small. For example, BAR can extract as much information from multiple fast, far-from-equilibrium simulations as from fewer simulations near equilibrium, which EXP cannot. Although TI and sometimes even EXP can be somewhat more efficient in idealized toy problems, in the realistic atomistic situations tested in this paper, BAR is significantly more efficient than all other methods.
Michael R. Shirts & Vijay S. Pande. Journal of Chemical Physics (2005)
SUMMARY: This paper is a test of our methods for free energy calculation — critical to our computational drug design methodology. We achieve a higher level of accuracy and precision than before. Moreover, our recent research in computational efficiency of free energy methods allows us to perform simulations on a local cluster that previously required large scale distributed computing, performing four times as much computational work in approximately a tenth of the computer time as a similar study a year ago.
TECHNICAL ABSTRACT: Quantitative free energy computation involves both using a model that is sufficiently faithful to the experimental system under study (accuracy) and establishing statistically meaningful measures of the uncertainties resulting from finite sampling (precision). In order to examine the accuracy of a range of common water models used for protein simulation for their solute/solvent properties, we calculate the free energy of hydration of 15 amino acid side chain analogs derived from the OPLS-AA parameter set with the TIP3P, TIP4P, SPC, SPC/E, TIP3P-MOD, and TIP4P-Ew water models. We achieve a high degree of statistical precision in our simulations, obtaining uncertainties for the free energy of hydration of 0.02-0.06 kcal/mol, equivalent to that obtained in experimental hydration free energy measurements of the same molecules. We find that TIP3P-MOD, a model designed to give improved free energy of hydration for methane, gives uniformly the closest match to experiment; we also find that the ability to accurately model pure water properties does not necessarily predict ability to predict solute/solvent behavior. We also evaluate the free energies of a number of novel modifications of TIP3P designed as a proof of concept that it is possible to obtain much better solute/solvent free energetic behavior without substantially negatively affecting pure water properties. We decrease the average error to zero while reducing the rms error below that of any of the published water models, with measured liquid water properties remaining almost constant with respect to our perturbations. This demonstrates there is still both room for improvement within current fixed-charge biomolecular force fields and significant parameter flexibility to make these improvements. Recent research in computational efficiency of free energy methods allows us to perform simulations on a local cluster that previously required large scale distributed computing, performing four times as much computational work in approximately a tenth of the computer time as a similar study a year ago.
Sidney Elmer, Sanghyun Park, & Vijay S. Pande. Journal of Chemical Physics (2005)
SUMMARY: Here, we lay out some of the first applications of a new method for future FAH calculations. This new method, Markovian State Models (MSM), allows FAH to solve some important limitations of previous methods. Since these limitations are most relevant for larger and more complex systems than what has been done in FAH so far, this does not affect the work in the past. However, it lays the foundation for FAH to tackle even more complex and challenging problems.
TECHNICAL ABSTRACT: In this article, we analyze the folding dynamics of an all-atom model of a polyphenylacetylene (pPA) 12-mer in explicit solvent for four common organic and aqueous solvents: acetonitrile,chloroform, methanol, and water. The solvent quality has a dramatic effect on the time scales in which pPA 12-mers fold. Acetonitrile was found to manifest ideal folding conditions as suggested by optimal folding times on the order of ~100-200 ns, depending on temperature. In contrast,
chloroform and water were observed to hinder the folding of the pPA 12-mer due to extreme solvation conditions relative to acetonitrile; chloroform denatures the oligomer, whereas water promotes aggregation and traps. The pPA 12-mer in a pure methanol solution folded in ~400 ns at 300 K, compared relative to the experimental 12-mer folding time of ~160 ns measured in a 1:1 v/v THF/methanol solution. Requisite in drawing the aforementioned conclusions, analysis techniques based on Markov state models are applied to multiple short independent trajectories to extrapolate the long-time scale dynamics of the 12-mer in each respective solvent. We review the theory of
Markov chains and derive a method to impose detailed balance on a transition probability matrix computed from simulation data.
Sidney Elmer, Sanghyun Park, & Vijay S. Pande. Journal of Chemical Physics (2005)
SUMMARY: Here, we lay out some new methodology for simulation for future FAH calculations. This new method, Markovian State Models (MSM), allows FAH to solve some important limitations of previous methods. Since these limitations are most relevant for larger and more complex systems than what has been done in FAH so far, this does not affect the work in the past. However, it lays the foundation for FAH to tackle even more complex and challenging problems.
TECHNICAL ABSTRACT: The structural landscape of poly-phenylacetylene (pPA), otherwise known as m-phenylene ethynylene oligomers, has been shown to consist of a very diverse set of conformations, including helices, turns, and knots. Defining a state space decomposition to classify these conformations into easily identifiable states is an important step in understanding the dynamics in relation to Markov state models. We define the state decomposition of pPA oligomers in terms of the sequence of discretized dihedral angles between adjacent phenyl rings along the oligomer backbone. Furthermore, we derive in mathematical detail an approach to further reduce the number of states by grouping symmetrically equivalent states into a single parent state. A more challenging problem requires a formal definition for knotted states in the structural landscape. Assuming that the oligomer chain can only cross the ideal helix path once, we propose a technique to define a knotted state derived from a helical state determined by the position along the helical nucleus where the chain crosses the ideal helix path. Several examples of helical states and knotted states from the pPA 12-mer illustrate the principles outlined in this article.
Bojan Zagrovic, Jan Lipfert, Eric J. Sorin, Ian S. Millett, Wilfred F. van Gunsteren, Sebastian Doniach & Vijay S. Pande. Proceedings of the National Academy of Sciences (2005)
SUMMARY: This study probes the structural character of a small peptide using experiment and simulation. It highlights the differences between global and local structural information, suggesting a new model for PPII conformational character, which is thought to be dominant in the unfolded state of proteins.
TECHNICAL ABSTRACT: Polyproline type II (PPII) helix has emerged recently as the dominant paradigm for describing the conformation of unfolded polypeptides. However, most experimental observables used to characterize unfolded proteins typically provide only short-range, sequence-local structural information that is both time- and ensemble- averaged, giving limited detail about the long-range structure of the chain. Here, we report a study of a long-range property: the radius of gyration of an alanine-based peptide, Ace-(diaminobutyric acid)2-(Ala)7-(ornithine)2-NH2. This molecule has previously been studied as a model for the unfolded state of proteins under folding conditions and is believed to adopt a PPII fold based on short-range techniques such as NMR and CD. By using synchrotron radiation and small-angle x-ray scattering, we have determined the radius of gyration of this peptide to be 7.4(+/-0.5), which is significantly less than the value expected from an ideal PPII helix in solution (13.1). To further study this contradiction, we have used molecular dynamics simulations using six variants of the AMBER force field and the GROMOS 53A6 force field. However, in all cases, the simulated ensembles underestimate the PPII content while overestimating the experimental radius of gyration. The conformational model that we propose, based on our small angle x-ray scattering results and what is known about this molecule from before, is that of a very flexible, fluctuating structure that on the level of individual residues explores a wide basin around the ideal PPII geometry but is never, or only rarely, in the ideal extended PPII helical conformation.
Christopher D. Snow, Eric J. Sorin, Young Min Rhee, and Vijay S. Pande. Annual Review of Biophysics & Biomolecular Structure (2005)
SUMMARY: Rather than reporting new data from the Folding@home project, this review article offers an in-depth look at the current state-of-the-art in simulation-based prediction. This includes work by our group and others in the field, including many computational models and methods of extracting information that can be directly compared to experiment.
TECHNICAL ABSTRACT: Simulation of protein folding has come a long way in five years. Notably, new quantitative comparisons with experiments for small, rapidly folding proteins have become possible. As the only way to validate simulation methodology, this achievement marks a significant advance. Here, we detail these recent achievements and ask whether simulations have indeed rendered quantitative predictions in several areas, including protein folding kinetics, thermodynamics, and physics-based methods for structure prediction. We conclude by looking to the future of such comparisons between simulations and experiments.
Eric J. Sorin and Vijay S. Pande. Journal of Computational Chemistry (2005)
SUMMARY: How do the results of peptide simulations change with slight variations to the models employed? Here we answer this question with respect to very local changes in the energetics of the polymer, demonstrating the sensitivity of simulated bulk (i.e. ensemble averaged) structural equilibrium on the parameters of the model.
TECHNICAL ABSTRACT: The kinetic and thermodynamic aspects of the helix-coil transition in polyalanine-based peptides have been studied at the ensemble level using a distributed computing network. This study builds on a previous report, which critically assessed the performance of several contemporary force fields in reproducing experimental measurements and elucidated the complex nature of helix-coil systems. Here we consider the effects of modifying backbone torsions and the scaling of noncovalent interactions. Although these elements determine the potential of mean force between atoms separated by three covalent bonds (and thus largely determine the local conformational distributions observed in simulation), we demonstrate that the interplay between these factors is both complex and force field dependent. We quantitatively assess the heliophilicity of several helix-stabilizing potentials as well as the changes in heliophilicity resulting from such modifications, which can “make or break” the accuracy of a given force field, and our findings suggests that future force field development may need to better consider effect that vary with peptide length. This report also serves as an example of the utility of distributed computing in analyzing and improving upon contemporary force fields at the level of absolute ensemble equilibrium, the next step in force field development.
Eric J. Sorin and Vijay S. Pande. Biophysical Journal (2005)
SUMMARY: How good are our models for folding? This question is important to address in order to understand the usefulness of our work, as well as the work of everyone in the atomistic simulation field in general. Here, we’ve done extremely extensive tests of models used in folding to show their strengths and weaknesses. Based on their weaknesses, we have proposed a new model which appears to have a much stronger agreement with experiment.
TECHNICAL ABSTRACT: The ensemble folding of two 21-residue a-helical peptides has been studied using all-atom simulations under several variants of the AMBER potential in explicit solvent using a global distributed computing network. Our extensive sampling, orders of magnitude greater than the experimental folding time, results in complete convergence to ensemble equilibrium. This allows for a quantitative assessment of these potentials, including a new variant of the AMBER-99 force field, denoted AMBER-99f, which shows improved agreement with experimental kinetic and thermodynamic measurements. From bulk analysis of the simulated AMBER-99f equilibrium, we find that the folding landscape is pseudo-two-state, with complexity arising from the broad, shallow character of the ‘native’ and ‘unfolded’ regions of the phase space. Each of these macrostates allows for configurational diffusion among a diverse ensemble of conformational microstates with greatly varying helical content and molecular size. Indeed, the observed structural dynamics are better represented as a conformational diffusion than as a simple exponential process, and equilibrium transition rates spanning several orders of magnitude are reported. After multiple nucleation steps, on average, helix formation proceeds via a kinetic “alignment” phase in which two or more short, low-entropy helical segments form a more ideal, single-helix structure.
Eric J. Sorin, Young Min Rhee, and Vijay S. Pande. Biophysical Journal (2005)
SUMMARY: While previous studies on the folding of nucleic acid hairpins have employed simplified models of either the nucleic acid or the solvent, this paper reports the first such study using an explicit treatment of the surrounding water and counterions. We show that accounting for water molecules in this manner is necessary to most accurately characterize the energetics of hairpin folding, whereas monovalent ions appear to play only a background role.
TECHNICAL ABSTRACT: Nucleic acid structure and dynamics are known to be closely coupled to local environmental conditions and, in particular, to the ionic character of the solvent. Here we consider what role the discrete properties of water and ions play in the collapse and folding of small nucleic acids. We study the folding of an experimentally well-characterized RNA hairpin-loop motif (sequence 5′-GGGC[GCAA]GCCU-3′) via ensemble molecular dynamics simulation and, with nearly 500 of aggregate simulation time using an explicit representation of the ionic solvent, report successful ensemble folding simulations, with a predicted folding time of 8.8(2.0)s, in agreement with experimental measurements of ~10s. Comparing our results to previous folding simulations using the GB/SA continuum solvent model shows that accounting for water-mediated interactions is necessary to accurately characterize the free energy surface and stochastic nature of folding. The formation of secondary structure appears to be more rapid than the fastest ionic degrees of freedom, and counterions do not participate discretely in observed folding events. We find that hydrophobic collapse follows a predominantly expulsive mechanism in which a diffusion-search of early structural compaction is followed by final formation of native structure that occurs in tandem with solvent evacuation.
Lillian T. Chong, Christopher D. Snow, Young Min Rhee, and Vijay S. Pande. Journal of Molecular Biology (2005)
SUMMARY: Roughly half of all known cancers result from mutations in p53. Our first work in the cancer area examines the tetramerization domain of p53. We predict how p53 folds and in doing so, we can predict which amino acid mutations would be relevant. When compared with experiments, our predictions have appeared to agree with experiment and give a new interpretation to existing data.
TECHNICAL ABSTRACT: Dimerization of the p53 oligomerization domain involves coupled folding and binding of monomers. To examine the dimerization, we have performed molecular dynamics (MD) simulations of dimer folding from the rate-limiting transition state ensemble (TSE). Among 799 putative transition state structures that were selected from a large ensemble of high-temperature unfolding trajectories, 129 were identified as members of the TSE via calculation of a 50% transmission coefficient from at least 20 room-temperature simulations. This study is the first to examine the refolding of a protein dimer using MD simulations in explicit water, revealing a folding nucleus for dimerization. Our atomistic simulations are consistent with experiment and offer insight that was previously unobtainable.
Nina Singhal, Christopher D. Snow, and Vijay S. Pande. Journal of Chemical Physics (2004)
SUMMARY: How can Folding@home use thousands to millions of CPUs to efficiently simulate long timescale biomolecular dynamics? This paper outlines the “Markovian State Model” method which is the foundation of how most new Folding@home calculations are performed. The MSM method allows for a very efficient use of uncoupled simulations, as one would easily get from distributed computing.
TECHNICAL ABSTRACT: We propose an efficient method for the prediction of protein folding rate constants and mechanisms. We use molecular dynamics simulation data to build Markovian state models (MSMs), discrete representations of the pathways sampled. Using these MSMs, we can quickly calculate the folding probability (Pfold) and mean first passage time of all the sampled points. In addition, we provide techniques for evaluating these values under perturbed conditions without expensive recomputations. To demonstrate this method on a challenging system, we apply these techniques to a two-dimensional model energy landscape and the folding of a tryptophan zipper beta hairpin.
Young Min Rhee, Eric J. Sorin, Guha Jayachandran, Erik Lindahl, & Vijay S Pande. Proceedings of the National Academy of Sciences (2004)
ABSTRACT: There are many unresolved questions regarding the role of water in protein folding. Does water merely induce hydrophobic forces, or does the discrete nature of water play a structural role in folding? Are the nonadditive aspects of water important in determining the folding mechanism? To help to address these questions, we have performed simulations of the folding of a model protein (BBA5) in explicit solvent. Starting 10,000 independent trajectories from a fully unfolded conformation, we have observed numerous folding events, making this work a comprehensive study of the kinetics of protein folding starting from the unfolded state and reaching the folded state and with an explicit solvation model and experimentally validated rates. Indeed, both the raw TIP3P folding rate (4.5 +/- 2.5s) and the diffusion-constant corrected rate (7.5 +/- 4.2s) are in strong agreement with the experimentally observed rate of 7.5 +/- 3.5s. To address the role of water in folding, the mechanism is compared with that predicted from implicit solvation simulations. An examination of solvent density near hydrophobic groups during folding suggests that in the case of BBA5, there are water-induced effects not captured by implicit solvation models, including signs of a concurrent mechanism of core collapse and desolvation.
Christopher D. Snow, Linlin Qiu, Deguo Du, Feng Gai, Stephen J. Hagen, & Vijay S Pande. Proceedings of the National Academy of Sciences (2004)
ABSTRACT: We studied the microsecond folding dynamics of three hairpins (Trp zippers 1-3, TZ1-TZ3) by using temperature-jump fluorescence and atomistic molecular dynamics in implicit solvent. In addition, we studied TZ2 by using time-resolved IR spectroscopy. By using distributed computing, we obtained an aggregate simulation time of 22 ms. The simulations included 150, 212, and 48 folding events at room temperature for TZ1, TZ2, and TZ3, respectively. The all-atom optimized potentials for liquid simulations (OPLSaa) potential set predicted TZ1 and TZ2 properties well; the estimated folding rates agreed with the experimentally determined folding rates and native conformations were the global potential-energy minimum. The simulations also predicted reasonable unfolding activation enthalpies. This work, directly comparing large simulated folding ensembles with multiple spectroscopic probes, revealed both the surprising predictive ability of current models as well as their shortcomings. Specifically, for TZ1-TZ3, OPLS for united atom models had a nonnative free-energy minimum, and the folding rate for OPLSaa TZ3 was sensitive to the initial conformation. Finally, we characterized the transition state; all TZs fold by means of similar, native-like transition-state conformations.
Eric J. Sorin, Bradley J. Nakatani, Young Min Rhee, Guha Jayachandran, V Vishal, & Vijay S Pande. Journal of Molecular Biology (2004)
ABSTRACT: Recent studies in protein folding suggest that native state topology plays a dominant role in determining the folding mechanism, yet an analogous statement has not been made for RNA, most likely due to the strong coupling between the ionic environment and conformational energetics that make RNA folding more complex than protein folding. Applying a distributed computing architecture to sample nearly 5000 complete tRNA folding events using a minimalist, atomistic model, we have characterized the role of native topology in tRNA folding dynamics: the simulated bulk folding behavior predicts well the experimentally observed folding mechanism. In contrast, single-molecule folding events display multiple discrete folding transitions and compose a largely diverse, heterogeneous dynamic ensemble. This both supports an emerging view of heterogeneous folding dynamics at the microscopic level and highlights the need for single-molecule experiments and both single-molecule and bulk simulations in interpreting bulk experimental measurements.
Bojan Zagrovic & Vijay S Pande. Nature Structural Biology (2003)
ABSTRACT: Recently, we have proposed that, on average, the structure of the unfolded state of small, mostly alpha-helical proteins may be similar to the native structure (the ‘mean-structure’ hypothesis). After examining thousands of simulations of both the folded and the unfolded states of five polypeptides in atomistic detail at room temperature, we report here a result that seems at odds with the mean-structure hypothesis. Specifically, the average inter-residue distances in the collapsed unfolded structures agree well with the statistics of the ideal random-flight chain with link length of 3.8 (the length of one amino acid). A possible resolution of this apparent contradiction is offered by the observation that the inter-residue distances in a typical alpha-helix over short stretches are close to the average distances in an ideal random-flight chain.
Michael R. Shirts, Eric Bair, Giles Hooker, and Vijay S Pande. Physical Review Letters (2003)
ABSTRACT: We present a maximum likelihood argument for the Bennett acceptance ratio method, and derive a simple formula for the variance of free energy estimates generated using this method. This derivation of the acceptance ratio method, using a form of logistic regression, a common statistical technique, allows us to shed additional light on the underlying physical and statistical properties of the method. For example, we demonstrate that the acceptance ratio method yields the lowest variance for any estimator of the free energy which is unbiased in the limit of large numbers of measurements.
Michael R. Shirts, Jed W. Pitera, William C. Swope, and Vijay S. Pande. Journal of Chemical Physics (2003)
ABSTRACT: Quantitative free energy computation involves both using a model that is sufficiently faithful to the experimental system under study (accuracy) and establishing statistically meaningful measures of the uncertainties resulting from finite sampling (precision). We use large-scale distributed computing to access sufficient computational resources to extensively sample molecular systems and thus reduce statistical uncertainty of measured free energies. In order to examine the accuracy of a range of common models used for protein simulation, we calculate the free energy of hydration of 15 amino acid side chain analogs derived from recent versions of the OPLS-AA, CHARMM, and AMBER parameter sets in TIP3P water using thermodynamic integration. We achieve a high degree of statistical precision in our simulations, obtaining uncertainties for the free energy of hydration of 0.02-0.05 kcal/mol, which are in general an order of magnitude smaller than those found in other studies. Notably, this level of precision is comparable to that obtained in experimental hydration free energy measurements of the same molecules. Root mean square differences from experiment over the set of molecules examined using AMBER-, CHARMM-, and OPLS-AA-derived parameters were 1.35 kcal/mol, 1.31 kcal/mol, and 0.85 kcal/mol, respectively. Under the simulation conditions used, these force fields tend to uniformly underestimate solubility of all the side chain analogs. The relative free energies of hydration between amino acid side chain analogs were closer to experiment but still exhibited significant deviations. Although extensive computational resources may be needed for large numbers of molecules, sufficient computational resources to calculate precise free energy calculations for small molecules are accessible to most researchers.
Bojan Zagrovic and Vijay S. Pande. Journal of Computational Chemistry (2003)
ABSTRACT: By using distributed computing techniques and a supercluster of more than 20,000 processors we simulated folding of a 20-residue Trp Cage miniprotein in atomistic detail with implicit GB/SA solvent at a variety of solvent viscosities (g). This allowed us to analyze the dependence of folding rates on viscosity. In particular, we focused on the low-viscosity regime (values below the viscosity of water). In accordance with Kramers’ theory, we observe approximately linear dependence of the folding rate on 1/g for values from 1-10^(-1) that of water viscosity. However, for the regime between 10^(-4) – 10^(-1) that of water viscosity we observe power-law dependence of the form k ~ g^(-1/5). These results suggest that estimating folding rates from molecular simulations run at low viscosity under the assumption of linear dependence of rate on inverse viscosity may lead to erroneous results.
Eric J. Sorin, Young Min Rhee, Bradley J. Nakatani & Vijay S. Pande. Biophysical Journal (2003)
ABSTRACT: The helical hairpin is one of the most ubiquitous and elementary secondary structural motifs in nucleic acids, capable of serving functional roles and participating in long-range tertiary contacts. Yet the self-assembly of these structures has not been well-characterized at the atomic level. With this in mind, the dynamics of nucleic acid hairpin formation and disruption have been studied using a novel computational tool: large-scale, parallel, atomistic molecular dynamics simulation employing an inhomogeneous distributed computer consisting of more than 40,000 processors. Using multiple methodologies, over 500 ms of atomistic simulation time has been collected for a large ensemble of hairpins (sequence 5′- GGGC[GCAA]GCCU-3′), allowing characterization of rare events not previously observable in simulation. From uncoupled ensemble dynamics simulations in unperturbed folding conditions, we report on 1), competing pathways between the folded and unfolded regions of the conformational space; 2), observed non-native stacking and basepairing traps; and 3), a helix unwinding-rewinding mode that is differentiated from the unfolding and folding dynamics. A heterogeneous transition state ensemble is characterized structurally through calculations of conformer-specific folding probabilities and a multiplexed replica exchange stochastic dynamics algorithm is used to derive an approximate folding landscape. A comparison between the observed folding mechanism and that of a peptide b-hairpin analog suggests that although native topology defines the character of the folding landscape, the statistical weighting of potential folding pathways is determined by the chemical nature of the polymer.
Young Min Rhee & Vijay S. Pande. Biophysical Journal (2003)
ABSTRACT: Simulating protein folding thermodynamics starting purely from a protein sequence is a grand challenge of computational biology. Here, we present an algorithm to calculate a canonical distribution from molecular dynamics simulation of protein folding. This algorithm is based on the replica exchange method where the kinetic trapping problem is overcome by exchanging noninteracting replicas simulated at different temperatures. Our algorithm uses multiplexed-replicas with a number of independent molecular dynamics runs at each temperature. Exchanges of configurations between these multiplexed-replicas are also tried, rendering the algorithm applicable to large-scale distributed computing (i.e., highly heterogeneous parallel computers with processors having different computational power). We demonstrate the enhanced sampling of this algorithm by simulating the folding thermodynamics of a 23 amino acid miniprotein. We show that better convergence is achieved compared to constant temperature molecular dynamics simulation, with an effcient scaling to large number of computer processors. Indeed, this enhanced sampling results in (to our knowledge) the first example of a replica exchange algorithm that samples a folded structure starting from a completely unfolded state.
Christopher D. Snow, Bojan Zagrovic, and Vijay S. Pande. Journal of the Americal Chemical Society (2002)
ABSTRACT: A number of rapidly folding proteins have been characterized in recent years.1 These small proteins can provide the first direct comparisons between simulated and experimental protein folding kinetics and pathways. Proteins have been characterized through thermodynamic sampling methods, unfolding simulations, and folding simulations using simple potentials. Here, as described recently, we use several thousand stochastic dynamics simulations in a generalized-Born implicit solvent (in atomic detail) to simulate the folding dynamics of the Trp cage mini-protein under experimental conditions (27 °C with full solvent viscosity,) 91 ps-1). The Folding@home distributed computing project was used to generate an aggregate simulation time of ~100 us (~250 CPU years). First we capture the rapid relaxation from an extended starting condition to a relaxed unfolded state ensemble of thousands of conformations. With continued simulation, a small fraction of these simulations reach the folded state. Furthermore, the topology of the collapsed unfolded state closely resembles the native state.
Christopher D. Snow, Houbi Ngyen, Vijay S. Pande, and Martin Gruebele. Nature (2002)
ABSTRACT: Protein folding is difficult to simulate with classical molecular dynamics. Secondary structure motifs such as -helices and -hairpins can form in 0.1-10 (ref. 1), whereas small proteins have been shown to fold completely in tens of microseconds. The longest folding simulation to date is a single 1- s simulation of the villin headpiece; however, such single runs may miss many features of the folding process as it is a heterogeneous reaction involving an ensemble of transition states. Here, we have used a distributed computing implementation to produce tens of thousands of 5-20-ns trajectories (700s) to simulate mutants of the designed mini-protein BBA5. The fast relaxation dynamics these predict were compared with the results of laser temperature-jump experiments. Our computational predictions are in excellent agreement with the experimentally determined mean folding times and equilibrium constants. The rapid folding of BBA5 is due to the swift formation of secondary structure. The convergence of experimentally and computationally accessible timescales will allow the comparison of absolute quantities characterizing in vitro and in silico (computed) protein folding.
Bojan Zagrovic, Christopher D. Snow, Siraj Khaliq, Michael R. Shirts, and Vijay S. Pande. Journal of Molecular Biology (2002)
ABSTRACT: The nature of the unfolded state plays a great role in our understanding of proteins. However, accurately studying the unfolded state with computer simulation is difficult, due to its complexity and the great deal of sampling required. Using a supercluster of over 10,000 processors we have performed close to 800 ms of molecular dynamics simulation in atomistic detail of the folded and unfolded states of three polypeptides from a range of structural classes: the all-alpha villin headpiece molecule, the beta hairpin tryptophan zipper, and a designed alpha-beta zinc finger mimic. A comparison between the folded and the unfolded ensembles reveals that, even though virtually none of the individual members of the unfolded ensemble exhibits native-like features, the mean unfolded structure (averaged over the entire unfolded ensemble) has a native-like geometry. This suggests several novel implications for protein folding and structure prediction as well as new interpretations for experiments which find structure in ensemble-averaged measurements.
Bojan Zagrovic, Christopher D. Snow, Michael R. Shirts, and Vijay S. Pande. Journal of Molecular Biology (2002)
ABSTRACT: By employing thousands of PCs and new worldwide-distributed computing techniques, we have simulated in atomistic detail the folding of a fastfolding 36-residue a-helical protein from the villin headpiece. The total simulated time exceeds 300 ms, orders of magnitude more than previous simulations of a molecule of this size. Starting from an extended state, we obtained an ensemble of folded structures, which is on average 1.7 and 1.9 away from the native state in Ca distance-based root-meansquare deviation (dRMS) and Cb dRMS sense, respectively. The folding mechanism of villin is most consistent with the hydrophobic collapse view of folding: the molecule collapses non-specifically very quickly (20 ns), which greatly reduces the size of the conformational space that needs to be explored in search of the native state. The conformational search in the collapsed state appears to be rate-limited by the formation of the aromatic core: in a significant fraction of our simulations, the C-terminal phenylalanine residue packs improperly with the rest of the hydrophobic core. We suggest that the breaking of this interaction may be the rate-determining step in the course of folding. On the basis of our simulations we estimate the folding rate of villin to be approximately 5 ms. By analyzing the average features of the folded ensemble obtained by simulation, we see that the mean folded structure is more similar to the native fold than any individual folded structure. This finding highlights the need for simulating ensembles of molecules and averaging the results in an experiment-like fashion if meaningful comparison between simulation and experiment is to be attempted. Moreover, our results demonstrate that (1) the computational methodology exists to simulate the multi-microsecond regime using distributed computing and (2) that potential sets used to describe interatomic interactions may be sufficiently accurate to reach the folded state, at least for small proteins. We conclude with a comparison between our results and current protein-folding theory.
Stefan M. Larson, Christopher D. Snow, Michael R. Shirts, and Vijay S. Pande. To appear in Computational Genomics, Richard Grant, editor, Horizon Press, (2002)
ABSTRACT: For decades, researchers have been applying computer simulation to address problems in biology. However, many of these grand challenges in computational biology, such as simulating how proteins fold, remained unsolved due to their great complexity. Indeed, even to simulate the fastest folding protein would require decades on the fastest modern CPUs. Here, we review novel methods to fundamentally speed such previously intractable problems using a new computational paradigm: distributed computing. By efficiently harnessing tens of thousands of computers throughout the world, we have been able to break previous computational barriers. However, distributed computing brings new challenges, such as how to efficiently divide a complex calculation of many PCs that are connected by relatively slow networking. Moreover, even if the challenge of accurately reproducing reality can be conquered, a new challenge emerges: how can we take the results of these simulations (typically tens to hundreds of gigabytes of raw data) and gain some insight into the questions at hand. This challenge of the analysis of the sea of data resulting from large-scale simulation will likely remain for decades to come.
Vijay Pande, et al. Peter Kollman Memorial Issue, Biopolymers (2002)
ABSTRACT: Atomistic simulations of protein folding have the potential to be a great complement to experimental studies, but have been severely limited by the time scales accessible with current computer hardware and algorithms. By employing a worldwide distributed computing network of tens of thousands of PCs and algorithms designed to efficiently utilize this new many-processor, highly heterogeneous, loosely coupled distributed computing paradigm, we have been able to simulate hundreds of microseconds of atomistic molecular dynamics. This has allowed us to directly simulate the folding mechanism and to accurately predict the folding rate of several fast-folding proteins and polymers, including a nonbiological helix, polypeptide a-helices, a b-hairpin, and a three-helix bundle protein from the villin headpiece. Our results demonstrate that one can reach the time scales needed to simulate fast folding using distributed computing, and that potential sets used to describe interatomic interactions are sufficiently accurate to reach the folded state with experimentally validated rates, at least for small proteins.
Bojan Zagrovic, Eric J. Sorin, and Vijay Pande, Journal of Molecular Biology (2001)
ABSTRACT: We have used distributed computing techniques and a supercluster of thousands of computer processors to study folding of the C-terminal b-hairpin from protein G in atomistic detail using the GB/SA implicit solvent model at 300 K. We have simulated a total of nearly 38 ms of folding time and obtained eight complete and independent folding trajectories. Starting from an extended state, we observe relaxation to an unfolded state characterized by non-specific, temporary hydrogen bonding. This is followed by the appearance of interactions between hydrophobic residues that stabilize a bent intermediate. Final formation of the complete hydrophobic core occurs cooperatively at the same time that the final hydrogen bonding pattern appears. The folded hairpin structures we observe all contain a closely packed hydrophobic core and proper b-sheet backbone dihedral angles, but they differ in backbone hydrogen bonding pattern. We show that this is consistent with the existing experimental data on the hairpin alone in solution. Our analysis also reveals short-lived semi-helical intermediates which denote a thermodynamic trap. Our results are consistent with a three-state mechanism with a single rate-limiting step in which a varying final hydrogen bond pattern is apparent, and semi-helical off-pathway intermediates may appear early in the folding process. We include details of the ensemble dynamics methodology and a discussion of our achievements using this new computational device for studying dynamics at the atomic level.
Michael R. Shirts and Vijay Pande, Physical Review Letters (2001)
ABSTRACT: A set of parallel replicas of a single simulation can be statistically coupled to closely approximate long trajectories. In many cases, this produces nearly linear speedup over a single simulation (M times faster with M simulations), rendering previously intractable problems within reach of large computer clusters. Interestingly, by varying the coupling of the parallel simulations, it is possible in some systems to obtain greater than linear speedup. The methods are generalizable to any search algorithm with long residence times in intermediate states.
Summary: Is distributed computing a fundamental advance or simply fashionable computing? In this brief letter, we show how distributed computing can be used to tackle problems which make even supercomputers quake. Indeed, we show how distributed computing has the ability to create a supercomputer thousands of times more powerful than any existing machine, due the large number of processors on the internet (hundreds of millions) and the relatively small number of computer processors in supercomputers (thousands).