The goal of precision medicine is to utilize our knowledge of the molecular causes of disease to better diagnose and treat patients. In the precision medicine framework, diseases are subdivided by their underlying etiologies, and treatment is based on a patient’s unique genetic background, unlike the current state of medicine where most patients with a given diagnosis are treated the same way. Precision medicine has the potential to substantially improve response rates to therapies and reduce unwanted side effects. There has been some early success in adopting precision medicine, as in precision oncology and pharmacogenomics.
However, significant hurdles remain to fully realizing the promise of precision medicine. Firstly, the number of variants that need to be annotated is daunting. To use just protein coding missense variants as an example, there are almost 500,000 missense variants of unknown clinical significance in the ClinVar repository of genetic variants. Additionally, even when patient carry pathogenic variants, there are rarely specific molecular therapies that treat their particular disease process. Precision medicine requires an arsenal of drugs that is much larger than the current pool of FDA-approved therapies.
Computational modeling has the potential to bridge the mismatch between the low numbers of annotated variants and FDA-approved drugs and the needs of precision medicine. Experiments from biochemical assays to animal models can provide insight into how a variant affects function at different scales. However, these experiments are typically low throughput and require substantial time and effort. More high throughput approaches, like deep mutational scanning, can exhaustively determine how mutations in a protein affect its function, but they are difficult and expensive to. Similarly, high throughput screening with purified proteins, cell lines, or complex organoids can uncover new lead compounds. However, lead compounds revealed in these assays are often difficult to improve without a detailed understanding of how a compound binds its target. For these reasons, there has been great interest in using computational modeling to improve variant interpretation and drug discovery. In principle, computational modeling can be used to predict the functional impact of a very large number of variants or screen large libraries of drug candidates.
To date, the field has focused primarily on solving structures of proteins and using these to understand sequence–function relationships and discover new drugs. Biophysicists have typically placed great weight on all-atom models of a protein structure, which are typically generated through x-ray crystallography or, increasingly, cryogenic electron. More recently, with the emergence of highly accurate predictive models of protein structures like AlphaFold, we now have access to reliable structures for nearly all human proteins. It has long been suggested that protein structure can inform which variants are likely to be pathogenic. After all, variants that fall in functionally relevant parts of a protein (e.g., an active site) may be more likely to have deleterious consequences. In theory, a single structure could aid in the interpretation of all missense variants that affect a given protein. Additionally, structure-based drug design offers the tantalizing promise of rational drug design. By docking small molecules against experimental or predicted structures, it should be possible to identify novel drugs against targets identified in population genetic studies.
However, single structures have substantial limitations that constrain their utility for variant interpretation and drug discovery. Combining machine learning with protein structure to predict variant pathogenicity has shown substantial promise, but even the best-performing models fail in many cases and usually do not distinguish between activating and inactivating mutations. An illustrative example comes from myosin motors, a class of proteins frequently mutated in human disease. To predict the effects of mutations based on a single structure, one typically uses heuristics, like assuming that mutations at nearby sites have similar effects. However, in many cases, myosin mutations cause opposite phenotypes (i.e., hypertrophic cardiomyopathy vs. dilated cardiomyopathy) but are found at neighboring residues, or even at the same residue. Hence, a structure can provide some clues as to how a variant will affect function, but the predictive power of this approach is limited.
Similarly, rational drug design methods largely assume that proteins adopt a single structure, which is greatly limiting in several ways. Firstly, this assumption limits drug design to inhibiting proteins by identifying compounds that bind key functional sites, thereby physically blocking the protein from performing functions like catalysis or binding other proteins. It is all but impossible to imagine designing a drug to enhance a desirable function if proteins are essentially rigid bodies. Moreover, many proteins must be written off as undruggable because their structures lack pockets where an inhibitor has the potential to bind tightly enough to serve as a valuable drug. Finally, current computational drug design methods struggle to quantitatively predict protein–ligand binding affinities, suggesting there is a fatal flaw in the single structure assumption.
A long-standing hypothesis is that accounting for the entire ensemble of structures a protein adopts in solution would be vastly superior to assuming a single structure encodes all the relevant information. For example, sequence variation can produce distinct biochemical phenotypes by modulating the relative probabilities of an ensemble of conformations primed for different functional roles. There is a growing body of evidence for this. For example, variants that cause increases in the probability of structures with a favorable alignment of catalytic residues lead to elevated catalytic efficiencies. Similarly, within protein families, differences in the distributions of conformations adopted in simulation can predict functional differences, even when crystal structures and phylogeny cannot. Thus, our ability to predict which patient missense variants are pathogenic is likely to improve when we explicitly consider protein ensembles.
Our ability to discover new drugs will also greatly improve thanks to an ensemble perspective. A protein–ligand affinity is an ensemble measurement, which reflects contributions from each state in the protein structural ensemble and that of the ligand. Thus, by incorporating knowledge of proteins’ conformational ensembles, we may be able to finally develop universally accurate methods for predicting protein–ligand affinity. Additionally, we may be able to design specific allosteric modulators of proteins that are currently considered undruggable but may form cryptic pockets in their excited states.
Computer simulations on Folding@home are providing a general and scalable means to take an ensemble perspective toward sequence–function relationships and drug discovery, thereby enabling a physics-based approach to precision medicine. In addition to facilitating traditional drug discovery approaches, this perspective is also opening novel opportunities. For example, cryptic pockets that are absent in structural snapshots of a protein but form due to protein dynamics are providing new targets for drug discovery.