Computational chemistry and machine learning continue to play an increasing role in drug discovery. One advance has been the increased accuracy with which we can predict the affinity of a small-molecule drug to its target, using knowledge of its bound pose. Driving these improvements have been advances in simulation-based free energy methods (1,2). Virtual screening methods to predict the bound poses of drugs can now be performed on extremely large scales; the efficiency of these methods for high-throughput screening make it possible to survey much larger chemical spaces that can be explored experimentally (3). Deep learning methods have driven further increases in the accuracy of these methods (4, 5, 6), and new machine learning algorithms (7, 8), along with active learning approaches (9) can help guide the search across large chemical spaces to find molecules that not only bind their targets, but are chemically synthesizable, bioavailable and non-toxic.

A key ingredient in all these efforts is our ability to computationally predict the lipophilicity of small molecules. As the name would imply, a lipophilic drug is one that has an affinity for oils and lipids rather than water. Most small-molecule drugs are somewhat lipophilic; otherwise they wouldn’t be able to partition into the lipids that make up cell membranes and get into cells. Proteins–the main target for most drugs–have binding sites that are usually a more greasy (hydrophobic) environment than the cytosol (about 70% water), which helps drugs bind.

As it turns out, it still remains challenging to predict lipophilicity from first principles. To address this problem, the NIH-funding SAMPL9 LogP Challenge has been organized. The Statistical Assessment of Modeling of Proteins and Ligands (SAMPL) series of challenges bring together many different groups to make blind computational predictions of quantities important in drug discovery, such as protein-ligand binding affinities. Assessments like this are critical for researchers to objectively evaluate the accuracy of their methods, and learn how to make further improvements.

In the recent SAMPL9 LogP challenge, research groups were asked to make blind predictions of the “log P” value for 16 drug-like molecules (10). What is logP? The “P” in “log P” refers to the partition coefficient , which is the ratio of concentrations for which of a solute L partitions into a nonpolar solvent versus a polar solvent (see Figure below), P = ([L] in nonpolar solvent)/(L] in polar solvent). The standard logP measurement used for most solutes is for partitioning for n-octanol and water, although the SAMPL9 LogP challenge was specifically for toluene vs. water.

The “log” in logP refers to the logarithm in base 10, or log₁₀. Consider that the average octanol/water logP value of drugs is around 3. That means that if you add equal parts water and octanol in a flask, added the drug, and shook it up, you would find 10³ = 1000 times more of the drug dissolved in the octanol.

In a recent preprint, Goold et al. describe the results of the Voelz lab’s participation in the SAMPL9 logP challenge, using an alchemical free energy method called expanded ensemble (11, 12) on Folding@home (projects 18465 and 18466). In this method, the solvation free energy of the drug is calculated by disappearing the nonbonded interactions of the solute. Folding@home is a crucial ingredient of this approach, which is ideal for distributed computing. Whereas other free energy methods require many replicas simulated in parallel, the expanded ensemble (EE) calculations can be performed using a single replica, which can be done using a single Folding@home client. By running many EE simulations in parallel, we can gain excellent statistics on how the logP estimate converges, and the estimated uncertainty of our prediction.

To read more about our results and how our predictions stacked up against other SAMPL9 LogP participants, you can read our preprint:

Goold S, Raddi RM, Voelz V. Expanded ensemble predictions of toluene–water partition coefficients in the SAMPL9 LogP challenge. ChemRxiv. 2024;

This work would not be possible without the participants of Folding@home, and we thank them for their support! In the future, we hope that methods like ours will continue to fuel larger efforts to improve drug discovery pipelines, and develop new therapeutics for human disease.

References

Wang, Lingle, Yujie Wu, Yuqing Deng, Byungchan Kim, Levi Pierce, Goran Krilov, Dmitry Lupyan, et al. “Accurate and Reliable Prediction of Relative Ligand Binding Potency in Prospective Drug Discovery by Way of a Modern Free-Energy Calculation Protocol and Force Field.” Journal of the American Chemical Society 137, no. 7 (February 25, 2015): 2695–2703. https://doi.org/10.1021/ja512751q.
Zariquiey, Francesc Sabanes, Raimondas Galvelis, Emilio Gallicchio, John D. Chodera, Thomas E. Markland, and Gianni de Fabritiis. “Enhancing Protein-Ligand Binding Affinity Predictions Using Neural Network Potentials.” arXiv, January 29, 2024. http://arxiv.org/abs/2401.16062.
Gutkin, Evgeny, Filipp Gusev, Francesco Gentile, Fuqiang Ban, S. Benjamin Koby, Chamali Narangoda, Olexandr Isayev, Artem Cherkasov, and Maria G. Kurnikova. “In Silico Screening of LRRK2 WDR Domain Inhibitors Using Deep Docking and Free Energy Simulations.” Chemical Science 15, no. 23 (2024): 8800–8812. https://doi.org/10.1039/D3SC06880C.
McNutt, Andrew T., Paul Francoeur, Rishal Aggarwal, Tomohide Masuda, Rocco Meli, Matthew Ragoza, Jocelyn Sunseri, and David Ryan Koes. “GNINA 1.0: Molecular Docking with Deep Learning.” Journal of Cheminformatics 13, no. 1 (2021): 43. https://doi.org/10.1186/s13321-021-00522-2.
Corso, Gabriele, Hannes Stärk, Bowen Jing, Regina Barzilay, and Tommi Jaakkola. “DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking.” arXiv, February 11, 2023. http://arxiv.org/abs/2210.01776.
Abramson, Josh, Jonas Adler, Jack Dunger, Richard Evans, Tim Green, Alexander Pritzel, Olaf Ronneberger, et al. “Accurate Structure Prediction of Biomolecular Interactions with AlphaFold 3.” Nature 630, no. 8016 (June 13, 2024): 493–500. https://doi.org/10.1038/s41586-024-07487-w.
Morris, Aaron, William McCorkindale, The COVID Moonshot Consortium, Nir Drayman, John D. Chodera, Savaş Tay, Nir London, and Alpha A. Lee. “Discovery of SARS-CoV-2 Main Protease Inhibitors Using a Synthesis-Directed de Novo Design Model.” Chemical Communications 57, no. 48 (2021): 5909–12. https://doi.org/10.1039/D1CC00050K.
Salas-Estrada, Leslie, Davide Provasi, Xing Qiu, Husnu Ümit Kaniskan, Xi-Ping Huang, Jeffrey F. DiBerto, João Marcelo Lamim Ribeiro, Jian Jin, Bryan L. Roth, and Marta Filizola. “De Novo Design of κ-Opioid Receptor Antagonists Using a Generative Deep-Learning Framework.” Journal of Chemical Information and Modeling 63, no. 16 (August 28, 2023): 5056–65. https://doi.org/10.1021/acs.jcim.3c00651.
Thompson, James, W Patrick Walters, Jianwen A Feng, Nicolas A Pabon, Hongcheng Xu, Michael Maser, Brian B Goldman, Demetri Moustakas, Molly Schmidt, and Forrest York. “Optimizing Active Learning for Free Energy Calculations.” Artificial Intelligence in the Life Sciences 2 (December 2022): 100050. https://doi.org/10.1016/j.ailsci.2022.100050.
Zamora, William J., Antonio Viayna, Silvana Pinheiro, Carles Curutchet, Laia Bisbal, Rebeca Ruiz, Clara Ràfols, and F. Javier Luque. “Prediction of Toluene/Water Partition Coefficients in the SAMPL9 Blind Challenge: Assessment of Machine Learning and IEF-PCM/MST Continuum Solvation Models.” Physical Chemistry Chemical Physics 25, no. 27 (2023): 17952–65. https://doi.org/10.1039/D3CP01428B.
Zhang, Si, David F. Hahn, Michael R. Shirts, and Vincent A. Voelz. “Expanded Ensemble Methods Can Be Used to Accurately Predict Protein-Ligand Relative Binding Free Energies.” Journal of Chemical Theory and Computation 17, no. 10 (October 12, 2021): 6536–47. https://doi.org/10.1021/acs.jctc.1c00513.
Hurley, Matthew F. D., Robert M. Raddi, Jason G. Pattis, and Vincent A. Voelz. “Expanded Ensemble Predictions of Absolute Binding Free Energies in the SAMPL9 Host–Guest Challenge.” Physical Chemistry Chemical Physics 25, no. 47 (2023): 32393–406. https://doi.org/10.1039/D3CP02197A.