AI for Cryptic Pockets

Cryptic pockets are of significant interest for drug discovery. These pockets are closed in the snapshots scientists derive from experiments of what a protein typically looks like. However, in simulations, we can see these pockets spontaneously open and close due to protein dynamics.

Folding@home is an amazing tool for finding these pockets. We’ve literally found them in hundreds of proteins at this point and have experimentally confirmed the existence of some of the pockets we found most exciting.

All the simulation data we’ve generated presents a great opportunity for artificial intelligence (AI).

We previously published an algorithm called PocketMiner that learned to predict where cryptic pockets are likely to form in a single input structure after learning about pocket opening from all of our simulation data. This algorithm has been a powerful tool!

PocketMiner still has some limitations though. The way the algorithm is designed, each part of the protein is only really aware of other nearby parts of the protein. However, we know many proteins have a feature called allostery, wherein distant parts of the protein are somehow able to influence each other.

Allostery is a strange phenomenon that doesn’t have good analogies. One of the best I can offer is to ask you to imagine that when you closed the front door of your house/apartment, a couple of windows elsewhere tended to pop open. That would raise lots of questions, from “how did that happen?” to “can I repurpose this to do useful things, like start dinner when I flop on my sofa?”

To capture allostery, we added a component called attention to PocketMiner. The attention mechanism adds a potential connection between every part of the protein. It doesn’t just connect everything up though. Instead, it learns which parts of likely to have meaningful connections and ignores other potential connections. That lets the algorithm find allostery and show us where it’s likely to occur.

In our new preprint, we show that attention-enabled PocketMiner (AE-PocketMiner) does a few things. First, it improves how well we can predict cryptic pockets by taking long-range allosteric effects into account that the original algorithm was blind to. Second, it allows us to predict what parts of a protein are in allosteric communication with a predicted pocket.

Model architecture of AE-PocketMiner. The model converts the input structure into a graph-based representation of a protein structure with nodes corresponding to residues. Nodes have features like backbone dihedral angles and edges have features encoding inter-residue relationships like radial basis functions of Cα-Cα distances (See Methods). These features are processed through geometric vector perceptron layers followed by four message passing layers that update each residue’s embeddings based on neighboring residues and connecting edges. To better capture both local and long-range residue interactions, a two-head attention block (gray dashed box) is introduced after the message passing and transformation layers. This block includes layer normalization and a residual connection that adds the original embeddings to the attended outputs. The combined embeddings are passed through a feedforward network (FFN), followed by a sigmoid activation function to predict per-residue cryptic pocket likelihood. An example prediction is shown (left middle), where residues are colored according to their predicted cryptic pocket probabilities (blue to red). The location of a known pocket is denoted with two black arrows (in the open state, the red residues in the !-loop and adjacent 238-loop shift to create the opening). The 15 residues that are predicted to have the strongest allosteric coupling to the cryptic pocket based on their attention scores are shown as spheres

The insight into allostery opens up a range of possibilities. One big question with cryptic pockets is whether targeting them with a therapeutic is likely to do anything useful. The improved PocketMiner now tells us if a cryptic pocket is likely to have allosteric control over any parts of the protein that are important for its function. Pockets that have influence over functional sites are much more likely to be useful drug targets.

We can also go the other way, asking where else on a protein we could make changes that would control the probability that a cryptic pocket is open. This turns out to be very useful for designing experiments to help find compounds that target cryptic pockets and could eventually be developed into therapeutics.