Google-owned artificial intelligence lab DeepMind have tackled one of the principal challenges in molecular biology – the determination of protein structure. Best-known for the AI programs that have defeated humans in chess, Go and Starcraft II, DeepMind have now developed a deep-learning program, called AlphaFold, that can determine the 3D shape of a protein from its amino acid sequence alone.
Proteins are large, complex molecules that underpin nearly all biological processes, from transporting oxygen to cells and detecting disease-causing pathogens to converting food into energy and producing more proteins. The information required for their synthesis is encoded in our DNA, which is then translated into long chains of amino acids which twist, bend and curve to form a 3D structure. Each protein has a unique structure, which determines how the protein can ‘move’ and ‘change’ and consequently, decides its function.
Therefore, predicting how these amino acid chains fold plays a significant part in understanding how the protein works. Unfortunately, while more than 200 million proteins have been discovered by scientists to date, less than 1% have had their structures solved. This immense challenge of creating a model of protein structure is illustrated by Levinthal’s paradox. This thought experiment postulates that if a protein were to adopt its correct shape by sequentially testing every possible conformation, it would take longer than the age of the known universe for the protein to reach its functional 3D structure, yet proteins fold spontaneously within milliseconds.
And this is not a new challenge: scientists have been working on the protein folding problem for decades. Experimental techniques such as X-ray crystallography, NMR spectroscopy and cryo-EM are used to solve protein structures in the lab, determining the position of each atom in the protein molecule in relation to all other atoms. However, these methods often require a long optimisation period and can be very expensive. Furthermore, some proteins prove exceptionally challenging for traditional methods – they can be difficult to purify in the lab, too unstable to work with or too flexible to be able to determine the location of some of their constituent atoms. Consequently, predicting the structure of a protein just from its amino acid sequence is a huge advancement in the field of structural biology.
The alphafold models are unprecedented in their accuracy, surpassing any previous models
In order to learn the principles which guide protein folding, the DeepMind team trained their algorithm on a public database containing 170,000 protein structures. Deep neural networks are employed to predict the properties of the protein from its amino acid sequence by predicting the distance between both pairs of amino acids and the angles between the chemical bonds that connect them. The deep learning algorithms are combined with an ‘attention algorithm’, which imitates a common jigsaw-solving strategy by connecting small clusters of amino acids before finding ways to join the structures of the clusters together into a complete protein structure. Finally, this information is used to generate a model of what the protein should look like. The AlphaFold models are unprecedented in their accuracy, surpassing the achievements of any previous computationally generated protein models.
In 2020, AlphaFold demonstrated its abilities by winning the biennial global competition, CASP (Critical Assessment of Protein Structure Prediction). The competition entails prediction of the structure of around 100 proteins, whose structure has been solved by traditional structural biology methods (but these are not made publicly available). Accuracy in comparison to the experimentally solved structures is marked on a 100-point scale, with a score of 90 being regarded as comparable to experimental techniques. AlphaFold determined the structure of around two thirds of the analysed proteins with accuracy comparable to experimental methods, and when ranked across all analysed proteins, AlphaFold had a median score of 92.5 out of 100. While AlphaFold also won CASP in 2018, this year it significantly outperformed other teams in each structure.
In a blog post, following the publication of a Nature study, the DeepMind researchers wrote: “It’s exciting to see these early signs of progress in protein folding, demonstrating the utility of AI for scientific discovery. Even though there’s a lot more work to do before we’re able to have a quantifiable impact on treating diseases, managing waste, and more, we know the potential is enormous.”
Looking forward, the DeepMind team envisions a future in which their software could facilitate the design of new drugs and improve our understanding of diseases caused by protein mutations and protein misfolding, such as cancer, Alzheimer’s and Parkinson’s disease. Predicted protein structures could also be key in conceiving new therapies against human pathogens which rely on interactions between microbial and host proteins to establish infection. Indeed, DeepMind has already started collaborations with a number of scientific groups, working on parasitic diseases like malaria, sleeping sickness and leishmaniasis. A deep understanding of how proteins fold could also assist in the design of novel proteins with functions that are beneficial for society. For example, biodegradable enzymes could be designed to help the breakdown of waste and pollutants such as plastic and oil in a more environmentally friendly way.
Ultimately, while AlphaFold has a long way to go before it can truly compete with conventional experimental techniques, this is a huge step forward in the field of structural biology. And the DeepMind team already have their eyes set on the next challenge – figuring out how proteins interact to form larger complexes and how proteins interact with other biological molecules, such as lipids and nucleic acids.