How a Scientist Taught Chemistry to AlphaFold AI

Artificial intelligence has changed the way science is done by allowing researchers to analyze the vast amounts of data generated by modern scientific tools. You can find a needle in a million haystacks with information and using deep learning, it can learn from the data itself. Artificial intelligence is accelerating progress in gene huntingAnd the medicineAnd the drug design And the Create organic compounds.

Deep learning uses algorithms, often neural networks trained on large amounts of data, to extract information from new data. It is quite different from traditional computing with its step-by-step instructions. Instead, it learns from the data. Deep learning is much less transparent than traditional computer programming, and leaves important questions – what has the system learned, and what does it know?

K chemistry professor I like to design tests that contain at least one difficult question that expands students’ knowledge to determine if they can combine different ideas and synthesize new ideas and concepts. We created such a question for poster child of AI advocate, AlphaFold, that solved a problem protein folding problem.

protein folding

Proteins are present in all living things. They provide cells with structure, catalyze reactions, transport small molecules, digest food, and do much more. They are made up of long chains of amino acids like beads on a string. But for a protein to do its job in a cell, it must twist and bend into a complex three-dimensional structure, a process called protein folding. Unfolded proteins can lead to disease.

In his 1972 Nobel Prize in Chemistry acceptance speech, Christian Anvinsen It is assumed that it should be possible Calculate the 3D structure of a protein from the sequence of its building blocksand amino acids.

Just as the letter order and spacing in this article give meaning and message, so does the arrangement of the amino acids determine the identity and shape of the protein, which leads to its function.

Because of the inherent flexibility of the building blocks of amino acids, a typical protein can rely on estimating 10 to the power of 300 different shapes. That’s a massive number, more than The number of atoms in the universe. However, within a split second, each protein in the organism folds to form its very specific shape – the lowest-energy arrangement of all the chemical bonds that make up a protein. Change just one amino acid into the hundreds of amino acids normally found in protein and it might misfold and not work anymore.

Alpha Fold

For 50 years, computer scientists have tried to solve the problem of protein folding — but with little success. Then in 2016 deep mindan AI subsidiary of parent Google, Alphabet, has launched Alpha Fold a program. used Protein Data Bank As a training set, which contains the experimentally determined structures of more than 150,000 proteins.

In less than five years it was AlphaFold Overcome the protein folding problem—At least the most useful part of it, which is determining the structure of a protein from its amino acid sequence. AlphaFold doesn’t explain how proteins fold so quickly and precisely. It was a huge gain for artificial intelligence, because it not only gained a huge scientific prestige, but was also a great scientific advance that could affect everyone’s life.

Today, thanks to programs like Alpha Fold 2 And the Rose TafoldResearchers like myself can determine the 3D structure of proteins from the amino acid sequences that make up the protein – at no cost – within an hour or two. Before AlphaFold2 we had to crystallize proteins and solve structures using X-ray crystalsa process that took months and cost tens of thousands of dollars per structure.

We now also have access to a file AlphaFold Protein Structure DatabaseDeepmind has deposited the 3D structures of nearly all proteins found in humans, mice, and more than 20 other species. So far they have dissolved over a million buildings and plan to add another 100 million this year alone. Knowledge of proteins has increased dramatically. The structure of half of the known proteins is likely to be documented by the end of 2022, among them many new unique structures associated with new useful functions.

I think like a chemist

AlphaFold2 was not designed to predict how proteins interact with each other, however it was able to model how individual proteins combine They form large complex units made up of multiple proteins. We had a tough question for AlphaFold – did the skeletal training set teach him some chemistry? Can you tell us if the amino acids will interact with each other – which is rare but important?

I am an account chemist interested in it fluorescent proteins. These proteins are found in hundreds of marine organisms such as jellyfish and corals. Her glow can be used to illuminate And the disease study.

There are 578 fluorescent proteins in Protein Data Bank, of which 10 are “broken” and do not shine. Proteins rarely attack themselves, a process called post-translational catalytic modification, and it is very difficult to predict which proteins will interact with themselves and which will not.

Only a chemist with a great deal of knowledge of fluorescent protein would be able to use amino acid sequences to find fluorescent proteins that contain the correct amino acid sequences to undergo the chemical transformations required to make them fluorescent. When we presented AlphaFold2 with sequences of 44 fluorescent proteins not found in the Protein Data Bank, It folded fixed fluorescent proteins differently than cleaved proteins.

The result amazed us: AlphaFold2 learned some chemistry. He figured out which amino acids in fluorescent proteins do the chemistry that makes them glow. We suspect that the Protein Data Bank training set and Multiple sequence alignment Enable AlphaFold2 to “think” like chemists and search for the amino acids required to interact with each other to make the protein shine.

A foldable program that learns some chemistry from a training set also has broader implications. By asking the right questions, what else can be gained from other deep learning algorithms? Can facial recognition algorithms find hidden signs of disease? Could algorithms designed to predict spending patterns among consumers also find a propensity for petty theft or deception? And most importantly, this ability – and Similar leaps in ability In other artificial intelligence systems – desirable?

Mark Zimmer is Professor of Chemistry at Connecticut College.

This article has been republished from Conversation Under a Creative Commons License. Read the original article.