AI

AI shown to dramatically speed up protein engineering

author
3 minutes, 45 seconds Read


Protein engineering scientists have been able to use machine learning to design more efficient proteins for performing biochemical tasks. This approach shortened a process that typically takes months to years of trial and error. 

The image at left depicts rat neurons expressing GCaMP protein in the low calcium condition, and at right, the same GCaMP expressing neurons in a high calcium condition.

The image at left depicts rat neurons expressing GCaMP protein in the low calcium condition, and at right, the same GCaMP expressing neurons in a high calcium condition. Image credit: UW Medicine

Researchers currently design proteins by introducing mutations into a protein’s amino acid sequence in the hope that those mutations will give the protein a desired property or function. They then must repeatedly test the resulting mutant proteins in the lab. 

“We were able to circumvent the trial-and-error approach by using machine learning algorithms to predict how different mutations would perform in the real world,” said Sarah Wait, first author of the paper describing the approach. The findings appear in the journal Nature Computational Science. 

Wait is a graduate student in the lab of Andre Berndt, assistant professor in the University of Washington Department of Bioengineering and the paper’s senior author. 

“The impact of this study will be huge,” Berndt said. “It’s going to shift protein engineering away from the hit-and-miss approaches and spur people to invest more into machine learning and other computational approaches.”

In the study, the researchers focused on a protein called GCaMP, which stands for Genetically Encoded Calcium Indicator Modified for Phototransduction. GCaMP — pronounced “G-camp” — fluoresces when it binds to calcium.

This property makes it ideal for studying nerve cell function, because when nerve cells fire, the calcium levels within them rise. When GCaMP is introduced into nerve cells, the cells fluoresce when they fire. This makes it possible to monitor and analyze neuronal activity in real time.

The researchers used a machine learning approach in which algorithms are trained on existing data to create statistical models that can then be used to analyze new data. 

In this case, the researchers trained three algorithms on the amino acid sequences of more than 1,000 versions of GCaMP whose properties were known. Some sequences were known to be more efficient and some less. This allowed the algorithms to develop statistical models that could identify sequences that were likely to be more efficient. 

Other than the sequences of the different versions of the protein, no other details, such as information about the structure of a protein, were provided to the algorithm.

The researchers combined the results of the three algorithms to increase the chances the results were reliable. 

“How each algorithm learned and formulated its prediction was different, so we figured if all three agreed on something there must be something to it,” Wait said.

Then the trained algorithms analyzed the sequences of 1,423 versions of GCaMP whose properties were unknown, and predicted which would likely be the most efficient.

“These were sequences we had never tested, and we didn’t have a clue whether any would be more effective,” Wait said.

The properties they sought in the candidate GCaMP proteins were the ability to fluoresce brightly when exposed to calcium but then quickly dim and be ready to fluoresce again. These properties are ideal because they allow GCamP to reveal the activity of rapidly firing neurons more accurately. 

“You want a bigger, brighter signal that quickly turns off,” Wait said.

The machine learning algorithms identified three promising versions of GCaMP. Subsequent laboratory testing found that all three were brighter and faster than any previously reported GCaMP proteins. One variant, called eGCAMP2+, was twice as bright as the versions currently considered state of the art. 

The approach can be used to study any protein, not just GCaMP, Wait noted. 

“It’s unbiased, meaning it doesn’t care what mutation dataset it sees, so researchers can use this platform to evaluate other proteins of interest,” she said.

In a matter of months, Berndt noted, the algorithms were able to discover significantly improved G-CaMP versions that had taken more than 20 years and a substantial investment in resources to develop. 

Source: University of Washington





Source link

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *