Gene therapy could cure genetic diseases, but it remains a challenge to package and deliver new genes to specific cells safely and effectively. Existing methods for designing one of the most widely used gene delivery vehicles, adeno-associated viruses (AAV), are often slow and inefficient.
Now, researchers at the Broad Institute of MIT and Harvard have developed a machine-learning method that promises to speed up the engineering of AAVs for gene therapy. The tool helps researchers design AAVs’ protein shells, called capsids, to have multiple desirable characteristics, such as the ability to deliver cargo to a specific organ but not others or to function in multiple species. Other methods only look for capsids that have one characteristic at a time.
The team used their method to design capsids for a commonly used type of AAV called AAV9 that targeted the liver more efficiently and could be easily manufactured. They found that about 90 percent of the capsids predicted by their machine learning models successfully delivered their cargo to human liver cells and met five other key criteria. They also found that their machine learning model correctly predicted protein behavior in macaque monkeys even though it was only trained on data from human and mouse cells. This finding suggests that the new method could help scientists more quickly design AAVs that work across species, which is essential for translating gene therapies to humans.
The findings, which recently appeared in Nature Communicationscomes from the lab of Ben Deverman, a staff scientist at the institute and director of vector engineering at the Broad’s Stanley Center for Psychiatric Research. Fatma-Elzahraa Eid, a senior machine learning scientist in Deverman’s group, was the paper’s first author.
“This was a really unique approach,” Deverman said. “It highlights the importance of lab biologists working with machine learning scientists from the beginning to design experiments that generate data that enables machine learning, rather than doing so as an afterthought.”
Group leader Ken Chan, graduate student Albert Chen, research associate Isabelle Tobey and scientific advisor Alina Chan, all in Deverman’s lab, also contributed significantly to the study.
Make way for the machines
Traditional approaches to designing AAVs involve generating large libraries containing millions of capsid protein variants and then testing them in cells and animals in multiple rounds of selection. This process can be expensive and time-consuming, and typically results in researchers identifying only a handful of capsids that have a specific trait. This makes it difficult to find capsids that meet multiple criteria.
Other groups have used machine learning to speed up large-scale analysis, but most methods optimized proteins for one function at the expense of another.
Deverman and Eid realized that existing, large AAV library-based datasets weren’t well-suited for training machine learning models. “Rather than just taking data and giving it to machine learning scientists, we thought, ‘What do we need to train machine learning models better?’” Eid said. “Figuring that out was really game-changing.”
They first used an initial round of machine learning modeling to generate a new, moderately sized library, called Fit4Function, that contained capsids that were predicted to package genetic cargo well. The team screened the library in human cells and mice to find capsids that had specific functions important for gene therapy in each species. They then used that data to build multiple machine learning models that could predict a certain function from a capsid’s amino acid sequence. Finally, they used the models in combination to create “multifunctional” AAV libraries optimized for multiple traits at once.
The future of protein design
As a proof of concept, Eid and other researchers in Deverman’s lab combined six models to design a library of capsids that had multiple desired functions, including the ability to be manufactured and targeted to the liver in human and mouse cells. Nearly 90 percent of these proteins exhibited all of the desired functions simultaneously.
The researchers also found that the model, trained solely on data from mice and human cells, correctly predicted how AAVs distribute across different organs in macaques, suggesting that these AAVs do so through a mechanism that carries across species. That could mean that in the future, gene therapy researchers could more quickly identify capsids with multiple desirable properties for human use.
In the future, Eid and Deverman say their models could help other groups create gene therapies that target or specifically avoid the liver. They also hope other labs will use their approach to generate models and libraries of their own that, taken together, could form a machine learning atlas — a resource that could predict the performance of AAV capsids across dozens of features to speed gene therapy development.