Skip to content

With generative, chemicals quickly calculate 3D genomic structures

Each cell of your body contains the same genetic sequence, but each cell expresses only a subset of those genes. These specific gene expression patterns, which ensure that a brain cell is different from a skin cell, are determined in part by the three -dimensional structure of genetic material, which controls the accessibility of each gene.

MIT chemists have now created a new way to determine those 3D genome structures, using generative artificial intelligence. Its technique can predict thousands of structures in just minutes, so it is much faster than existing experimental methods to analyze structures.

Using this technique, researchers could more easily study how the 3D organization of the genome affects the patterns and functions of gene expression of individual cells.

“Our goal was to try to predict the structure of the three -dimensional genome of the underlying DNA sequence,” says Bin Zhang, associate professor of chemistry and main author of the study. “Now that we can do that, what this technique puts along with the experimental avant -garde techniques, can really open many interesting opportunities.”

The graduated students of MIT Greg Schuette and Zhuohan Lao are the main authors of the article, which appears today in Scientific advances.

Of sequence to structure

Within the cell nucleus, DNA and proteins form a complex called chromatin, which has several levels of organization, allowing the cells to cover 2 meters of DNA in a nucleus that is only one hundredth of a millimeter in diameter. Long DNA wind threads around proteins called histones, giving rise to a structure somewhat like accounts on a rope.

Chemical labels known as epigenetic modifications can be united to DNA in specific locations, and these labels, which vary according to the type of cell, affect the folding of chromatin and accessibility of nearby genes. These differences in the conformation of chromatin help determine what genes are expressed in different types of cells, or at different times within a given cell.

In the last 20 years, scientists have developed experimental techniques to determine chromatin structures. A widely used technique, known as HI-C, works by joining the neighboring DNA threads in the cell nucleus. Then, researchers can determine what segments are close to other crushing DNA in many small pieces and sequenced it.

This method can be used in large cell populations to calculate an average structure for a chromatin section, or in individual cells to determine structures within that specific cell. However, HI-C techniques and similar are intensive in labor, and can take approximately a week generating a cell data.

To overcome these limitations, Zhang and his students developed a model that takes advantage of recent advances in generative AI to create a quick and precise way of predicting chromatin structures in individual cells. The AI ​​model they designed can quickly analyze the DNA sequences and predict chromatin structures that these sequences could produce in a cell.

“Deep learning is really good in patterns recognition,” says Zhang. “It allows us to analyze very long DNA segments, thousands of base pairs and discover what is the important information encoded in those DNA base pairs.”

The chromogen, the model created by researchers, has two components. The first component, a deep learning model taught to “read” the genome, analyzes the information encoded in the underlying DNA sequence and chromatin accessibility data, the latter is widely available and specific cell type.

The second component is a generative the AI ​​model that predicts physically precise chromatin conformations, since it has been trained in more than 11 million chromatin conformations. These data were generated from experiments using DIP-C (a variant of HI-C) in 16 cells of a human lymphocyte line.

When integrated, the first component informs the generative model how the specific cell environment influences the formation of different chromatin structures, and this scheme effectively captures sequence structure relationships. For each sequence, researchers use their model to generate many possible structures. This is because DNA is a very messy molecule, so a single DNA sequence can lead to many different possible conformations.

“An important complication factor to predict the structure of the genome is that there is no unique solution to which we point. There is a distribution of structures, regardless of what part of the genome is seeing. Predict the same complicated and high -dimension statistical distribution It’s something incredibly difficult to do, “says Schuette.

Quick analysis

Once trained, the model can generate predictions in a much faster time scale than HI-C or other experimental techniques.

“While six months can spend experiments to obtain some dozens of structures in a given cell type, it can generate a thousand structures in a particular region with our model in 20 minutes in just one GPU,” says Schuette.

After training their model, the researchers used it to generate structure predictions for more than 2,000 DNA sequences, then compared them with the experimentally determined structures for those sequences. They discovered that the structures generated by the model were the same or very similar to those observed in the experimental data.

“In general, we observe hundreds or thousands of conformations for each sequence, and that gives it a reasonable representation of the diversity of the structures that a particular region can have,” says Zhang. “If you repeat your experiment several times, in different cells, it is very likely to end with a very different conformation. That is what our model is trying to predict.”

The researchers also found that the model could make precise predictions for data from the types of cells other than the one trained. This suggests that the model could be useful for analyzing how chromatin structures differ between cell types and how these differences affect their function. The model could also be used to explore different chromatin states that can exist within a single cell, and how these changes affect gene expression.

Another possible application would be to explore how mutations in a particular DNA sequence change the conformation of chromatin, which could shed light on how such mutations can cause disease.

“There are many interesting questions that I think we can address with this type of model,” says Zhang.

Researchers have put all their data and the model available to others who wish to use them.

The research was funded by the National Health Institutes.