Skip to content

Get dynamic information from static snapshots

Imagine predicting the exact finishing order of the Kentucky Derby from a photograph taken 10 seconds into the race.

That challenge pales in comparison to what researchers face when they use single-cell RNA sequencing (scRNA-seq) to study how embryos develop, cells differentiate, cancers form, and how the immune system reacts.

In an article published today in proceedings of the National Academy of SciencesResearchers from UChicago’s Pritzker School of Molecular Engineering and the Department of Chemistry have created TopicVelo, a powerful new method that uses static snapshots of scRNA-seq to study how cells and genes change over time.

The team took an interdisciplinary collaborative approach, incorporating concepts from classical machine learning, computational biology, and chemistry.

“In terms of unsupervised machine learning, we use a very simple and well-established idea. And in terms of the transcriptional model that we use, it’s also a very simple and old idea. But when you put them together, they make something more powerful than you would expect. expect,” said Samantha Riesenfeld, PME assistant professor of molecular engineering and medicine, who wrote the paper with Professor Suriyanarayanan Vaikuntanathan of the Department of Chemistry and their joint student, UChicago chemistry doctoral candidate Cheng Frank Gao.

The problem with pseudotime

Researchers use scRNA-seq to obtain powerful and detailed measurements, but static in nature.

“We developed TopicVelo to infer cell state transitions from scRNA-seq data,” Riesenfeld said. “It’s difficult to do that from this type of data because scRNA-seq is destructive. When you measure the cell this way, you destroy the cell.”

This leaves researchers with a snapshot of when the cell was measured/destroyed. While scRNA-seq provides the best available snapshot of the entire transcriptome, the information many researchers need is how cells make the transition. over time. They need to know how a cell becomes cancerous or how a particular genetic program behaves during an immune response.

To help uncover dynamic processes from a static snapshot, researchers traditionally use what is called “pseudotime.” It’s impossible to watch the expression of an individual cell or gene change and grow in a still image, but that image also captured other cells and genes of the same type that might be a little further along in the same process. If scientists connect the dots correctly, they can gain valuable information about what the process looks like over time.

Connecting those dots is a difficult guess, based on the assumption that similar-looking cells are simply found at different points along the same path. Biology is much more complicated, with false starts, stops, bursts, and multiple chemical forces tugging at every gene.

Instead of traditional pseudotime approaches, which analyze expression similarity between the transcriptional profiles of cells, RNA velocity approaches analyze the dynamics of mRNA transcription, splicing, and degradation within those cells.

It is a promising but early technology.

“The persistent gap between the promise and reality of RNA speed has greatly restricted its application,” the authors wrote in the paper.

To close this gap, TopicVelo abandons deterministic models and adopts (and draws insights from) a much more difficult stochastic model that reflects the inescapable randomness of biology.

“Cells, when you think about them, are inherently random,” said Gao, the paper’s first author. “You can have genetically identical twins or cells that will grow to be very different. TopicVelo introduces the use of a stochastic model. We can better capture the biophysics underlying transcription processes that are important for mRNA transcription.”

Machine learning shows the way

The team also realized that another assumption limits the standard rate of RNA. “Most methods assume that all cells express basically the same large genetic program, but you can imagine that cells have to carry out different types of processes simultaneously, to different degrees,” Riesenfeld said. Unraveling these processes is a challenge.

Probabilistic topic modeling (a machine learning tool traditionally used to identify topics from written documents) provided the UChicago team with a strategy. TopicVelo groups scRNA-seq data not by cell types or genes, but by the processes in which those cells and genes are involved. Processes are inferred from data, rather than imposed by external knowledge.

“If you look at a science magazine, it will be organized into topics like ‘physics,’ ‘chemistry,’ and ‘astrophysics,’ that kind of thing,” Gao said. “We applied this organizing principle to single-cell RNA sequencing data. So now we can organize our data by themes, such as ‘ribosomal synthesis,’ ‘differentiation,’ ‘immune response,’ and ‘cell cycle.’ And we can fit transcriptional models specific stochastics of each process”.

After TopicVelo untangles this chaos of processes and organizes them by topic, it applies topic weights again to the cells, to take into account what percentage of each cell’s transcriptional profile is involved in which activity.

According to Riesenfeld, “This approach helps us observe the dynamics of different processes and understand their importance in different cells. And that is especially useful when there are branching points or when a cell is pulled in different directions.”

The results of combining the stochastic model with the thematic model are surprising. For example, TopicVelo was able to reconstruct trajectories that previously required special experimental techniques to recover. These improvements greatly expand the potential applications.

Gao compared the article’s findings to the article itself, a product of many areas of study and experience.

“In PME, if you have a chemistry project, there’s probably a physics or engineering student working on it,” he said. “It’s never just chemistry.”