A team of researchers from the National University of Singapore (NUS) has introduced a new method for analysing single-cell RNA sequencing (scRNA-seq) data. This method promises to improve both the accuracy and speed of data interpretation, potentially accelerating progress in numerous areas of biomedical research, including studies on cancer and Alzheimer’s disease.
The innovative framework, named scAMF (Single-cell Analysis via Manifold Fitting), was developed by a team of scientists led by Associate Professor Zhigang Yao from the Department of Statistics and Data Science in the NUS Faculty of Science. The framework employs advanced mathematical techniques to fit a low-dimensional manifold within the high-dimensional space where gene expression data is measured. By doing so, scAMF effectively reduces noise while preserving crucial biological information. This enables more accurate characterisation of cell types and states.
This research was conducted in collaboration with Professor Yau Shing-Tung of Tsinghua University. Their findings were published in the journal Proceedings of the National Academy of Sciences on September 3, 2024.
Leveraging manifold tuning techniques to overcome data analysis bottlenecks
Single-cell RNA sequencing has emerged as a crucial tool in genomics research, offering unprecedented insights into cellular diversity and disease mechanisms. However, noise inherent in scRNA-seq data, arising from both biological variability and technical errors, has long posed challenges for accurate analysis. Traditional scRNA-seq analysis methods, including genomic imputation approaches, graph-based methods, and deep learning-based algorithms, often struggle to accurately characterize relationships between cells due to inherent noise.
The scAMF framework represents a significant advancement in overcoming these limitations. It works on the principle of fitting a low-dimensional manifold within the ambient space of gene expression data, effectively reducing noise while preserving crucial information. At the heart of scAMF is the manifold fitting module, which effectively removes noise from scRNA-seq data by unfolding its distribution in the ambient space. This technique aims to reconstruct a uniform manifold within the original space where the data is measured, capturing the low-dimensional structure of the data in a way that minimizes information loss and effectively removes noise.
The key innovation of scAMF lies in its ability to improve the spatial distribution of data, bringing gene expression vectors from cells of the same type closer together while maintaining a clear separation between different cell types. This improvement allows for more accurate and reliable clustering in downstream analyses.
“Our method effectively removes noise from RNA sequencing data by fitting a low-dimensional manifold into a high-dimensional space,” explained Associate Professor Yao. “This method significantly improves the accuracy of cell type classification and the clarity of data visualization.”
The scAMF method employs a unique combination of data transformation, manifold fitting using shared nearest neighbor metrics, and unsupervised clustering validation. Compared to other methods, scAMF demonstrates superior performance in several key areas, including more effective noise reduction, improved clustering accuracy, better preservation of biological information, competitive computational efficiency, clearer visualization, and robust performance on diverse datasets. These improvements position scAMF as a powerful new tool in single-cell analysis, potentially enabling researchers to uncover previously hidden cellular heterogeneity and rare cell populations.
Future work: driving greater understanding of cellular diversity and function
Building on the success of scAMF, the research team is now developing a new framework for building high-resolution multi-scale cell atlases. This new approach aims to overcome current methodological limitations in cell atlas construction, such as challenges in identifying small cell populations and outdated unsupervised learning techniques.
A key goal is the development of a scAMF-based multi-resolution cellular analysis framework. This advanced framework aims to identify rare cell populations and contribute to the construction of comprehensive cell atlases. The multi-resolution approach will enable researchers to analyze cellular heterogeneity at various levels of granularity, from general cell types to subtle subpopulations. This is particularly crucial for identifying rare cell types that may be missed by conventional analysis methods.
“Our ongoing work has already shown promising results on numerous benchmark datasets, revealing new biological insights,” noted Associate Professor Yao. “We have applied it to the Human Brain Cell Atlas and identified new subtypes and marker genes for various cell types.”
This ongoing research promises to further push the boundaries of single cell analysis, potentially revolutionizing our understanding of cellular diversity and function in various biological systems.