Department of Statistics and Data Science, National University of Singapore, Singapore 117546, Republic of Singapore.
Yau Mathematical Sciences Center, Jingzhai, Tsinghua University, Beijing 100084, China.
Proc Natl Acad Sci U S A. 2024 Sep 10;121(37):e2400002121. doi: 10.1073/pnas.2400002121. Epub 2024 Sep 3.
Single-cell RNA sequencing (scRNA-seq) data, susceptible to noise arising from biological variability and technical errors, can distort gene expression analysis and impact cell similarity assessments, particularly in heterogeneous populations. Current methods, including deep learning approaches, often struggle to accurately characterize cell relationships due to this inherent noise. To address these challenges, we introduce scAMF (Single-cell Analysis via Manifold Fitting), a framework designed to enhance clustering accuracy and data visualization in scRNA-seq studies. At the heart of scAMF lies the manifold fitting module, which effectively denoises scRNA-seq data by unfolding their distribution in the ambient space. This unfolding aligns the gene expression vector of each cell more closely with its underlying structure, bringing it spatially closer to other cells of the same cell type. To comprehensively assess the impact of scAMF, we compile a collection of 25 publicly available scRNA-seq datasets spanning various sequencing platforms, species, and organ types, forming an extensive RNA data bank. In our comparative studies, benchmarking scAMF against existing scRNA-seq analysis algorithms in this data bank, we consistently observe that scAMF outperforms in terms of clustering efficiency and data visualization clarity. Further experimental analysis reveals that this enhanced performance stems from scAMF's ability to improve the spatial distribution of the data and capture class-consistent neighborhoods. These findings underscore the promising application potential of manifold fitting as a tool in scRNA-seq analysis, signaling a significant enhancement in the precision and reliability of data interpretation in this critical field of study.
单细胞 RNA 测序 (scRNA-seq) 数据容易受到生物变异性和技术误差产生的噪声的影响,这可能会扭曲基因表达分析并影响细胞相似性评估,特别是在异质群体中。由于这种固有噪声,包括深度学习方法在内的当前方法往往难以准确描述细胞之间的关系。为了解决这些挑战,我们引入了 scAMF(通过流形拟合进行单细胞分析),这是一个旨在提高 scRNA-seq 研究中聚类准确性和数据可视化的框架。scAMF 的核心是流形拟合模块,它通过在环境空间中展开 scRNA-seq 数据来有效地对其进行去噪。这种展开使每个细胞的基因表达向量与其潜在结构更紧密地对齐,从而使它在空间上更接近同一细胞类型的其他细胞。为了全面评估 scAMF 的影响,我们编译了一个由 25 个公开可用的 scRNA-seq 数据集组成的集合,这些数据集涵盖了各种测序平台、物种和器官类型,形成了一个广泛的 RNA 数据库。在我们对该数据库中现有的 scRNA-seq 分析算法的比较研究中,我们一致观察到 scAMF 在聚类效率和数据可视化清晰度方面表现优于其他算法。进一步的实验分析表明,这种增强的性能源于 scAMF 改善数据空间分布和捕获类一致邻域的能力。这些发现突显了流形拟合作为 scRNA-seq 分析工具的应用潜力,表明在这一关键研究领域中,数据解释的精度和可靠性得到了显著提高。