Suppr超能文献

组学数据的非线性嵌入与整合:一种快速且无需调整的方法。

Nonlinear embedding and integration of omics data: a fast and tuning-free approach.

作者信息

Liu Shengjie, Yu Tianwei

机构信息

School of Data Science, The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), 2001 Longxiang Boulevard, Longgang District, Shenzhen 518172, Guangdong, P.R. China.

出版信息

Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf184.

Abstract

The rapid progress of single-cell technology has facilitated cost-effective acquisition of diverse omics data, allowing biologists to unravel the complexities of cell populations, disease states, and more. Additionally, single-cell multi-omics technologies have opened new avenues for studying biological interactions. However, the high dimensionality and sparsity of omics data present significant analytical challenges. Dimension reduction (DR) techniques are hence essential for analyzing such complex data, yet many existing methods have inherent limitations. Linear methods like principal component analysis (PCA) struggle to capture intricate associations within data. In response, nonlinear techniques have emerged, but they may face scalability issues, be restricted to single-omics data, or prioritize visualization over generating informative embeddings. Here, we introduce dissimilarity based on conditional ordered list (DCOL) correlation, a novel measure for quantifying nonlinear relationships between variables. Based on this measure, we propose DCOL-PCA and DCOL-Canonical Correlation Analysis for dimension reduction and integration of single- and multi-omics data. In simulations, our methods outperformed nine DR methods and four joint dimension reduction methods, demonstrating stable performance across various settings. We also validated these methods on real datasets, with our method demonstrating its ability to detect intricate signals within and between omics data and generate lower dimensional embeddings that preserve the essential information and latent structures.

摘要

单细胞技术的快速发展促进了以具有成本效益的方式获取各种组学数据,使生物学家能够揭示细胞群体、疾病状态等的复杂性。此外,单细胞多组学技术为研究生物相互作用开辟了新途径。然而,组学数据的高维度和稀疏性带来了重大的分析挑战。因此,降维(DR)技术对于分析此类复杂数据至关重要,但许多现有方法存在固有局限性。像主成分分析(PCA)这样的线性方法难以捕捉数据中的复杂关联。作为回应,非线性技术应运而生,但它们可能面临可扩展性问题,仅限于单组学数据,或者在生成信息丰富的嵌入方面更侧重于可视化。在此,我们引入基于条件有序列表(DCOL)相关性的差异度,这是一种用于量化变量之间非线性关系的新度量。基于此度量,我们提出了用于单细胞和多组学数据降维和整合的DCOL-PCA和DCOL-典型相关分析。在模拟中,我们的方法优于九种降维方法和四种联合降维方法,在各种设置下均表现出稳定的性能。我们还在真实数据集上验证了这些方法,我们的方法展示了其检测组学数据内部和之间复杂信号的能力,并生成保留基本信息和潜在结构的低维嵌入。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6534/12009717/e6d9673f7adc/bbaf184f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验