Suppr超能文献

EDClust:一种用于多主体单细胞 RNA 测序中细胞聚类的 EM-MM 混合方法。

EDClust: an EM-MM hybrid method for cell clustering in multiple-subject single-cell RNA sequencing.

机构信息

Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA.

Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA.

出版信息

Bioinformatics. 2022 May 13;38(10):2692-2699. doi: 10.1093/bioinformatics/btac168.

Abstract

MOTIVATION

Single-cell RNA sequencing (scRNA-seq) has revolutionized biological research by enabling the measurement of transcriptomic profiles at the single-cell level. With the increasing application of scRNA-seq in larger-scale studies, the problem of appropriately clustering cells emerges when the scRNA-seq data are from multiple subjects. One challenge is the subject-specific variation; systematic heterogeneity from multiple subjects may have a significant impact on clustering accuracy. Existing methods seeking to address such effects suffer from several limitations.

RESULTS

We develop a novel statistical method, EDClust, for multi-subject scRNA-seq cell clustering. EDClust models the sequence read counts by a mixture of Dirichlet-multinomial distributions and explicitly accounts for cell-type heterogeneity, subject heterogeneity and clustering uncertainty. An EM-MM hybrid algorithm is derived for maximizing the data likelihood and clustering the cells. We perform a series of simulation studies to evaluate the proposed method and demonstrate the outstanding performance of EDClust. Comprehensive benchmarking on four real scRNA-seq datasets with various tissue types and species demonstrates the substantial accuracy improvement of EDClust compared to existing methods.

AVAILABILITY AND IMPLEMENTATION

The R package is freely available at https://github.com/weix21/EDClust.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

单细胞 RNA 测序(scRNA-seq)通过在单细胞水平上测量转录组谱,彻底改变了生物学研究。随着 scRNA-seq 在更大规模研究中的应用越来越多,当 scRNA-seq 数据来自多个个体时,就会出现适当对细胞进行聚类的问题。其中一个挑战是个体特异性变化;来自多个个体的系统异质性可能会对聚类准确性产生重大影响。现有的旨在解决此类影响的方法存在一些局限性。

结果

我们开发了一种新的统计方法 EDClust,用于多主体 scRNA-seq 细胞聚类。EDClust 通过狄利克雷-多项分布的混合模型对序列读取计数进行建模,并明确考虑了细胞类型异质性、个体异质性和聚类不确定性。导出了一种 EM-MM 混合算法来最大化数据似然并对细胞进行聚类。我们进行了一系列模拟研究来评估所提出的方法,并证明了 EDClust 的出色性能。在具有各种组织类型和物种的四个真实 scRNA-seq 数据集上进行的全面基准测试表明,与现有方法相比,EDClust 的准确性有了显著提高。

可用性和实施

R 包可在 https://github.com/weix21/EDClust 上免费获得。

补充信息

补充数据可在生物信息学在线获得。

相似文献

1
EDClust: an EM-MM hybrid method for cell clustering in multiple-subject single-cell RNA sequencing.
Bioinformatics. 2022 May 13;38(10):2692-2699. doi: 10.1093/bioinformatics/btac168.
2
DIMM-SC: a Dirichlet mixture model for clustering droplet-based single cell transcriptomic data.
Bioinformatics. 2018 Jan 1;34(1):139-146. doi: 10.1093/bioinformatics/btx490.
5
Clustering scRNA-seq data with the cross-view collaborative information fusion strategy.
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae511.
7
FlowGrid enables fast clustering of very large single-cell RNA-seq data.
Bioinformatics. 2021 Dec 22;38(1):282-283. doi: 10.1093/bioinformatics/btab521.
8
scNAME: neighborhood contrastive clustering with ancillary mask estimation for scRNA-seq data.
Bioinformatics. 2022 Mar 4;38(6):1575-1583. doi: 10.1093/bioinformatics/btac011.
10
Deep enhanced constraint clustering based on contrastive learning for scRNA-seq data.
Brief Bioinform. 2023 Jul 20;24(4). doi: 10.1093/bib/bbad222.

引用本文的文献

1
Single-cell omics: experimental workflow, data analyses and applications.
Sci China Life Sci. 2025 Jan;68(1):5-102. doi: 10.1007/s11427-023-2561-0. Epub 2024 Jul 23.

本文引用的文献

1
Integrated analysis of multimodal single-cell data.
Cell. 2021 Jun 24;184(13):3573-3587.e29. doi: 10.1016/j.cell.2021.04.048. Epub 2021 May 31.
3
Accurate feature selection improves single-cell RNA-seq cell clustering.
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab034.
4
: batch effect adjustment for RNA-seq count data.
NAR Genom Bioinform. 2020 Sep;2(3):lqaa078. doi: 10.1093/nargab/lqaa078. Epub 2020 Sep 21.
6
Deep learning enables accurate clustering with batch effect removal in single-cell RNA-seq analysis.
Nat Commun. 2020 May 11;11(1):2338. doi: 10.1038/s41467-020-15851-3.
7
SHARP: hyperfast and accurate processing of single-cell RNA-seq data via ensemble random projection.
Genome Res. 2020 Feb;30(2):205-213. doi: 10.1101/gr.254557.119. Epub 2020 Jan 28.
8
A benchmark of batch-effect correction methods for single-cell RNA sequencing data.
Genome Biol. 2020 Jan 16;21(1):12. doi: 10.1186/s13059-019-1850-9.
9
Fast, sensitive and accurate integration of single-cell data with Harmony.
Nat Methods. 2019 Dec;16(12):1289-1296. doi: 10.1038/s41592-019-0619-0. Epub 2019 Nov 18.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验