DIMM-SC：一种基于 Dirichlet 混合模型的用于聚类基于液滴的单细胞转录组学数据的方法。

DIMM-SC: a Dirichlet mixture model for clustering droplet-based single cell transcriptomic data.

机构信息

Department of Biostatistics, University of Pittsburgh Graduate School of Public Health, Pittsburgh, PA, USA.

Division of Pulmonary Medicine, Allergy and Immunology and Department of Pediatrics, Children's Hospital of Pittsburgh of UPMC, University of Pittsburgh, Pittsburgh, PA, USA.

出版信息

Bioinformatics. 2018 Jan 1;34(1):139-146. doi: 10.1093/bioinformatics/btx490.

DOI:10.1093/bioinformatics/btx490

PMID:29036318

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6454475/

Abstract

MOTIVATION

Single cell transcriptome sequencing (scRNA-Seq) has become a revolutionary tool to study cellular and molecular processes at single cell resolution. Among existing technologies, the recently developed droplet-based platform enables efficient parallel processing of thousands of single cells with direct counting of transcript copies using Unique Molecular Identifier (UMI). Despite the technology advances, statistical methods and computational tools are still lacking for analyzing droplet-based scRNA-Seq data. Particularly, model-based approaches for clustering large-scale single cell transcriptomic data are still under-explored.

RESULTS

We developed DIMM-SC, a Dirichlet Mixture Model for clustering droplet-based Single Cell transcriptomic data. This approach explicitly models UMI count data from scRNA-Seq experiments and characterizes variations across different cell clusters via a Dirichlet mixture prior. We performed comprehensive simulations to evaluate DIMM-SC and compared it with existing clustering methods such as K-means, CellTree and Seurat. In addition, we analyzed public scRNA-Seq datasets with known cluster labels and in-house scRNA-Seq datasets from a study of systemic sclerosis with prior biological knowledge to benchmark and validate DIMM-SC. Both simulation studies and real data applications demonstrated that overall, DIMM-SC achieves substantially improved clustering accuracy and much lower clustering variability compared to other existing clustering methods. More importantly, as a model-based approach, DIMM-SC is able to quantify the clustering uncertainty for each single cell, facilitating rigorous statistical inference and biological interpretations, which are typically unavailable from existing clustering methods.

AVAILABILITY AND IMPLEMENTATION

DIMM-SC has been implemented in a user-friendly R package with a detailed tutorial available on www.pitt.edu/∼wec47/singlecell.html.

CONTACT

wei.chen@chp.edu or hum@ccf.org.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

单细胞转录组测序（scRNA-Seq）已成为研究单细胞水平细胞和分子过程的革命性工具。在现有技术中，最近开发的基于液滴的平台通过使用独特分子标识符（UMI）直接计数转录本拷贝，能够高效地并行处理数千个单细胞。尽管技术有所进步，但分析基于液滴的 scRNA-Seq 数据的统计方法和计算工具仍然缺乏。特别是，基于模型的方法仍然在探索中，用于对大规模单细胞转录组数据进行聚类。

结果

我们开发了 DIMM-SC，这是一种用于基于液滴的单细胞转录组数据聚类的狄利克雷混合模型。该方法明确地对 scRNA-Seq 实验中的 UMI 计数数据进行建模，并通过狄利克雷混合先验来描述不同细胞簇之间的变化。我们进行了全面的模拟评估，将 DIMM-SC 与现有的聚类方法（如 K-means、CellTree 和 Seurat）进行了比较。此外，我们分析了具有已知聚类标签的公共 scRNA-Seq 数据集和来自系统性硬化症研究的内部 scRNA-Seq 数据集，以基准测试和验证 DIMM-SC。模拟研究和真实数据应用均表明，总体而言，与其他现有聚类方法相比，DIMM-SC 可显著提高聚类准确性，并大大降低聚类变异性。更重要的是，作为一种基于模型的方法，DIMM-SC 能够量化每个单细胞的聚类不确定性，促进严格的统计推断和生物学解释，而这通常是现有聚类方法无法提供的。

可用性和实现

DIMM-SC 已在一个用户友好的 R 包中实现，并在 www.pitt.edu/∼wec47/singlecell.html 上提供了详细的教程。

联系方式

wei.chen@chp.edu 或 hum@ccf.org。

补充信息

补充数据可在生物信息学在线获取。

相似文献

DIMM-SC: a Dirichlet mixture model for clustering droplet-based single cell transcriptomic data.

Bioinformatics. 2018 Jan 1;34(1):139-146. doi: 10.1093/bioinformatics/btx490.

A Bayesian mixture model for clustering droplet-based single-cell transcriptomic data from population studies.

Nat Commun. 2019 Apr 9;10(1):1649. doi: 10.1038/s41467-019-09639-3.

Random forest based similarity learning for single cell RNA sequencing data.

Bioinformatics. 2018 Jul 1;34(13):i79-i88. doi: 10.1093/bioinformatics/bty260.

VPAC: Variational projection for accurate clustering of single-cell transcriptomic data.

BMC Bioinformatics. 2019 May 1;20(Suppl 7):0. doi: 10.1186/s12859-019-2742-4.

scBGEDA: deep single-cell clustering analysis via a dual denoising autoencoder with bipartite graph ensemble clustering.

Bioinformatics. 2023 Feb 14;39(2). doi: 10.1093/bioinformatics/btad075.

Scalable preprocessing for sparse scRNA-seq data exploiting prior knowledge.

Bioinformatics. 2018 Jul 1;34(13):i124-i132. doi: 10.1093/bioinformatics/bty293.

A multitask clustering approach for single-cell RNA-seq analysis in Recessive Dystrophic Epidermolysis Bullosa.

PLoS Comput Biol. 2018 Apr 9;14(4):e1006053. doi: 10.1371/journal.pcbi.1006053. eCollection 2018 Apr.

EDClust: an EM-MM hybrid method for cell clustering in multiple-subject single-cell RNA sequencing.

Bioinformatics. 2022 May 13;38(10):2692-2699. doi: 10.1093/bioinformatics/btac168.

scDAC: deep adaptive clustering of single-cell transcriptomic data with coupled autoencoder and Dirichlet process mixture model.

Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae198.

Machine learning and statistical methods for clustering single-cell RNA-sequencing data.

Brief Bioinform. 2020 Jul 15;21(4):1209-1223. doi: 10.1093/bib/bbz063.

引用本文的文献

scMMAE: masked cross-attention network for single-cell multimodal omics fusion to enhance unimodal omics.

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf010.

Cell Type Differentiation Using Network Clustering Algorithms.

bioRxiv. 2024 Dec 7:2024.12.04.626793. doi: 10.1101/2024.12.04.626793.

Identifying cell states in single-cell RNA-seq data at statistically maximal resolution.

PLoS Comput Biol. 2024 Jul 12;20(7):e1012224. doi: 10.1371/journal.pcbi.1012224. eCollection 2024 Jul.

CTEC: a cross-tabulation ensemble clustering approach for single-cell RNA sequencing data analysis.

Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae130.

Scalable nonparametric clustering with unified marker gene selection for single-cell RNA-seq data.

bioRxiv. 2024 Feb 12:2024.02.11.579839. doi: 10.1101/2024.02.11.579839.

A critical assessment of clustering algorithms to improve cell clustering and identification in single-cell transcriptome study.

Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad497.

A new and effective two-step clustering approach for single cell RNA sequencing data.

BMC Genomics. 2023 Nov 9;23(Suppl 6):864. doi: 10.1186/s12864-023-09577-x.

Bayesian cluster analysis.

Philos Trans A Math Phys Eng Sci. 2023 May 15;381(2247):20220149. doi: 10.1098/rsta.2022.0149. Epub 2023 Mar 27.

Latent dirichlet allocation for double clustering (LDA-DC): discovering patients phenotypes and cell populations within a single Bayesian framework.

BMC Bioinformatics. 2023 Feb 23;24(1):61. doi: 10.1186/s12859-023-05177-4.

Single-cell RNA-seq data analysis using graph autoencoders and graph attention networks.

Front Genet. 2022 Dec 9;13:1003711. doi: 10.3389/fgene.2022.1003711. eCollection 2022.

本文引用的文献

Massively parallel digital transcriptional profiling of single cells.

Nat Commun. 2017 Jan 16;8:14049. doi: 10.1038/ncomms14049.

CellTree: an R/bioconductor package to infer the hierarchical structure of cell populations from single-cell RNA-seq data.

BMC Bioinformatics. 2016 Sep 13;17(1):363. doi: 10.1186/s12859-016-1175-6.

Single-cell genome sequencing: current state of the science.

Nat Rev Genet. 2016 Mar;17(3):175-88. doi: 10.1038/nrg.2015.16. Epub 2016 Jan 25.

Fast clustering using adaptive density peak detection.

Stat Methods Med Res. 2017 Dec;26(6):2800-2811. doi: 10.1177/0962280215609948. Epub 2015 Oct 16.

Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets.

Cell. 2015 May 21;161(5):1202-1214. doi: 10.1016/j.cell.2015.05.002.

Spatial reconstruction of single-cell gene expression data.

Nat Biotechnol. 2015 May;33(5):495-502. doi: 10.1038/nbt.3192. Epub 2015 Apr 13.

Computational and analytical challenges in single-cell transcriptomics.

Nat Rev Genet. 2015 Mar;16(3):133-45. doi: 10.1038/nrg3833. Epub 2015 Jan 28.

Machine learning. Clustering by fast search and find of density peaks.

Science. 2014 Jun 27;344(6191):1492-6. doi: 10.1126/science.1242072.

Massively parallel single-cell RNA-seq for marker-free decomposition of tissues into cell types.

Science. 2014 Feb 14;343(6172):776-9. doi: 10.1126/science.1247651.

Quantitative single-cell RNA-seq with unique molecular identifiers.

Nat Methods. 2014 Feb;11(2):163-6. doi: 10.1038/nmeth.2772. Epub 2013 Dec 22.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

DIMM-SC：一种基于 Dirichlet 混合模型的用于聚类基于液滴的单细胞转录组学数据的方法。

DIMM-SC: a Dirichlet mixture model for clustering droplet-based single cell transcriptomic data.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

CONTACT

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

联系方式

补充信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献