Suppr超能文献

DGCyTOF:基于图形聚类可视化的深度学习,用于预测单细胞质谱流式细胞术数据的细胞类型。

DGCyTOF: Deep learning with graphic cluster visualization to predict cell types of single cell mass cytometry data.

机构信息

Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, Ohio, United States of America.

The Grainger College of Engineering, The University of Illinois Urbana-Champaign, Urbana and Champaign, Champaign, Illinois, United States of America.

出版信息

PLoS Comput Biol. 2022 Apr 11;18(4):e1008885. doi: 10.1371/journal.pcbi.1008885. eCollection 2022 Apr.

Abstract

Single-cell mass cytometry, also known as cytometry by time of flight (CyTOF) is a powerful high-throughput technology that allows analysis of up to 50 protein markers per cell for the quantification and classification of single cells. Traditional manual gating utilized to identify new cell populations has been inadequate, inefficient, unreliable, and difficult to use, and no algorithms to identify both calibration and new cell populations has been well established. A deep learning with graphic cluster (DGCyTOF) visualization is developed as a new integrated embedding visualization approach in identifying canonical and new cell types. The DGCyTOF combines deep-learning classification and hierarchical stable-clustering methods to sequentially build a tri-layer construct for known cell types and the identification of new cell types. First, deep classification learning is constructed to distinguish calibration cell populations from all cells by softmax classification assignment under a probability threshold, and graph embedding clustering is then used to identify new cell populations sequentially. In the middle of two-layer, cell labels are automatically adjusted between new and unknown cell populations via a feedback loop using an iteration calibration system to reduce the rate of error in the identification of cell types, and a 3-dimensional (3D) visualization platform is finally developed to display the cell clusters with all cell-population types annotated. Utilizing two benchmark CyTOF databases comprising up to 43 million cells, we compared accuracy and speed in the identification of cell types among DGCyTOF, DeepCyTOF, and other technologies including dimension reduction with clustering, including Principal Component Analysis (PCA), Factor Analysis (FA), Independent Component Analysis (ICA), Isometric Feature Mapping (Isomap), t-distributed Stochastic Neighbor Embedding (t-SNE), and Uniform Manifold Approximation and Projection (UMAP) with k-means clustering and Gaussian mixture clustering. We observed the DGCyTOF represents a robust complete learning system with high accuracy, speed and visualization by eight measurement criteria. The DGCyTOF displayed F-scores of 0.9921 for CyTOF1 and 0.9992 for CyTOF2 datasets, whereas those scores were only 0.507 and 0.529 for the t-SNE+k-means; 0.565 and 0.59, for UMAP+ k-means. Comparison of DGCyTOF with t-SNE and UMAP visualization in accuracy demonstrated its approximately 35% superiority in predicting cell types. In addition, observation of cell-population distribution was more intuitive in the 3D visualization in DGCyTOF than t-SNE and UMAP visualization. The DGCyTOF model can automatically assign known labels to single cells with high accuracy using deep-learning classification assembling with traditional graph-clustering and dimension-reduction strategies. Guided by a calibration system, the model seeks optimal accuracy balance among calibration cell populations and unknown cell types, yielding a complete and robust learning system that is highly accurate in the identification of cell populations compared to results using other methods in the analysis of single-cell CyTOF data. Application of the DGCyTOF method to identify cell populations could be extended to the analysis of single-cell RNASeq data and other omics data.

摘要

单细胞质量细胞术,也称为飞行时间细胞术(CyTOF),是一种强大的高通量技术,允许对每个细胞进行多达 50 个蛋白质标记物的定量和分类,以用于单细胞的分析。传统的手动门控技术用于识别新的细胞群已经不够用、效率低下、不可靠且难以使用,并且还没有建立用于识别校准和新细胞群的算法。一种新的集成嵌入可视化方法——基于图形聚类的深度学习(DGCyTOF)可视化,已被开发用于识别经典和新型细胞类型。DGCyTOF 将深度学习分类和层次稳定聚类方法结合起来,为已知细胞类型和新细胞类型的识别顺序构建一个三层结构。首先,通过软最大分类分配在概率阈值下,构建深度分类学习,以区分校准细胞群体与所有细胞,并通过图嵌入聚类顺序识别新的细胞群体。在两层中间,通过使用迭代校准系统的反馈循环,自动调整新和未知细胞群体之间的细胞标签,以降低细胞类型识别中的错误率,最后开发了一个 3D 可视化平台,用于显示带有所有细胞群体类型注释的细胞群。利用包含多达 4300 万个细胞的两个基准 CyTOF 数据库,我们比较了 DGCyTOF、DeepCyTOF 和其他技术(包括聚类的降维和主成分分析(PCA)、因子分析(FA)、独立成分分析(ICA)、等距特征映射(Isomap)、t 分布随机邻居嵌入(t-SNE)和均方根逼近和投影(UMAP)与 k-means 聚类和高斯混合聚类)在细胞类型识别中的准确性和速度。我们观察到 DGCyTOF 代表了一个具有高准确性、速度和可视化的强大完整学习系统,具有八个测量标准。DGCyTOF 对 CyTOF1 和 CyTOF2 数据集的 F 分数分别为 0.9921 和 0.9992,而 t-SNE+k-means 的分数仅为 0.507 和 0.529;UMAP+k-means 的分数分别为 0.565 和 0.59。在准确性方面,DGCyTOF 与 t-SNE 和 UMAP 可视化的比较表明,它在预测细胞类型方面的优势约为 35%。此外,与 t-SNE 和 UMAP 可视化相比,在 DGCyTOF 中的 3D 可视化中,细胞群体分布的观察更加直观。DGCyTOF 模型可以使用深度学习分类与传统的图聚类和降维策略相结合,高精度地自动为单个细胞分配已知标签。在一个校准系统的指导下,该模型在校准细胞群体和未知细胞类型之间寻求最佳的准确性平衡,产生了一个完整和强大的学习系统,与其他方法相比,在单细胞 CyTOF 数据分析中,该系统在细胞群体识别方面具有更高的准确性。DGCyTOF 方法用于识别细胞群体的应用可以扩展到单细胞 RNA-Seq 数据和其他组学数据的分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fa10/9060369/a70aa2e4181b/pcbi.1008885.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验