使用 t-SNE 进行单细胞转录组学分析的艺术。

The art of using t-SNE for single-cell transcriptomics.

机构信息

Institute for Ophthalmic Research, University of Tübingen, Tübingen, Germany.

Bernstein Center for Computational Neuroscience, University of Tübingen, Tübingen, Germany.

出版信息

Nat Commun. 2019 Nov 28;10(1):5416. doi: 10.1038/s41467-019-13056-x.

DOI:10.1038/s41467-019-13056-x

PMID:31780648

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6882829/

Abstract

Single-cell transcriptomics yields ever growing data sets containing RNA expression levels for thousands of genes from up to millions of cells. Common data analysis pipelines include a dimensionality reduction step for visualising the data in two dimensions, most frequently performed using t-distributed stochastic neighbour embedding (t-SNE). It excels at revealing local structure in high-dimensional data, but naive applications often suffer from severe shortcomings, e.g. the global structure of the data is not represented accurately. Here we describe how to circumvent such pitfalls, and develop a protocol for creating more faithful t-SNE visualisations. It includes PCA initialisation, a high learning rate, and multi-scale similarity kernels; for very large data sets, we additionally use exaggeration and downsampling-based initialisation. We use published single-cell RNA-seq data sets to demonstrate that this protocol yields superior results compared to the naive application of t-SNE.

摘要

单细胞转录组学产生了越来越多的数据，其中包含了多达数百万个细胞的数千个基因的 RNA 表达水平。常见的数据分析管道包括降维步骤，用于在二维空间中可视化数据，最常用的方法是使用 t 分布随机邻域嵌入 (t-SNE)。它擅长揭示高维数据中的局部结构，但在直观应用中往往存在严重的缺陷，例如数据的全局结构不能被准确地表示。在这里，我们描述了如何规避这些陷阱，并开发了一种创建更真实 t-SNE 可视化的方案。它包括 PCA 初始化、高学习率和多尺度相似性核；对于非常大的数据集，我们还使用夸张和基于下采样的初始化。我们使用已发表的单细胞 RNA-seq 数据集来证明与 t-SNE 的直观应用相比，该方案产生了更好的结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f21f/6882829/928a6337a14f/41467_2019_13056_Fig1_HTML.jpg

相似文献

The art of using t-SNE for single-cell transcriptomics.

Nat Commun. 2019 Nov 28;10(1):5416. doi: 10.1038/s41467-019-13056-x.

Automated optimized parameters for T-distributed stochastic neighbor embedding improve visualization and analysis of large datasets.

Nat Commun. 2019 Nov 28;10(1):5415. doi: 10.1038/s41467-019-13055-y.

Shape-aware stochastic neighbor embedding for robust data visualisations.

BMC Bioinformatics. 2022 Nov 14;23(1):477. doi: 10.1186/s12859-022-05028-8.

Fast interpolation-based t-SNE for improved visualization of single-cell RNA-seq data.

Nat Methods. 2019 Mar;16(3):243-245. doi: 10.1038/s41592-018-0308-4. Epub 2019 Feb 11.

Visualization of Single Cell RNA-Seq Data Using t-SNE in R.

Methods Mol Biol. 2020;2117:159-167. doi: 10.1007/978-1-0716-0301-7_8.

Assessing single-cell transcriptomic variability through density-preserving data visualization.

Nat Biotechnol. 2021 Jun;39(6):765-774. doi: 10.1038/s41587-020-00801-7. Epub 2021 Jan 18.

Dimensionality Reduction of Single-Cell RNA-Seq Data.

Methods Mol Biol. 2021;2284:331-342. doi: 10.1007/978-1-0716-1307-8_18.

Heavy-tailed kernels reveal a finer cluster structure in t-SNE visualisations.

Mach Learn Knowl Discov Databases. 2020;11906:124-139. doi: 10.1007/978-3-030-46150-8_8. Epub 2020 Apr 30.

Dimensionality reduction by UMAP reinforces sample heterogeneity analysis in bulk transcriptomic data.

Cell Rep. 2021 Jul 27;36(4):109442. doi: 10.1016/j.celrep.2021.109442.

Visualizing Single-Cell RNA-seq Data with Semisupervised Principal Component Analysis.

Int J Mol Sci. 2020 Aug 12;21(16):5797. doi: 10.3390/ijms21165797.

引用本文的文献

DeepAtlas: a tool for effective manifold learning.

bioRxiv. 2025 Aug 31:2025.08.26.672474. doi: 10.1101/2025.08.26.672474.

DeepAtlas: a tool for effective manifold learning.

ArXiv. 2025 Aug 26:arXiv:2508.19479v1.

CD4 T Cell Subsets and as Novel Biomarkers of Immune Dysregulation in Dilated Cardiomyopathy.

Int J Mol Sci. 2025 Aug 13;26(16):7806. doi: 10.3390/ijms26167806.

Investigating Liquid-Liquid Phase Separation in Lung Adenocarcinoma to Improve Prognostic Accuracy and Treatment Efficacy.

J Cell Mol Med. 2025 Aug;29(16):e70807. doi: 10.1111/jcmm.70807.

A single-cell, spatial transcriptomic atlas of the Arabidopsis life cycle.

Nat Plants. 2025 Aug 19. doi: 10.1038/s41477-025-02072-z.

Model-based dimensionality reduction for single-cell RNA-seq using generalized bilinear models.

Biostatistics. 2024 Dec 31;26(1). doi: 10.1093/biostatistics/kxaf024.

Machine learning-driven multi-omics analysis identifies a prognostic gene signature associated with programmed cell death and metabolism in hepatocellular carcinoma.

Biol Proced Online. 2025 Aug 9;27(1):29. doi: 10.1186/s12575-025-00286-1.

A comparative study of manifold learning methods for scRNA-seq with a trajectory-aware metric.

Sci Rep. 2025 Aug 7;15(1):28923. doi: 10.1038/s41598-025-14301-8.

In toto analysis of embryonic organisation reduces tissue diversity to two archetypes requiring specific cadherins.

Nat Commun. 2025 Jul 25;16(1):6872. doi: 10.1038/s41467-025-62127-9.

scGGC: a two-stage strategy for single-cell clustering through cellular gene pathway construction.

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf368.

本文引用的文献

Heavy-tailed kernels reveal a finer cluster structure in t-SNE visualisations.

Mach Learn Knowl Discov Databases. 2020;11906:124-139. doi: 10.1007/978-3-030-46150-8_8. Epub 2020 Apr 30.

Clustering with t-SNE, provably.

SIAM J Math Data Sci. 2019;1(2):313-332. doi: 10.1137/18m1216134. Epub 2019 May 28.

Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model.

Genome Biol. 2019 Dec 23;20(1):295. doi: 10.1186/s13059-019-1861-6.

Automated optimized parameters for T-distributed stochastic neighbor embedding improve visualization and analysis of large datasets.

Nat Commun. 2019 Nov 28;10(1):5415. doi: 10.1038/s41467-019-13055-y.

PAGA: graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells.

Genome Biol. 2019 Mar 19;20(1):59. doi: 10.1186/s13059-019-1663-x.

The single-cell transcriptional landscape of mammalian organogenesis.

Nature. 2019 Feb;566(7745):496-502. doi: 10.1038/s41586-019-0969-x. Epub 2019 Feb 20.

Fast interpolation-based t-SNE for improved visualization of single-cell RNA-seq data.

Nat Methods. 2019 Mar;16(3):243-245. doi: 10.1038/s41592-018-0308-4. Epub 2019 Feb 11.

M3Drop: dropout-based feature selection for scRNASeq.

Bioinformatics. 2019 Aug 15;35(16):2865-2867. doi: 10.1093/bioinformatics/bty1044.

Dimensionality reduction for visualizing single-cell data using UMAP.

Nat Biotechnol. 2018 Dec 3. doi: 10.1038/nbt.4314.

Shared and distinct transcriptomic cell types across neocortical areas.

Nature. 2018 Nov;563(7729):72-78. doi: 10.1038/s41586-018-0654-5. Epub 2018 Oct 31.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用 t-SNE 进行单细胞转录组学分析的艺术。

The art of using t-SNE for single-cell transcriptomics.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献