Suppr超能文献

使用 t-SNE 进行单细胞转录组学分析的艺术。

The art of using t-SNE for single-cell transcriptomics.

机构信息

Institute for Ophthalmic Research, University of Tübingen, Tübingen, Germany.

Bernstein Center for Computational Neuroscience, University of Tübingen, Tübingen, Germany.

出版信息

Nat Commun. 2019 Nov 28;10(1):5416. doi: 10.1038/s41467-019-13056-x.

Abstract

Single-cell transcriptomics yields ever growing data sets containing RNA expression levels for thousands of genes from up to millions of cells. Common data analysis pipelines include a dimensionality reduction step for visualising the data in two dimensions, most frequently performed using t-distributed stochastic neighbour embedding (t-SNE). It excels at revealing local structure in high-dimensional data, but naive applications often suffer from severe shortcomings, e.g. the global structure of the data is not represented accurately. Here we describe how to circumvent such pitfalls, and develop a protocol for creating more faithful t-SNE visualisations. It includes PCA initialisation, a high learning rate, and multi-scale similarity kernels; for very large data sets, we additionally use exaggeration and downsampling-based initialisation. We use published single-cell RNA-seq data sets to demonstrate that this protocol yields superior results compared to the naive application of t-SNE.

摘要

单细胞转录组学产生了越来越多的数据,其中包含了多达数百万个细胞的数千个基因的 RNA 表达水平。常见的数据分析管道包括降维步骤,用于在二维空间中可视化数据,最常用的方法是使用 t 分布随机邻域嵌入 (t-SNE)。它擅长揭示高维数据中的局部结构,但在直观应用中往往存在严重的缺陷,例如数据的全局结构不能被准确地表示。在这里,我们描述了如何规避这些陷阱,并开发了一种创建更真实 t-SNE 可视化的方案。它包括 PCA 初始化、高学习率和多尺度相似性核;对于非常大的数据集,我们还使用夸张和基于下采样的初始化。我们使用已发表的单细胞 RNA-seq 数据集来证明与 t-SNE 的直观应用相比,该方案产生了更好的结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f21f/6882829/928a6337a14f/41467_2019_13056_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验