Department of Pathology and Laboratory Medicine, Boston University School of Medicine, Boston, MA, 02118, USA.
Flow Cytometry Core Facility, Boston University School of Medicine, Boston, MA, 02118, USA.
Nat Commun. 2019 Nov 28;10(1):5415. doi: 10.1038/s41467-019-13055-y.
Accurate and comprehensive extraction of information from high-dimensional single cell datasets necessitates faithful visualizations to assess biological populations. A state-of-the-art algorithm for non-linear dimension reduction, t-SNE, requires multiple heuristics and fails to produce clear representations of datasets when millions of cells are projected. We develop opt-SNE, an automated toolkit for t-SNE parameter selection that utilizes Kullback-Leibler divergence evaluation in real time to tailor the early exaggeration and overall number of gradient descent iterations in a dataset-specific manner. The precise calibration of early exaggeration together with opt-SNE adjustment of gradient descent learning rate dramatically improves computation time and enables high-quality visualization of large cytometry and transcriptomics datasets, overcoming limitations of analysis tools with hard-coded parameters that often produce poorly resolved or misleading maps of fluorescent and mass cytometry data. In summary, opt-SNE enables superior data resolution in t-SNE space and thereby more accurate data interpretation.
准确而全面地从高维单细胞数据集提取信息需要忠实的可视化来评估生物群体。一种用于非线性降维的最先进算法,t-SNE,需要多个启发式方法,并且当数以百万计的细胞被投影时,无法清晰地表示数据集。我们开发了 opt-SNE,这是一种用于 t-SNE 参数选择的自动化工具包,它利用实时的 Kullback-Leibler 散度评估,以数据集特定的方式调整早期夸张和整个梯度下降迭代次数。早期夸张的精确校准以及 opt-SNE 对梯度下降学习率的调整极大地缩短了计算时间,并能够对大型细胞仪和转录组学数据集进行高质量的可视化,克服了具有硬编码参数的分析工具的局限性,这些工具通常会产生荧光和质量细胞仪数据的分辨率差或误导性的图谱。总之,opt-SNE 能够在 t-SNE 空间中实现更高的数据分辨率,从而更准确地解释数据。