Armstrong George, Martino Cameron, Rahman Gibraan, Gonzalez Antonio, Vázquez-Baeza Yoshiki, Mishne Gal, Knight Rob
Department of Pediatrics, School of Medicine, University of California, San Diegogrid.266100.3, California, USA.
Center for Microbiome Innovation, Jacobs School of Engineering, University of California San Diego, La Jolla, California, USA.
mSystems. 2021 Oct 26;6(5):e0069121. doi: 10.1128/mSystems.00691-21. Epub 2021 Oct 5.
Microbiome data are sparse and high dimensional, so effective visualization of these data requires dimensionality reduction. To date, the most commonly used method for dimensionality reduction in the microbiome is calculation of between-sample microbial differences (beta diversity), followed by principal-coordinate analysis (PCoA). Uniform Manifold Approximation and Projection (UMAP) is an alternative method that can reduce the dimensionality of beta diversity distance matrices. Here, we demonstrate the benefits and limitations of using UMAP for dimensionality reduction on microbiome data. Using real data, we demonstrate that UMAP can improve the representation of clusters, especially when the clusters are composed of multiple subgroups. Additionally, we show that UMAP provides improved correlation of biological variation along a gradient with a reduced number of coordinates of the resulting embedding. Finally, we provide parameter recommendations that emphasize the preservation of global geometry. We therefore conclude that UMAP should be routinely used as a complementary visualization method for microbiome beta diversity studies. UMAP provides an additional method to visualize microbiome data. The method is extensible to any beta diversity metric used with PCoA, and our results demonstrate that UMAP can indeed improve visualization quality and correspondence with biological and technical variables of interest. The software to perform this analysis is available under an open-source license and can be obtained at https://github.com/knightlab-analyses/umap-microbiome-benchmarking; additionally, we have provided a QIIME 2 plugin for UMAP at https://github.com/biocore/q2-umap.
微生物组数据稀疏且维度高,因此要有效可视化这些数据需要进行降维。迄今为止,微生物组中最常用的降维方法是计算样本间微生物差异(β多样性),然后进行主坐标分析(PCoA)。均匀流形逼近与投影(UMAP)是一种可降低β多样性距离矩阵维度的替代方法。在此,我们展示了使用UMAP对微生物组数据进行降维的优点和局限性。利用真实数据,我们证明UMAP可改善聚类的表示,尤其是当聚类由多个亚组组成时。此外,我们表明UMAP能在所得嵌入坐标数量减少的情况下,提高沿梯度的生物变异相关性。最后,我们给出强调保留全局几何结构的参数建议。因此,我们得出结论,UMAP应常规用作微生物组β多样性研究的补充可视化方法。UMAP提供了一种可视化微生物组数据的额外方法。该方法可扩展到与PCoA一起使用的任何β多样性度量,并且我们的结果表明UMAP确实可以提高可视化质量以及与感兴趣的生物学和技术变量的对应性。执行此分析的软件可在开源许可下获取,可从https://github.com/knightlab-analyses/umap-microbiome-benchmarking获得;此外,我们在https://github.com/biocore/q2-umap为UMAP提供了一个QIIME 2插件。