Zhai Peng, Yang Longshu, Guo Xiao, Wang Zhe, Guo Jiangtao, Wang Xiaoqi, Zhu Huaiqiu
State Key Laboratory for Turbulence and Complex Systems, Department of Biomedical Engineering, College of Engineering, Peking University, Beijing, 100871, China.
Center for Quantitative Biology, Peking University, Beijing, 100871, China.
BMC Bioinformatics. 2017 Oct 2;18(1):434. doi: 10.1186/s12859-017-1849-8.
During the past decade, the development of high throughput nucleic sequencing and mass spectrometry analysis techniques have enabled the characterization of microbial communities through metagenomics, metatranscriptomics, metaproteomics and metabolomics data. To reveal the diversity of microbial communities and interactions between living conditions and microbes, it is necessary to introduce comparative analysis based upon integration of all four types of data mentioned above. Comparative meta-omics, especially comparative metageomics, has been established as a routine process to highlight the significant differences in taxon composition and functional gene abundance among microbiota samples. Meanwhile, biologists are increasingly concerning about the correlations between meta-omics features and environmental factors, which may further decipher the adaptation strategy of a microbial community.
We developed a graphical comprehensive analysis software named MetaComp comprising a series of statistical analysis approaches with visualized results for metagenomics and other meta-omics data comparison. This software is capable to read files generated by a variety of upstream programs. After data loading, analyses such as multivariate statistics, hypothesis testing of two-sample, multi-sample as well as two-group sample and a novel function-regression analysis of environmental factors are offered. Here, regression analysis regards meta-omic features as independent variable and environmental factors as dependent variables. Moreover, MetaComp is capable to automatically choose an appropriate two-group sample test based upon the traits of input abundance profiles. We further evaluate the performance of its choice, and exhibit applications for metagenomics, metaproteomics and metabolomics samples.
MetaComp, an integrative software capable for applying to all meta-omics data, originally distills the influence of living environment on microbial community by regression analysis. Moreover, since the automatically chosen two-group sample test is verified to be outperformed, MetaComp is friendly to users without adequate statistical training. These improvements are aiming to overcome the new challenges under big data era for all meta-omics data. MetaComp is available at: http://cqb.pku.edu.cn/ZhuLab/MetaComp/ and https://github.com/pzhaipku/MetaComp/ .
在过去十年中,高通量核酸测序和质谱分析技术的发展使得通过宏基因组学、宏转录组学、宏蛋白质组学和代谢组学数据来表征微生物群落成为可能。为了揭示微生物群落的多样性以及生活条件与微生物之间的相互作用,有必要引入基于上述四类数据整合的比较分析。比较宏组学,尤其是比较宏基因组学,已成为突出微生物群样本中分类群组成和功能基因丰度显著差异的常规方法。同时,生物学家越来越关注宏组学特征与环境因素之间的相关性,这可能进一步解读微生物群落的适应策略。
我们开发了一个名为MetaComp的图形化综合分析软件,它包含一系列统计分析方法,对宏基因组学和其他宏组学数据比较的结果进行可视化。该软件能够读取由各种上游程序生成的文件。数据加载后,提供多元统计分析、双样本、多样本以及两组样本的假设检验,还有一个新的环境因素功能回归分析。这里,回归分析将宏组学特征视为自变量,环境因素视为因变量。此外,MetaComp能够根据输入丰度谱的特征自动选择合适的两组样本检验。我们进一步评估了其选择的性能,并展示了在宏基因组学、宏蛋白质组学和代谢组学样本中的应用。
MetaComp是一款适用于所有宏组学数据的综合软件,最初通过回归分析提炼生活环境对微生物群落的影响。此外,由于自动选择的两组样本检验被证实表现更优,MetaComp对没有足够统计训练的用户很友好。这些改进旨在克服大数据时代所有宏组学数据面临的新挑战。可在以下网址获取MetaComp:http://cqb.pku.edu.cn/ZhuLab/MetaComp/ 以及https://github.com/pzhaipku/MetaComp/ 。