Thang Mike W C, Chua Xin-Yi, Price Gareth, Gorse Dominique, Field Matt A
Institute for Molecular Bioscience, University of Queensland, Brisbane, Queensland, 4000, Australia.
Queensland Facility for Advanced Bioinformatics, University of Queensland, Brisbane, Queensland, 4000, Australia.
F1000Res. 2019 May 23;8:726. doi: 10.12688/f1000research.18866.2. eCollection 2019.
Metagenomic sequencing is an increasingly common tool in environmental and biomedical sciences. While software for detailing the composition of microbial communities using 16S rRNA marker genes is relatively mature, increasingly researchers are interested in identifying changes exhibited within microbial communities under differing environmental conditions. In order to gain maximum value from metagenomic sequence data we must improve the existing analysis environment by providing accessible and scalable computational workflows able to generate reproducible results. Here we describe a complete end-to-end open-source metagenomics workflow running within Galaxy for 16S differential abundance analysis. The workflow accepts 454 or Illumina sequence data (either overlapping or non-overlapping paired end reads) and outputs lists of the operational taxonomic unit (OTUs) exhibiting the greatest change under differing conditions. A range of analysis steps and graphing options are available giving users a high-level of control over their data and analyses. Additionally, users are able to input complex sample-specific metadata information which can be incorporated into differential analysis and used for grouping / colouring within graphs. Detailed tutorials containing sample data and existing workflows are available for three different input types: overlapping and non-overlapping read pairs as well as for pre-generated Biological Observation Matrix (BIOM) files. Using the Galaxy platform we developed MetaDEGalaxy, a complete metagenomics differential abundance analysis workflow. MetaDEGalaxy is designed for bench scientists working with 16S data who are interested in comparative metagenomics. MetaDEGalaxy builds on momentum within the wider Galaxy metagenomics community with the hope that more tools will be added as existing methods mature.
宏基因组测序在环境科学和生物医学领域正日益成为一种常用工具。虽然利用16S rRNA标记基因详细分析微生物群落组成的软件相对成熟,但越来越多的研究人员对识别不同环境条件下微生物群落所呈现的变化感兴趣。为了从宏基因组序列数据中获取最大价值,我们必须通过提供可访问且可扩展的计算工作流程来改进现有的分析环境,这些工作流程要能够生成可重复的结果。在此,我们描述了一个在Galaxy中运行的完整的端到端开源宏基因组学工作流程,用于16S差异丰度分析。该工作流程接受454或Illumina序列数据(重叠或非重叠双端 reads),并输出在不同条件下变化最大的操作分类单元(OTU)列表。提供了一系列分析步骤和绘图选项,让用户能够高度控制其数据和分析。此外,用户能够输入复杂的特定样本元数据信息,这些信息可纳入差异分析,并用于绘图中的分组/着色。针对三种不同的输入类型(重叠和非重叠读对以及预生成的生物观测矩阵(BIOM)文件),提供了包含样本数据和现有工作流程的详细教程。利用Galaxy平台,我们开发了MetaDEGalaxy,这是一个完整的宏基因组学差异丰度分析工作流程。MetaDEGalaxy是为处理16S数据且对比较宏基因组学感兴趣的实验台科学家设计的。MetaDEGalaxy借助更广泛的Galaxy宏基因组学社区的发展势头,希望随着现有方法的成熟能添加更多工具。