European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
Nucleic Acids Res. 2020 Jan 8;48(D1):D570-D578. doi: 10.1093/nar/gkz1035.
MGnify (http://www.ebi.ac.uk/metagenomics) provides a free to use platform for the assembly, analysis and archiving of microbiome data derived from sequencing microbial populations that are present in particular environments. Over the past 2 years, MGnify (formerly EBI Metagenomics) has more than doubled the number of publicly available analysed datasets held within the resource. Recently, an updated approach to data analysis has been unveiled (version 5.0), replacing the previous single pipeline with multiple analysis pipelines that are tailored according to the input data, and that are formally described using the Common Workflow Language, enabling greater provenance, reusability, and reproducibility. MGnify's new analysis pipelines offer additional approaches for taxonomic assertions based on ribosomal internal transcribed spacer regions (ITS1/2) and expanded protein functional annotations. Biochemical pathways and systems predictions have also been added for assembled contigs. MGnify's growing focus on the assembly of metagenomic data has also seen the number of datasets it has assembled and analysed increase six-fold. The non-redundant protein database constructed from the proteins encoded by these assemblies now exceeds 1 billion sequences. Meanwhile, a newly developed contig viewer provides fine-grained visualisation of the assembled contigs and their enriched annotations.
MGnify(http://www.ebi.ac.uk/metagenomics)提供了一个免费的平台,用于组装、分析和存档源自特定环境中微生物群体的宏基因组数据。在过去的两年中,MGnify(前身为 EBI 宏基因组学)持有的公开可用分析数据集数量增加了一倍以上。最近,该平台推出了一种更新的数据分析方法(版本 5.0),用多个根据输入数据定制的分析管道取代了之前的单一管道,并且使用通用工作流语言进行正式描述,从而提高了可追溯性、可重用性和可重复性。MGnify 的新分析管道提供了基于核糖体内部转录间隔区(ITS1/2)和扩展蛋白功能注释的分类学断言的额外方法。组装的连续体还添加了生化途径和系统预测。MGnify 越来越关注宏基因组数据的组装,其组装和分析的数据集数量也增加了六倍。从这些组装中编码的蛋白质构建的非冗余蛋白质数据库现在超过 10 亿个序列。同时,新开发的连续体查看器提供了组装连续体及其丰富注释的细粒度可视化。