EMBL-EBI European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
School of Mathematics & Statistics, Newcastle University, Newcastle upon Tyne NE1 7RU, UK.
Nucleic Acids Res. 2018 Jan 4;46(D1):D726-D735. doi: 10.1093/nar/gkx967.
EBI metagenomics (http://www.ebi.ac.uk/metagenomics) provides a free to use platform for the analysis and archiving of sequence data derived from the microbial populations found in a particular environment. Over the past two years, EBI metagenomics has increased the number of datasets analysed 10-fold. In addition to increased throughput, the underlying analysis pipeline has been overhauled to include both new or updated tools and reference databases. Of particular note is a new workflow for taxonomic assignments that has been extended to include assignments based on both the large and small subunit RNA marker genes and to encompass all cellular micro-organisms. We also describe the addition of metagenomic assembly as a new analysis service. Our pilot studies have produced over 2400 assemblies from datasets in the public domain. From these assemblies, we have produced a searchable, non-redundant protein database of over 50 million sequences. To provide improved access to the data stored within the resource, we have developed a programmatic interface that provides access to the analysis results and associated sample metadata. Finally, we have integrated the results of a series of statistical analyses that provide estimations of diversity and sample comparisons.
EBI 宏基因组学(http://www.ebi.ac.uk/metagenomics)提供了一个免费的平台,用于分析和存档特定环境中微生物种群的序列数据。在过去的两年中,EBI 宏基因组学将分析的数据集数量增加了 10 倍。除了增加通量外,基础分析管道还进行了全面检修,包括新的或更新的工具和参考数据库。值得特别注意的是,一种新的分类分配工作流程已经扩展到包括基于大、小亚基 RNA 标记基因的分配,并涵盖所有细胞微生物。我们还介绍了将宏基因组组装作为一种新的分析服务添加进来。我们的试点研究从公共领域的数据集产生了超过 2400 个组装。从这些组装中,我们生成了一个可搜索的、非冗余的超过 5000 万个序列的蛋白质数据库。为了提供对资源中存储的数据的更好访问,我们开发了一个编程接口,提供对分析结果和相关样本元数据的访问。最后,我们整合了一系列统计分析的结果,这些分析提供了多样性和样本比较的估计。