Suppr超能文献

穷人的 BLASTX——使用 PAUDA 进行高通量宏基因组蛋白质数据库搜索。

A poor man's BLASTX--high-throughput metagenomic protein database search using PAUDA.

机构信息

Singapore Centre on Environmental Life Sciences Engineering, School of Biological Sciences, Nanyang Technological University, Singapore 637551, Center for Bioinformatics, University of Tübingen, 72076 Tübingen, Germany and Life Sciences Institute, National University of Singapore, Singapore 117456.

出版信息

Bioinformatics. 2014 Jan 1;30(1):38-9. doi: 10.1093/bioinformatics/btt254. Epub 2013 May 7.

Abstract

SUMMARY

In the context of metagenomics, we introduce a new approach to protein database search called PAUDA, which runs ~10,000 times faster than BLASTX, while achieving about one-third of the assignment rate of reads to KEGG orthology groups, and producing gene and taxon abundance profiles that are highly correlated to those obtained with BLASTX. PAUDA requires <80 CPU hours to analyze a dataset of 246 million Illumina DNA reads from permafrost soil for which a previous BLASTX analysis (on a subset of 176 million reads) reportedly required 800,000 CPU hours, leading to the same clustering of samples by functional profiles.

AVAILABILITY

PAUDA is freely available from: http://ab.inf.uni-tuebingen.de/software/pauda. Also supplementary method details are available from this website.

摘要

摘要

在宏基因组学背景下,我们提出了一种新的蛋白质数据库搜索方法,称为 PAUDA,它的运行速度比 BLASTX 快约 10000 倍,而将reads 分配到 KEGG 直系同源群的比例约为其三分之一,并生成与 BLASTX 获得的高度相关的基因和分类群丰度谱。PAUDA 分析 24600 万条来自永久冻土土壤的 Illumina DNA reads 的数据集仅需 <80 CPU 小时,而之前的 BLASTX 分析(在 1.76 亿条reads 的一个子集上)则需要 800000 CPU 小时,从而导致功能谱对样本进行相同的聚类。

可用性

PAUDA 可从以下网址免费获得:http://ab.inf.uni-tuebingen.de/software/pauda。此外,该网站还提供了补充方法的详细信息。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6453/3866550/8014c58ac5b1/btt254f1ap.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验