Department of Microbiology and Cell Science, University of Florida, Gainesville, FL 32611-0700, USA.
ISME J. 2010 Jul;4(7):852-61. doi: 10.1038/ismej.2010.16. Epub 2010 Feb 25.
High-throughput DNA sequencing can identify organisms and describe population structures in many environmental and clinical samples. Current technologies generate millions of reads in a single run, requiring extensive computational strategies to organize, analyze and interpret those sequences. A series of bioinformatics tools for high-throughput sequencing analysis, including pre-processing, clustering, database matching and classification, have been compiled into a pipeline called PANGEA. The PANGEA pipeline was written in Perl and can be run on Mac OSX, Windows or Linux. With PANGEA, sequences obtained directly from the sequencer can be processed quickly to provide the files needed for sequence identification by BLAST and for comparison of microbial communities. Two different sets of bacterial 16S rRNA sequences were used to show the efficiency of this workflow. The first set of 16S rRNA sequences is derived from various soils from Hawaii Volcanoes National Park. The second set is derived from stool samples collected from diabetes-resistant and diabetes-prone rats. The workflow described here allows the investigator to quickly assess libraries of sequences on personal computers with customized databases. PANGEA is provided for users as individual scripts for each step in the process or as a single script where all processes, except the chi(2) step, are joined into one program called the 'backbone'.
高通量 DNA 测序可以识别许多环境和临床样本中的生物,并描述其种群结构。目前的技术在单次运行中可以产生数百万个读数,因此需要广泛的计算策略来组织、分析和解释这些序列。一系列用于高通量测序分析的生物信息学工具,包括预处理、聚类、数据库匹配和分类,已被编译成一个名为 PANGEA 的管道。PANGEA 管道是用 Perl 编写的,可以在 Mac OSX、Windows 或 Linux 上运行。使用 PANGEA,可以快速处理直接从测序仪获得的序列,提供 BLAST 所需的文件用于序列识别,并比较微生物群落。使用两组不同的细菌 16S rRNA 序列来展示该工作流程的效率。第一组 16S rRNA 序列来自夏威夷火山国家公园的各种土壤。第二组序列来自从糖尿病抗性和糖尿病易感大鼠收集的粪便样本。这里描述的工作流程允许研究人员在具有自定义数据库的个人计算机上快速评估序列库。PANGEA 作为单个脚本提供给用户,用于处理过程中的每个步骤,或者作为一个单独的脚本,其中除了 chi(2)步骤之外的所有步骤都被合并到一个名为“骨干”的程序中。