DBH：一种基于德布鲁因图的启发式方法，用于将大规模16S rRNA序列聚类为操作分类单元。

DBH: A de Bruijn graph-based heuristic method for clustering large-scale 16S rRNA sequences into OTUs.

作者信息

Wei Ze-Gang, Zhang Shao-Wu

机构信息

Key Laboratory of Information Fusion Technology of Ministry of Education, College of Automation, Northwestern Polytechnical University, Xi'an 710072, China.

出版信息

J Theor Biol. 2017 Jul 21;425:80-87. doi: 10.1016/j.jtbi.2017.04.019. Epub 2017 Apr 26.

DOI:10.1016/j.jtbi.2017.04.019

PMID:28454900

Abstract

Recent sequencing revolution driven by high-throughput technologies has led to rapid accumulation of 16S rRNA sequences for microbial communities. Clustering short sequences into operational taxonomic units (OTUs) is an initial crucial process in analyzing metagenomic data. Although many heuristic methods have been proposed for OTU inferences with low computational complexity, they just select one sequence as the seed for each cluster and the results are sensitive to the selected sequences that represent the clusters. To address this issue, we present a de Bruijn graph-based heuristic clustering method (DBH) for clustering massive 16S rRNA sequences into OTUs by introducing a novel seed selection strategy and greedy clustering approach. Compared with existing widely used methods on several simulated and real-life metagenomic datasets, the results show that DBH has higher clustering performance and low memory usage, facilitating the overestimation of OTUs number. DBH is more effective to handle large-scale metagenomic datasets. The DBH software can be freely downloaded from https://github.com/nwpu134/DBH.git for academic users.

摘要

由高通量技术推动的近期测序革命已导致微生物群落16S rRNA序列的快速积累。将短序列聚类为操作分类单元（OTU）是宏基因组数据分析中的一个关键初始过程。尽管已经提出了许多计算复杂度较低的启发式方法用于OTU推断，但它们只是为每个聚类选择一个序列作为种子，并且结果对代表聚类的所选序列敏感。为了解决这个问题，我们提出了一种基于德布鲁因图的启发式聚类方法（DBH），通过引入一种新颖的种子选择策略和贪婪聚类方法，将大量16S rRNA序列聚类为OTU。在几个模拟和实际宏基因组数据集上与现有的广泛使用的方法相比，结果表明DBH具有更高的聚类性能和低内存使用，有助于减少OTU数量的高估。DBH在处理大规模宏基因组数据集方面更有效。学术用户可从https://github.com/nwpu134/DBH.git免费下载DBH软件。

相似文献

DBH: A de Bruijn graph-based heuristic method for clustering large-scale 16S rRNA sequences into OTUs.

J Theor Biol. 2017 Jul 21;425:80-87. doi: 10.1016/j.jtbi.2017.04.019. Epub 2017 Apr 26.

MtHc: a motif-based hierarchical method for clustering massive 16S rRNA sequences into OTUs.

Mol Biosyst. 2015 Jul;11(7):1907-13. doi: 10.1039/c5mb00089k.

DMSC: A Dynamic Multi-Seeds Method for Clustering 16S rRNA Sequences Into OTUs.

Front Microbiol. 2019 Mar 12;10:428. doi: 10.3389/fmicb.2019.00428. eCollection 2019.

DMclust, a Density-based Modularity Method for Accurate OTU Picking of 16S rRNA Sequences.

Mol Inform. 2017 Dec;36(12). doi: 10.1002/minf.201600059. Epub 2017 Jun 6.

Improved OTU-picking using long-read 16S rRNA gene amplicon sequencing and generic hierarchical clustering.

Microbiome. 2015 Oct 5;3:43. doi: 10.1186/s40168-015-0105-6.

bioOTU: An Improved Method for Simultaneous Taxonomic Assignments and Operational Taxonomic Units Clustering of 16s rRNA Gene Sequences.

J Comput Biol. 2016 Apr;23(4):229-38. doi: 10.1089/cmb.2015.0214. Epub 2016 Mar 7.

MSClust: A Multi-Seeds based Clustering algorithm for microbiome profiling using 16S rRNA sequence.

J Microbiol Methods. 2013 Sep;94(3):347-55. doi: 10.1016/j.mimet.2013.07.004. Epub 2013 Jul 28.

A De Novo Robust Clustering Approach for Amplicon-Based Sequence Data.

J Comput Biol. 2019 Jun;26(6):618-624. doi: 10.1089/cmb.2018.0170. Epub 2018 Dec 5.

Piphillin predicts metagenomic composition and dynamics from DADA2-corrected 16S rDNA sequences.

BMC Genomics. 2020 Jan 17;21(1):56. doi: 10.1186/s12864-019-6427-1.

CLUSTOM-CLOUD: In-Memory Data Grid-Based Software for Clustering 16S rRNA Sequence Data in the Cloud Environment.

PLoS One. 2016 Mar 8;11(3):e0151064. doi: 10.1371/journal.pone.0151064. eCollection 2016.

引用本文的文献

pathMap: a path-based mapping tool for long noisy reads with high sensitivity.

Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae107.

invMap: a sensitive mapping tool for long noisy reads with inversion structural variants.

Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad726.

Comparison of Methods for Picking the Operational Taxonomic Units From Amplicon Sequences.

Front Microbiol. 2021 Mar 24;12:644012. doi: 10.3389/fmicb.2021.644012. eCollection 2021.

Metagenomic data of bacterial community from different land uses at the river basin, Kelantan.

Data Brief. 2020 Sep 28;33:106351. doi: 10.1016/j.dib.2020.106351. eCollection 2020 Dec.

smsMap: mapping single molecule sequencing reads by locating the alignment starting positions.

BMC Bioinformatics. 2020 Aug 4;21(1):341. doi: 10.1186/s12859-020-03698-w.

DMSC: A Dynamic Multi-Seeds Method for Clustering 16S rRNA Sequences Into OTUs.

Front Microbiol. 2019 Mar 12;10:428. doi: 10.3389/fmicb.2019.00428. eCollection 2019.

NPBSS: a new PacBio sequencing simulator for generating the continuous long reads with an empirical model.

BMC Bioinformatics. 2018 May 22;19(1):177. doi: 10.1186/s12859-018-2208-0.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

DBH：一种基于德布鲁因图的启发式方法，用于将大规模16S rRNA序列聚类为操作分类单元。

DBH: A de Bruijn graph-based heuristic method for clustering large-scale 16S rRNA sequences into OTUs.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献