Rice University, Department of Computer Science, Houston, TX, 77005, USA.
Department of Obstetrics and Gynecology, Division of Maternal-Fetal Medicine, Baylor College of Medicine and Texas Children's Hospital, Houston, TX, 77030, USA.
Nat Commun. 2022 Nov 10;13(1):6799. doi: 10.1038/s41467-022-34409-z.
Computational analysis of host-associated microbiomes has opened the door to numerous discoveries relevant to human health and disease. However, contaminant sequences in metagenomic samples can potentially impact the interpretation of findings reported in microbiome studies, especially in low-biomass environments. Contamination from DNA extraction kits or sampling lab environments leaves taxonomic "bread crumbs" across multiple distinct sample types. Here we describe Squeegee, a de novo contamination detection tool that is based upon this principle, allowing the detection of microbial contaminants when negative controls are unavailable. On the low-biomass samples, we compare Squeegee predictions to experimental negative control data and show that Squeegee accurately recovers putative contaminants. We analyze samples of varying biomass from the Human Microbiome Project and identify likely, previously unreported kit contamination. Collectively, our results highlight that Squeegee can identify microbial contaminants with high precision and thus represents a computational approach for contaminant detection when negative controls are unavailable.
对宿主相关微生物组的计算分析为人类健康和疾病的相关发现开辟了道路。然而,宏基因组样本中的污染序列可能会影响微生物组研究报告的结果的解释,特别是在生物量低的环境中。DNA 提取试剂盒或采样实验室环境的污染会在多个不同的样本类型中留下分类“面包屑”。在这里,我们描述了 Squeegee,这是一种基于这一原理的新的污染检测工具,当没有阴性对照时,它允许检测微生物污染物。在低生物量样本上,我们将 Squeegee 的预测与实验阴性对照数据进行比较,并表明 Squeegee 可以准确地恢复可疑的污染物。我们分析了来自人类微生物组计划的不同生物量样本,并确定了可能以前未报告的试剂盒污染。总的来说,我们的结果表明,Squeegee 可以高精度地识别微生物污染物,因此代表了一种在没有阴性对照时进行污染物检测的计算方法。