Suppr超能文献

人类“污染组”:1000 个家庭的全基因组序列中的细菌、病毒和计算污染。

The human "contaminome": bacterial, viral, and computational contamination in whole genome sequences from 1000 families.

机构信息

Department of Bioengineering, Stanford University, Stanford, USA.

Department of Biomedical Data Science, Stanford University, Stanford, USA.

出版信息

Sci Rep. 2022 Jun 14;12(1):9863. doi: 10.1038/s41598-022-13269-z.

Abstract

The unmapped readspace of whole genome sequencing data tends to be large but is often ignored. We posit that it contains valuable signals of both human infection and contamination. Using unmapped and poorly aligned reads from whole genome sequences (WGS) of over 1000 families and nearly 5000 individuals, we present insights into common viral, bacterial, and computational contamination that plague whole genome sequencing studies. We present several notable results: (1) In addition to known contaminants such as Epstein-Barr virus and phiX, sequences from whole blood and lymphocyte cell lines contain many other contaminants, likely originating from storage, prep, and sequencing pipelines. (2) Sequencing plate and biological sample source of a sample strongly influence contamination profile. And, (3) Y-chromosome fragments not on the human reference genome commonly mismap to bacterial reference genomes. Both experiment-derived and computational contamination is prominent in next-generation sequencing data. Such contamination can compromise results from WGS as well as metagenomics studies, and standard protocols for identifying and removing contamination should be developed to ensure the fidelity of sequencing-based studies.

摘要

全基因组测序数据中的未映射读段通常很大,但往往被忽略。我们假设它包含了人类感染和污染的有价值的信号。我们使用了来自 1000 多个家庭和近 5000 个人的全基因组序列(WGS)的未映射和未对齐的读段,深入了解了困扰全基因组测序研究的常见病毒、细菌和计算污染问题。我们提出了一些值得注意的结果:(1)除了已知的污染物,如 EBV 和 phiX 外,来自全血和淋巴细胞系的序列还包含许多其他污染物,可能来自于储存、准备和测序过程。(2)样本的测序板和生物样本来源强烈影响污染谱。并且,(3)Y 染色体片段不在人类参考基因组上,通常与细菌参考基因组错配。实验和计算产生的污染在下一代测序数据中都很突出。这种污染可能会影响 WGS 以及宏基因组研究的结果,因此应该制定识别和去除污染的标准协议,以确保基于测序的研究的准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6673/9198055/b1c69e58bfd4/41598_2022_13269_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验