Petabase 规模的序列比对促进病毒发现。

Petabase-scale sequence alignment catalyses viral discovery.

机构信息

Independent researcher, Corte Madera, CA, USA.

Independent researcher, Vancouver, British Columbia, Canada.

出版信息

Nature. 2022 Feb;602(7895):142-147. doi: 10.1038/s41586-021-04332-2. Epub 2022 Jan 26.

DOI:10.1038/s41586-021-04332-2

PMID:35082445

Abstract

Public databases contain a planetary collection of nucleic acid sequences, but their systematic exploration has been inhibited by a lack of efficient methods for searching this corpus, which (at the time of writing) exceeds 20 petabases and is growing exponentially. Here we developed a cloud computing infrastructure, Serratus, to enable ultra-high-throughput sequence alignment at the petabase scale. We searched 5.7 million biologically diverse samples (10.2 petabases) for the hallmark gene RNA-dependent RNA polymerase and identified well over 10 novel RNA viruses, thereby expanding the number of known species by roughly an order of magnitude. We characterized novel viruses related to coronaviruses, hepatitis delta virus and huge phages, respectively, and analysed their environmental reservoirs. To catalyse the ongoing revolution of viral discovery, we established a free and comprehensive database of these data and tools. Expanding the known sequence diversity of viruses can reveal the evolutionary origins of emerging pathogens and improve pathogen surveillance for the anticipation and mitigation of future pandemics.

摘要

公共数据库包含了大量的核酸序列，但由于缺乏有效的方法来搜索这个超过 20 千万亿字节且呈指数级增长的数据集，因此对其进行系统探索受到了抑制。在这里，我们开发了一种云计算基础设施 Serratus，以实现兆兆字节规模的超高通量序列比对。我们在 570 万个具有生物多样性的样本（1020 千万亿字节）中搜索了标志性基因 RNA 依赖性 RNA 聚合酶，并鉴定出了超过 10 种新的 RNA 病毒，从而将已知病毒的数量扩大了近一个数量级。我们分别对与冠状病毒、丁型肝炎病毒和巨型噬菌体相关的新型病毒进行了特征描述，并分析了它们的环境宿主。为了推动病毒发现的持续革命，我们建立了一个免费的、全面的此类数据和工具数据库。扩大病毒的已知序列多样性可以揭示新兴病原体的进化起源，并改善病原体监测，以预测和减轻未来的大流行。

相似文献

Petabase-scale sequence alignment catalyses viral discovery.

Nature. 2022 Feb;602(7895):142-147. doi: 10.1038/s41586-021-04332-2. Epub 2022 Jan 26.

A structural and primary sequence comparison of the viral RNA-dependent RNA polymerases.

Nucleic Acids Res. 2003 Apr 1;31(7):1821-9. doi: 10.1093/nar/gkg277.

Unmapped RNA Virus Diversity in Termites and their Symbionts.

Viruses. 2020 Oct 9;12(10):1145. doi: 10.3390/v12101145.

Expansion of the global RNA virome reveals diverse clades of bacteriophages.

Cell. 2022 Oct 13;185(21):4023-4037.e18. doi: 10.1016/j.cell.2022.08.023. Epub 2022 Sep 28.

Evolution of tertiary structure of viral RNA dependent polymerases.

PLoS One. 2014 May 9;9(5):e96070. doi: 10.1371/journal.pone.0096070. eCollection 2014.

RNA Viromics of Southern California Wastewater and Detection of SARS-CoV-2 Single-Nucleotide Variants.

Appl Environ Microbiol. 2021 Nov 10;87(23):e0144821. doi: 10.1128/AEM.01448-21. Epub 2021 Sep 22.

Ever-increasing viral diversity associated with the red imported fire ant Solenopsis invicta (Formicidae: Hymenoptera).

Virol J. 2021 Jan 6;18(1):5. doi: 10.1186/s12985-020-01469-w.

Structure Unveils Relationships between RNA Virus Polymerases.

Viruses. 2021 Feb 17;13(2):313. doi: 10.3390/v13020313.

Depicting the RNA Virome of Hematophagous Arthropods from Belgrade, Serbia.

Viruses. 2020 Sep 2;12(9):975. doi: 10.3390/v12090975.

RNA virome analysis of questing ticks from Hokuriku District, Japan, and the evolutionary dynamics of tick-borne phleboviruses.

Ticks Tick Borne Dis. 2020 Mar;11(2):101364. doi: 10.1016/j.ttbdis.2019.101364. Epub 2019 Dec 27.

引用本文的文献

Expanding the diversity of Celavirus, the most divergent genus in the family Potyviridae.

Virus Genes. 2025 Sep 11. doi: 10.1007/s11262-025-02184-w.

A prevalent huge phage clade in human and animal gut microbiomes.

Res Sq. 2025 Aug 19:rs.3.rs-7356405. doi: 10.21203/rs.3.rs-7356405/v1.

A prevalent huge phage clade in human and animal gut microbiomes.

bioRxiv. 2025 Aug 11:2025.08.10.669567. doi: 10.1101/2025.08.10.669567.

Population-scale sequencing resolves correlates and determinants of latent Epstein-Barr Virus infection.

bioRxiv. 2025 Jul 18:2025.07.18.665549. doi: 10.1101/2025.07.18.665549.

Blood virome profiling reveals subtype-specific viral signatures and reduced diversity in non-Hodgkin lymphoma.

Virulence. 2025 Dec;16(1):2542457. doi: 10.1080/21505594.2025.2542457. Epub 2025 Aug 17.

Insights into diversity, host range, and evolution of iflaviruses in Lepidoptera through transcriptome mining.

Virus Evol. 2025 Jul 7;11(1):veaf051. doi: 10.1093/ve/veaf051. eCollection 2025.

An Update on RNA Virus Discovery: Current Challenges and Future Perspectives.

Viruses. 2025 Jul 15;17(7):983. doi: 10.3390/v17070983.

Machine Learning and Artificial Intelligence for Infectious Disease Surveillance, Diagnosis, and Prognosis.

Viruses. 2025 Jun 23;17(7):882. doi: 10.3390/v17070882.

Derailing the host machinery to achieve replication: how viroid and viroid-like RNAs successfully copy their genomes in hostile territory.

RNA Biol. 2025 Dec;22(1):1-19. doi: 10.1080/15476286.2025.2538269. Epub 2025 Aug 26.

SegFinder: an automated tool for identifying complete RNA virus genome segments through co-occurrence in multiple sequenced samples.

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf358.

本文引用的文献

STAT: a fast, scalable, MinHash-based k-mer tool to assess Sequence Read Archive next-generation sequence submissions.

Genome Biol. 2021 Sep 20;22(1):270. doi: 10.1186/s13059-021-02490-0.

Slippery when wet: cross-species transmission of divergent coronaviruses in bony and jawless fish and the evolutionary history of the .

Virus Evol. 2021 May 31;7(2):veab050. doi: 10.1093/ve/veab050. eCollection 2021.

Sensitive protein alignments at tree-of-life scale using DIAMOND.

Nat Methods. 2021 Apr;18(4):366-368. doi: 10.1038/s41592-021-01101-x. Epub 2021 Apr 7.

Hepatitis delta virus-like circular RNAs from diverse metazoans encode conserved hammerhead ribozymes.

Virus Evol. 2021 Feb 18;7(1):veab016. doi: 10.1093/ve/veab016. eCollection 2021 Jan.

Identification of novel avian and mammalian deltaviruses provides new insights into deltavirus evolution.

Virus Evol. 2021 Feb 12;7(1):veab003. doi: 10.1093/ve/veab003. eCollection 2021 Jan.

Massive expansion of human gut bacteriophage diversity.

Cell. 2021 Feb 18;184(4):1098-1109.e9. doi: 10.1016/j.cell.2021.01.029.

VirSorter2: a multi-classifier, expert-guided approach to detect diverse DNA and RNA viruses.

Microbiome. 2021 Feb 1;9(1):37. doi: 10.1186/s40168-020-00990-y.

Diversification of mammalian deltaviruses by host shifting.

Proc Natl Acad Sci U S A. 2021 Jan 19;118(3). doi: 10.1073/pnas.2019907118.

CheckV assesses the quality and completeness of metagenome-assembled viral genomes.

Nat Biotechnol. 2021 May;39(5):578-585. doi: 10.1038/s41587-020-00774-7. Epub 2020 Dec 21.

A genomic catalog of Earth's microbiomes.

Nat Biotechnol. 2021 Apr;39(4):499-509. doi: 10.1038/s41587-020-0718-6. Epub 2020 Nov 9.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Petabase 规模的序列比对促进病毒发现。

Petabase-scale sequence alignment catalyses viral discovery.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献