Suppr超能文献

利用矩阵分解在序列数据库中高效发现频繁共现的突变。

Efficient discovery of frequently co-occurring mutations in a sequence database with matrix factorization.

作者信息

Kolar Michael Robert, Kobzarenko Valerie, Mitra Debasis

机构信息

BiC Lab, Department of Electrical Engineering and Computer Science, Florida Institute of Technology, Melbourne, Florida, United States of America.

出版信息

PLoS Comput Biol. 2025 Apr 24;21(4):e1012391. doi: 10.1371/journal.pcbi.1012391. eCollection 2025 Apr.

Abstract

We have developed a robust method for efficiently tracking multiple co-occurring mutations in a sequence database. Evolution often hinges on the interaction of several mutations to produce significant phenotypic changes that lead to the proliferation of a variant. However, identifying numerous simultaneous mutations across a vast database of sequences poses a significant computational challenge. Our approach leverages a matrix factorization technique to automatically and efficiently pinpoint subsets of positions where co-mutations occur, appearing in a substantial number of sequences within the database. We validated our method using SARS-CoV-2 receptor-binding domains, comprising approximately seven hundred thousand sequences of the Spike protein, demonstrating superior performance compared to a reasonably exhaustive brute-force method. Furthermore, we explore the biological significance of the identified co-mutational positions (CMPs) and their potential impact on the virus's evolution and functionality, identifying key mutations in Delta and Omicron variants. This analysis underscores the significant role of identified CMPs in understanding the evolutionary trajectory. By tracking the "birth" and "death" of CMPs, we can elucidate the persistence and impact of specific groups of mutations across different viral strains, providing valuable insights into the virus' adaptability and thus, possibly aiding vaccine design strategies.

摘要

我们开发了一种强大的方法,用于在序列数据库中高效追踪多个同时出现的突变。进化通常取决于多个突变之间的相互作用,以产生显著的表型变化,从而导致变体的增殖。然而,在庞大的序列数据库中识别大量同时发生的突变带来了巨大的计算挑战。我们的方法利用矩阵分解技术,自动且高效地找出共突变发生的位置子集,这些位置出现在数据库中的大量序列中。我们使用严重急性呼吸综合征冠状病毒2(SARS-CoV-2)受体结合域验证了我们的方法,该受体结合域包含约70万个刺突蛋白序列,结果表明与一种相当详尽的暴力方法相比,我们的方法具有卓越的性能。此外,我们探究了所识别的共突变位置(CMPs)的生物学意义及其对病毒进化和功能的潜在影响,确定了德尔塔和奥密克戎变体中的关键突变。这一分析强调了所识别的CMPs在理解进化轨迹方面的重要作用。通过追踪CMPs的“产生”和“消失”,我们可以阐明特定突变组在不同病毒株中的持续性和影响,为病毒的适应性提供有价值的见解,从而可能有助于疫苗设计策略。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eace/12273922/73684ba77575/pcbi.1012391.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验