Nyeo Sherry S, Cumming Erin M, Burren Oliver S, Pagadala Meghana S, Gutierrez Jacob C, Ali Thahmina A, Kida Laura C, Chen Yifan, Hu Fengyuan, Hollis Benjamin, Fabre Margarete, MacArthur Stewart, Wang Quanli, Ludwig Leif S, Dey Kushal K, Petrovski Slavé, Dhindsa Ryan S, Lareau Caleb A
Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
Tri-Institutional Program in Computational Biology, Weill Cornell School of Medicine, New York, NY, USA.
bioRxiv. 2025 Jul 18:2025.07.18.665549. doi: 10.1101/2025.07.18.665549.
Epstein-Barr Virus (EBV) is an endemic herpesvirus implicated in autoimmunity, cancer, and neurological disorders. Though primary infection typically resolves with subclinical symptoms, long-term complications can arise due to immune dysregulation or viral latency, in which EBV DNA is detectable in blood for decades. Despite the ubiquity of this virus, we have an incomplete understanding of the highly variable responses to EBV that range from asymptomatic infection to a trigger for severe disease. Here, we demonstrate that existing whole genome sequencing (WGS) data contains ample non-human DNA sequences to reconstruct a molecular biomarker of latent EBV infection consistent with orthogonal phenotypes, including viral serology. Using the UK Biobank ( = 490,560) and All of Us ( = 245,394), we uncover reproducible complex trait associations that nominate latent blood-derived EBV DNA as a respiratory, autoimmune, and cardiovascular disease biomarker. Further, we evaluate the genetic determinants of persistent EBV DNA via genome-wide and exome-wide association studies, uncovering protein-altering variants from 147 genes. Single-cell and pathway-scale enrichment analyses implicate variable antigen processing and presentation as a primary genetic determinant of latent EBV persistence, with gene programs expressed in B cells and antigen-presenting cells. Using predicted viral epitope presentation affinities, we implicate genetic variation in MHC class II as a key modulator of EBV DNA persistence. Our analyses demonstrate how existing WGS data can derive novel molecular biomarkers, which may generalize to dozens of viruses comprising the blood virome.
爱泼斯坦-巴尔病毒(EBV)是一种地方性疱疹病毒,与自身免疫、癌症和神经系统疾病有关。虽然原发性感染通常以亚临床症状消退,但由于免疫失调或病毒潜伏,可能会出现长期并发症,在这种情况下,数十年内血液中都可检测到EBV DNA。尽管这种病毒无处不在,但我们对EBV的高度可变反应(从无症状感染到引发严重疾病)仍了解不足。在这里,我们证明现有的全基因组测序(WGS)数据包含大量非人类DNA序列,可重建与包括病毒血清学在内的正交表型一致的潜伏性EBV感染分子生物标志物。利用英国生物银行(n = 490,560)和“我们所有人”项目(n = 245,394)的数据,我们发现了可重复的复杂性状关联,将潜伏性血液来源的EBV DNA确定为呼吸系统、自身免疫和心血管疾病的生物标志物。此外,我们通过全基因组和外显子组关联研究评估了持续性EBV DNA的遗传决定因素,发现了147个基因的蛋白质改变变体。单细胞和通路规模的富集分析表明,可变抗原加工和呈递是潜伏性EBV持续存在的主要遗传决定因素,其基因程序在B细胞和抗原呈递细胞中表达。利用预测的病毒表位呈递亲和力,我们认为MHC II类的遗传变异是EBV DNA持续存在的关键调节因子。我们的分析表明现有WGS数据如何能够衍生出新的分子生物标志物,这些标志物可能推广到构成血液病毒组的数十种病毒。