Li Qiuhui, Keskus Ayse G, Wagner Justin, Izydorczyk Michal B, Timp Winston, Sedlazeck Fritz J, Klein Alison P, Zook Justin M, Kolmogorov Mikhail, Schatz Michael C
Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA.
Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, Bethesda, Maryland 20892, USA.
Genome Res. 2025 Apr 14;35(4):599-620. doi: 10.1101/gr.280041.124.
Cancer is fundamentally a disease of the genome, characterized by extensive genomic, transcriptomic, and epigenomic alterations. Most current studies predominantly use short-read sequencing, gene panels, or microarrays to explore these alterations; however, these technologies can systematically miss or misrepresent certain types of alterations, especially structural variants, complex rearrangements, and alterations within repetitive regions. Long-read sequencing is rapidly emerging as a transformative technology for cancer research by providing a comprehensive view across the genome, transcriptome, and epigenome, including the ability to detect alterations that previous technologies have overlooked. In this Perspective, we explore the current applications of long-read sequencing for both germline and somatic cancer analysis. We provide an overview of the computational methodologies tailored to long-read data and highlight key discoveries and resources within cancer genomics that were previously inaccessible with prior technologies. We also address future opportunities and persistent challenges, including the experimental and computational requirements needed to scale to larger sample sizes, the hurdles in sequencing and analyzing complex cancer genomes, and opportunities for leveraging machine learning and artificial intelligence technologies for cancer informatics. We further discuss how the telomere-to-telomere genome and the emerging human pangenome could enhance the resolution of cancer genome analysis, potentially revolutionizing early detection and disease monitoring in patients. Finally, we outline strategies for transitioning long-read sequencing from research applications to routine clinical practice.
癌症本质上是一种基因组疾病,其特征是广泛的基因组、转录组和表观基因组改变。目前大多数研究主要使用短读长测序、基因检测板或微阵列来探索这些改变;然而,这些技术可能会系统性地遗漏或错误表征某些类型的改变,特别是结构变异、复杂重排以及重复区域内的改变。长读长测序正迅速成为癌症研究的一项变革性技术,它能提供全基因组、转录组和表观基因组的全面视图,包括检测先前技术所忽视的改变的能力。在这篇观点文章中,我们探讨了长读长测序在种系和体细胞癌症分析中的当前应用。我们概述了针对长读长数据量身定制的计算方法,并强调了癌症基因组学中以前使用现有技术无法获得的关键发现和资源。我们还讨论了未来的机遇和持续存在的挑战,包括扩大样本量所需的实验和计算要求、测序和分析复杂癌症基因组的障碍,以及利用机器学习和人工智能技术进行癌症信息学研究的机遇。我们进一步讨论了端粒到端粒基因组和新兴的人类泛基因组如何能够提高癌症基因组分析的分辨率,可能会彻底改变患者的早期检测和疾病监测。最后,我们概述了将长读长测序从研究应用转变为常规临床实践的策略。