Suppr超能文献

计算方法在预测非编码变异的功能和致病性方面的性能比较。

Performance Comparison of Computational Methods for the Prediction of the Function and Pathogenicity of Non-coding Variants.

机构信息

National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China.

National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China; Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China.

出版信息

Genomics Proteomics Bioinformatics. 2023 Jun;21(3):649-661. doi: 10.1016/j.gpb.2022.02.002. Epub 2022 Mar 8.

Abstract

Non-coding variants in the human genome significantly influence human traits and complex diseases via their regulation and modification effects. Hence, an increasing number of computational methods are developed to predict the effects of variants in human non-coding sequences. However, it is difficult for inexperienced users to select appropriate computational methods from dozens of available methods. To solve this issue, we assessed 12 performance metrics of 24 methods on four independent non-coding variant benchmark datasets: (1) rare germline variants from clinical relevant sequence variants (ClinVar), (2) rare somatic variants from Catalogue Of Somatic Mutations In Cancer (COSMIC), (3) common regulatory variants from curated expression quantitative trait locus (eQTL) data, and (4) disease-associated common variants from curated genome-wide association studies (GWAS). All 24 tested methods performed differently under various conditions, indicating varying strengths and weaknesses under different scenarios. Importantly, the performance of existing methods was acceptable for rare germline variants from ClinVar with the area under the receiver operating characteristic curve (AUROC) of 0.4481-0.8033 and poor for rare somatic variants from COSMIC (AUROC = 0.4984-0.7131), common regulatory variants from curated eQTL data (AUROC = 0.4837-0.6472), and disease-associated common variants from curated GWAS (AUROC = 0.4766-0.5188). We also compared the prediction performance of 24 methods for non-coding de novo mutations in autism spectrum disorder, and found that the combined annotation-dependent depletion (CADD) and context-dependent tolerance score (CDTS) methods showed better performance. Summarily, we assessed the performance of 24 computational methods under diverse scenarios, providing preliminary advice for proper tool selection and guiding the development of new techniques in interpreting non-coding variants.

摘要

人类基因组中的非编码变异通过其调控和修饰作用显著影响人类特征和复杂疾病。因此,越来越多的计算方法被开发出来,以预测人类非编码序列变异的影响。然而,对于没有经验的用户来说,从数十种可用方法中选择合适的计算方法是很困难的。为了解决这个问题,我们在四个独立的非编码变异基准数据集上评估了 24 种方法的 12 种性能指标:(1)来自临床相关序列变异(ClinVar)的罕见种系变异;(2)来自癌症目录体细胞突变(COSMIC)的罕见体细胞变异;(3)来自精心策划的表达数量性状基因座(eQTL)数据的常见调控变异;(4)来自精心策划的全基因组关联研究(GWAS)的疾病相关常见变异。所有 24 种测试方法在不同条件下表现不同,这表明在不同情况下具有不同的优势和劣势。重要的是,现有方法对来自 ClinVar 的罕见种系变异的性能是可以接受的,其接收者操作特征曲线下的面积(AUROC)为 0.4481-0.8033,而对来自 COSMIC 的罕见体细胞变异(AUROC = 0.4984-0.7131)、来自精心策划的 eQTL 数据的常见调控变异(AUROC = 0.4837-0.6472)和来自精心策划的 GWAS 的疾病相关常见变异(AUROC = 0.4766-0.5188)的性能较差。我们还比较了 24 种方法对自闭症谱系障碍中非编码新生突变的预测性能,发现联合注释依赖耗尽(CADD)和上下文相关耐受评分(CDTS)方法表现更好。总之,我们在不同的场景下评估了 24 种计算方法的性能,为正确选择工具提供了初步建议,并指导了非编码变异解释新技术的发展。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8a9b/10787016/046d8573eaaa/gr1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验