Suppr超能文献

利用深度学习预测大麦基因库种质地理起源对基因组-环境关联研究的影响

Effects of using deep learning to predict the geographic origin of barley genebank accessions on genome-environment association studies.

作者信息

Chang Che-Wei, Schmid Karl

机构信息

University of Hohenheim, Stuttgart, Germany.

出版信息

Theor Appl Genet. 2025 Aug 12;138(9):211. doi: 10.1007/s00122-025-05003-w.

Abstract

Genome-environment association (GEA) is an approach for identifying adaptive loci by combining genetic variation with environmental parameters, offering potential for improving crop resilience. However, its application to genebank accessions is limited by missing geographic origin data. To address this limitation, we explored the use of neural networks to predict the geographic origins of barley accessions and integrate imputed environmental data into GEA. Neural networks demonstrated high accuracy in cross-validation but occasionally produced ecologically implausible predictions as models solely considered geographical proximity. For example, some predicted origins were located within non-arable regions, such as the Mediterranean Sea. Using barley flowering time genes as benchmarks, GEA integrating imputed environmental data ( ) displayed partially concordant yet complementary detection of genomic regions near flowering time genes compared to regular GEA ( ), highlighting the potential of GEA with imputed data to complement regular GEA in uncovering novel adaptive loci. Also, contrary to our initial hypothesis anticipating a significant improvement in GEA performance by increasing sample size, our simulations yield unexpected insights. Our study suggests potential limitations in the sensitivity of GEA approaches to the considerable expansion in sample size achieved through predicting missing geographical data. Overall, our study provides insights into leveraging incomplete geographical origin data by integrating deep learning with GEA. Our findings indicate the need for further development of GEA approaches to optimize the use of imputed environmental data, such as incorporating regional GEA patterns instead of solely focusing on global associations between allele frequencies and environmental gradients across large-scale landscapes.

摘要

基因组-环境关联分析(GEA)是一种通过将遗传变异与环境参数相结合来识别适应性位点的方法,为提高作物抗逆性提供了潜力。然而,其在基因库种质中的应用受到地理起源数据缺失的限制。为了解决这一限制,我们探索了使用神经网络来预测大麦种质的地理起源,并将估算的环境数据整合到GEA中。神经网络在交叉验证中表现出高精度,但偶尔会产生生态上不合理的预测,因为模型仅考虑了地理距离。例如,一些预测的起源地位于非耕地地区,如地中海。以大麦开花时间基因作为基准,与常规GEA相比,整合估算环境数据的GEA( )在开花时间基因附近的基因组区域检测上显示出部分一致但互补的结果,突出了使用估算数据的GEA在揭示新的适应性位点方面补充常规GEA的潜力。此外,与我们最初预期通过增加样本量显著提高GEA性能的假设相反,我们的模拟得出了意想不到的见解。我们的研究表明,GEA方法对通过预测缺失地理数据实现的样本量大幅增加的敏感性可能存在潜在限制。总体而言,我们的研究为通过将深度学习与GEA相结合来利用不完整的地理起源数据提供了见解。我们的研究结果表明,需要进一步开发GEA方法,以优化估算环境数据的使用,例如纳入区域GEA模式,而不是仅仅关注大规模景观中等位基因频率与环境梯度之间的全局关联。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验