Hu Haixiao, Rincent Renaud, Runcie Daniel E
Department of Plant Sciences, University of California Davis, Davis, CA 95616, USA.
GQE - Le Moulon Université Paris-Saclay, INRAE, CNRS, AgroParisTech, Gif-sur-Yvette 91190, France.
Genetics. 2025 Jan 8;229(1):1-41. doi: 10.1093/genetics/iyae171.
Multienvironment trials (METs) are crucial for identifying varieties that perform well across a target population of environments. However, METs are typically too small to sufficiently represent all relevant environment-types, and face challenges from changing environment-types due to climate change. Statistical methods that enable prediction of variety performance for new environments beyond the METs are needed. We recently developed MegaLMM, a statistical model that can leverage hundreds of trials to significantly improve genetic value prediction accuracy within METs. Here, we extend MegaLMM to enable genomic prediction in new environments by learning regressions of latent factor loadings on Environmental Covariates (ECs) across trials. We evaluated the extended MegaLMM using the maize Genome-To-Fields dataset, consisting of 4,402 varieties cultivated in 195 trials with 87.1% of phenotypic values missing, and demonstrated its high accuracy in genomic prediction under various breeding scenarios. Furthermore, we showcased MegaLMM's superiority over univariate GBLUP in predicting trait performance of experimental genotypes in new environments. Finally, we explored the use of higher-dimensional quantitative ECs and discussed when and how detailed environmental data can be leveraged for genomic prediction from METs. We propose that MegaLMM can be applied to plant breeding of diverse crops and different fields of genetics where large-scale linear mixed models are utilized.
多环境试验(METs)对于识别在目标环境群体中表现良好的品种至关重要。然而,METs通常规模太小,无法充分代表所有相关的环境类型,并且面临着由于气候变化导致环境类型变化的挑战。因此,需要能够预测METs之外新环境中品种表现的统计方法。我们最近开发了MegaLMM,这是一种统计模型,它可以利用数百次试验来显著提高METs内遗传价值预测的准确性。在这里,我们扩展了MegaLMM,通过学习跨试验的环境协变量(ECs)上潜在因子负荷的回归,来实现新环境中的基因组预测。我们使用玉米基因组到田间数据集评估了扩展后的MegaLMM,该数据集由在195次试验中种植的4402个品种组成,其中87.1%的表型值缺失,并证明了其在各种育种场景下基因组预测的高精度。此外,我们展示了MegaLMM在预测新环境中实验基因型的性状表现方面优于单变量GBLUP。最后,我们探索了使用高维定量ECs,并讨论了何时以及如何利用详细的环境数据进行METs的基因组预测。我们建议MegaLMM可应用于使用大规模线性混合模型的各种作物的植物育种和遗传学的不同领域。