Marzella Dario F, Crocioni Giulia, Radusinović Tadija, Lepikhov Daniil, Severin Heleen, Bodor Dani L, Rademaker Daniel T, Lin ChiaYu, Georgievska Sonja, Renaud Nicolas, Kessler Amy L, Lopez-Tarifa Pablo, Buschow Sonja I, Bekkers Erik, Xue Li C
Medical BioSciences department, Radboudumc, Radboud University Medical Center, 6525 GA, Nijmegen, The Netherlands.
Netherlands eScience Center, Amsterdam, The Netherlands.
Commun Biol. 2024 Dec 19;7(1):1661. doi: 10.1038/s42003-024-07292-1.
The interaction between peptides and major histocompatibility complex (MHC) molecules is pivotal in autoimmunity, pathogen recognition and tumor immunity. Recent advances in cancer immunotherapies demand for more accurate computational prediction of MHC-bound peptides. We address the generalizability challenge of MHC-bound peptide predictions, revealing limitations in current sequence-based approaches. Our structure-based methods leveraging geometric deep learning (GDL) demonstrate promising improvement in generalizability across unseen MHC alleles. Further, we tackle data efficiency by introducing a self-supervised learning approach on structures (3D-SSL). Without being exposed to any binding affinity data, our 3D-SSL outperforms sequence-based methods trained on ~90 times more data points. Finally, we demonstrate the resilience of structure-based GDL methods to biases in binding data on an Hepatitis B virus vaccine immunopeptidomics case study. This proof-of-concept study highlights structure-based methods' potential to enhance generalizability and data efficiency, with possible implications for data-intensive fields like T-cell receptor specificity predictions.
肽与主要组织相容性复合体(MHC)分子之间的相互作用在自身免疫、病原体识别和肿瘤免疫中起着关键作用。癌症免疫疗法的最新进展要求对与MHC结合的肽进行更准确的计算预测。我们解决了与MHC结合的肽预测的泛化性挑战,揭示了当前基于序列的方法的局限性。我们基于结构的方法利用几何深度学习(GDL)在跨未见MHC等位基因的泛化性方面显示出有前景的改进。此外,我们通过引入基于结构的自监督学习方法(3D-SSL)来解决数据效率问题。在未接触任何结合亲和力数据的情况下,我们的3D-SSL优于在多约90倍数据点上训练的基于序列的方法。最后,我们在乙肝病毒疫苗免疫肽组学案例研究中展示了基于结构的GDL方法对结合数据偏差的适应性。这项概念验证研究突出了基于结构的方法在增强泛化性和数据效率方面的潜力,可能对T细胞受体特异性预测等数据密集型领域产生影响。