Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, DTU, 2800 Kgs, Lyngby, Denmark.
Department of Health Technology, Section for Experimental and Translational Immunology, Technical University of Denmark, DTU, 2800 Kgs, Lyngby, Denmark.
Commun Biol. 2021 Sep 10;4(1):1060. doi: 10.1038/s42003-021-02610-3.
Prediction of T-cell receptor (TCR) interactions with MHC-peptide complexes remains highly challenging. This challenge is primarily due to three dominant factors: data accuracy, data scarceness, and problem complexity. Here, we showcase that "shallow" convolutional neural network (CNN) architectures are adequate to deal with the problem complexity imposed by the length variations of TCRs. We demonstrate that current public bulk CDR3β-pMHC binding data overall is of low quality and that the development of accurate prediction models is contingent on paired α/β TCR sequence data corresponding to at least 150 distinct pairs for each investigated pMHC. In comparison, models trained on CDR3α or CDR3β data alone demonstrated a variable and pMHC specific relative performance drop. Together these findings support that T-cell specificity is predictable given the availability of accurate and sufficient paired TCR sequence data. NetTCR-2.0 is publicly available at https://services.healthtech.dtu.dk/service.php?NetTCR-2.0 .
T 细胞受体 (TCR) 与 MHC-肽复合物相互作用的预测仍然极具挑战性。这一挑战主要归因于三个主要因素:数据准确性、数据稀缺性和问题复杂性。在这里,我们展示了“浅层”卷积神经网络 (CNN) 架构足以应对 TCR 长度变化带来的问题复杂性。我们证明,目前公共的 TCR 全长 CDR3β-pMHC 结合数据总体质量较低,并且开发准确的预测模型取决于至少 150 对每个研究的 pMHC 所对应的配对的 α/β TCR 序列数据。相比之下,仅使用 CDR3α 或 CDR3β 数据训练的模型表现出可变的且针对 pMHC 的相对性能下降。这些发现共同表明,只要有准确和充足的配对 TCR 序列数据,T 细胞的特异性是可预测的。NetTCR-2.0 可在 https://services.healthtech.dtu.dk/service.php?NetTCR-2.0 上公开获取。