Seale Colm, Gonçalves Joana P
Pattern Recognition & Bioinformatics, Department of Intelligent Systems, EEMCS Faculty, Delft University of Technology, 2628 XE Delft, The Netherlands.
Holland Proton Therapy Centre (HollandPTC), 2629 JH Delft, The Netherlands.
Bioinform Adv. 2025 Jul 2;5(1):vbaf157. doi: 10.1093/bioadv/vbaf157. eCollection 2025.
Controlling the outcomes of CRISPR editing is crucial for the success of gene therapy. Since donor template-based editing is often inefficient, alternative strategies have emerged that leverage mutagenic end-joining repair instead. Existing machine learning models can accurately predict end-joining repair outcomes; however, generalisability beyond the specific cell line used for training remains a challenge, and interpretability is typically limited by suboptimal feature representation and model architecture.
We propose X-CRISP, a flexible and interpretable neural network for predicting repair outcome frequencies based on a minimal set of outcome and sequence features, including microhomologies (MH). Outperforming prior models on detailed and aggregate outcome predictions, X-CRISP prioritised MH location over MH sequence properties such as GC content for deletion outcomes. Through transfer learning, we adapted X-CRISP pre-trained on wild-type mESC data to target human cell lines K562, HAP1, U2OS, and mESC lines with altered DNA repair function. Adapted X-CRISP models improved over direct training on target data from as few as 50 samples, suggesting that this strategy could be leveraged to build models for new domains using a fraction of the data required to train models from scratch.
X-CRISP is available at https://github.com/joanagoncalveslab/xcrisp.
控制CRISPR编辑的结果对于基因治疗的成功至关重要。由于基于供体模板的编辑通常效率低下,因此出现了利用诱变末端连接修复的替代策略。现有的机器学习模型可以准确预测末端连接修复结果;然而,在用于训练的特定细胞系之外的泛化仍然是一个挑战,并且可解释性通常受到次优特征表示和模型架构的限制。
我们提出了X-CRISP,这是一种灵活且可解释的神经网络,用于基于一组最少的结果和序列特征(包括微同源性(MH))来预测修复结果频率。在详细和总体结果预测方面优于先前的模型,X-CRISP在预测缺失结果时,将MH位置优先于MH序列特性(如GC含量)。通过迁移学习,我们将在野生型mESC数据上预训练的X-CRISP应用于靶向人类细胞系K562、HAP1、U2OS以及具有改变的DNA修复功能的mESC系。经过调整的X-CRISP模型在对少至50个样本的目标数据进行直接训练时表现更优,这表明该策略可用于利用从零开始训练模型所需数据的一小部分来构建新领域的模型。