Zhang Guishan, Zeng Tian, Dai Zhiming, Dai Xianhua
Key Laboratory of Digital Signal and Image Processing of Guangdong Provincial, College of Engineering, Shantou University, Shantou 515063, China.
School of Electronics and Information Technology, Sun Yat-sen University, Guangzhou 510006, China.
Comput Struct Biotechnol J. 2021 Mar 7;19:1445-1457. doi: 10.1016/j.csbj.2021.03.001. eCollection 2021.
CRISPR/Cas9 is a preferred genome editing tool and has been widely adapted to ranges of disciplines, from molecular biology to gene therapy. A key prerequisite for the success of CRISPR/Cas9 is its capacity to distinguish between single guide RNAs (sgRNAs) on target and homologous off-target sites. Thus, optimized design of sgRNAs by maximizing their on-target activity and minimizing their potential off-target mutations are crucial concerns for this system. Several deep learning models have been developed for comprehensive understanding of sgRNA cleavage efficacy and specificity. Although the proposed methods yield the performance results by automatically learning a suitable representation from the input data, there is still room for the improvement of accuracy and interpretability. Here, we propose novel interpretable attention-based convolutional neural networks, namely CRISPR-ONT and CRISPR-OFFT, for the prediction of CRISPR/Cas9 sgRNA on- and off-target activities, respectively. Experimental tests on public datasets demonstrate that our models significantly yield satisfactory results in terms of accuracy and interpretability. Our findings contribute to the understanding of how RNA-guide Cas9 nucleases scan the mammalian genome. Data and source codes are available at https://github.com/Peppags/CRISPRont-CRISPRofft.
CRISPR/Cas9是一种首选的基因组编辑工具,已被广泛应用于从分子生物学到基因治疗等一系列学科。CRISPR/Cas9成功的一个关键前提是其区分靶标上的单导向RNA(sgRNA)和同源脱靶位点的能力。因此,通过最大化sgRNA的靶向活性并最小化其潜在的脱靶突变来优化sgRNA的设计是该系统的关键问题。已经开发了几种深度学习模型来全面理解sgRNA的切割效率和特异性。尽管所提出的方法通过自动从输入数据中学习合适的表示来产生性能结果,但在准确性和可解释性方面仍有改进空间。在这里,我们提出了基于注意力的新型可解释卷积神经网络,即CRISPR-ONT和CRISPR-OFFT,分别用于预测CRISPR/Cas9 sgRNA的靶向和脱靶活性。在公共数据集上的实验测试表明,我们的模型在准确性和可解释性方面显著产生了令人满意的结果。我们的发现有助于理解RNA引导的Cas9核酸酶如何扫描哺乳动物基因组。数据和源代码可在https://github.com/Peppags/CRISPRont-CRISPRofft获取。