Department of Computer Science and Engineering, The Chinese University of Hong Kong, Sha Tin, Hong Kong SAR 999077, China.
Bioinformatics. 2023 May 4;39(5). doi: 10.1093/bioinformatics/btad259.
As an important group of proteins discovered in phages, anti-CRISPR inhibits the activity of the immune system of bacteria (i.e. CRISPR-Cas), offering promise for gene editing and phage therapy. However, the prediction and discovery of anti-CRISPR are challenging due to their high variability and fast evolution. Existing biological studies rely on known CRISPR and anti-CRISPR pairs, which may not be practical considering the huge number. Computational methods struggle with prediction performance. To address these issues, we propose a novel deep neural network for anti-CRISPR analysis (AcrNET), which achieves significant performance.
On both the cross-fold and cross-dataset validation, our method outperforms the state-of-the-art methods. Notably, AcrNET improves the prediction performance by at least 15% regarding the F1 score for the cross-dataset test problem comparing with state-of-art Deep Learning method. Moreover, AcrNET is the first computational method to predict the detailed anti-CRISPR classes, which may help illustrate the anti-CRISPR mechanism. Taking advantage of a Transformer protein language model ESM-1b, which was pre-trained on 250 million protein sequences, AcrNET overcomes the data scarcity problem. Extensive experiments and analysis suggest that the Transformer model feature, evolutionary feature, and local structure feature complement each other, which indicates the critical properties of anti-CRISPR proteins. AlphaFold prediction, further motif analysis, and docking experiments further demonstrate that AcrNET can capture the evolutionarily conserved pattern and the interaction between anti-CRISPR and the target implicitly.
Web server: https://proj.cse.cuhk.edu.hk/aihlab/AcrNET/. Training code and pre-trained model are available at.
作为噬菌体中发现的一类重要蛋白,抗 CRISPR 抑制了细菌免疫系统(即 CRISPR-Cas)的活性,为基因编辑和噬菌体治疗提供了可能。然而,由于其高度变异性和快速进化,抗 CRISPR 的预测和发现具有挑战性。现有的生物学研究依赖于已知的 CRISPR 和抗 CRISPR 对,考虑到数量庞大,这可能并不实际。计算方法在预测性能方面存在困难。为了解决这些问题,我们提出了一种新的用于抗 CRISPR 分析的深度神经网络(AcrNET),该方法取得了显著的性能。
在交叉折叠和交叉数据集验证中,我们的方法均优于最先进的方法。值得注意的是,与最先进的深度学习方法相比,AcrNET 在跨数据集测试问题的 F1 分数上至少提高了 15%,提高了预测性能。此外,AcrNET 是第一个能够预测详细抗 CRISPR 类别的计算方法,这可能有助于阐明抗 CRISPR 机制。利用经过 2.5 亿个蛋白质序列预训练的 Transformer 蛋白质语言模型 ESM-1b,AcrNET 克服了数据匮乏的问题。大量的实验和分析表明,Transformer 模型特征、进化特征和局部结构特征相互补充,这表明了抗 CRISPR 蛋白的关键性质。AlphaFold 预测、进一步的基序分析和对接实验进一步证明了 AcrNET 可以隐式地捕获抗 CRISPR 与靶标之间的进化保守模式和相互作用。
Web 服务器:https://proj.cse.cuhk.edu.hk/aihlab/AcrNET/。训练代码和预训练模型可在[项目页面]获取。