Suppr超能文献

DeepCRISTL:深度迁移学习预测 CRISPR/Cas9 功能和内源性靶标编辑效率。

DeepCRISTL: deep transfer learning to predict CRISPR/Cas9 functional and endogenous on-target editing efficiency.

机构信息

School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer-Sheva 8410501, Israel.

出版信息

Bioinformatics. 2022 Jun 24;38(Suppl 1):i161-i168. doi: 10.1093/bioinformatics/btac218.

Abstract

MOTIVATION

CRISPR/Cas9 technology has been revolutionizing the field of gene editing in recent years. Guide RNAs (gRNAs) enable Cas9 proteins to target specific genomic loci for editing. However, editing efficiency varies between gRNAs. Thus, computational methods were developed to predict editing efficiency for any gRNA of interest. High-throughput datasets of Cas9 editing efficiencies were produced to train machine-learning models to predict editing efficiency. However, these high-throughput datasets have low correlation with functional and endogenous editing. Another difficulty arises from the fact that functional and endogenous editing efficiency is more difficult to measure, and as a result, functional and endogenous datasets are too small to train accurate machine-learning models on.

RESULTS

We developed DeepCRISTL, a deep-learning model to predict the on-target efficiency given a gRNA sequence. DeepCRISTL takes advantage of high-throughput datasets to learn general patterns of gRNA on-target editing efficiency, and then uses transfer learning (TL) to fine-tune the model and fit it to the functional and endogenous prediction task. We pre-trained the DeepCRISTL model on more than 150 000 gRNAs, produced through the DeepHF study as a high-throughput dataset of three Cas9 enzymes. We improved the DeepHF model by multi-task and ensemble techniques and achieved state-of-the-art results over each of the three enzymes: up to 0.89 in Spearman correlation between predicted and measured on-target efficiencies. To fine-tune model weights to predict on-target efficiency of functional or endogenous datasets, we tested several TL approaches, with gradual learning being the overall best performer, both when pre-trained on DeepHF and when pre-trained on CRISPROn, another high-throughput dataset. DeepCRISTL outperformed state-of-the-art methods on all functional and endogenous datasets. Using saliency maps, we identified and compared the important features learned by the model in each dataset. We believe DeepCRISTL will improve prediction performance in many other CRISPR/Cas9 editing contexts by leveraging TL to utilize both high-throughput datasets, and smaller and more biologically relevant datasets, such as functional and endogenous datasets.

AVAILABILITY AND IMPLEMENTATION

DeepCRISTL is available via github.com/OrensteinLab/DeepCRISTL.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

近年来,CRISPR/Cas9 技术正在彻底改变基因编辑领域。向导 RNA(gRNA)使 Cas9 蛋白能够针对特定的基因组位点进行编辑。然而,gRNA 之间的编辑效率存在差异。因此,开发了计算方法来预测任何感兴趣的 gRNA 的编辑效率。产生了高通量 Cas9 编辑效率数据集来训练机器学习模型以预测编辑效率。然而,这些高通量数据集与功能和内源性编辑的相关性较低。另一个困难源于这样一个事实,即功能和内源性编辑效率更难测量,因此,功能和内源性数据集太小,无法在其上训练准确的机器学习模型。

结果

我们开发了 DeepCRISTL,这是一种基于深度学习的模型,可根据 gRNA 序列预测靶标效率。DeepCRISTL 利用高通量数据集来学习 gRNA 靶标编辑效率的一般模式,然后使用迁移学习(TL)来微调模型并使其适应功能和内源性预测任务。我们在超过 150000 个 gRNA 上对 DeepCRISTL 模型进行了预训练,这些 gRNA 是通过 DeepHF 研究产生的,作为三种 Cas9 酶的高通量数据集。我们通过多任务和集成技术改进了 DeepHF 模型,并在三种酶中的每一种上都取得了最先进的结果:在预测和测量的靶标效率之间的 Spearman 相关性高达 0.89。为了调整模型权重以预测功能或内源性数据集的靶标效率,我们测试了几种 TL 方法,逐步学习是整体表现最好的方法,无论是在 DeepHF 上进行预训练还是在另一个高通量数据集 CRISPROn 上进行预训练。DeepCRISTL 在所有功能和内源性数据集上均优于最先进的方法。通过使用显着性图,我们在每个数据集上确定并比较了模型学习的重要特征。我们相信,通过利用 TL 利用高通量数据集以及更小且更具生物学相关性的数据集(例如功能和内源性数据集),DeepCRISTL 将通过利用 TL 来改善许多其他 CRISPR/Cas9 编辑情况下的预测性能。

可用性和实现

DeepCRISTL 可通过 github.com/OrensteinLab/DeepCRISTL 获得。

补充信息

补充数据可在 Bioinformatics 在线获得。

相似文献

1
DeepCRISTL: deep transfer learning to predict CRISPR/Cas9 functional and endogenous on-target editing efficiency.
Bioinformatics. 2022 Jun 24;38(Suppl 1):i161-i168. doi: 10.1093/bioinformatics/btac218.
3
Enhancing CRISPR-Cas9 gRNA efficiency prediction by data integration and deep learning.
Nat Commun. 2021 May 28;12(1):3238. doi: 10.1038/s41467-021-23576-0.
4
Machine learning-based prediction models to guide the selection of Cas9 variants for efficient gene editing.
Cell Rep. 2024 Feb 27;43(2):113765. doi: 10.1016/j.celrep.2024.113765. Epub 2024 Feb 14.
5
A Multiplexed CRISPR/Cas9 Editing System Based on the Endogenous tRNA Processing.
Methods Mol Biol. 2019;1917:63-73. doi: 10.1007/978-1-4939-8991-1_5.
6
7
WheatCRISPR: a web-based guide RNA design tool for CRISPR/Cas9-mediated genome editing in wheat.
BMC Plant Biol. 2019 Nov 6;19(1):474. doi: 10.1186/s12870-019-2097-z.
8
CROTON: an automated and variant-aware deep learning framework for predicting CRISPR/Cas9 editing outcomes.
Bioinformatics. 2021 Jul 12;37(Suppl_1):i342-i348. doi: 10.1093/bioinformatics/btab268.
10
CRISPR-Cas9 gRNA efficiency prediction: an overview of predictive tools and the role of deep learning.
Nucleic Acids Res. 2022 Apr 22;50(7):3616-3637. doi: 10.1093/nar/gkac192.

引用本文的文献

1
DeepMEns: an ensemble model for predicting sgRNA on-target activity based on multiple features.
Brief Funct Genomics. 2025 Jan 15;24. doi: 10.1093/bfgp/elae043.
3
Epigenetic profiles guide improved CRISPR/Cas9-mediated gene knockout in human T cells.
Nucleic Acids Res. 2024 Jan 11;52(1):141-153. doi: 10.1093/nar/gkad1076.
4
Deep learning in CRISPR-Cas systems: a review of recent studies.
Front Bioeng Biotechnol. 2023 Jul 3;11:1226182. doi: 10.3389/fbioe.2023.1226182. eCollection 2023.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验