Suppr超能文献

深度学习和机器学习融合框架用于预测 sgRNA 切割效率。

A fusion framework of deep learning and machine learning for predicting sgRNA cleavage efficiency.

机构信息

Department of Biomedical Informatics, MOE Key Lab of Cardiovascular Sciences, School of Basic Medical Sciences, Peking University, Beijing, China.

Department of Biomedical Informatics, MOE Key Lab of Cardiovascular Sciences, School of Basic Medical Sciences, Peking University, Beijing, China.

出版信息

Comput Biol Med. 2023 Oct;165:107476. doi: 10.1016/j.compbiomed.2023.107476. Epub 2023 Sep 6.

Abstract

CRISPR/Cas9 system is a powerful tool for genome editing. Numerous studies have shown that sgRNAs can strongly affect the efficiency of editing. However, it is still not clear what rules should be followed for designing sgRNA with high cleavage efficiency. At present, several machine learning or deep learning methods have been developed to predict the cleavage efficiency of sgRNAs, however, the prediction accuracy of these tools is still not satisfactory. Here we propose a fusion framework of deep learning and machine learning, which first deals with the primary sequence and secondary structure features of the sgRNAs using both convolutional neural network (CNN) and recurrent neural network (RNN), and then uses the features extracted by the deep neural network to train a conventional machine learning model with LGBM. As a result, the new approach overwhelmed previous methods. The Spearman's correlation coefficient between predicted and measured sgRNA cleavage efficiency of our model (0.917) is improved by over 5% compared with the most advanced method (0.865), and the mean square error reduces from 7.89 × 10 to 4.75 × 10. Finally, we developed an online tool, CRISep (http://www.cuilab.cn/CRISep), to evaluate the availability of sgRNAs based on our models.

摘要

CRISPR/Cas9 系统是基因组编辑的强大工具。大量研究表明,sgRNA 可以强烈影响编辑效率。然而,设计具有高切割效率的 sgRNA 应该遵循什么规则仍不清楚。目前已经开发了几种机器学习或深度学习方法来预测 sgRNA 的切割效率,但是这些工具的预测准确性仍然不尽人意。在这里,我们提出了一种深度学习和机器学习的融合框架,该框架首先使用卷积神经网络 (CNN) 和递归神经网络 (RNN) 处理 sgRNA 的原始序列和二级结构特征,然后使用深度神经网络提取的特征来训练带有 LGBM 的传统机器学习模型。结果,新方法优于以前的方法。与最先进的方法 (0.865) 相比,我们模型的预测和测量 sgRNA 切割效率之间的斯皮尔曼相关系数 (0.917) 提高了 5%以上,均方误差从 7.89×10 降低到 4.75×10。最后,我们开发了一个在线工具 CRISep(http://www.cuilab.cn/CRISep),基于我们的模型评估 sgRNA 的可用性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验