Suppr超能文献

NeuroPpred-MSN:一种基于多特征融合和连体网络的神经肽预测模型。

NeuroPpred-MSN: A Neuropeptide Prediction Model Based on Multi-feature Fusion and Siamese Networks.

作者信息

Wen Jian, Chen Minyu, Shen Yongqi, Wang Honghong, Wei Zhuoyu, Gu Lichuan, Zhu Xiaolei

机构信息

School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, 230036, China.

Anhui Province Key Laboratory of Smart Agricultural Technology and Equipment, Hefei, 230036, China.

出版信息

Interdiscip Sci. 2025 Jun 3. doi: 10.1007/s12539-025-00730-6.

Abstract

The discovery of neuropeptides offers numerous opportunities for identifying novel drugs and targets to treat a variety of diseases. While various computational methods have been proposed, there remains potential for further performance improvement. In this work, we introduce NeuroPpred-MSN, an innovative and efficient neuropeptide prediction model that leverages multi-feature fusion and Siamese networks. To comprehensively represent the information of neuropeptides, the peptide sequences are encoded by four encoding schemes (token embedding, word2vec embedding, protein language embedding, and handcrafted features). Then, the token embedding and word2vector embedding are fed to a Siamese network channel. In the other channel of the model, peptide sequences and their secondary structure sequences are fed into ProtT5-XL-UniRef50 model to generate the embedding features, while handcrafted encoding techniques are used to extract the physicochemical information. Then the two kinds of features are fused and fed into a bidirectional gated recurrent unit (Bi-GRU) network for further processing. Ultimately, the outputs of the two channels are integrated into a fully connected layer, thereby facilitating the generation of the final prediction. The results on the independent test set indicate that NeuroPpred-MSN exhibits superior predictive performance, with an area under the receiver operating characteristic curve (AUROC) of 98.3%, exceeding the performance of other state-of-the-art predictors. Specifically, compared to other optimal results, this model exhibits improvements of 1.52% in accuracy (ACC), 1.52% in F1 score (F1), 3.2% in Matthews correlation coefficient (MCC), and 1.55% in AUROC. The model was further evaluated on imbalanced datasets, where it achieved the highest values in AUROC, ACC, MCC, sensitivity (SN), and F1, further demonstrating its robustness and generalization. The model can be accessed at the following GitHub repository: https://github.com/wenjean/NeuroPpred-MSN .

摘要

神经肽的发现为识别治疗各种疾病的新型药物和靶点提供了众多机会。虽然已经提出了各种计算方法,但仍有进一步提高性能的潜力。在这项工作中,我们介绍了NeuroPpred-MSN,这是一种创新且高效的神经肽预测模型,它利用了多特征融合和连体网络。为了全面表示神经肽的信息,肽序列由四种编码方案(标记嵌入、词向量嵌入、蛋白质语言嵌入和手工制作特征)进行编码。然后,将标记嵌入和词向量嵌入输入到连体网络通道中。在模型的另一个通道中,肽序列及其二级结构序列被输入到ProtT5-XL-UniRef50模型中以生成嵌入特征,同时使用手工制作的编码技术来提取物理化学信息。然后将这两种特征融合并输入到双向门控循环单元(Bi-GRU)网络中进行进一步处理。最终,将两个通道的输出集成到一个全连接层中,从而便于生成最终预测。独立测试集上的结果表明,NeuroPpred-MSN表现出卓越的预测性能,受试者工作特征曲线下面积(AUROC)为98.3%,超过了其他现有最佳预测器的性能。具体而言,与其他最佳结果相比,该模型在准确率(ACC)上提高了1.52%,在F1分数(F1)上提高了1.52%,在马修斯相关系数(MCC)上提高了3.2%,在AUROC上提高了1.55%。该模型在不平衡数据集上进行了进一步评估,在AUROC、ACC、MCC、灵敏度(SN)和F1方面均取得了最高值,进一步证明了其稳健性和泛化能力。该模型可在以下GitHub存储库中获取:https://github.com/wenjean/NeuroPpred-MSN

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验