Huang Jianwei, Zhang Youli, Ren Shulin, Wang Ziyang, Jin Xiaocheng, Lu Xiaoli, Zhang Yu, Min Xiaoping, Ge Shengxiang, Zhang Jun, Xia Ningshao
Institute of Artificial Intelligence, School of Informatics, Xiamen University, No. 422 Siming South Rd, 361005, Xiamen, Fujian, China.
National Institute of Diagnostics and Vaccine Development in Infectious Diseases, School of Public Health, Xiamen University, No. 422 Siming South Rd, 361005, Xiamen, Fujian, China.
Brief Bioinform. 2025 May 3;26(3). doi: 10.1093/bib/bbaf230.
Liquid-liquid phase separation plays a critical role in cellular processes, including protein aggregation and RNA metabolism, by forming membraneless subcellular structures. Accurate identification of phase-separated proteins is essential for understanding and controlling these processes. Traditional identification methods are effective but often costly and time-consuming. The recent machine learning methods have reduced these costs, but most models are restricted to classifying scaffold and client proteins with limited experimental conditions. To address this limitation, we developed a Mamba-based encoder using contrastive learning that incorporates separation probability, protein type, and experimental conditions. Our model achieved 95.2% accuracy in predicting phase-separated proteins and an ROCAUC score of 0.87 in classifying scaffold and client proteins. Further validation in the DgHBP-2 drug delivery system demonstrated its potential for condition modulation in drug development. This study provides an effective framework for the accurate identification and control of phase separation, facilitating advancements in biomedical research and therapeutic applications.
液-液相分离通过形成无膜亚细胞结构,在包括蛋白质聚集和RNA代谢在内的细胞过程中发挥关键作用。准确识别相分离蛋白对于理解和控制这些过程至关重要。传统的识别方法虽然有效,但往往成本高昂且耗时。最近的机器学习方法降低了这些成本,但大多数模型仅限于在有限的实验条件下对支架蛋白和客户蛋白进行分类。为了解决这一局限性,我们开发了一种基于曼巴的编码器,使用对比学习,该编码器纳入了分离概率、蛋白质类型和实验条件。我们的模型在预测相分离蛋白方面达到了95.2%的准确率,在对支架蛋白和客户蛋白进行分类时的ROCAUC分数为0.87。在DgHBP-2药物递送系统中的进一步验证证明了其在药物开发中进行条件调节的潜力。本研究为相分离的准确识别和控制提供了一个有效的框架,促进了生物医学研究和治疗应用的进展。