School of Science, Dalian Maritime University, Dalian 116026, China.
School of Bioengineering, Dalian University of Technology, Dalian 116024, China.
Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad299.
The formation of biomolecular condensates by liquid-liquid phase separation (LLPS) has become a universal mechanism for spatiotemporal coordination of biological activities in cells and has been widely observed to directly regulate the key cellular processes involved in cancer cell pathology. However, the complexity of protein sequences and the diversity of conformations are inherently disordered, which poses great challenges for LLPS protein calculations and experimental research. Herein, we proposed a novel predictor named PredLLPS_PSSM for LLPS protein identification based only on sequence evolution information. Because finding real and reliable samples is the cornerstone of building predictors, we collected anew and collated the LLPS proteins from the latest versions of three databases. By comparing the performance of the position-specific score matrix (PSSM) and word embedding, PredLLPS_PSSM combined PSSM-based information and two deep learning frameworks. Independent tests using three existing independent test datasets and two newly constructed independent test datasets demonstrated the superiority of PredLLPS_PSSM compared with state-of-the-art methods. Furthermore, we tested PredLLPS_PSSM on nine experimentally identified LLPS proteins from three insects that were not included in any of the databases. In addition, the powerful Shapley Additive exPlanation algorithm and heatmap were applied to find the most critical amino acids relevant to LLPS.
液-液相分离(LLPS)形成生物分子凝聚物已成为细胞中生物活性时空协调的通用机制,并广泛观察到其直接调节涉及癌细胞病理学的关键细胞过程。然而,蛋白质序列的复杂性和构象的多样性是固有无序的,这给 LLPS 蛋白的计算和实验研究带来了巨大的挑战。在此,我们提出了一种新的基于序列进化信息的 LLPS 蛋白预测器 PredLLPS_PSSM。因为寻找真实可靠的样本是构建预测器的基石,我们从三个数据库的最新版本中收集并整理了新的 LLPS 蛋白。通过比较位置特异性评分矩阵(PSSM)和词嵌入的性能,PredLLPS_PSSM 结合了基于 PSSM 的信息和两个深度学习框架。使用三个现有的独立测试数据集和两个新构建的独立测试数据集进行的独立测试表明,PredLLPS_PSSM 优于最先进的方法。此外,我们在三种昆虫中测试了九个实验鉴定的 LLPS 蛋白,这些蛋白均未包含在任何数据库中。此外,还应用了强大的 Shapley Additive exPlanation 算法和热图来寻找与 LLPS 最相关的关键氨基酸。