Kennedy Luke, Sandhu Jagdeep K, Harper Mary-Ellen, Cuperlovic-Culf Miroslava
Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, University of Ottawa, 451 Smyth Road, Ottawa, ON, K1H 8M5, Canada.
Ottawa Institute of Systems Biology, University of Ottawa, 451 Smyth Road, Ottawa, ON, K1H 8M5, Canada.
BMC Bioinformatics. 2025 Feb 11;26(1):48. doi: 10.1186/s12859-025-06051-1.
Alterations of metabolism, including changes in mitochondrial metabolism as well as glutathione (GSH) metabolism are a well appreciated hallmark of many cancers. Mitochondrial GSH (mGSH) transport is a poorly characterized aspect of GSH metabolism, which we investigate in the context of cancer. Existing functional annotation approaches from machine (ML) or deep learning (DL) models based only on protein sequences, were unable to annotate functions in biological contexts.
We develop a flexible ML framework for functional annotation from diverse feature data. This hybrid ML framework leverages cancer cell line multi-omics data and other biological knowledge data as features, to uncover potential genes involved in mGSH metabolism and membrane transport in cancers. This framework achieves strong performance across functional annotation tasks and several cell line and primary tumor cancer samples. For our application, classification models predict the known mGSH transporter SLC25A39 but not SLC25A40 as being highly probably related to mGSH metabolism in cancers. SLC25A10, SLC25A50, and orphan SLC25A24, SLC25A43 are predicted to be associated with mGSH metabolism in multiple biological contexts and structural analysis of these proteins reveal similarities in potential substrate binding regions to the binding residues of SLC25A39.
These findings have implications for a better understanding of cancer cell metabolism and novel therapeutic targets with respect to GSH metabolism through potential novel functional annotations of genes. The hybrid ML framework proposed here can be applied to other biological function classifications or multi-omics datasets to generate hypotheses in various biological contexts. Code and a tutorial for generating models and predictions in this framework are available at: https://github.com/lkenn012/mGSH_cancerClassifiers .
代谢改变,包括线粒体代谢以及谷胱甘肽(GSH)代谢的变化,是许多癌症的一个公认特征。线粒体GSH(mGSH)转运是GSH代谢中一个特征描述较少的方面,我们在癌症背景下对其进行研究。现有的基于机器学习(ML)或深度学习(DL)模型的功能注释方法仅基于蛋白质序列,无法在生物学背景下注释功能。
我们开发了一个灵活的用于从多样特征数据进行功能注释的ML框架。这个混合ML框架利用癌细胞系多组学数据和其他生物学知识数据作为特征,以揭示癌症中参与mGSH代谢和膜转运的潜在基因。该框架在功能注释任务以及多个细胞系和原发性肿瘤癌症样本中都取得了良好的性能。对于我们的应用,分类模型预测已知的mGSH转运蛋白SLC25A39而非SLC25A40极有可能与癌症中的mGSH代谢相关。SLC25A10、SLC25A50以及孤儿蛋白SLC25A24、SLC25A43预计在多种生物学背景下与mGSH代谢相关,对这些蛋白质的结构分析揭示了其潜在底物结合区域与SLC25A39结合残基的相似性。
这些发现对于通过潜在的基因新功能注释更好地理解癌细胞代谢以及GSH代谢的新治疗靶点具有重要意义。这里提出的混合ML框架可应用于其他生物学功能分类或多组学数据集,以在各种生物学背景下生成假设。该框架中用于生成模型和预测的代码及教程可在以下网址获取:https://github.com/lkenn012/mGSH_cancerClassifiers 。