PRADclass：基于 Gleason 分级信息的混合计算策略鉴定预测侵袭性前列腺腺癌的共识生物标志物特征

PRADclass: Hybrid Gleason Grade-Informed Computational Strategy Identifies Consensus Biomarker Features Predictive of Aggressive Prostate Adenocarcinoma.

机构信息

Department of Bioinformatics, School of Chemical and Biotechnology, SASTRA Deemed to be University, Thanjavur, India.

Department of Pharmaceutical Technology, UCE, Anna University (BIT campus), Trichy, India.

出版信息

Technol Cancer Res Treat. 2024 Jan-Dec;23:15330338231222389. doi: 10.1177/15330338231222389.

DOI:10.1177/15330338231222389

PMID:38226611

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10793196/

Abstract

BACKGROUND

Prostate adenocarcinoma (PRAD) is a common cancer diagnosis among men globally, yet large gaps in our knowledge persist with respect to the molecular bases of its progression and aggression. It is mostly indolent and slow-growing, but aggressive prostate cancers need to be recognized early for optimising treatment, with a view to reducing mortality.

METHODS

Based on TCGA transcriptomic data pertaining to PRAD and the associated clinical metadata, we determined the sample Gleason grade, and used it to execute: (i) Gleason-grade wise linear modeling, followed by five contrasts against controls and ten contrasts between grades; and (ii) Gleason-grade wise network modeling via weighted gene correlation network analysis (WGCNA). Candidate biomarkers were obtained from the above analysis and the consensus found. The consensus biomarkers were used as the feature space to train ML models for classifying a sample as benign, indolent or aggressive.

RESULTS

The statistical modeling yielded 77 Gleason grade-salient genes while the WGCNA algorithm yielded 1003 trait-specific key genes in grade-wise significant modules. Consensus analysis of the two approaches identified two genes in Grade-1 (SLC43A1 and PHGR1), 26 genes in Grade-4 (including LOC100128675, PPP1R3C, NECAB1, UBXN10, SERPINA5, CLU, RASL12, DGKG, FHL1, NCAM1, and CEND1), and seven genes in Grade-5 (CBX2, DPYS, FAM72B, SHCBP1, TMEM132A, TPX2, UBE2C). A RandomForest model trained and optimized on these 35 biomarkers for the ternary classification problem yielded a balanced accuracy ∼ 86% on external validation.

CONCLUSIONS

The consensus of multiple parallel computational strategies has unmasked candidate Gleason grade-specific biomarkers. PRADclass, a validated AI model featurizing these biomarkers achieved good performance, and could be trialed to predict the differentiation of prostate cancers. PRADclass is available for academic use at: https://apalania.shinyapps.io/pradclass (online) and https://github.com/apalania/pradclass (command-line interface).

摘要

背景

前列腺腺癌（PRAD）是全球男性常见的癌症诊断，但在其进展和侵袭的分子基础方面仍存在很大的知识差距。它大多是惰性和缓慢生长的，但需要早期识别侵袭性前列腺癌，以优化治疗，降低死亡率。

方法

基于 TCGA 转录组数据和相关的临床元数据，我们确定了样本的 Gleason 分级，并使用它来执行：（i）Gleason 分级线性建模，然后进行五个对照与对照和十个对照之间的等级；（ii）通过加权基因相关网络分析（WGCNA）进行 Gleason 分级网络建模。候选生物标志物来自于上述分析和共识发现。共识生物标志物被用作特征空间，用于训练 ML 模型，将样本分类为良性、惰性或侵袭性。

结果

统计建模产生了 77 个与 Gleason 分级相关的基因，而 WGCNA 算法在分级显著模块中产生了 1003 个特征特定的关键基因。两种方法的共识分析确定了两个在 1 级（SLC43A1 和 PHGR1）的基因，26 个在 4 级（包括 LOC100128675、PPP1R3C、NECAB1、UBXN10、SERPINA5、CLU、RASL12、DGKG、FHL1、NCAM1 和 CEND1）的基因，以及 7 个在 5 级（CBX2、DPYS、FAM72B、SHCBP1、TMEM132A、TPX2 和 UBE2C）的基因。在这些 35 个生物标志物上训练和优化的随机森林模型用于三分类问题，在外部验证中达到了约 86%的平衡准确性。

结论

多种平行计算策略的共识揭示了候选的 Gleason 分级特异性生物标志物。PRADclass，一个基于这些生物标志物的验证 AI 模型，表现良好，可以尝试用于预测前列腺癌的分化。PRADclass 可在以下网址获得学术使用：https://apalania.shinyapps.io/pradclass（在线）和 https://github.com/apalania/pradclass（命令行接口）。