Beder Thomas, Hansen Björn-Thore, Hartmann Alina M, Zimmermann Johannes, Amelunxen Eric, Wolgast Nadine, Walter Wencke, Zaliova Marketa, Antić Željko, Chouvarine Philippe, Bartsch Lorenz, Barz Malwine J, Bultmann Miriam, Horns Johanna, Bendig Sonja, Kässens Jan, Kaleta Christoph, Cario Gunnar, Schrappe Martin, Neumann Martin, Gökbuget Nicola, Bergmann Anke Katharina, Trka Jan, Haferlach Claudia, Brüggemann Monika, Baldus Claudia D, Bastian Lorenz
Medical Department II, Hematology and Oncology, University Hospital Schleswig-Holstein, Kiel, Germany.
Clinical Research Unit "CATCH ALL" (KFO 5010/1) funded by the Deutsche Forschungsgemeinschaft, Bonn, Germany.
Hemasphere. 2023 Aug 25;7(9):e939. doi: 10.1097/HS9.0000000000000939. eCollection 2023 Sep.
Current classifications (World Health Organization-HAEM5/ICC) define up to 26 molecular B-cell precursor acute lymphoblastic leukemia (BCP-ALL) disease subtypes by genomic driver aberrations and corresponding gene expression signatures. Identification of driver aberrations by transcriptome sequencing (RNA-Seq) is well established, while systematic approaches for gene expression analysis are less advanced. Therefore, we developed ALLCatchR, a machine learning-based classifier using RNA-Seq gene expression data to allocate BCP-ALL samples to all 21 gene expression-defined molecular subtypes. Trained on n = 1869 transcriptome profiles with established subtype definitions (4 cohorts; 55% pediatric / 45% adult), ALLCatchR allowed subtype allocation in 3 independent hold-out cohorts (n = 1018; 75% pediatric / 25% adult) with 95.7% accuracy (averaged sensitivity across subtypes: 91.1% / specificity: 99.8%). High-confidence predictions were achieved in 83.7% of samples with 98.9% accuracy. Only 1.2% of samples remained unclassified. ALLCatchR outperformed existing tools and identified novel driver candidates in previously unassigned samples. Additional modules provided predictions of samples blast counts, patient's sex, and immunophenotype, allowing the imputation in cases where these information are missing. We established a novel RNA-Seq reference of human B-lymphopoiesis using 7 FACS-sorted progenitor stages from healthy bone marrow donors. Implementation in ALLCatchR enabled projection of BCP-ALL samples to this trajectory. This identified shared proximity patterns of BCP-ALL subtypes to normal lymphopoiesis stages, extending immunophenotypic classifications with a novel framework for developmental comparisons of BCP-ALL. ALLCatchR enables RNA-Seq routine application for BCP-ALL diagnostics with systematic gene expression analysis for accurate subtype allocation and novel insights into underlying developmental trajectories.
目前的分类方法(世界卫生组织-HAEM5/ICC)通过基因组驱动畸变和相应的基因表达特征定义了多达26种分子B细胞前体急性淋巴细胞白血病(BCP-ALL)疾病亚型。通过转录组测序(RNA-Seq)识别驱动畸变已经很成熟,而基因表达分析的系统方法则不太先进。因此,我们开发了ALLCatchR,这是一种基于机器学习的分类器,使用RNA-Seq基因表达数据将BCP-ALL样本分配到所有21种基因表达定义的分子亚型中。在n = 1869个具有既定亚型定义的转录组图谱(4个队列;55%为儿科/45%为成人)上进行训练后,ALLCatchR能够在3个独立的保留队列(n = 1018;75%为儿科/25%为成人)中进行亚型分配,准确率达到95.7%(各亚型平均敏感性:91.1%/特异性:99.8%)。在83.7%的样本中实现了高置信度预测,准确率为98.9%。只有1.2%的样本仍未分类。ALLCatchR优于现有工具,并在先前未分配的样本中识别出了新的驱动候选基因。其他模块提供了样本原始细胞计数、患者性别和免疫表型的预测,允许在这些信息缺失的情况下进行估算。我们使用来自健康骨髓供体的7个通过荧光激活细胞分选(FACS)的祖细胞阶段建立了一种新型的人类B淋巴细胞生成RNA-Seq参考。在ALLCatchR中的应用能够将BCP-ALL样本投射到这一轨迹上。这确定了BCP-ALL亚型与正常淋巴细胞生成阶段的共同接近模式,用一个用于BCP-ALL发育比较的新框架扩展了免疫表型分类。ALLCatchR能够将RNA-Seq常规应用于BCP-ALL诊断,通过系统的基因表达分析进行准确的亚型分配,并对潜在的发育轨迹有新的见解。