Warnat-Herresthal Stefanie, Perrakis Konstantinos, Taschler Bernd, Becker Matthias, Baßler Kevin, Beyer Marc, Günther Patrick, Schulte-Schrepping Jonas, Seep Lea, Klee Kathrin, Ulas Thomas, Haferlach Torsten, Mukherjee Sach, Schultze Joachim L
LIMES-Institute, Department for Genomics and Immunoregulation, University of Bonn, Carl-Troll-Str. 31, 53115 Bonn, Germany.
Statistics and Machine Learning, German Center for Neurodegenerative Diseases, Venusberg-Campus 1, Building 99, 53127 Bonn, Germany.
iScience. 2020 Jan 24;23(1):100780. doi: 10.1016/j.isci.2019.100780. Epub 2019 Dec 18.
Acute myeloid leukemia (AML) is a severe, mostly fatal hematopoietic malignancy. We were interested in whether transcriptomic-based machine learning could predict AML status without requiring expert input. Using 12,029 samples from 105 different studies, we present a large-scale study of machine learning-based prediction of AML in which we address key questions relating to the combination of machine learning and transcriptomics and their practical use. We find data-driven, high-dimensional approaches-in which multivariate signatures are learned directly from genome-wide data with no prior knowledge-to be accurate and robust. Importantly, these approaches are highly scalable with low marginal cost, essentially matching human expert annotation in a near-automated workflow. Our results support the notion that transcriptomics combined with machine learning could be used as part of an integrated -omics approach wherein risk prediction, differential diagnosis, and subclassification of AML are achieved by genomics while diagnosis could be assisted by transcriptomic-based machine learning.
急性髓系白血病(AML)是一种严重的、大多致命的血液系统恶性肿瘤。我们感兴趣的是基于转录组学的机器学习能否在无需专家输入的情况下预测AML状态。利用来自105项不同研究的12029个样本,我们开展了一项基于机器学习预测AML的大规模研究,其中我们解决了与机器学习和转录组学相结合及其实际应用相关的关键问题。我们发现数据驱动的高维方法——在没有先验知识的情况下直接从全基因组数据中学习多变量特征——准确且稳健。重要的是,这些方法具有高度可扩展性且边际成本低,在近乎自动化的工作流程中基本能与人类专家注释相匹配。我们的结果支持这样一种观点,即转录组学与机器学习相结合可作为综合组学方法的一部分,其中AML的风险预测、鉴别诊断和亚分类可通过基因组学实现,而诊断可借助基于转录组学的机器学习来辅助。