Dörrich Marion, Balk Matthias, Heusinger Tatjana, Beyer Sandra, Mirbagheri Hamed, Fischer David J, Kanso Hassan, Matek Christian, Hartmann Arndt, Iro Heinrich, Eckstein Markus, Gostian Antoniu-Oreste, Kist Andreas M
Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany.
Department of Otolaryngology - Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany.
Nat Commun. 2025 Aug 4;16(1):7163. doi: 10.1038/s41467-025-62386-6.
Head and neck cancer is a common disease and is associated with a poor prognosis. A promising approach to improving patient outcomes is personalized treatment, which uses information from a variety of modalities. However, only little progress has been made due to the lack of large public datasets. We present a multimodal dataset, HANCOCK, that comprises monocentric, real-world data of 763 head and neck cancer patients. Our dataset contains demographical, pathological, and blood data as well as surgery reports and histologic images, that can be explored in a low-dimensional representation. We can show that combining these modalities using machine learning is superior to a single modality and the integration of imaging data using foundation models helps in endpoint prediction. We believe that HANCOCK will not only open new insights into head and neck cancer pathology but also serve as a major source for researching multimodal machine-learning methodologies in precision oncology.
头颈癌是一种常见疾病,预后较差。改善患者预后的一种有前景的方法是个性化治疗,即使用来自多种模式的信息。然而,由于缺乏大型公共数据集,进展甚微。我们展示了一个多模式数据集HANCOCK,它包含763名头颈癌患者的单中心真实世界数据。我们的数据集包含人口统计学、病理学和血液数据以及手术报告和组织学图像,这些数据可以在低维表示中进行探索。我们可以证明,使用机器学习结合这些模式优于单一模式,并且使用基础模型整合成像数据有助于终点预测。我们相信,HANCOCK不仅将为头颈癌病理学带来新的见解,还将成为精准肿瘤学中研究多模式机器学习方法的主要来源。