Google DeepMind, London, UK.
Google Research, London, UK.
Nat Med. 2024 Apr;30(4):1166-1173. doi: 10.1038/s41591-024-02838-6. Epub 2024 Apr 10.
Domain generalization is a ubiquitous challenge for machine learning in healthcare. Model performance in real-world conditions might be lower than expected because of discrepancies between the data encountered during deployment and development. Underrepresentation of some groups or conditions during model development is a common cause of this phenomenon. This challenge is often not readily addressed by targeted data acquisition and 'labeling' by expert clinicians, which can be prohibitively expensive or practically impossible because of the rarity of conditions or the available clinical expertise. We hypothesize that advances in generative artificial intelligence can help mitigate this unmet need in a steerable fashion, enriching our training dataset with synthetic examples that address shortfalls of underrepresented conditions or subgroups. We show that diffusion models can automatically learn realistic augmentations from data in a label-efficient manner. We demonstrate that learned augmentations make models more robust and statistically fair in-distribution and out of distribution. To evaluate the generality of our approach, we studied three distinct medical imaging contexts of varying difficulty: (1) histopathology, (2) chest X-ray and (3) dermatology images. Complementing real samples with synthetic ones improved the robustness of models in all three medical tasks and increased fairness by improving the accuracy of clinical diagnosis within underrepresented groups, especially out of distribution.
领域泛化是医疗保健机器学习中普遍存在的挑战。由于部署和开发过程中遇到的数据差异,模型在实际条件下的性能可能低于预期。在模型开发过程中,某些群体或条件代表性不足是这种现象的常见原因。通过专家临床医生有针对性的数据采集和“标记”通常无法轻易解决这个挑战,因为条件的罕见性或可用的临床专业知识,这可能非常昂贵或实际上不可能。我们假设生成式人工智能的进步可以以可控的方式帮助缓解这种未满足的需求,通过用解决代表性不足的条件或子组的短缺的合成示例来丰富我们的训练数据集。我们表明,扩散模型可以以标签高效的方式自动从数据中学习逼真的增强。我们证明,学习到的增强使模型在分布内和分布外更稳健且具有统计学公平性。为了评估我们方法的通用性,我们研究了三个不同难度的医学成像背景:(1)组织病理学,(2)胸部 X 射线和(3)皮肤病学图像。在所有三个医学任务中,用合成样本补充真实样本可以提高模型的稳健性,并通过提高代表性不足群体的临床诊断准确性(尤其是分布外)来提高公平性。