Garst Swier, Dekker Julian, Reinders Marcel
Intelligent Systems, Delft University of Technology, van Mourik Broekmanweg 6, Delft, Zuid-Holland 2628 XE, The Netherlands.
Database (Oxford). 2025 Mar 19;2025. doi: 10.1093/database/baaf016.
Federated learning is an upcoming machine learning paradigm which allows data from multiple sources to be used for training of classifiers without the data leaving the source it originally resides. This can be highly valuable for use cases such as medical research, where gathering data at a central location can be quite complicated due to privacy and legal concerns of the data. In such cases, federated learning has the potential to vastly speed up the research cycle. Although federated and central learning have been compared from a theoretical perspective, an extensive experimental comparison of performances and learning behavior still lacks. We have performed a comprehensive experimental comparison between federated and centralized learning. We evaluated various classifiers on various datasets exploring influences of different sample distributions as well as different class distributions across the clients. The results show similar performances under a wide variety of settings between the federated and central learning strategies. Federated learning is able to deal with various imbalances in the data distributions. It is sensitive to batch effects between different datasets when they coincide with location, similar to central learning, but this setting might go unobserved more easily. Federated learning seems to be robust to various challenges such as skewed data distributions, high data dimensionality, multiclass problems, and complex models. Taken together, the insights from our comparison gives much promise for applying federated learning as an alternative to sharing data. Code for reproducing the results in this work can be found at: https://github.com/swiergarst/FLComparison.
联邦学习是一种即将兴起的机器学习范式,它允许来自多个源的数据用于训练分类器,而无需数据离开其原始所在的源。这对于医学研究等用例可能非常有价值,因为在医学研究中,由于数据的隐私和法律问题,在中心位置收集数据可能相当复杂。在这种情况下,联邦学习有可能极大地加快研究周期。虽然已经从理论角度对联邦学习和集中式学习进行了比较,但仍然缺乏对性能和学习行为的广泛实验比较。我们对联邦学习和集中式学习进行了全面的实验比较。我们在各种数据集上评估了各种分类器,探讨了不同样本分布以及客户端之间不同类分布的影响。结果表明,在各种设置下,联邦学习策略和集中式学习策略的性能相似。联邦学习能够处理数据分布中的各种不平衡。与集中式学习类似,当不同数据集之间的批次效应与位置一致时,它对批次效应敏感,但这种情况可能更容易被忽视。联邦学习似乎对各种挑战具有鲁棒性,如数据分布倾斜、高数据维度、多类问题和复杂模型。综上所述,我们比较得出的见解为将联邦学习作为共享数据的替代方案应用提供了很大的前景。可在以下网址找到用于重现本文结果的代码:https://github.com/swiergarst/FLComparison 。