Institute of Computational Biology, Helmholtz Munich, Neuherberg 85764, Germany.
Life and Medical Sciences Institute, Faculty of Mathematics and Natural Sciences, University of Bonn, Bonn 53115, Germany.
Bioinformatics. 2023 Sep 2;39(9). doi: 10.1093/bioinformatics/btad531.
Federated Learning (FL) is gaining traction in various fields as it enables integrative data analysis without sharing sensitive data, such as in healthcare. However, the risk of data leakage caused by malicious attacks must be considered. In this study, we introduce a novel attack algorithm that relies on being able to compute sample means, sample covariances, and construct known linearly independent vectors on the data owner side.
We show that these basic functionalities, which are available in several established FL frameworks, are sufficient to reconstruct privacy-protected data. Additionally, the attack algorithm is robust to defense strategies that involve adding random noise. We demonstrate the limitations of existing frameworks and propose potential defense strategies analyzing the implications of using differential privacy. The novel insights presented in this study will aid in the improvement of FL frameworks.
The code examples are provided at GitHub (https://github.com/manuhuth/Data-Leakage-From-Covariances.git). The CNSIM1 dataset, which we used in the manuscript, is available within the DSData R package (https://github.com/datashield/DSData/tree/main/data).
联邦学习(FL)在各个领域得到了广泛关注,因为它允许在不共享敏感数据(如医疗保健领域)的情况下进行综合数据分析。然而,必须考虑恶意攻击造成的数据泄露风险。在本研究中,我们引入了一种新的攻击算法,该算法依赖于能够在数据所有者端计算样本均值、样本协方差和构造已知线性无关向量的能力。
我们表明,在几个已建立的 FL 框架中可用的这些基本功能足以重建受隐私保护的数据。此外,该攻击算法对涉及添加随机噪声的防御策略具有鲁棒性。我们分析了使用差分隐私的影响,展示了现有框架的局限性,并提出了潜在的防御策略。本研究提出的新见解将有助于改进 FL 框架。
代码示例可在 GitHub(https://github.com/manuhuth/Data-Leakage-From-Covariances.git)上获得。我们在论文中使用的 CNSIM1 数据集可在 DSData R 包(https://github.com/datashield/DSData/tree/main/data)中获得。