协方差信息的可访问性在联邦学习框架中造成了漏洞。

Accessibility of covariance information creates vulnerability in Federated Learning frameworks.

机构信息

Institute of Computational Biology, Helmholtz Munich, Neuherberg 85764, Germany.

Life and Medical Sciences Institute, Faculty of Mathematics and Natural Sciences, University of Bonn, Bonn 53115, Germany.

出版信息

Bioinformatics. 2023 Sep 2;39(9). doi: 10.1093/bioinformatics/btad531.

DOI:10.1093/bioinformatics/btad531

PMID:37647639

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10516515/

Abstract

MOTIVATION

Federated Learning (FL) is gaining traction in various fields as it enables integrative data analysis without sharing sensitive data, such as in healthcare. However, the risk of data leakage caused by malicious attacks must be considered. In this study, we introduce a novel attack algorithm that relies on being able to compute sample means, sample covariances, and construct known linearly independent vectors on the data owner side.

RESULTS

We show that these basic functionalities, which are available in several established FL frameworks, are sufficient to reconstruct privacy-protected data. Additionally, the attack algorithm is robust to defense strategies that involve adding random noise. We demonstrate the limitations of existing frameworks and propose potential defense strategies analyzing the implications of using differential privacy. The novel insights presented in this study will aid in the improvement of FL frameworks.

AVAILABILITY AND IMPLEMENTATION

The code examples are provided at GitHub (https://github.com/manuhuth/Data-Leakage-From-Covariances.git). The CNSIM1 dataset, which we used in the manuscript, is available within the DSData R package (https://github.com/datashield/DSData/tree/main/data).

摘要

动机

联邦学习（FL）在各个领域得到了广泛关注，因为它允许在不共享敏感数据（如医疗保健领域）的情况下进行综合数据分析。然而，必须考虑恶意攻击造成的数据泄露风险。在本研究中，我们引入了一种新的攻击算法，该算法依赖于能够在数据所有者端计算样本均值、样本协方差和构造已知线性无关向量的能力。

结果

我们表明，在几个已建立的 FL 框架中可用的这些基本功能足以重建受隐私保护的数据。此外，该攻击算法对涉及添加随机噪声的防御策略具有鲁棒性。我们分析了使用差分隐私的影响，展示了现有框架的局限性，并提出了潜在的防御策略。本研究提出的新见解将有助于改进 FL 框架。

可用性和实现

代码示例可在 GitHub（https://github.com/manuhuth/Data-Leakage-From-Covariances.git）上获得。我们在论文中使用的 CNSIM1 数据集可在 DSData R 包（https://github.com/datashield/DSData/tree/main/data）中获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2692/10516515/d7f380989ea5/btad531f1.jpg

相似文献

Accessibility of covariance information creates vulnerability in Federated Learning frameworks.

Bioinformatics. 2023 Sep 2;39(9). doi: 10.1093/bioinformatics/btad531.

The FeatureCloud Platform for Federated Learning in Biomedicine: Unified Approach.

J Med Internet Res. 2023 Jul 12;25:e42621. doi: 10.2196/42621.

Federated horizontally partitioned principal component analysis for biomedical applications.

Bioinform Adv. 2022 Apr 26;2(1):vbac026. doi: 10.1093/bioadv/vbac026. eCollection 2022.

LFighter: Defending against the label-flipping attack in federated learning.

Neural Netw. 2024 Feb;170:111-126. doi: 10.1016/j.neunet.2023.11.019. Epub 2023 Nov 11.

Multi-site fMRI analysis using privacy-preserving federated learning and domain adaptation: ABIDE results.

Med Image Anal. 2020 Oct;65:101765. doi: 10.1016/j.media.2020.101765. Epub 2020 Jul 2.

Analysis of Privacy-Enhancing Technologies in Open-Source Federated Learning Frameworks for Driver Activity Recognition.

Sensors (Basel). 2022 Apr 13;22(8):2983. doi: 10.3390/s22082983.

dsMTL: a computational framework for privacy-preserving, distributed multi-task machine learning.

Bioinformatics. 2022 Oct 31;38(21):4919-4926. doi: 10.1093/bioinformatics/btac616.

FL-QSAR: a federated learning-based QSAR prototype for collaborative drug discovery.

Bioinformatics. 2021 Apr 1;36(22-23):5492-5498. doi: 10.1093/bioinformatics/btaa1006.

Federated unsupervised random forest for privacy-preserving patient stratification.

Bioinformatics. 2024 Sep 1;40(Suppl 2):ii198-ii207. doi: 10.1093/bioinformatics/btae382.

Robust Privacy-Preserving Recommendation Systems Driven by Multimodal Federated Learning.

IEEE Trans Neural Netw Learn Syst. 2025 May;36(5):8896-8910. doi: 10.1109/TNNLS.2024.3411402. Epub 2025 May 2.

引用本文的文献

DataSHIELD: mitigating disclosure risk in a multi-site federated analysis platform.

Bioinform Adv. 2025 Mar 10;5(1):vbaf046. doi: 10.1093/bioadv/vbaf046. eCollection 2025.

Federated difference-in-differences with multiple time periods in DataSHIELD.

iScience. 2024 Oct 9;27(11):111025. doi: 10.1016/j.isci.2024.111025. eCollection 2024 Nov 15.

An innovative technological infrastructure for managing SARS-CoV-2 data across different cohorts in compliance with General Data Protection Regulation.

Digit Health. 2024 May 15;10:20552076241248922. doi: 10.1177/20552076241248922. eCollection 2024 Jan-Dec.

本文引用的文献

OpenFL: the open federated learning library.

Phys Med Biol. 2022 Oct 19;67(21):214001. doi: 10.1088/1361-6560/ac97d9.

Challenges of data sharing in European Covid-19 projects: A learning opportunity for advancing pandemic preparedness and response.

Lancet Reg Health Eur. 2022 Oct;21:100467. doi: 10.1016/j.lanepe.2022.100467. Epub 2022 Aug 4.

Associations between exploratory dietary patterns and incident type 2 diabetes: a federated meta-analysis of individual participant data from 25 cohort studies.

Eur J Nutr. 2022 Oct;61(7):3649-3667. doi: 10.1007/s00394-022-02909-9. Epub 2022 Jun 1.

Associations of early-life pet ownership with asthma and allergic sensitization: A meta-analysis of more than 77,000 children from the EU Child Cohort Network.

J Allergy Clin Immunol. 2022 Jul;150(1):82-92. doi: 10.1016/j.jaci.2022.01.023. Epub 2022 Feb 10.

Unravelling data for rapid evidence-based response to COVID-19: a summary of the unCoVer protocol.

BMJ Open. 2021 Nov 18;11(11):e055630. doi: 10.1136/bmjopen-2021-055630.

Federated learning for predicting clinical outcomes in patients with COVID-19.

Nat Med. 2021 Oct;27(10):1735-1743. doi: 10.1038/s41591-021-01506-3. Epub 2021 Sep 15.

Remove obstacles to sharing health data with researchers outside of the European Union.

Nat Med. 2021 Aug;27(8):1329-1333. doi: 10.1038/s41591-021-01460-0.

Swarm Learning for decentralized and confidential clinical machine learning.

Nature. 2021 Jun;594(7862):265-270. doi: 10.1038/s41586-021-03583-3. Epub 2021 May 26.

Heterogeneity of Associations between Total and Types of Fish Intake and the Incidence of Type 2 Diabetes: Federated Meta-Analysis of 28 Prospective Studies Including 956,122 Participants.

Nutrients. 2021 Apr 7;13(4):1223. doi: 10.3390/nu13041223.

Orchestrating privacy-protected big data analyses of data from different resources with R and DataSHIELD.

PLoS Comput Biol. 2021 Mar 30;17(3):e1008880. doi: 10.1371/journal.pcbi.1008880. eCollection 2021 Mar.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

协方差信息的可访问性在联邦学习框架中造成了漏洞。

Accessibility of covariance information creates vulnerability in Federated Learning frameworks.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性和实现

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献