Zhi Degui, Jiang Xiaoqian, Harmanci Arif
Department of Bioinformatics and Systems Medicine, D. Bradley McWilliams School of Biomedical Informatics, University of Texas Health Science Center, Houston, Texas 77030, USA.
Department of Health Data Science and Artificial Intelligence, D. Bradley McWilliams School of Biomedical Informatics, University of Texas Health Science Center, Houston, Texas 77030, USA.
Genome Res. 2025 Feb 14;35(2):326-339. doi: 10.1101/gr.278934.124.
One of the major challenges in genomic data sharing is protecting participants' privacy in collaborative studies and in cases when genomic data are outsourced to perform analysis tasks, for example, genotype imputation services and federated collaborations genomic analysis. Although numerous cryptographic methods have been developed, these methods may not yet be practical for population-scale tasks in terms of computational requirements, rely on high-level expertise in security, and require each algorithm to be implemented from scratch. In this study, we focus on outsourcing of genotype imputation, a fundamental task that utilizes population-level reference panels, and develop protocols that rely on using "proxy panels" to protect genotype panels, whereas the imputation task is being outsourced at servers. The proxy panels are generated through a series of protection mechanisms such as haplotype sampling, allele hashing, and coordinate anonymization to protect the underlying sensitive panel's genetic variant coordinates, genetic maps, and chromosome-wide haplotypes. Although the resulting proxy panels are almost distinct from the sensitive panels, they are valid panels that can be used as input to imputation methods such as Beagle. We demonstrate that proxy-based imputation protects against well-known attacks with a minor decrease in imputation accuracy for variants in a wide range of allele frequencies.
基因组数据共享中的一个主要挑战是,在合作研究以及将基因组数据外包以执行分析任务(例如基因型填充服务和联合合作基因组分析)的情况下,保护参与者的隐私。尽管已经开发了许多加密方法,但就计算要求而言,这些方法可能尚未适用于大规模人群任务,依赖于安全方面的高级专业知识,并且需要从头开始实现每个算法。在本研究中,我们专注于基因型填充的外包,这是一项利用人群水平参考面板的基础任务,并开发了依赖于使用“代理面板”来保护基因型面板的协议,而填充任务则外包给服务器。代理面板是通过一系列保护机制生成的,例如单倍型采样、等位基因哈希和坐标匿名化,以保护基础敏感面板的遗传变异坐标、遗传图谱和全染色体单倍型。尽管生成的代理面板几乎与敏感面板不同,但它们是有效的面板,可作为诸如Beagle等填充方法的输入。我们证明,基于代理的填充能够抵御已知攻击,对于广泛等位基因频率的变异,填充准确性仅有轻微下降。