Suppr超能文献

HarmonizR:阻塞和奇异特征数据调整可提高运行时效率和数据保存能力。

HarmonizR: blocking and singular feature data adjustment improve runtime efficiency and data preservation.

作者信息

Schlumbohm Simon, Neumann Julia E, Neumann Philipp

机构信息

Chair for High Performance Computing, Helmut-Schmidt-University, University of the Federal Armed Forces Hamburg, Holstenhofweg 85, 22043, Hamburg, Hamburg, Germany.

Center for Molecular Neurobiology Hamburg (ZMNH), University Medical Center Hamburg-Eppendorf (UKE), Falkenried 94, 20251, Hamburg, Hamburg, Germany.

出版信息

BMC Bioinformatics. 2025 Feb 11;26(1):47. doi: 10.1186/s12859-025-06073-9.

Abstract

BACKGROUND

Data adjustment is an essential tool for increasing statistical power during analysis, for example in case of complex multi-experiment data from (single-cell) RNA, proteomics and other omics data. Despite its benefits, data integration introduces internal biases-so-called batch effects. Due to the inherent presence of missing values by such methods and their additional introduction by means of data integration, renowned algorithms such as ComBat and limma are unable to perform batch effect adjustment. Recently, the HarmonizR framework was presented for these cases, which is a tool for missing value tolerant data adjustment.

RESULTS

In this contribution, we provide significant improvements to the HarmonizR approach. A novel blocking strategy is introduced to severely reduce runtime, while still supporting parallel architectures. Additionally, a "unique removal" strategy has been integrated into HarmonizR to maintain even more features for adjustment in datasets, showing a feature rescue of up to 103.9% for our tested datasets. In this work, we show (1) severely improved runtime for both small and large, real datasets and (2) the ability retain more features from the integrated dataset during adjustment, showing a feature rescue of up to 103.9% for our tested datasets.

CONCLUSION

The proposed improvements tackle the previous shortcomings of the published HarmonizR version. Since HarmonizR was mainly developed for dataset integration on rare tumor entities, it did not include runtime improvements beyond parallelization, which has been addressed in this update. An additionally welcome update regarding improved feature rescue furthermore enhances the algorithms ability to quickly and robustly perform batch effect reduction.

摘要

背景

数据调整是在分析过程中提高统计效力的重要工具,例如对于来自(单细胞)RNA、蛋白质组学和其他组学数据的复杂多实验数据。尽管有诸多益处,但数据整合会引入内部偏差,即所谓的批次效应。由于此类方法中固有缺失值的存在以及通过数据整合额外引入的缺失值,诸如ComBat和limma等知名算法无法进行批次效应调整。最近,针对这些情况提出了HarmonizR框架,它是一种容忍缺失值的数据调整工具。

结果

在本论文中,我们对HarmonizR方法进行了显著改进。引入了一种新颖的分块策略以大幅减少运行时间,同时仍支持并行架构。此外,“唯一值去除”策略已集成到HarmonizR中,以在数据集中保留更多用于调整的特征,对于我们测试的数据集,特征挽救率高达103.9%。在这项工作中,我们展示了:(1)对于大小各异的真实数据集,运行时间都有显著改善;(2)在调整过程中能够从整合数据集中保留更多特征,对于我们测试的数据集,特征挽救率高达103.9%。

结论

所提出的改进解决了已发表的HarmonizR版本先前存在的缺点。由于HarmonizR主要是为罕见肿瘤实体上的数据集整合而开发的,它没有包含除并行化之外的运行时间改进,而本次更新解决了这一问题。另外,关于改进特征挽救的更新也进一步增强了该算法快速且稳健地减少批次效应的能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d942/11817103/38bfcbeeba0a/12859_2025_6073_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验