Pirmani Ashkan, De Brouwer Edward, Geys Lotte, Parciak Tina, Moreau Yves, Peeters Liesbet M
ESAT, STADIUS, KU Leuven, Leuven, Belgium.
Biomedical Research Institute, Hasselt University, Diepenbeek, Belgium.
JMIR Med Inform. 2023 Nov 9;11:e48030. doi: 10.2196/48030.
Investigating low-prevalence diseases such as multiple sclerosis is challenging because of the rather small number of individuals affected by this disease and the scattering of real-world data across numerous data sources. These obstacles impair data integration, standardization, and analysis, which negatively impact the generation of significant meaningful clinical evidence.
This study aims to present a comprehensive, research question-agnostic, multistakeholder-driven end-to-end data analysis pipeline that accommodates 3 prevalent data-sharing streams: individual data sharing, core data set sharing, and federated model sharing.
A demand-driven methodology is employed for standardization, followed by 3 streams of data acquisition, a data quality enhancement process, a data integration procedure, and a concluding analysis stage to fulfill real-world data-sharing requirements. This pipeline's effectiveness was demonstrated through its successful implementation in the COVID-19 and multiple sclerosis global data sharing initiative.
The global data sharing initiative yielded multiple scientific publications and provided extensive worldwide guidance for the community with multiple sclerosis. The pipeline facilitated gathering pertinent data from various sources, accommodating distinct sharing streams and assimilating them into a unified data set for subsequent statistical analysis or secure data examination. This pipeline contributed to the assembly of the largest data set of people with multiple sclerosis infected with COVID-19.
The proposed data analysis pipeline exemplifies the potential of global stakeholder collaboration and underlines the significance of evidence-based decision-making. It serves as a paradigm for how data sharing initiatives can propel advancements in health care, emphasizing its adaptability and capacity to address diverse research inquiries.
由于受多发性硬化症影响的个体数量相对较少,且真实世界的数据分散在众多数据源中,因此对多发性硬化症等低患病率疾病进行研究具有挑战性。这些障碍妨碍了数据整合、标准化和分析,对产生有意义的临床证据产生了负面影响。
本研究旨在提出一个全面的、与研究问题无关的、多利益相关者驱动的端到端数据分析流程,该流程可容纳3种常见的数据共享流:个体数据共享、核心数据集共享和联邦模型共享。
采用需求驱动的方法进行标准化,随后进行3个数据采集流、一个数据质量提升过程、一个数据整合程序以及一个总结分析阶段,以满足真实世界的数据共享需求。通过在COVID-19和多发性硬化症全球数据共享倡议中的成功实施,证明了该流程的有效性。
全球数据共享倡议产生了多篇科学出版物,并为多发性硬化症社区提供了广泛的全球指导。该流程有助于从各种来源收集相关数据,适应不同的共享流,并将它们整合到一个统一的数据集中,以便进行后续的统计分析或安全的数据审查。该流程促成了最大的感染COVID-19的多发性硬化症患者数据集的汇集。
所提出的数据分析流程体现了全球利益相关者合作的潜力,并强调了基于证据的决策的重要性。它为数据共享倡议如何推动医疗保健进步提供了一个范例,强调了其适应性和应对各种研究问题的能力。