Department of Epidemiology, College of Medicine & College of Public Health and Health Professions, University of Florida, Gainesville, FL, 32610, USA.
Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, 32610, USA.
BMC Med Inform Decis Mak. 2018 Dec 29;18(1):139. doi: 10.1186/s12911-018-0719-2.
Nowadays, trendy research in biomedical sciences juxtaposes the term 'precision' to medicine and public health with companion words like big data, data science, and deep learning. Technological advancements permit the collection and merging of large heterogeneous datasets from different sources, from genome sequences to social media posts or from electronic health records to wearables. Additionally, complex algorithms supported by high-performance computing allow one to transform these large datasets into knowledge. Despite such progress, many barriers still exist against achieving precision medicine and precision public health interventions for the benefit of the individual and the population.
The present work focuses on analyzing both the technical and societal hurdles related to the development of prediction models of health risks, diagnoses and outcomes from integrated biomedical databases. Methodological challenges that need to be addressed include improving semantics of study designs: medical record data are inherently biased, and even the most advanced deep learning's denoising autoencoders cannot overcome the bias if not handled a priori by design. Societal challenges to face include evaluation of ethically actionable risk factors at the individual and population level; for instance, usage of gender, race, or ethnicity as risk modifiers, not as biological variables, could be replaced by modifiable environmental proxies such as lifestyle and dietary habits, household income, or access to educational resources.
Data science for precision medicine and public health warrants an informatics-oriented formalization of the study design and interoperability throughout all levels of the knowledge inference process, from the research semantics, to model development, and ultimately to implementation.
如今,生物医学科学领域的热门研究将“精准”一词与大数据、数据科学和深度学习等词并列,用于医学和公共卫生领域。技术进步使得人们能够从不同来源(从基因组序列到社交媒体帖子,从电子健康记录到可穿戴设备)收集和合并大型异构数据集。此外,高性能计算支持的复杂算法允许人们将这些大型数据集转化为知识。尽管取得了这些进展,但在实现精准医学和精准公共卫生干预措施以造福个人和人群方面,仍然存在许多障碍。
本研究重点分析了从综合生物医学数据库中开发健康风险、诊断和结果预测模型相关的技术和社会障碍。需要解决的方法学挑战包括改善研究设计的语义:医疗记录数据本质上存在偏差,即使是最先进的深度学习去噪自动编码器,如果不在设计中预先处理,也无法克服偏差。需要面对的社会挑战包括在个体和人群层面评估可采取伦理行动的风险因素;例如,将性别、种族或民族用作风险修饰符,而不是作为生物学变量,可以用可修改的环境替代物(如生活方式和饮食习惯、家庭收入或获得教育资源)来代替。
精准医学和公共卫生的数据科学需要对研究设计进行面向信息学的形式化,并在从研究语义到模型开发,最终到实施的知识推理过程的所有级别实现互操作性。