Zeng Ping, Shao Zhonghe, Zhou Xiang
Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China.
Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China.
Comput Struct Biotechnol J. 2021 May 26;19:3209-3224. doi: 10.1016/j.csbj.2021.05.042. eCollection 2021.
Mediation analysis investigates the intermediate mechanism through which an exposure exerts its influence on the outcome of interest. Mediation analysis is becoming increasingly popular in high-throughput genomics studies where a common goal is to identify molecular-level traits, such as gene expression or methylation, which actively mediate the genetic or environmental effects on the outcome. Mediation analysis in genomics studies is particularly challenging, however, thanks to the large number of potential mediators measured in these studies as well as the composite null nature of the mediation effect hypothesis. Indeed, while the standard univariate and multivariate mediation methods have been well-established for analyzing one or multiple mediators, they are not well-suited for genomics studies with a large number of mediators and often yield conservative p-values and limited power. Consequently, over the past few years many new high-dimensional mediation methods have been developed for analyzing the large number of potential mediators collected in high-throughput genomics studies. In this work, we present a thorough review of these important recent methodological advances in high-dimensional mediation analysis. Specifically, we describe in detail more than ten high-dimensional mediation methods, focusing on their motivations, basic modeling ideas, specific modeling assumptions, practical successes, methodological limitations, as well as future directions. We hope our review will serve as a useful guidance for statisticians and computational biologists who develop methods of high-dimensional mediation analysis as well as for analysts who apply mediation methods to high-throughput genomics studies.
中介分析研究暴露对感兴趣的结局施加影响所通过的中间机制。在高通量基因组学研究中,中介分析正变得越来越流行,在这类研究中,一个共同目标是识别分子水平的特征,如基因表达或甲基化,这些特征可积极介导遗传或环境因素对结局的影响。然而,基因组学研究中的中介分析尤其具有挑战性,这是由于在这些研究中测量了大量潜在的中介因素,以及中介效应假设的复合零假设性质。事实上,虽然标准的单变量和多变量中介方法已被广泛确立用于分析一个或多个中介因素,但它们并不适合用于具有大量中介因素的基因组学研究,并且常常产生保守的p值和有限的检验效能。因此,在过去几年中,已经开发了许多新的高维中介方法,用于分析高通量基因组学研究中收集的大量潜在中介因素。在这项工作中,我们对高维中介分析中这些重要的近期方法学进展进行了全面综述。具体而言,我们详细描述了十多种高维中介方法,重点介绍它们的动机、基本建模思想、特定建模假设、实际成功案例、方法学局限性以及未来方向。我们希望我们的综述能为开发高维中介分析方法的统计学家和计算生物学家以及将中介方法应用于高通量基因组学研究的分析师提供有用的指导。