Suppr超能文献

多变量缺失数据研究中的假设和分析计划:超越 MCAR/MAR/MNAR 分类。

Assumptions and analysis planning in studies with missing data in multiple variables: moving beyond the MCAR/MAR/MNAR classification.

机构信息

Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, Melbourne, Australia.

Department of Paediatrics, University of Melbourne, Australia.

出版信息

Int J Epidemiol. 2023 Aug 2;52(4):1268-1275. doi: 10.1093/ije/dyad008.

Abstract

Researchers faced with incomplete data are encouraged to consider whether their data are 'missing completely at random' (MCAR), 'missing at random' (MAR) or 'missing not at random' (MNAR) when planning their analysis. However, there are two major problems with this classification as originally defined by Rubin in the 1970s. First, when there are missing data in multiple variables, the plausibility of the MAR assumption is difficult to assess using substantive knowledge and is more stringent than is generally appreciated. Second, although MCAR and MAR are sufficient conditions for consistent estimation with specific methods, they are not necessary conditions and therefore this categorization does not directly determine the best approach for handling the missing data in an analysis. How best to handle missing data depends on the assumed causal relationships between variables and their missingness, and what these relationships imply in terms of the 'recoverability' of the target estimand (the population parameter that encodes the answer to the underlying research question). Recoverability is defined as whether the estimand can be consistently estimated from the patterns and associations in the observed data without needing to invoke external information on the extent to which the distribution of missing values might differ from that of observed values. In this manuscript we outline an approach for deciding which method to use to handle multivariable missing data in an analysis, using directed acyclic graphs to depict missingness assumptions and determining the implications in terms of recoverability of the target estimand.

摘要

研究人员在进行数据分析前,应考虑其数据属于完全随机缺失(MCAR)、随机缺失(MAR)还是非随机缺失(MNAR)。然而,Rubin 在 20 世纪 70 年代最初定义的这种分类存在两个主要问题。首先,当多个变量存在缺失数据时,基于实质性知识评估 MAR 假设的合理性较为困难,且比普遍认为的更为严格。其次,虽然 MCAR 和 MAR 是使用特定方法进行一致估计的充分条件,但并非必要条件,因此这种分类并不能直接确定分析中处理缺失数据的最佳方法。如何最好地处理缺失数据取决于变量及其缺失之间的假设因果关系,以及这些关系在目标估计量(表示回答基础研究问题的总体参数)的“可恢复性”方面意味着什么。可恢复性是指是否可以根据观察数据中的模式和关联一致地估计估计量,而无需援引关于缺失值分布与观察值分布差异程度的外部信息。在本文中,我们概述了一种使用有向无环图(DAG)来描述缺失假设并根据目标估计量的可恢复性来确定其含义的方法,用于决定在分析中使用哪种方法来处理多变量缺失数据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d088/10396404/a5d208dd847c/dyad008f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验