Department of Public Health and Primary Care, KU Leuven, Kapucijnenvoer 33, J building, 3000, Leuven, Belgium.
Open Analytics NV, Antwerp, Belgium.
BMC Med Res Methodol. 2021 Apr 2;21(1):62. doi: 10.1186/s12874-021-01256-3.
In case-control studies most algorithms allow the controls to be sampled several times, which is not always optimal. If many controls are available and adjustment for several covariates is necessary, matching without replacement might increase statistical efficiency. Comparing similar units when having observational data is of utter importance, since confounding and selection bias is present. The aim was twofold, firstly to create a method that accommodates the option that a control is not resampled, and second, to display several scenarios that identify changes of Odds Ratios (ORs) while increasing the balance of the matched sample.
The algorithm was derived in an iterative way starting from the pre-processing steps to derive the data until its application in a study to investigate the risk of antibiotics on colorectal cancer in the INTEGO registry (Flanders, Belgium). Different scenarios were developed to investigate the fluctuation of ORs using the combination of exact and varying variables with or without replacement of controls. To achieve balance in the population, we introduced the Comorbidity Index (CI) variable, which is the sum of chronic diseases as a means to have comparable units for drawing valid associations.
This algorithm is fast and optimal. We simulated data and demonstrated that the run-time of matching even with millions of patients is minimal. Optimal, since the closest controls is always captured (using the appropriate ordering and by creating some auxiliary variables), and in the scenario that a case has only one control, we assure that this control will be matched to this case, thus maximizing the cases to be used in the analysis. In total, 72 different scenarios were displayed indicating the fluctuation of ORs, and revealing patterns, especially a drop when balancing the population.
We created an optimal and computationally efficient algorithm to derive a matched case-control sample with and without replacement of controls. The code and the functions are publicly available as an open source in an R package. Finally, we emphasize the importance of displaying several scenarios and assess the difference of ORs while using an index to balance population in observational data.
在病例对照研究中,大多数算法允许对照被多次采样,这并不总是最佳的。如果有许多对照可用,并且需要对多个协变量进行调整,则不进行替换的匹配可能会提高统计效率。在有观察数据的情况下,比较相似的单位是至关重要的,因为存在混杂和选择偏差。目的有两个,首先是创建一种方法,该方法允许控制不被重新采样,其次是展示几种情况,以识别在增加匹配样本平衡的同时,比值比(OR)的变化。
该算法是通过从预处理步骤开始迭代得出数据,直到应用于一项研究,以调查 INTEGO 注册表(比利时佛兰德斯)中抗生素对结直肠癌的风险,从而得出该算法。开发了不同的场景,以研究使用精确和变化的变量组合进行替换或不替换对照时 OR 的波动。为了在人群中实现平衡,我们引入了共病指数(CI)变量,该变量是慢性病的总和,作为绘制有效关联的可比单位的一种手段。
该算法快速且最优。我们模拟了数据,并证明即使有上百万的患者,匹配的运行时间也是最小的。最优的,因为总是能捕获最接近的对照(使用适当的排序并创建一些辅助变量),并且在病例只有一个对照的情况下,我们保证将这个对照与这个病例匹配,从而最大限度地利用分析中的病例。总共显示了 72 种不同的场景,这些场景显示了 OR 的波动,并揭示了一些模式,特别是在平衡人口时出现的下降。
我们创建了一种最优且计算效率高的算法,用于生成带或不带对照替换的匹配病例对照样本。代码和函数作为一个开源 R 包公开可用。最后,我们强调了显示几种情况并评估使用指数平衡人口时 OR 差异的重要性。