Department of Gastroenterology and Hepatology, Erasmus MC University Medical Center, Rotterdam, Netherlands.
Department of Gastroenterology and Hepatology, Albert Schweitzer Hospital, Dordrecht, Netherlands.
Cochrane Database Syst Rev. 2022 Jun 6;6(6):CD009276. doi: 10.1002/14651858.CD009276.pub2.
Worldwide, many countries have adopted colorectal cancer (CRC) screening programmes, often based on faecal occult blood tests (FOBTs). CRC screening aims to detect advanced neoplasia (AN), which is defined as CRC or advanced adenomas. FOBTs fall into two categories based on detection technique and the detected blood component: qualitative guaiac-based FOBTs (gFOBTs) and faecal immunochemical tests (FITs), which can be qualitative and quantitative. Screening with gFOBTs reduces CRC-related mortality.
To compare the diagnostic test accuracy of gFOBT and FIT screening for detecting advanced colorectal neoplasia in average-risk individuals.
We searched CENTRAL, MEDLINE, Embase, BIOSIS Citation Index, Science Citation Index Expanded, and Google Scholar. We searched the reference lists and PubMed-related articles of included studies to identify additional studies.
We included prospective and retrospective studies that provided the number of true positives, false positives, false negatives, and true negatives for gFOBTs, FITs, or both, with colonoscopy as reference standard. We excluded case-control studies. We included studies in which all participants underwent both index test and reference standard ("reference standard: all"), and studies in which only participants with a positive index test underwent the reference standard while participants with a negative test were followed for at least one year for development of interval carcinomas ("reference standard: positive"). The target population consisted of asymptomatic, average-risk individuals undergoing CRC screening. The target conditions were CRC and advanced neoplasia (advanced adenomas and CRC combined).
Two review authors independently screened and selected studies for inclusion. In case of disagreement, a third review author made the final decision. We used the Rutter and Gatsonis hierarchical summary receiver operating characteristic model to explore differences between tests and identify potential sources of heterogeneity, and the bivariate hierarchical model to estimate sensitivity and specificity at common thresholds: 10 µg haemoglobin (Hb)/g faeces and 20 µg Hb/g faeces. We performed indirect comparisons of the accuracy of the two tests and direct comparisons when both index tests were evaluated in the same population.
We ran the initial search on 25 June 2019, which yielded 63 studies for inclusion. We ran a top-up search on 14 September 2021, which yielded one potentially eligible study, currently awaiting classification. We included a total of 33 "reference standard: all" published articles involving 104,640 participants. Six studies evaluated only gFOBTs, 23 studies evaluated only FITs, and four studies included both gFOBTs and FITs. The cut-off for positivity of FITs varied between 2.4 μg and 50 µg Hb/g faeces. For each Quality Assessment of Diagnostic Accuracy Studies (QUADAS)-2 domain, we assessed risk of bias as high in less than 20% of studies. The summary curve showed that FITs had a higher discriminative ability than gFOBTs for AN (P < 0.001) and CRC (P = 0.004). For the detection of AN, the summary sensitivity of gFOBTs was 15% (95% confidence interval (CI) 12% to 20%), which was significantly lower than FITs at both 10 μg and 20 μg Hb/g cut-offs with summary sensitivities of 33% (95% CI 27% to 40%; P < 0.001) and 26% (95% CI 21% to 31%, P = 0.002), respectively. Results were simulated in a hypothetical cohort of 10,000 screening participants with 1% CRC prevalence and 10% AN prevalence. Out of 1000 participants with AN, gFOBTs missed 850, while FITs missed 670 (10 μg Hb/g cut-off) and 740 (20 μg Hb/g cut-off). No significant differences in summary specificity for AN detection were found between gFOBTs (94%; 95% CI 92% to 96%), and FITs at 10 μg Hb/g cut-off (93%; 95% CI 90% to 95%) and at 20 μg Hb/g cut-off (97%; 95% CI 95% to 98%). So, among 9000 participants without AN, 540 were offered (unnecessary) colonoscopy with gFOBTs compared to 630 (10 μg Hb/g) and 270 (20 μg Hb/g) with FITs. Similarly, for the detection of CRC, the summary sensitivity of gFOBTs, 39% (95% CI 25% to 55%), was significantly lower than FITs at 10 μg and 20 μg Hb/g cut-offs: 76% (95% CI 57% to 88%: P = 0.001) and 65% (95% CI 46% to 80%; P = 0.035), respectively. So, out of 100 participants with CRC, gFOBTs missed 61, and FITs missed 24 (10 μg Hb/g) and 35 (20 μg Hb/g). No significant differences in summary specificity for CRC were found between gFOBTs (94%; 95% CI 91% to 96%), and FITs at the 10 μg Hb/g cut-off (94%; 95% CI 87% to 97%) and 20 μg Hb/g cut-off (96%; 95% CI 91% to 98%). So, out of 9900 participants without CRC, 594 were offered (unnecessary) colonoscopy with gFOBTs versus 594 (10 μg Hb/g) and 396 (20 μg Hb/g) with FITs. In five studies that compared FITs and gFOBTs in the same population, FITs showed a higher discriminative ability for AN than gFOBTs (P = 0.003). We included a total of 30 "reference standard: positive" studies involving 3,664,934 participants. Of these, eight were gFOBT-only studies, 18 were FIT-only studies, and four studies combined both gFOBTs and FITs. The cut-off for positivity of FITs varied between 5 µg to 250 µg Hb/g faeces. For each QUADAS-2 domain, we assessed risk of bias as high in less than 20% of studies. The summary curve showed that FITs had a higher discriminative ability for detecting CRC than gFOBTs (P < 0.001). The summary sensitivity for CRC of gFOBTs, 59% (95% CI 55% to 64%), was significantly lower than FITs at the 10 μg Hb/g cut-off, 89% (95% CI 80% to 95%; P < 0.001) and the 20 μg Hb/g cut-off, 89% (95% CI 85% to 92%; P < 0.001). So, in the hypothetical cohort with 100 participants with CRC, gFOBTs missed 41, while FITs missed 11 (10 μg Hb/g) and 11 (20 μg Hb/g). The summary specificity of gFOBTs was 98% (95% CI 98% to 99%), which was higher than FITs at both 10 μg and 20 μg Hb/g cut-offs: 94% (95% CI 92% to 95%; P < 0.001) and 95% (95% CI 94% to 96%; P < 0.001), respectively. So, out of 9900 participants without CRC, 198 were offered (unnecessary) colonoscopy with gFOBTs compared to 594 (10 μg Hb/g) and 495 (20 μg Hb/g) with FITs. At a specificity of 90% and 95%, FITs had a higher sensitivity than gFOBTs.
AUTHORS' CONCLUSIONS: FITs are superior to gFOBTs in detecting AN and CRC in average-risk individuals. Specificity of both tests was similar in "reference standard: all" studies, whereas specificity was significantly higher for gFOBTs than FITs in "reference standard: positive" studies. However, at pre-specified specificities, the sensitivity of FITs was significantly higher than gFOBTs.
全世界许多国家都采用了结直肠癌(CRC)筛查方案,通常基于粪便潜血试验(FOBT)。CRC 筛查旨在检测高级别肿瘤(AN),定义为 CRC 或高级别腺瘤。FOBT 分为两种类型,基于检测技术和检测到的血液成分:定性愈创木脂 FOBT(gFOBT)和粪便免疫化学试验(FIT),可以是定性和定量的。gFOBT 筛查可降低 CRC 相关死亡率。
比较 gFOBT 和 FIT 筛查在平均风险个体中检测高级结直肠肿瘤的诊断测试准确性。
我们检索了 CENTRAL、MEDLINE、Embase、BIOSIS 引文索引、科学引文索引扩展版和 Google Scholar。我们检索了纳入研究的参考文献和 PubMed 相关文章,以确定其他研究。
我们纳入了提供 gFOBT、FIT 或两者对结直肠癌和高级别肿瘤(高级别腺瘤和 CRC 组合)的阳性预测值、假阳性率、假阴性率和真阴性率的前瞻性和回顾性研究。我们排除了病例对照研究。我们纳入了所有参与者都接受了指数试验和参考标准(“参考标准:全部”)的研究,以及仅接受阳性指数试验的参与者而阴性试验的参与者至少随访一年以发展为间隔性癌的研究(“参考标准:阳性”)。目标人群为接受 CRC 筛查的无症状、平均风险个体。目标条件为 CRC 和高级别肿瘤(高级别腺瘤和 CRC 组合)。
两名综述作者独立筛选和选择纳入的研究。如果存在分歧,由第三名综述作者做出最终决定。我们使用 Rutter 和 Gatsonis 分层综合受试者工作特征模型来探索试验之间的差异,并确定潜在的异质性来源,并使用二变量分层模型在常见阈值(10μg 血红蛋白(Hb)/g 粪便和 20μg Hb/g 粪便)处估计敏感性和特异性。我们对两种检测方法的准确性进行了间接比较,并在同一人群中评估了两种指数检测方法的直接比较。
我们于 2019 年 6 月 25 日进行了初步搜索,共纳入了 63 项研究。我们于 2021 年 9 月 14 日进行了一次补充搜索,其中一项可能符合条件的研究目前正在分类中。我们共纳入了 33 项“参考标准:全部”已发表的文章,涉及 104640 名参与者。六项研究仅评估了 gFOBT,23 项研究仅评估了 FIT,四项研究同时包括了 gFOBT 和 FIT。FITs 的阳性截断值在 2.4μg 至 50μg Hb/g 粪便之间变化。对于每个质量评估诊断准确性研究(QUADAS-2)的两个域,我们评估风险偏倚在少于 20%的研究中较高。综合曲线显示,FITs 在检测 AN 和 CRC 方面的鉴别能力均高于 gFOBT(P<0.001)。对于 AN 的检测,gFOBT 的汇总敏感性为 15%(95%置信区间[CI]为 12%至 20%),显著低于 FITs 在 10μg 和 20μg Hb/g 截断值处的汇总敏感性,分别为 33%(95%CI 为 27%至 40%;P<0.001)和 26%(95%CI 为 21%至 31%;P=0.002)。在一个假设的 10000 名筛查参与者的队列中,有 1%的 CRC 患病率和 10%的 AN 患病率。在 1000 名 AN 患者中,gFOBT 漏诊了 850 名,而 FITs 漏诊了 740 名(10μg Hb/g 截断值)和 870 名(20μg Hb/g 截断值)。在检测 AN 方面,gFOBTs 的汇总特异性为 94%(95%CI 为 92%至 96%),与 FITs 在 10μg Hb/g 截断值(93%,95%CI 为 90%至 95%)和 20μg Hb/g 截断值(97%,95%CI 为 95%至 98%)的汇总特异性无显著差异。因此,在 9000 名无 AN 的参与者中,与 FITs 相比,gFOBTs 多提供了 540 例(不必要的)结肠镜检查,而 FITs 在 10μg Hb/g 截断值和 20μg Hb/g 截断值处分别多提供了 630 例(10μg Hb/g)和 270 例(20μg Hb/g)。同样,在检测 CRC 方面,gFOBTs 的汇总敏感性为 39%(95%CI 为 25%至 55%),显著低于 FITs 在 10μg 和 20μg Hb/g 截断值处的汇总敏感性,分别为 76%(95%CI 为 57%至 88%:P=0.001)和 65%(95%CI 为 46%至 80%;P=0.035)。因此,在 100 名 CRC 患者中,gFOBTs 漏诊了 61 名,而 FITs 漏诊了 24 名(10μg Hb/g)和 35 名(20μg Hb/g)。在检测 CRC 方面,gFOBTs 的汇总特异性为 94%(95%CI 为 91%至 96%),与 FITs 在 10μg Hb/g 截断值(94%,95%CI 为 87%至 97%)和 20μg Hb/g 截断值(96%,95%CI 为 91%至 98%)的汇总特异性无显著差异。因此,在 9900 名无 CRC 的参与者中,与 FITs 相比,gFOBTs 多提供了 594 例(不必要的)结肠镜检查,而 FITs 在 10μg Hb/g 截断值和 20μg Hb/g 截断值处分别多提供了 594 例和 396 例。在五项比较 FITs 和 gFOBTs 的同一人群的研究中,FITs 在检测 AN 方面的鉴别能力优于 gFOBTs(P=0.003)。我们共纳入了 30 项“参考标准:阳性”研究,涉及 3664934 名参与者。其中,八项为 gFOBT 研究,18 项为 FIT 研究,四项研究同时包括 gFOBT 和 FIT。FITs 的阳性截断值在 5μg 至 250μg 粪便血红蛋白之间变化。对于每个 QUADAS-2 域,我们评估风险偏倚在少于 20%的研究中较高。综合曲线显示,FITs 在检测 CRC 方面的鉴别能力优于 gFOBT(P<0.001)。gFOBTs 检测 CRC 的汇总敏感性为 59%(95%CI 为 55%至 64%),显著低于 FITs 在 10μg Hb/g 截断值处的汇总敏感性,89%(95%CI 为 80%至 95%;P<0.001)和 20μg Hb/g 截断值处的汇总敏感性,89%(95%CI 为 85%至 92%;P<0.001)。因此,在 100 名 CRC 患者中,gFOBTs 漏诊了 41 名,而 FITs 在 10μg Hb/g 截断值处漏诊了 11 名(10μg Hb/g),在 20μg Hb/g 截断值处漏诊了 11 名。gFOBTs 的汇总特异性为 98%(95%CI 为 98%至 99%),高于 FITs 在 10μg 和 20μg Hb/g 截断值处的汇总特异性:94%(95%CI 为 92%至 95%;P<0.001)和 95%(95%CI 为 94%至 96%;P<0.001)。因此,在 9900 名无 CRC 的参与者中,与 FITs 相比,gFOBTs 多提供了 198 例(不必要的)结肠镜检查,而 FITs 在 10μg Hb/g 截断值和 20μg Hb/g 截断值处分别