National Clinical Research Center for Laboratory Medicine, Department of Laboratory Medicine, The First Hospital of China Medical University, Shenyang, China.
J Med Internet Res. 2023 Jul 17;25:e45651. doi: 10.2196/45651.
Reference intervals (RIs) play an important role in clinical decision-making. However, due to the time, labor, and financial costs involved in establishing RIs using direct means, the use of indirect methods, based on big data previously obtained from clinical laboratories, is getting increasing attention. Different indirect techniques combined with different data transformation methods and outlier removal might cause differences in the calculation of RIs. However, there are few systematic evaluations of this.
This study used data derived from direct methods as reference standards and evaluated the accuracy of combinations of different data transformation, outlier removal, and indirect techniques in establishing complete blood count (CBC) RIs for large-scale data.
The CBC data of populations aged ≥18 years undergoing physical examination from January 2010 to December 2011 were retrieved from the First Affiliated Hospital of China Medical University in northern China. After exclusion of repeated individuals, we performed parametric, nonparametric, Hoffmann, Bhattacharya, and truncation points and Kolmogorov-Smirnov distance (kosmic) indirect methods, combined with log or BoxCox transformation, and Reed-Dixon, Tukey, and iterative mean (3SD) outlier removal methods in order to derive the RIs of 8 CBC parameters and compared the results with those directly and previously established. Furthermore, bias ratios (BRs) were calculated to assess which combination of indirect technique, data transformation pattern, and outlier removal method is preferrable.
Raw data showed that the degrees of skewness of the white blood cell (WBC) count, platelet (PLT) count, mean corpuscular hemoglobin (MCH), mean corpuscular hemoglobin concentration (MCHC), and mean corpuscular volume (MCV) were much more obvious than those of other CBC parameters. After log or BoxCox transformation combined with Tukey or iterative mean (3SD) processing, the distribution types of these data were close to Gaussian distribution. Tukey-based outlier removal yielded the maximum number of outliers. The lower-limit bias of WBC (male), PLT (male), hemoglobin (HGB; male), MCH (male/female), and MCV (female) was greater than that of the corresponding upper limit for more than half of 30 indirect methods. Computational indirect choices of CBC parameters for males and females were inconsistent. The RIs of MCHC established by the direct method for females were narrow. For this, the kosmic method was markedly superior, which contrasted with the RI calculation of CBC parameters with high |BR| qualification rates for males. Among the top 10 methodologies for the WBC count, PLT count, HGB, MCV, and MCHC with a high-BR qualification rate among males, the Bhattacharya, Hoffmann, and parametric methods were superior to the other 2 indirect methods.
Compared to results derived by the direct method, outlier removal methods and indirect techniques markedly influence the final RIs, whereas data transformation has negligible effects, except for obviously skewed data. Specifically, the outlier removal efficiency of Tukey and iterative mean (3SD) methods is almost equivalent. Furthermore, the choice of indirect techniques depends more on the characteristics of the studied analyte itself. This study provides scientific evidence for clinical laboratories to use their previous data sets to establish RIs.
参考区间(RI)在临床决策中起着重要作用。然而,由于使用直接方法建立 RI 需要耗费时间、人力和财力,因此越来越多的人关注基于先前从临床实验室获得的大数据的间接方法。不同的间接技术结合不同的数据转换方法和异常值去除方法可能会导致 RI 的计算结果存在差异。然而,对此类方法的系统评价很少。
本研究使用直接方法获得的数据作为参考标准,评估不同数据转换、异常值去除和间接技术组合在大规模数据中建立完整的血细胞计数(CBC)RI 的准确性。
检索 2010 年 1 月至 2011 年 12 月在中国医科大学附属第一医院进行体检的≥18 岁人群的 CBC 数据。排除重复个体后,我们进行了参数、非参数、Hoffmann、Bhattacharya 和截断点以及 Kolmogorov-Smirnov 距离(kosmic)间接方法,并结合对数或 BoxCox 转换以及 Reed-Dixon、Tukey 和迭代均值(3SD)异常值去除方法,得出 8 个 CBC 参数的 RI,并将结果与直接和先前建立的 RI 进行比较。此外,计算了偏比(BR),以评估间接技术、数据转换模式和异常值去除方法的组合哪个更可取。
原始数据显示,白细胞(WBC)计数、血小板(PLT)计数、平均红细胞血红蛋白(MCH)、平均红细胞血红蛋白浓度(MCHC)和平均红细胞体积(MCV)的偏度程度明显大于其他 CBC 参数。经过对数或 BoxCox 转换并结合 Tukey 或迭代均值(3SD)处理后,这些数据的分布类型接近正态分布。Tukey 异常值去除方法产生的异常值数量最多。WBC(男性)、PLT(男性)、血红蛋白(HGB;男性)、MCH(男性/女性)和 MCV(女性)的下限偏倚大于对应上限偏倚的情况超过 30 种间接方法的一半以上。男女 CBC 参数的计算间接选择不一致。女性 MCHC 的直接方法建立的 RI 较窄。对于这种情况,kosmic 方法明显优于 RI 计算,这与男性具有高|BR|合格率的 CBC 参数 RI 计算形成对比。在男性中具有高 BR 合格率的 WBC 计数、PLT 计数、HGB、MCV 和 MCHC 的前 10 种方法中,Bhattacharya、Hoffmann 和参数方法优于其他 2 种间接方法。
与直接方法得出的结果相比,异常值去除方法和间接技术显著影响最终 RI,而数据转换的影响可以忽略不计,除非数据明显偏斜。特别是 Tukey 和迭代均值(3SD)方法的异常值去除效率几乎相当。此外,间接技术的选择更多地取决于所研究分析物本身的特征。本研究为临床实验室使用其先前的数据来建立 RI 提供了科学依据。