Suppr超能文献

文本挖掘支持知识综合的摘要筛选:一种半自动化工作流程。

Text mining to support abstract screening for knowledge syntheses: a semi-automated workflow.

机构信息

Knowledge Translation Program, Li Ka Shing Knowledge Institute, St. Michael's Hospital, Unity Health Toronto, 209 Victoria St, Toronto, Ontario, M5B 1T8, Canada.

Department of Software Engineering, University of Belgrade, Jove Ilica 154, Belgrade, 11000, Serbia.

出版信息

Syst Rev. 2021 May 26;10(1):156. doi: 10.1186/s13643-021-01700-x.

Abstract

BACKGROUND

Current text mining tools supporting abstract screening in systematic reviews are not widely used, in part because they lack sensitivity and precision. We set out to develop an accessible, semi-automated "workflow" to conduct abstract screening for systematic reviews and other knowledge synthesis methods.

METHODS

We adopt widely recommended text-mining and machine-learning methods to (1) process title-abstracts into numerical training data; and (2) train a classification model to predict eligible abstracts. The predicted abstracts are screened by human reviewers for ("true") eligibility, and the newly eligible abstracts are used to identify similar abstracts, using near-neighbor methods, which are also screened. These abstracts, as well as their eligibility results, are used to update the classification model, and the above steps are iterated until no new eligible abstracts are identified. The workflow was implemented in R and evaluated using a systematic review of insulin formulations for type-1 diabetes (14,314 abstracts) and a scoping review of knowledge-synthesis methods (17,200 abstracts). Workflow performance was evaluated against the recommended practice of screening abstracts by 2 reviewers, independently. Standard measures were examined: sensitivity (inclusion of all truly eligible abstracts), specificity (exclusion of all truly ineligible abstracts), precision (inclusion of all truly eligible abstracts among all abstracts screened as eligible), F1-score (harmonic average of sensitivity and precision), and accuracy (correctly predicted eligible or ineligible abstracts). Workload reduction was measured as the hours the workflow saved, given only a subset of abstracts needed human screening.

RESULTS

With respect to the systematic and scoping reviews respectively, the workflow attained 88%/89% sensitivity, 99%/99% specificity, 71%/72% precision, an F1-score of 79%/79%, 98%/97% accuracy, 63%/55% workload reduction, with 12%/11% fewer abstracts for full-text retrieval and screening, and 0%/1.5% missed studies in the completed reviews.

CONCLUSION

The workflow was a sensitive, precise, and efficient alternative to the recommended practice of screening abstracts with 2 reviewers. All eligible studies were identified in the first case, while 6 studies (1.5%) were missed in the second that would likely not impact the review's conclusions. We have described the workflow in language accessible to reviewers with limited exposure to natural language processing and machine learning, and have made the code available to reviewers.

摘要

背景

当前支持系统评价中摘要筛选的文本挖掘工具并未广泛应用,部分原因是它们缺乏敏感性和准确性。我们旨在开发一种易于使用的、半自动的“工作流程”,以进行系统评价和其他知识综合方法的摘要筛选。

方法

我们采用广泛推荐的文本挖掘和机器学习方法:(1) 将标题-摘要处理为数字训练数据;(2) 训练分类模型以预测合格的摘要。预测的摘要由人工审查员进行筛选,以确定其是否合格(“真实”的),并根据最近邻方法识别类似的摘要,这些摘要也经过筛选。这些摘要及其合格结果用于更新分类模型,并且迭代上述步骤,直到不再发现新的合格摘要。该工作流程是在 R 中实现的,并使用针对 1 型糖尿病胰岛素制剂的系统评价(14314 个摘要)和知识综合方法的范围综述(17200 个摘要)进行了评估。工作流程的性能是根据两名审查员独立筛选摘要的推荐实践进行评估的。检查了标准措施:敏感性(包含所有真正合格的摘要)、特异性(排除所有真正不合格的摘要)、精确性(在筛选为合格的所有摘要中包含所有真正合格的摘要)、F1 分数(敏感性和精确性的调和平均值)和准确性(正确预测合格或不合格的摘要)。根据仅需要人工筛选的摘要子集,衡量工作流程节省的工作量减少情况。

结果

对于系统综述和范围综述,工作流程的敏感性分别为 88%/89%,特异性为 99%/99%,精确性为 71%/72%,F1 分数为 79%/79%,准确性为 98%/97%,工作量减少 63%/55%,需要进行全文检索和筛选的摘要减少了 12%/11%,而在完成的综述中遗漏的研究分别为 0%/1.5%。

结论

该工作流程是一种敏感、精确和高效的替代方法,可替代使用 2 名审查员筛选摘要的推荐实践。在第一个案例中,所有合格的研究都被识别出来,而在第二个案例中,有 6 项研究(1.5%)被遗漏,这可能不会影响综述的结论。我们已经用审查员易于理解的语言描述了工作流程,审查员只需接触过自然语言处理和机器学习,并且我们已经向审查员提供了代码。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a9a/8157694/528d6a026c5f/13643_2021_1700_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验