Guetterman Timothy C, Chang Tammy, DeJonckheere Melissa, Basu Tanmay, Scruggs Elizabeth, Vydiswaran V G Vinod
Department of Family Medicine, University of Michigan, Ann Arbor, MI, United States.
Institute for Healthcare Policy and Innovation, University of Michigan, Ann Arbor, MI, United States.
J Med Internet Res. 2018 Jun 29;20(6):e231. doi: 10.2196/jmir.9702.
Qualitative research methods are increasingly being used across disciplines because of their ability to help investigators understand the perspectives of participants in their own words. However, qualitative analysis is a laborious and resource-intensive process. To achieve depth, researchers are limited to smaller sample sizes when analyzing text data. One potential method to address this concern is natural language processing (NLP). Qualitative text analysis involves researchers reading data, assigning code labels, and iteratively developing findings; NLP has the potential to automate part of this process. Unfortunately, little methodological research has been done to compare automatic coding using NLP techniques and qualitative coding, which is critical to establish the viability of NLP as a useful, rigorous analysis procedure.
The purpose of this study was to compare the utility of a traditional qualitative text analysis, an NLP analysis, and an augmented approach that combines qualitative and NLP methods.
We conducted a 2-arm cross-over experiment to compare qualitative and NLP approaches to analyze data generated through 2 text (short message service) message survey questions, one about prescription drugs and the other about police interactions, sent to youth aged 14-24 years. We randomly assigned a question to each of the 2 experienced qualitative analysis teams for independent coding and analysis before receiving NLP results. A third team separately conducted NLP analysis of the same 2 questions. We examined the results of our analyses to compare (1) the similarity of findings derived, (2) the quality of inferences generated, and (3) the time spent in analysis.
The qualitative-only analysis for the drug question (n=58) yielded 4 major findings, whereas the NLP analysis yielded 3 findings that missed contextual elements. The qualitative and NLP-augmented analysis was the most comprehensive. For the police question (n=68), the qualitative-only analysis yielded 4 primary findings and the NLP-only analysis yielded 4 slightly different findings. Again, the augmented qualitative and NLP analysis was the most comprehensive and produced the highest quality inferences, increasing our depth of understanding (ie, details and frequencies). In terms of time, the NLP-only approach was quicker than the qualitative-only approach for the drug (120 vs 270 minutes) and police (40 vs 270 minutes) questions. An approach beginning with qualitative analysis followed by qualitative- or NLP-augmented analysis took longer time than that beginning with NLP for both drug (450 vs 240 minutes) and police (390 vs 220 minutes) questions.
NLP provides both a foundation to code qualitatively more quickly and a method to validate qualitative findings. NLP methods were able to identify major themes found with traditional qualitative analysis but were not useful in identifying nuances. Traditional qualitative text analysis added important details and context.
定性研究方法因其能够帮助研究者理解参与者用自己的语言表达的观点而在各学科中越来越多地被使用。然而,定性分析是一个费力且资源密集的过程。为了实现深度分析,研究者在分析文本数据时限于较小的样本量。解决这一问题的一种潜在方法是自然语言处理(NLP)。定性文本分析涉及研究者阅读数据、分配编码标签并迭代地得出研究结果;NLP有潜力使这一过程的部分环节自动化。不幸的是,很少有方法学研究对使用NLP技术的自动编码和定性编码进行比较,而这对于确定NLP作为一种有用、严谨的分析程序的可行性至关重要。
本研究的目的是比较传统定性文本分析、NLP分析以及结合定性和NLP方法的增强方法的效用。
我们进行了一项双臂交叉实验,以比较定性和NLP方法来分析通过两个文本(短信服务)调查问题生成的数据,一个关于处方药,另一个关于与警察的互动,这些问题发送给了14至24岁的青少年。在收到NLP结果之前,我们将一个问题随机分配给两个经验丰富的定性分析团队中的每一个,进行独立编码和分析。第三个团队分别对相同的两个问题进行NLP分析。我们检查了分析结果,以比较(1)得出的研究结果之间的相似性,(2)产生的推理质量,以及(3)分析所花费的时间。
关于药物问题(n = 58)的纯定性分析得出了4个主要发现,而NLP分析得出了3个遗漏了背景要素的发现。定性和NLP增强分析最为全面。对于警察问题(n = 68),纯定性分析得出了4个主要发现,纯NLP分析得出了4个略有不同的发现。同样,增强的定性和NLP分析最为全面,产生了最高质量的推理,增加了我们的理解深度(即细节和频率)。在时间方面,对于药物问题(120分钟对270分钟)和警察问题(40分钟对270分钟),纯NLP方法比纯定性方法更快。对于药物问题(450分钟对240分钟)和警察问题(390分钟对220分钟),从定性分析开始然后进行定性或NLP增强分析的方法比从NLP开始的方法花费的时间更长。
NLP既为更快地进行定性编码提供了基础,也为验证定性研究结果提供了一种方法。NLP方法能够识别传统定性分析中发现的主要主题,但在识别细微差别方面没有用处。传统定性文本分析增加了重要的细节和背景。