Institute for High Performance Computing and Networking (ICAR), National Research Council (CNR), Italy.
Comput Biol Med. 2023 May;158:106876. doi: 10.1016/j.compbiomed.2023.106876. Epub 2023 Apr 5.
The paper proposes a methodology based on Natural Language Processing (NLP) and Sentiment Analysis (SA) to get insights into sentiments and opinions toward COVID-19 vaccination in Italy. The studied dataset consists of vaccine-related tweets published in Italy from January 2021 to February 2022. In the considered period, 353,217 tweets have been analyzed, obtained after filtering 1,602,940 tweets with the word "vaccin". A main novelty of the approach is the categorization of opinion holders in four classes, Common users, Media, Medicine, Politics, obtained by applying NLP tools, enhanced with large-scale domain-specific lexicons, on the short bios published by users themselves. Feature-based sentiment analysis is enriched with an Italian sentiment lexicon containing polarized words, expressing semantic orientation, and intensive words which give cues to identify the tone of voice of each user category. The results of the analysis highlighted an overall negative sentiment along all the considered periods, especially for the Common users, and a different attitude of opinion holders towards specific important events, such as deaths after vaccination, occurring in some days of the examined 14 months.
本文提出了一种基于自然语言处理(NLP)和情感分析(SA)的方法,以深入了解意大利人对 COVID-19 疫苗接种的看法和意见。研究数据集包括 2021 年 1 月至 2022 年 2 月期间在意大利发布的与疫苗相关的推文。在考虑的时间段内,分析了 353217 条推文,这些推文是在过滤了 1602940 条包含“vaccin”一词的推文后获得的。该方法的一个主要新颖之处在于,通过应用 NLP 工具并结合大规模的领域特定词典,对用户自己发布的简短个人资料进行分类,将意见持有者分为四类:普通用户、媒体、医学、政治。基于特征的情感分析通过一个包含极性词的意大利情感词典得到了丰富,这些极性词表达了语义方向和强化词,为识别每个用户类别语气提供了线索。分析结果突出表明,在所有考虑的时间段内,总体情绪都是负面的,尤其是对普通用户而言,而且对于某些重要事件(例如接种疫苗后死亡),意见持有者的态度也不同,这些事件发生在研究的 14 个月中的某些日子。