Department of Computer Science, The Johns Hopkins University, Baltimore, Maryland, United States of America.
Geographic Data Science Lab, Department of Geography and Planning, University of Liverpool, Liverpool, United Kingdom.
PLoS One. 2024 Jun 6;19(6):e0301488. doi: 10.1371/journal.pone.0301488. eCollection 2024.
The COVID-19 pandemic prompted governments worldwide to implement a range of containment measures, including mass gathering restrictions, social distancing, and school closures. Despite these efforts, vaccines continue to be the safest and most effective means of combating such viruses. Yet, vaccine hesitancy persists, posing a significant public health concern, particularly with the emergence of new COVID-19 variants. To effectively address this issue, timely data is crucial for understanding the various factors contributing to vaccine hesitancy. While previous research has largely relied on traditional surveys for this information, recent sources of data, such as social media, have gained attention. However, the potential of social media data as a reliable proxy for information on population hesitancy, especially when compared with survey data, remains underexplored. This paper aims to bridge this gap. Our approach uses social, demographic, and economic data to predict vaccine hesitancy levels in the ten most populous US metropolitan areas. We employ machine learning algorithms to compare a set of baseline models that contain only these variables with models that incorporate survey data and social media data separately. Our results show that XGBoost algorithm consistently outperforms Random Forest and Linear Regression, with marginal differences between Random Forest and XGBoost. This was especially the case with models that incorporate survey or social media data, thus highlighting the promise of the latter data as a complementary information source. Results also reveal variations in influential variables across the five hesitancy classes, such as age, ethnicity, occupation, and political inclination. Further, the application of models to different MSAs yields mixed results, emphasizing the uniqueness of communities and the need for complementary data approaches. In summary, this study underscores social media data's potential for understanding vaccine hesitancy, emphasizes the importance of tailoring interventions to specific communities, and suggests the value of combining different data sources.
新冠疫情促使全球各国政府实施了一系列遏制措施,包括限制大规模集会、保持社交距离和关闭学校。尽管采取了这些措施,疫苗仍然是对抗此类病毒最安全、最有效的手段。然而,疫苗犹豫仍然存在,这是一个重大的公共卫生问题,尤其是在新的新冠病毒变异出现的情况下。为了有效解决这个问题,及时的数据对于了解导致疫苗犹豫的各种因素至关重要。虽然之前的研究在很大程度上依赖于传统调查来获取这些信息,但最近的数据来源,如社交媒体,已经引起了关注。然而,社交媒体数据作为反映人群犹豫情绪的可靠指标的潜力,特别是与调查数据相比,仍未得到充分探索。本文旨在弥补这一空白。我们的方法使用社会、人口和经济数据来预测美国十个人口最多的大都市区的疫苗犹豫水平。我们使用机器学习算法来比较一组仅包含这些变量的基线模型,以及分别包含调查数据和社交媒体数据的模型。我们的结果表明,XGBoost 算法始终优于随机森林和线性回归,随机森林和 XGBoost 之间的差异很小。在包含调查或社交媒体数据的模型中尤其如此,这突出了后者数据作为补充信息来源的潜力。结果还揭示了五个犹豫类别中影响变量的变化,例如年龄、族裔、职业和政治倾向。此外,将模型应用于不同的大都市统计区会产生不同的结果,这强调了社区的独特性和需要采用补充数据方法。总之,本研究强调了社交媒体数据在理解疫苗犹豫方面的潜力,强调了针对特定社区量身定制干预措施的重要性,并提出了结合不同数据源的价值。