Zhang Youshan, Allem Jon-Patrick, Unger Jennifer Beth, Boley Cruz Tess
Department of Computer Science, Lehigh University, Bethlehem, PA, United States.
Keck School of Medicine of USC, Los Angeles, CA, United States.
J Med Internet Res. 2018 Nov 21;20(11):e10513. doi: 10.2196/10513.
Instagram, with millions of posts per day, can be used to inform public health surveillance targets and policies. However, current research relying on image-based data often relies on hand coding of images, which is time-consuming and costly, ultimately limiting the scope of the study. Current best practices in automated image classification (eg, support vector machine (SVM), backpropagation neural network, and artificial neural network) are limited in their capacity to accurately distinguish between objects within images.
This study aimed to demonstrate how a convolutional neural network (CNN) can be used to extract unique features within an image and how SVM can then be used to classify the image.
Images of waterpipes or hookah (an emerging tobacco product possessing similar harms to that of cigarettes) were collected from Instagram and used in the analyses (N=840). A CNN was used to extract unique features from images identified to contain waterpipes. An SVM classifier was built to distinguish between images with and without waterpipes. Methods for image classification were then compared to show how a CNN+SVM classifier could improve accuracy.
As the number of validated training images increased, the total number of extracted features increased. In addition, as the number of features learned by the SVM classifier increased, the average level of accuracy increased. Overall, 99.5% (418/420) of images classified were correctly identified as either hookah or nonhookah images. This level of accuracy was an improvement over earlier methods that used SVM, CNN, or bag-of-features alone.
A CNN extracts more features of images, allowing an SVM classifier to be better informed, resulting in higher accuracy compared with methods that extract fewer features. Future research can use this method to grow the scope of image-based studies. The methods presented here might help detect increases in the popularity of certain tobacco products over time on social media. By taking images of waterpipes from Instagram, we place our methods in a context that can be utilized to inform health researchers analyzing social media to understand user experience with emerging tobacco products and inform public health surveillance targets and policies.
照片墙(Instagram)每天有数以百万计的帖子,可用于为公共卫生监测目标和政策提供信息。然而,目前基于图像数据的研究通常依赖于图像的人工编码,既耗时又昂贵,最终限制了研究范围。自动图像分类的当前最佳实践(例如支持向量机(SVM)、反向传播神经网络和人工神经网络)在准确区分图像中的物体的能力方面存在局限。
本研究旨在演示如何使用卷积神经网络(CNN)提取图像中的独特特征,以及随后如何使用支持向量机对图像进行分类。
从照片墙收集水烟袋(一种新兴烟草产品,危害与香烟类似)的图像并用于分析(N = 840)。使用卷积神经网络从经识别包含水烟袋的图像中提取独特特征。构建一个支持向量机分类器以区分有水烟袋和无水烟袋的图像。然后比较图像分类方法,以展示卷积神经网络+支持向量机分类器如何提高准确性。
随着经过验证的训练图像数量增加,提取的特征总数增加。此外,随着支持向量机分类器学习的特征数量增加,平均准确率提高。总体而言,分类的图像中有99.5%(418/420)被正确识别为水烟袋图像或非水烟袋图像。与单独使用支持向量机、卷积神经网络或特征袋的早期方法相比,这种准确率水平有所提高。
卷积神经网络提取更多的图像特征,使支持向量机分类器能有更充分的信息,与提取较少特征的方法相比,准确率更高。未来的研究可以使用这种方法扩大基于图像的研究范围。此处介绍的方法可能有助于发现某些烟草产品在社交媒体上随时间推移受欢迎程度的增加情况。通过从照片墙获取水烟袋图像,我们将我们的方法置于一种情境中,可用于为分析社交媒体以了解用户对新兴烟草产品体验的健康研究人员提供信息,并为公共卫生监测目标和政策提供信息。