Burnap Pete, Colombo Gualtiero, Amery Rosie, Hodorog Andrei, Scourfield Jonathan
School of Computer Science & Informatics, Cardiff University, UK.
Office for National Statistics Newport, UK.
Online Soc Netw Media. 2017 Aug;2:32-44. doi: 10.1016/j.osnem.2017.08.001.
The World Wide Web, and online social networks in particular, have increased connectivity between people such that information can spread to millions of people in a matter of minutes. This form of online collective contagion has provided many benefits to society, such as providing reassurance and emergency management in the immediate aftermath of natural disasters. However, it also poses a potential risk to vulnerable Web users who receive this information and could subsequently come to harm. One example of this would be the spread of suicidal ideation in online social networks, about which concerns have been raised. In this paper we report the results of a number of machine classifiers built with the aim of classifying text relating to suicide on Twitter. The classifier distinguishes between the more worrying content, such as suicidal ideation, and other suicide-related topics such as reporting of a suicide, memorial, campaigning and support. It also aims to identify flippant references to suicide. We built a set of baseline classifiers using lexical, structural, emotive and psychological features extracted from Twitter posts. We then improved on the baseline classifiers by building an ensemble classifier using the Rotation Forest algorithm and a Maximum Probability voting classification decision method, based on the outcome of base classifiers. This achieved an F-measure of 0.728 overall (for 7 classes, including suicidal ideation) and 0.69 for the suicidal ideation class. We summarise the results by reflecting on the most significant predictive principle components of the suicidal ideation class to provide insight into the language used on Twitter to express suicidal ideation. Finally, we perform a 12-month case study of suicide-related posts where we further evaluate the classification approach - showing a sustained classification performance and providing anonymous insights into the trends and demographic profile of Twitter users posting content of this type.
万维网,尤其是在线社交网络,增强了人们之间的联系,使得信息能够在几分钟内传播给数百万人。这种在线集体传播形式给社会带来了诸多益处,比如在自然灾害刚发生后提供安慰和应急管理。然而,它也给接收这些信息的脆弱网络用户带来了潜在风险,这些用户可能随后会受到伤害。一个例子就是在线社交网络中自杀念头的传播,这已经引发了人们的担忧。在本文中,我们报告了一系列机器分类器的结果,这些分类器旨在对推特上与自杀相关的文本进行分类。该分类器能区分更令人担忧的内容,如自杀念头,以及其他与自杀相关的主题,如自杀报道、纪念、活动和支持。它还旨在识别对自杀的轻率提及。我们利用从推特帖子中提取的词汇、结构、情感和心理特征构建了一组基线分类器。然后,我们通过基于基分类器的结果,使用旋转森林算法和最大概率投票分类决策方法构建一个集成分类器,对基线分类器进行了改进。这一方法总体上实现了0.728的F值(针对7个类别,包括自杀念头),自杀念头类别的F值为0.69。我们通过思考自杀念头类别的最重要预测主成分来总结结果,以深入了解推特上用于表达自杀念头的语言。最后,我们对与自杀相关的帖子进行了为期12个月的案例研究,在其中进一步评估分类方法——展示持续的分类性能,并对发布此类内容的推特用户的趋势和人口统计概况提供匿名见解。