National Institute of Parasitic Diseases, Chinese Center for Disease Control and Prevention; Chinese Center for Tropical Diseases Research; WHO Collaborating Centre for Tropical Diseases; National Center for International Research on Tropical Diseases, Ministry of Science and Technology; NHC Key Laboratory of Parasite and Vector Biology, Shanghai, 200025, China.
School of Global Health, Chinese Center for Tropical Diseases Research, Shanghai Jiao Tong University School of Medicine; One Health Center, The University of Edinburgh, Shanghai Jiao Tong University, Shanghai, 200025, China.
Infect Dis Poverty. 2021 May 20;10(1):74. doi: 10.1186/s40249-021-00852-1.
Oncomelania hupensis is only intermediate snail host of Schistosoma japonicum, and distribution of O. hupensis is an important indicator for the surveillance of schistosomiasis. This study explored the feasibility of a random forest algorithm weighted by spatial distance for risk prediction of schistosomiasis distribution in the Yangtze River Basin in China, with the aim to produce an improved precision reference for the national schistosomiasis control programme by reducing the number of snail survey sites without losing predictive accuracy.
The snail presence and absence records were collected from Anhui, Hunan, Hubei, Jiangxi and Jiangsu provinces in 2018. A machine learning of random forest algorithm based on a set of environmental and climatic variables was developed to predict the breeding sites of the O. hupensis intermediated snail host of S. japonicum. Different spatial sizes of a hexagonal grid system were compared to estimate the need for required snail sampling sites. The predictive accuracy related to geographic distances between snail sampling sites was estimated by calculating Kappa and the area under the curve (AUC).
The highest accuracy (AUC = 0.889 and Kappa = 0.618) was achieved at the 5 km distance weight. The five factors with the strongest correlation to O. hupensis infestation probability were: (1) distance to lake (48.9%), (2) distance to river (36.6%), (3) isothermality (29.5%), (4) mean daily difference in temperature (28.1%), and (5) altitude (26.0%). The risk map showed that areas characterized by snail infestation were mainly located along the Yangtze River, with the highest probability in the dividing, slow-flowing river arms in the middle and lower reaches of the Yangtze River in Anhui, followed by areas near the shores of China's two main lakes, the Dongting Lake in Hunan and Hubei and the Poyang Lake in Jiangxi.
Applying the machine learning of random forest algorithm made it feasible to precisely predict snail infestation probability, an approach that could improve the sensitivity of the Chinese schistosome surveillance system. Redesign of the snail surveillance system by spatial bias correction of O. hupensis infestation in the Yangtze River Basin to reduce the number of sites required to investigate from 2369 to 1747.
钉螺是日本血吸虫唯一的中间宿主,钉螺的分布是血吸虫病监测的重要指标。本研究旨在通过减少钉螺调查点数量而不降低预测精度,为国家血吸虫病控制规划提供改进的精度参考,探索一种基于空间距离加权的随机森林算法对中国长江流域血吸虫病分布进行风险预测的可行性。
收集了 2018 年安徽、湖南、湖北、江西和江苏五省的钉螺存在和缺失记录。基于一组环境和气候变量,建立了一种基于机器学习的随机森林算法,用于预测日本血吸虫中间宿主钉螺的孳生场所。比较了不同大小的六边形网格系统,以估计所需钉螺采样点的数量。通过计算 Kappa 和曲线下面积(AUC)来估计钉螺采样点之间地理距离与预测精度的关系。
在 5km 距离权重下,获得了最高的精度(AUC=0.889,Kappa=0.618)。与钉螺感染概率相关性最强的五个因素分别是:(1)距湖泊的距离(48.9%),(2)距河流的距离(36.6%),(3)等温性(29.5%),(4)日温差(28.1%)和(5)海拔(26.0%)。风险图显示,钉螺感染的区域主要分布在长江沿线,在安徽长江中下游水流缓慢的分流处感染概率最高,其次是湖南和湖北的洞庭湖以及江西的鄱阳湖等中国两大主要湖泊的沿岸地区。
应用随机森林算法进行机器学习,可以精确预测钉螺感染概率,从而提高中国血吸虫病监测系统的敏感性。通过对长江流域钉螺感染的空间偏差进行校正,重新设计钉螺监测系统,将所需调查点数量从 2369 个减少到 1747 个。