Rahman Md Siddikur, Chowdhury Arman Hossain, Amrin Miftahuzzannat
Department of Statistics, Begum Rokeya University, Rangpur, Bangladesh.
PLOS Glob Public Health. 2022 May 18;2(5):e0000495. doi: 10.1371/journal.pgph.0000495. eCollection 2022.
Accurate predictive time series modelling is important in public health planning and response during the emergence of a novel pandemic. Therefore, the aims of the study are three-fold: (a) to model the overall trend of COVID-19 confirmed cases and deaths in Bangladesh; (b) to generate a short-term forecast of 8 weeks of COVID-19 cases and deaths; (c) to compare the predictive accuracy of the Autoregressive Integrated Moving Average (ARIMA) and eXtreme Gradient Boosting (XGBoost) for precise modelling of non-linear features and seasonal trends of the time series. The data were collected from the onset of the epidemic in Bangladesh from the Directorate General of Health Service (DGHS) and Institute of Epidemiology, Disease Control and Research (IEDCR). The daily confirmed cases and deaths of COVID-19 of 633 days in Bangladesh were divided into several training and test sets. The ARIMA and XGBoost models were established using those training data, and the test sets were used to evaluate each model's ability to forecast and finally averaged all the predictive performances to choose the best model. The predictive accuracy of the models was assessed using the mean absolute error (MAE), mean percentage error (MPE), root mean square error (RMSE) and mean absolute percentage error (MAPE). The findings reveal the existence of a nonlinear trend and weekly seasonality in the dataset. The average error measures of the ARIMA model for both COVID-19 confirmed cases and deaths were lower than XGBoost model. Hence, in our study, the ARIMA model performed better than the XGBoost model in predicting COVID-19 confirmed cases and deaths in Bangladesh. The suggested prediction model might play a critical role in estimating the spread of a novel pandemic in Bangladesh and similar countries.
准确的预测性时间序列建模在新型大流行出现期间的公共卫生规划和应对中至关重要。因此,本研究的目的有三个方面:(a) 对孟加拉国新冠肺炎确诊病例和死亡的总体趋势进行建模;(b) 对新冠肺炎病例和死亡进行为期8周的短期预测;(c) 比较自回归积分移动平均 (ARIMA) 和极端梯度提升 (XGBoost) 在精确建模时间序列的非线性特征和季节性趋势方面的预测准确性。数据从孟加拉国疫情开始时起,由卫生服务总局 (DGHS) 和疾病控制与研究流行病学研究所 (IEDCR) 收集。孟加拉国633天的新冠肺炎每日确诊病例和死亡病例被分成几个训练集和测试集。使用这些训练数据建立ARIMA和XGBoost模型,并使用测试集评估每个模型的预测能力,最后对所有预测性能进行平均以选择最佳模型。使用平均绝对误差 (MAE)、平均百分比误差 (MPE)、均方根误差 (RMSE) 和平均绝对百分比误差 (MAPE) 评估模型的预测准确性。研究结果揭示了数据集中存在非线性趋势和每周季节性。ARIMA模型对新冠肺炎确诊病例和死亡的平均误差度量均低于XGBoost模型。因此,在我们的研究中,ARIMA模型在预测孟加拉国新冠肺炎确诊病例和死亡方面比XGBoost模型表现更好。所建议的预测模型可能在估计孟加拉国及类似国家新型大流行的传播方面发挥关键作用。