Wang Lu, Zhu Dongxiao
Dept. of Computer Science, Wayne State University, Detroit, MI 48202.
Data Min Knowl Discov. 2021 May;35(3):1134-1161. doi: 10.1007/s10618-021-00746-8. Epub 2021 Mar 23.
Many real-world datasets are labeled with natural orders, i.e., ordinal labels. Ordinal regression is a method to predict ordinal labels that finds a wide range of applications in data-rich domains, such as natural, health and social sciences. Most existing ordinal regression approaches work well for independent and identically distributed (IID) instances via formulating a single ordinal regression task. However, for heterogeneous non-IID instances with well-defined local geometric structures, e.g., subpopulation groups, multi-task learning (MTL) provides a promising framework to encode task (subgroup) relatedness, bridge data from all tasks, and simultaneously learn multiple related tasks in efforts to improve generalization performance. Even though MTL methods have been extensively studied, there is barely existing work investigating MTL for heterogeneous data with ordinal labels. We tackle this important problem via sparse and deep multi-task approaches. Specifically, we develop a regularized multi-task ordinal regression (MTOR) model for smaller datasets and a deep neural networks based MTOR model for large-scale datasets. We evaluate the performance using three real-world healthcare datasets with applications to multi-stage disease progression diagnosis. Our experiments indicate that the proposed MTOR models markedly improve the prediction performance comparing with single-task ordinal regression models.
许多现实世界的数据集都带有自然顺序标签,即序数标签。序数回归是一种预测序数标签的方法,在数据丰富的领域,如自然科学、健康科学和社会科学中有着广泛的应用。大多数现有的序数回归方法通过制定单个序数回归任务,对独立同分布(IID)实例效果良好。然而,对于具有明确局部几何结构的异构非IID实例,例如亚群体组,多任务学习(MTL)提供了一个很有前景的框架,用于编码任务(子组)相关性、连接所有任务的数据,并同时学习多个相关任务以提高泛化性能。尽管MTL方法已经得到了广泛研究,但几乎没有现有工作研究具有序数标签的异构数据的MTL。我们通过稀疏和深度多任务方法解决这个重要问题。具体来说,我们为较小的数据集开发了一个正则化多任务序数回归(MTOR)模型,为大规模数据集开发了一个基于深度神经网络的MTOR模型。我们使用三个现实世界的医疗保健数据集评估性能,并将其应用于多阶段疾病进展诊断。我们的实验表明,与单任务序数回归模型相比,所提出的MTOR模型显著提高了预测性能。