Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210;
Department of Statistics, The Ohio State University, Columbus, OH 43210.
Proc Natl Acad Sci U S A. 2019 Aug 6;116(32):15849-15854. doi: 10.1073/pnas.1903070116. Epub 2019 Jul 24.
Breakthroughs in machine learning are rapidly changing science and society, yet our fundamental understanding of this technology has lagged far behind. Indeed, one of the central tenets of the field, the bias-variance trade-off, appears to be at odds with the observed behavior of methods used in modern machine-learning practice. The bias-variance trade-off implies that a model should balance underfitting and overfitting: Rich enough to express underlying structure in data and simple enough to avoid fitting spurious patterns. However, in modern practice, very rich models such as neural networks are trained to exactly fit (i.e., interpolate) the data. Classically, such models would be considered overfitted, and yet they often obtain high accuracy on test data. This apparent contradiction has raised questions about the mathematical foundations of machine learning and their relevance to practitioners. In this paper, we reconcile the classical understanding and the modern practice within a unified performance curve. This "double-descent" curve subsumes the textbook U-shaped bias-variance trade-off curve by showing how increasing model capacity beyond the point of interpolation results in improved performance. We provide evidence for the existence and ubiquity of double descent for a wide spectrum of models and datasets, and we posit a mechanism for its emergence. This connection between the performance and the structure of machine-learning models delineates the limits of classical analyses and has implications for both the theory and the practice of machine learning.
机器学习的突破正在迅速改变科学和社会,但我们对这项技术的基本理解却远远落后。事实上,该领域的一个核心原则,即偏差-方差权衡,似乎与现代机器学习实践中所使用的方法的观察行为不符。偏差-方差权衡意味着模型应该在欠拟合和过拟合之间取得平衡:既要足够丰富以表达数据中的潜在结构,又要足够简单以避免拟合虚假模式。然而,在现代实践中,非常丰富的模型,如神经网络,被训练来精确拟合(即插值)数据。从经典意义上讲,这样的模型被认为是过拟合的,但它们在测试数据上通常会获得很高的准确性。这种明显的矛盾引发了关于机器学习的数学基础及其与实践者相关性的问题。在本文中,我们在统一的性能曲线上调和了经典理解和现代实践。这种“双重下降”曲线通过展示如何在插值点之外增加模型容量来提高性能,从而包含了教科书式的 U 形偏差-方差权衡曲线。我们为广泛的模型和数据集存在和普遍存在的双重下降提供了证据,并提出了其出现的机制。这种机器学习模型的性能和结构之间的联系描绘了经典分析的局限性,并对机器学习的理论和实践都有影响。