Nakagawa Shinichi, Cuthill Innes C
Department of Animal and Plant Sciences, University of Sheffield, Sheffield S10 2TN, UK.
Biol Rev Camb Philos Soc. 2007 Nov;82(4):591-605. doi: 10.1111/j.1469-185X.2007.00027.x.
Null hypothesis significance testing (NHST) is the dominant statistical approach in biology, although it has many, frequently unappreciated, problems. Most importantly, NHST does not provide us with two crucial pieces of information: (1) the magnitude of an effect of interest, and (2) the precision of the estimate of the magnitude of that effect. All biologists should be ultimately interested in biological importance, which may be assessed using the magnitude of an effect, but not its statistical significance. Therefore, we advocate presentation of measures of the magnitude of effects (i.e. effect size statistics) and their confidence intervals (CIs) in all biological journals. Combined use of an effect size and its CIs enables one to assess the relationships within data more effectively than the use of p values, regardless of statistical significance. In addition, routine presentation of effect sizes will encourage researchers to view their results in the context of previous research and facilitate the incorporation of results into future meta-analysis, which has been increasingly used as the standard method of quantitative review in biology. In this article, we extensively discuss two dimensionless (and thus standardised) classes of effect size statistics: d statistics (standardised mean difference) and r statistics (correlation coefficient), because these can be calculated from almost all study designs and also because their calculations are essential for meta-analysis. However, our focus on these standardised effect size statistics does not mean unstandardised effect size statistics (e.g. mean difference and regression coefficient) are less important. We provide potential solutions for four main technical problems researchers may encounter when calculating effect size and CIs: (1) when covariates exist, (2) when bias in estimating effect size is possible, (3) when data have non-normal error structure and/or variances, and (4) when data are non-independent. Although interpretations of effect sizes are often difficult, we provide some pointers to help researchers. This paper serves both as a beginner's instruction manual and a stimulus for changing statistical practice for the better in the biological sciences.
零假设显著性检验(NHST)是生物学中占主导地位的统计方法,尽管它存在许多常常未被认识到的问题。最重要的是,NHST没有为我们提供两条关键信息:(1)感兴趣效应的大小,以及(2)该效应大小估计的精度。所有生物学家最终都应该关注生物学重要性,这可以通过效应大小来评估,而不是其统计显著性。因此,我们主张在所有生物学期刊中呈现效应大小的度量(即效应量统计)及其置信区间(CI)。与使用p值相比,联合使用效应大小及其CI能够更有效地评估数据中的关系,而不考虑统计显著性。此外,常规呈现效应大小将鼓励研究人员在先前研究的背景下看待他们的结果,并促进将结果纳入未来的荟萃分析,荟萃分析已越来越多地被用作生物学定量综述的标准方法。在本文中,我们广泛讨论了两类无量纲(因此是标准化的)效应量统计:d统计量(标准化均值差)和r统计量(相关系数),因为这些可以从几乎所有研究设计中计算出来,而且它们的计算对于荟萃分析至关重要。然而,我们关注这些标准化效应量统计并不意味着非标准化效应量统计(例如均值差和回归系数)不那么重要。我们针对研究人员在计算效应大小和CI时可能遇到的四个主要技术问题提供了潜在解决方案:(1)存在协变量时,(2)当效应大小估计可能存在偏差时,(3)数据具有非正态误差结构和/或方差时,以及(4)数据非独立时。尽管效应大小的解释通常很困难,但我们提供了一些指导方针来帮助研究人员。本文既是初学者的指导手册,也是推动生物学领域统计实践向好改变的催化剂。