--- title: "Effect Size from Test Statistics" output: rmarkdown::html_vignette: toc: true fig_width: 10.08 fig_height: 6 tags: [r, effect size, standardization, cohen d] vignette: > %\VignetteIndexEntry{Effect Size from Test Statistics} \usepackage[utf8]{inputenc} %\VignetteEngine{knitr::rmarkdown} editor_options: chunk_output_type: console bibliography: bibliography.bib --- ```{r message=FALSE, warning=FALSE, include=FALSE} library(knitr) library(effectsize) knitr::opts_chunk$set(comment = ">") options(digits = 2) options(knitr.kable.NA = "") set.seed(747) .eval_if_requireNamespace <- function(...) { pkgs <- c(...) knitr::opts_chunk$get("eval") && all(sapply(pkgs, requireNamespace, quietly = TRUE)) } ``` # Introduction In many real world applications there are no straightforward ways of obtaining standardized effect sizes. However, it is possible to get approximations of most of the effect size indices ($d$, $r$, $\eta^2_p$...) with the use of test statistics. These conversions are based on the idea that **test statistics are a function of effect size and sample size**. Thus information about samples size (or more often of degrees of freedom) is used to reverse-engineer indices of effect size from test statistics. This idea and these functions also power our [***Effect Sizes From Test Statistics*** *shiny app*](https://easystats4u.shinyapps.io/statistic2effectsize/). The measures discussed here are, in one way or another, ***signal to noise ratios***, with the "noise" representing the unaccounted variance in the outcome variable^[Note that for generalized linear models (Poisson, Logistic...), where the outcome is never on an arbitrary scale, estimates themselves **are** indices of effect size! Thus this vignette is relevant only to general linear models.]. The indices are: - Percent variance explained ($\eta^2_p$, $\omega^2_p$, $\epsilon^2_p$). - Measure of association ($r$). - Measure of difference ($d$). ## (Partial) Percent Variance Explained These measures represent the ratio of $Signal^2 / (Signal^2 + Noise^2)$, with the "noise" having all other "signals" partial-ed out (be they of other fixed or random effects). The most popular of these indices is $\eta^2_p$ (Eta; which is equivalent to $R^2$). The conversion of the $F$- or $t$-statistic is based on @friedman1982simplified. Let's look at an example: ```{r, eval = .eval_if_requireNamespace("afex"), message=FALSE} library(afex) data(md_12.1) aov_fit <- aov_car(rt ~ angle * noise + Error(id / (angle * noise)), data = md_12.1, anova_table = list(correction = "none", es = "pes") ) aov_fit ``` Let's compare the $\eta^2_p$ (the `pes` column) obtained here with ones recovered from `F_to_eta2()`: ```{r} library(effectsize) options(es.use_symbols = TRUE) # get nice symbols when printing! (On Windows, requires R >= 4.2.0) F_to_eta2( f = c(40.72, 33.77, 45.31), df = c(2, 1, 2), df_error = c(18, 9, 18) ) ``` **They are identical!**^[Note that these are *partial* percent variance explained, and so their sum can be larger than 1.] (except for the fact that `F_to_eta2()` also provides confidence intervals^[Confidence intervals for all indices are estimated using the non-centrality parameter method; These methods search for a the best non-central parameter of the non-central $F$/$t$ distribution for the desired tail-probabilities, and then convert these ncps to the corresponding effect sizes.] :) In this case we were able to easily obtain the effect size (thanks to `afex`!), but in other cases it might not be as easy, and using estimates based on test statistic offers a good approximation. For example: ### In Simple Effect and Contrast Analysis ```{r, eval = .eval_if_requireNamespace("afex", "emmeans")} library(emmeans) joint_tests(aov_fit, by = "noise") F_to_eta2( f = c(8, 51), df = 2, df_error = 9 ) ``` We can also use `t_to_eta2()` for contrast analysis: ```{r, eval = .eval_if_requireNamespace("afex", "emmeans")} pairs(emmeans(aov_fit, ~angle)) t_to_eta2( t = c(-6.2, -8.2, -3.2), df_error = 9 ) ``` ### In Linear Mixed Models ```{r, eval = .eval_if_requireNamespace("lmerTest"), message=FALSE} library(lmerTest) fit_lmm <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy) anova(fit_lmm) F_to_eta2(45.9, 1, 17) ``` We can also use `t_to_eta2()` for the slope of `Days` (which in this case gives the same result). ```{r, eval = .eval_if_requireNamespace("lmerTest")} parameters::model_parameters(fit_lmm, effects = "fixed", ci_method = "satterthwaite") t_to_eta2(6.77, df_error = 17) ``` ### Bias-Corrected Indices Alongside $\eta^2_p$ there are also the less biased $\omega_p^2$ (Omega) and $\epsilon^2_p$ (Epsilon; sometimes called $\text{Adj. }\eta^2_p$, which is equivalent to $R^2_{adj}$; @albers2018power, @mordkoff2019simple). ```{r} F_to_eta2(45.9, 1, 17) F_to_epsilon2(45.9, 1, 17) F_to_omega2(45.9, 1, 17) ``` ## Measure of Association Similar to $\eta^2_p$, $r$ is a signal to noise ratio, and is in fact equal to $\sqrt{\eta^2_p}$ (so it's really a *partial* $r$). It is often used instead of $\eta^2_p$ when discussing the *strength* of association (but I suspect people use it instead of $\eta^2_p$ because it gives a bigger number, which looks better). ### For Slopes ```{r, eval = .eval_if_requireNamespace("lmerTest")} parameters::model_parameters(fit_lmm, effects = "fixed", ci_method = "satterthwaite") t_to_r(6.77, df_error = 17) ``` In a fixed-effect linear model, this returns the **partial** correlation. Compare: ```{r} fit_lm <- lm(rating ~ complaints + critical, data = attitude) parameters::model_parameters(fit_lm) t_to_r( t = c(7.46, 0.01), df_error = 27 ) ``` to: ```{r, eval=.eval_if_requireNamespace("correlation")} correlation::correlation(attitude, select = "rating", select2 = c("complaints", "critical"), partial = TRUE ) ``` ### In Contrast Analysis This measure is also sometimes used in contrast analysis, where it is called the point bi-serial correlation - $r_{pb}$ [@cohen1965some; @rosnow2000contrasts]: ```{r, eval = .eval_if_requireNamespace("afex", "emmeans")} pairs(emmeans(aov_fit, ~angle)) t_to_r( t = c(-6.2, -8.2, -3.2), df_error = 9 ) ``` ## Measures of Difference These indices represent $Signal/Noise$ with the "signal" representing the difference between two means. This is akin to Cohen's $d$, and is a close approximation when comparing two groups of equal size [@wolf1986meta; @rosnow2000contrasts]. These can be useful in contrast analyses. ### Between-Subject Contrasts ```{r, eval = .eval_if_requireNamespace("emmeans")} m <- lm(breaks ~ tension, data = warpbreaks) em_tension <- emmeans(m, ~tension) pairs(em_tension) t_to_d( t = c(2.53, 3.72, 1.20), df_error = 51 ) ``` However, these are merely approximations of a *true* Cohen's *d*. It is advised to directly estimate Cohen's *d*, whenever possible. For example, here with `emmeans::eff_size()`: ```{r, eval = .eval_if_requireNamespace("emmeans")} eff_size(em_tension, sigma = sigma(m), edf = df.residual(m)) ``` ### Within-Subject Contrasts ```{r, eval = .eval_if_requireNamespace("afex", "emmeans")} pairs(emmeans(aov_fit, ~angle)) t_to_d( t = c(-6.2, -8.2, -3.3), df_error = 9, paired = TRUE ) ``` (Note set `paired = TRUE` to not over estimate the size of the effect; @rosenthal1991meta; @rosnow2000contrasts) # References