Title:  Indices of Effect Size 

Description:  Provide utilities to work with indices of effect size for a wide variety of models and hypothesis tests (see list of supported models using the function 'insight::supported_models()'), allowing computation of and conversion between indices such as Cohen's d, r, odds, etc. References: BenShachar et al. (2020) <doi:10.21105/joss.02815>. 
Authors:  Mattan S. BenShachar [aut, cre] (<https://orcid.org/0000000242874801>, @mattansb), Dominique Makowski [aut] (<https://orcid.org/0000000153759967>, @Dom_Makowski), Daniel Lüdecke [aut] (<https://orcid.org/0000000288953206>, @strengejacke), Indrajeet Patil [aut] (<https://orcid.org/0000000319956531>, @patilindrajeets), Brenton M. Wiernik [aut] (<https://orcid.org/0000000195606336>, @bmwiernik), Rémi Thériault [aut] (<https://orcid.org/0000000343156788>, @rempsyc), Ken Kelley [ctb], David Stanley [ctb], Aaron Caldwell [ctb] , Jessica Burnett [rev] , Johannes Karreth [rev] , Philip Waggoner [aut, ctb] 
Maintainer:  Mattan S. BenShachar <[email protected]> 
License:  MIT + file LICENSE 
Version:  0.8.9 
Built:  20240721 07:23:39 UTC 
Source:  https://github.com/easystats/effectsize 
$\chi^2$
to $\phi$
and Other Correlationlike Effect SizesConvert between $\chi^2$
(chisquare), $\phi$
(phi), Cramer's
$V$
, Tschuprow's $T$
, Cohen's $w$
,
פ (Fei) and Pearson's $C$
for contingency
tables or goodness of fit.
chisq_to_phi( chisq, n, nrow = 2, ncol = 2, adjust = TRUE, ci = 0.95, alternative = "greater", ... ) chisq_to_cohens_w( chisq, n, nrow, ncol, p, ci = 0.95, alternative = "greater", ... ) chisq_to_cramers_v( chisq, n, nrow, ncol, adjust = TRUE, ci = 0.95, alternative = "greater", ... ) chisq_to_tschuprows_t( chisq, n, nrow, ncol, adjust = TRUE, ci = 0.95, alternative = "greater", ... ) chisq_to_fei(chisq, n, nrow, ncol, p, ci = 0.95, alternative = "greater", ...) chisq_to_pearsons_c( chisq, n, nrow, ncol, ci = 0.95, alternative = "greater", ... ) phi_to_chisq(phi, n, ...)
chisq_to_phi( chisq, n, nrow = 2, ncol = 2, adjust = TRUE, ci = 0.95, alternative = "greater", ... ) chisq_to_cohens_w( chisq, n, nrow, ncol, p, ci = 0.95, alternative = "greater", ... ) chisq_to_cramers_v( chisq, n, nrow, ncol, adjust = TRUE, ci = 0.95, alternative = "greater", ... ) chisq_to_tschuprows_t( chisq, n, nrow, ncol, adjust = TRUE, ci = 0.95, alternative = "greater", ... ) chisq_to_fei(chisq, n, nrow, ncol, p, ci = 0.95, alternative = "greater", ...) chisq_to_pearsons_c( chisq, n, nrow, ncol, ci = 0.95, alternative = "greater", ... ) phi_to_chisq(phi, n, ...)
chisq 
The 
n 
Total sample size. 
nrow , ncol

The number of rows/columns in the contingency table. 
adjust 
Should the effect size be corrected for smallsample bias?
Defaults to 
ci 
Confidence Interval (CI) level 
alternative 
a character string specifying the alternative hypothesis;
Controls the type of CI returned: 
... 
Arguments passed to or from other methods. 
p 
Vector of expected values. See 
phi 
The 
These functions use the following formulas:
$\phi = w = \sqrt{\chi^2 / n}$
$\textrm{Cramer's } V = \phi / \sqrt{\min(\textit{nrow}, \textit{ncol})  1}$
$\textrm{Tschuprow's } T = \phi / \sqrt[4]{(\textit{nrow}  1) \times (\textit{ncol}  1)}$
$פ = \phi / \sqrt{[1 / \min(p_E)]  1}$
Where $p_E$
are the expected probabilities.
$\textrm{Pearson's } C = \sqrt{\chi^2 / (\chi^2 + n)}$
For versions adjusted for smallsample bias of $\phi$
, $V$
, and $T$
,
see Bergsma, 2013.
A data frame with the effect size(s), and confidence interval(s). See
cramers_v()
.
Unless stated otherwise, confidence (compatibility) intervals (CIs) are
estimated using the noncentrality parameter method (also called the "pivot
method"). This method finds the noncentrality parameter ("ncp") of a
noncentral t, F, or $\chi^2$
distribution that places the observed
t, F, or $\chi^2$
test statistic at the desired probability point of
the distribution. For example, if the observed t statistic is 2.0, with 50
degrees of freedom, for which cumulative noncentral t distribution is t =
2.0 the .025 quantile (answer: the noncentral t distribution with ncp =
.04)? After estimating these confidence bounds on the ncp, they are
converted into the effect size metric to obtain a confidence interval for the
effect size (Steiger, 2004).
For additional details on estimation and troubleshooting, see effectsize_CIs.
"Confidence intervals on measures of effect size convey all the information
in a hypothesis test, and more." (Steiger, 2004). Confidence (compatibility)
intervals and p values are complementary summaries of parameter uncertainty
given the observed data. A dichotomous hypothesis test could be performed
with either a CI or a p value. The 100 (1  $\alpha$
)% confidence
interval contains all of the parameter values for which p > $\alpha$
for the current data and model. For example, a 95% confidence interval
contains all of the values for which p > .05.
Note that a confidence interval including 0 does not indicate that the null
(no effect) is true. Rather, it suggests that the observed data together with
the model and its assumptions combined do not provided clear evidence against
a parameter value of 0 (same as with any other value in the interval), with
the level of this evidence defined by the chosen $\alpha$
level (Rafi &
Greenland, 2020; Schweder & Hjort, 2016; Xie & Singh, 2013). To infer no
effect, additional judgments about what parameter values are "close enough"
to 0 to be negligible are needed ("equivalence testing"; Bauer & Kiesser,
1996).
see
The see
package contains relevant plotting functions. See the plotting vignette in the see
package.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd Ed.). New York: Routledge.
Cumming, G., & Finch, S. (2001). A primer on the understanding, use, and calculation of confidence intervals that are based on central and noncentral distributions. Educational and Psychological Measurement, 61(4), 532574.
BenShachar, M.S., Patil, I., Thériault, R., Wiernik, B.M., Lüdecke, D. (2023). Phi, Fei, Fo, Fum: Effect Sizes for Categorical Data That Use the Chi‑Squared Statistic. Mathematics, 11, 1982. doi:10.3390/math11091982
Bergsma, W. (2013). A biascorrection for Cramer's V and Tschuprow's T. Journal of the Korean Statistical Society, 42(3), 323328.
Johnston, J. E., Berry, K. J., & Mielke Jr, P. W. (2006). Measures of effect size for chisquared and likelihoodratio goodnessoffit tests. Perceptual and motor skills, 103(2), 412414.
Rosenberg, M. S. (2010). A generalized formula for converting chisquare tests to effect sizes for metaanalysis. PloS one, 5(4), e10059.
phi()
for more details.
Other effect size from test statistic:
F_to_eta2()
,
t_to_d()
data("Music_preferences") # chisq.test(Music_preferences) #> #> Pearson's Chisquared test #> #> data: Music_preferences #> Xsquared = 95.508, df = 6, pvalue < 2.2e16 #> chisq_to_cohens_w(95.508, n = sum(Music_preferences), nrow = nrow(Music_preferences), ncol = ncol(Music_preferences) ) data("Smoking_FASD") # chisq.test(Smoking_FASD, p = c(0.015, 0.010, 0.975)) #> #> Chisquared test for given probabilities #> #> data: Smoking_FASD #> Xsquared = 7.8521, df = 2, pvalue = 0.01972 chisq_to_fei( 7.8521, n = sum(Smoking_FASD), nrow = 1, ncol = 3, p = c(0.015, 0.010, 0.975) )
data("Music_preferences") # chisq.test(Music_preferences) #> #> Pearson's Chisquared test #> #> data: Music_preferences #> Xsquared = 95.508, df = 6, pvalue < 2.2e16 #> chisq_to_cohens_w(95.508, n = sum(Music_preferences), nrow = nrow(Music_preferences), ncol = ncol(Music_preferences) ) data("Smoking_FASD") # chisq.test(Smoking_FASD, p = c(0.015, 0.010, 0.975)) #> #> Chisquared test for given probabilities #> #> data: Smoking_FASD #> Xsquared = 7.8521, df = 2, pvalue = 0.01972 chisq_to_fei( 7.8521, n = sum(Smoking_FASD), nrow = 1, ncol = 3, p = c(0.015, 0.010, 0.975) )
Compute effect size indices for standardized differences: Cohen's d,
Hedges' g and Glass’s delta ($\Delta$
). (This function returns the
population estimate.) Pair with any reported stats::t.test()
.
Both Cohen's d and Hedges' g are the estimated the standardized
difference between the means of two populations. Hedges' g provides a
correction for smallsample bias (using the exact method) to Cohen's d. For
sample sizes > 20, the results for both statistics are roughly equivalent.
Glass’s delta is appropriate when the standard deviations are significantly
different between the populations, as it uses only the second group's
standard deviation.
cohens_d( x, y = NULL, data = NULL, pooled_sd = TRUE, mu = 0, paired = FALSE, adjust = FALSE, ci = 0.95, alternative = "two.sided", verbose = TRUE, ... ) hedges_g( x, y = NULL, data = NULL, pooled_sd = TRUE, mu = 0, paired = FALSE, ci = 0.95, alternative = "two.sided", verbose = TRUE, ... ) glass_delta( x, y = NULL, data = NULL, mu = 0, adjust = TRUE, ci = 0.95, alternative = "two.sided", verbose = TRUE, ... )
cohens_d( x, y = NULL, data = NULL, pooled_sd = TRUE, mu = 0, paired = FALSE, adjust = FALSE, ci = 0.95, alternative = "two.sided", verbose = TRUE, ... ) hedges_g( x, y = NULL, data = NULL, pooled_sd = TRUE, mu = 0, paired = FALSE, ci = 0.95, alternative = "two.sided", verbose = TRUE, ... ) glass_delta( x, y = NULL, data = NULL, mu = 0, adjust = TRUE, ci = 0.95, alternative = "two.sided", verbose = TRUE, ... )
x , y

A numeric vector, or a character name of one in 
data 
An optional data frame containing the variables. 
pooled_sd 
If 
mu 
a number indicating the true value of the mean (or difference in means if you are performing a two sample test). 
paired 
If 
adjust 
Should the effect size be adjusted for smallsample bias using
Hedges' method? Note that 
ci 
Confidence Interval (CI) level 
alternative 
a character string specifying the alternative hypothesis;
Controls the type of CI returned: 
verbose 
Toggle warnings and messages on or off. 
... 
Arguments passed to or from other methods. When 
Set pooled_sd = FALSE
for effect sizes that are to accompany a Welch's
ttest (Delacre et al, 2021).
A data frame with the effect size ( Cohens_d
, Hedges_g
,
Glass_delta
) and their CIs (CI_low
and CI_high
).
Unless stated otherwise, confidence (compatibility) intervals (CIs) are
estimated using the noncentrality parameter method (also called the "pivot
method"). This method finds the noncentrality parameter ("ncp") of a
noncentral t, F, or $\chi^2$
distribution that places the observed
t, F, or $\chi^2$
test statistic at the desired probability point of
the distribution. For example, if the observed t statistic is 2.0, with 50
degrees of freedom, for which cumulative noncentral t distribution is t =
2.0 the .025 quantile (answer: the noncentral t distribution with ncp =
.04)? After estimating these confidence bounds on the ncp, they are
converted into the effect size metric to obtain a confidence interval for the
effect size (Steiger, 2004).
For additional details on estimation and troubleshooting, see effectsize_CIs.
"Confidence intervals on measures of effect size convey all the information
in a hypothesis test, and more." (Steiger, 2004). Confidence (compatibility)
intervals and p values are complementary summaries of parameter uncertainty
given the observed data. A dichotomous hypothesis test could be performed
with either a CI or a p value. The 100 (1  $\alpha$
)% confidence
interval contains all of the parameter values for which p > $\alpha$
for the current data and model. For example, a 95% confidence interval
contains all of the values for which p > .05.
Note that a confidence interval including 0 does not indicate that the null
(no effect) is true. Rather, it suggests that the observed data together with
the model and its assumptions combined do not provided clear evidence against
a parameter value of 0 (same as with any other value in the interval), with
the level of this evidence defined by the chosen $\alpha$
level (Rafi &
Greenland, 2020; Schweder & Hjort, 2016; Xie & Singh, 2013). To infer no
effect, additional judgments about what parameter values are "close enough"
to 0 to be negligible are needed ("equivalence testing"; Bauer & Kiesser,
1996).
see
The see
package contains relevant plotting functions. See the plotting vignette in the see
package.
The indices here give the population estimated standardized difference. Some statistical packages give the sample estimate instead (without applying Bessel's correction).
Algina, J., Keselman, H. J., & Penfield, R. D. (2006). Confidence intervals for an effect size when variances are not equal. Journal of Modern Applied Statistical Methods, 5(1), 2.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd Ed.). New York: Routledge.
Delacre, M., Lakens, D., Ley, C., Liu, L., & Leys, C. (2021, May 7). Why Hedges’ g*s based on the nonpooled standard deviation should be reported with Welch's ttest. doi:10.31234/osf.io/tu6mp
Hedges, L. V. & Olkin, I. (1985). Statistical methods for metaanalysis. Orlando, FL: Academic Press.
Hunter, J. E., & Schmidt, F. L. (2004). Methods of metaanalysis: Correcting error and bias in research findings. Sage.
rm_d()
, sd_pooled()
, t_to_d()
, r_to_d()
Other standardized differences:
mahalanobis_d()
,
means_ratio()
,
p_superiority()
,
rank_biserial()
,
repeated_measures_d()
data(mtcars) mtcars$am < factor(mtcars$am) # Two Independent Samples  (d < cohens_d(mpg ~ am, data = mtcars)) # Same as: # cohens_d("mpg", "am", data = mtcars) # cohens_d(mtcars$mpg[mtcars$am=="0"], mtcars$mpg[mtcars$am=="1"]) # More options: cohens_d(mpg ~ am, data = mtcars, pooled_sd = FALSE) cohens_d(mpg ~ am, data = mtcars, mu = 5) cohens_d(mpg ~ am, data = mtcars, alternative = "less") hedges_g(mpg ~ am, data = mtcars) glass_delta(mpg ~ am, data = mtcars) # One Sample  cohens_d(wt ~ 1, data = mtcars) # same as: # cohens_d("wt", data = mtcars) # cohens_d(mtcars$wt) # More options: cohens_d(wt ~ 1, data = mtcars, mu = 3) hedges_g(wt ~ 1, data = mtcars, mu = 3) # Paired Samples  data(sleep) cohens_d(Pair(extra[group == 1], extra[group == 2]) ~ 1, data = sleep) # same as: # cohens_d(sleep$extra[sleep$group == 1], sleep$extra[sleep$group == 2], paired = TRUE) # cohens_d(sleep$extra[sleep$group == 1]  sleep$extra[sleep$group == 2]) # rm_d(sleep$extra[sleep$group == 1], sleep$extra[sleep$group == 2], method = "z", adjust = FALSE) # More options: cohens_d(Pair(extra[group == 1], extra[group == 2]) ~ 1, data = sleep, mu = 1, verbose = FALSE) hedges_g(Pair(extra[group == 1], extra[group == 2]) ~ 1, data = sleep, verbose = FALSE) # Interpretation  interpret_cohens_d(1.48, rules = "cohen1988") interpret_hedges_g(1.48, rules = "sawilowsky2009") interpret_glass_delta(1.48, rules = "gignac2016") # Or: interpret(d, rules = "sawilowsky2009") # Common Language Effect Sizes d_to_u3(1.48) # Or: print(d, append_CLES = TRUE)
data(mtcars) mtcars$am < factor(mtcars$am) # Two Independent Samples  (d < cohens_d(mpg ~ am, data = mtcars)) # Same as: # cohens_d("mpg", "am", data = mtcars) # cohens_d(mtcars$mpg[mtcars$am=="0"], mtcars$mpg[mtcars$am=="1"]) # More options: cohens_d(mpg ~ am, data = mtcars, pooled_sd = FALSE) cohens_d(mpg ~ am, data = mtcars, mu = 5) cohens_d(mpg ~ am, data = mtcars, alternative = "less") hedges_g(mpg ~ am, data = mtcars) glass_delta(mpg ~ am, data = mtcars) # One Sample  cohens_d(wt ~ 1, data = mtcars) # same as: # cohens_d("wt", data = mtcars) # cohens_d(mtcars$wt) # More options: cohens_d(wt ~ 1, data = mtcars, mu = 3) hedges_g(wt ~ 1, data = mtcars, mu = 3) # Paired Samples  data(sleep) cohens_d(Pair(extra[group == 1], extra[group == 2]) ~ 1, data = sleep) # same as: # cohens_d(sleep$extra[sleep$group == 1], sleep$extra[sleep$group == 2], paired = TRUE) # cohens_d(sleep$extra[sleep$group == 1]  sleep$extra[sleep$group == 2]) # rm_d(sleep$extra[sleep$group == 1], sleep$extra[sleep$group == 2], method = "z", adjust = FALSE) # More options: cohens_d(Pair(extra[group == 1], extra[group == 2]) ~ 1, data = sleep, mu = 1, verbose = FALSE) hedges_g(Pair(extra[group == 1], extra[group == 2]) ~ 1, data = sleep, verbose = FALSE) # Interpretation  interpret_cohens_d(1.48, rules = "cohen1988") interpret_hedges_g(1.48, rules = "sawilowsky2009") interpret_glass_delta(1.48, rules = "gignac2016") # Or: interpret(d, rules = "sawilowsky2009") # Common Language Effect Sizes d_to_u3(1.48) # Or: print(d, append_CLES = TRUE)
Cohen's g is an effect size of asymmetry (or marginal heterogeneity) for
dependent (paired) contingency tables ranging between 0 (perfect symmetry)
and 0.5 (perfect asymmetry) (see stats::mcnemar.test()
). (Note this is not
not a measure of (dis)agreement between the pairs, but of (a)symmetry.)
cohens_g(x, y = NULL, ci = 0.95, alternative = "two.sided", ...)
cohens_g(x, y = NULL, ci = 0.95, alternative = "two.sided", ...)
x 
a numeric vector or matrix. 
y 
a numeric vector; ignored if 
ci 
Confidence Interval (CI) level 
alternative 
a character string specifying the alternative hypothesis;
Controls the type of CI returned: 
... 
Ignored 
A data frame with the effect size (Cohens_g
, Risk_ratio
(possibly with the prefix log_
), Cohens_h
) and its CIs (CI_low
and
CI_high
).
Confidence intervals are based on the proportion ($P = g + 0.5$
)
confidence intervals returned by stats::prop.test()
(minus 0.5), which give
a good close approximation.
"Confidence intervals on measures of effect size convey all the information
in a hypothesis test, and more." (Steiger, 2004). Confidence (compatibility)
intervals and p values are complementary summaries of parameter uncertainty
given the observed data. A dichotomous hypothesis test could be performed
with either a CI or a p value. The 100 (1  $\alpha$
)% confidence
interval contains all of the parameter values for which p > $\alpha$
for the current data and model. For example, a 95% confidence interval
contains all of the values for which p > .05.
Note that a confidence interval including 0 does not indicate that the null
(no effect) is true. Rather, it suggests that the observed data together with
the model and its assumptions combined do not provided clear evidence against
a parameter value of 0 (same as with any other value in the interval), with
the level of this evidence defined by the chosen $\alpha$
level (Rafi &
Greenland, 2020; Schweder & Hjort, 2016; Xie & Singh, 2013). To infer no
effect, additional judgments about what parameter values are "close enough"
to 0 to be negligible are needed ("equivalence testing"; Bauer & Kiesser,
1996).
see
The see
package contains relevant plotting functions. See the plotting vignette in the see
package.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd Ed.). New York: Routledge.
Other effect sizes for contingency table:
oddsratio()
,
phi()
data("screening_test") phi(screening_test$Diagnosis, screening_test$Test1) phi(screening_test$Diagnosis, screening_test$Test2) # Both tests seem comparable  but are the tests actually different? (tests < table(Test1 = screening_test$Test1, Test2 = screening_test$Test2)) mcnemar.test(tests) cohens_g(tests) # Test 2 gives a negative result more than test 1!
data("screening_test") phi(screening_test$Diagnosis, screening_test$Test1) phi(screening_test$Diagnosis, screening_test$Test2) # Both tests seem comparable  but are the tests actually different? (tests < table(Test1 = screening_test$Test1, Test2 = screening_test$Test2)) mcnemar.test(tests) cohens_g(tests) # Test 2 gives a negative result more than test 1!
Enables a conversion between different indices of effect size, such as standardized difference (Cohen's d), (pointbiserial) correlation r or (log) odds ratios.
d_to_r(d, n1, n2, ...) r_to_d(r, n1, n2, ...) oddsratio_to_d(OR, log = FALSE, ...) logoddsratio_to_d(logOR, log = TRUE, ...) d_to_oddsratio(d, log = FALSE, ...) d_to_logoddsratio(d, log = TRUE, ...) oddsratio_to_r(OR, n1, n2, log = FALSE, ...) logoddsratio_to_r(logOR, log = TRUE, ...) r_to_oddsratio(r, n1, n2, log = FALSE, ...) r_to_logoddsratio(r, n1, n2, log = TRUE, ...)
d_to_r(d, n1, n2, ...) r_to_d(r, n1, n2, ...) oddsratio_to_d(OR, log = FALSE, ...) logoddsratio_to_d(logOR, log = TRUE, ...) d_to_oddsratio(d, log = FALSE, ...) d_to_logoddsratio(d, log = TRUE, ...) oddsratio_to_r(OR, n1, n2, log = FALSE, ...) logoddsratio_to_r(logOR, log = TRUE, ...) r_to_oddsratio(r, n1, n2, log = FALSE, ...) r_to_logoddsratio(r, n1, n2, log = TRUE, ...)
d , r , OR , logOR

Standardized difference value (Cohen's d), correlation coefficient (r), Odds ratio, or logged Odds ratio. 
n1 , n2

Group sample sizes. If either is missing, groups are assumed to be of equal size. 
... 
Arguments passed to or from other methods. 
log 
Take in or output the log of the ratio (such as in logistic models), e.g. when the desired input or output are log odds ratios instead odds ratios. 
Conversions between d and OR is done through these formulae:
$d = \frac{\log(OR)\times\sqrt{3}}{\pi}$
$log(OR) = d * \frac{\pi}{\sqrt(3)}$
Converting between d and r is done through these formulae:
$d = \frac{\sqrt{h} * r}{\sqrt{1  r^2}}$
$r = \frac{d}{\sqrt{d^2 + h}}$
Where $h = \frac{n_1 + n_2  2}{n_1} + \frac{n_1 + n_2  2}{n_2}$
.
When groups are of equal size, h reduces to approximately 4. The resulting
r is also called the binomial effect size display (BESD; Rosenthal et al.,
1982).
Converted index.
Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Converting among effect sizes. Introduction to metaanalysis, 4549.
Jacobs, P., & Viechtbauer, W. (2017). Estimation of the biserial correlation and its sampling variance for use in metaanalysis. Research synthesis methods, 8(2), 161180. doi:10.1002/jrsm.1218
Rosenthal, R., & Rubin, D. B. (1982). A simple, general purpose display of magnitude of experimental effect. Journal of educational psychology, 74(2), 166.
SánchezMeca, J., MarínMartínez, F., & ChacónMoscoso, S. (2003). Effectsize indices for dichotomized outcomes in metaanalysis. Psychological methods, 8(4), 448.
Other convert between effect sizes:
diff_to_cles
,
eta2_to_f2()
,
odds_to_probs()
,
oddsratio_to_riskratio()
,
w_to_fei()
r_to_d(0.5) d_to_oddsratio(1.154701) oddsratio_to_r(8.120534) d_to_r(1) r_to_oddsratio(0.4472136, log = TRUE) oddsratio_to_d(1.813799, log = TRUE)
r_to_d(0.5) d_to_oddsratio(1.154701) oddsratio_to_r(8.120534) d_to_r(1) r_to_oddsratio(0.4472136, log = TRUE) oddsratio_to_d(1.813799, log = TRUE)
Convert Standardized Differences to Common Language Effect Sizes
d_to_p_superiority(d) rb_to_p_superiority(rb) rb_to_vda(rb) d_to_u2(d) d_to_u1(d) d_to_u3(d) d_to_overlap(d) rb_to_wmw_odds(rb)
d_to_p_superiority(d) rb_to_p_superiority(rb) rb_to_vda(rb) d_to_u2(d) d_to_u1(d) d_to_u3(d) d_to_overlap(d) rb_to_wmw_odds(rb)
d , rb

A numeric vector of Cohen's d / rankbiserial correlation or
the output from 
This function use the following formulae for Cohen's d:
$Pr(superiority) = \Phi(d/\sqrt{2})$
$\textrm{Cohen's } U_3 = \Phi(d)$
$\textrm{Cohen's } U_2 = \Phi(d/2)$
$\textrm{Cohen's } U_1 = (2\times U_2  1)/U_2$
$Overlap = 2 \times \Phi(d/2)$
And the following for the rankbiserial correlation:
$Pr(superiority) = (r_{rb} + 1)/2$
$WMW_{Odds} = Pr(superiority) / (1  Pr(superiority))$
A list of Cohen's U3
, Overlap
, Pr(superiority)
, a
numeric vector of Pr(superiority)
, or a data frame, depending
on the input.
For d, these calculations assume that the populations have equal variance and are normally distributed.
Vargha and Delaney's A is an alias for the nonparametric probability of superiority.
Cohen, J. (1977). Statistical power analysis for the behavioral sciences. New York: Routledge.
Reiser, B., & Faraggi, D. (1999). Confidence intervals for the overlapping coefficient: the normal equal variance case. Journal of the Royal Statistical Society, 48(3), 413418.
Ruscio, J. (2008). A probabilitybased measure of effect size: robustness to base rates and other factors. Psychological methods, 13(1), 19–30.
cohens_u3()
for descriptions of the effect sizes (also,
cohens_d()
, rank_biserial()
).
Other convert between effect sizes:
d_to_r()
,
eta2_to_f2()
,
odds_to_probs()
,
oddsratio_to_riskratio()
,
w_to_fei()
effectsize
APIRead the Support functions for model extensions vignette.
.es_aov_simple( aov_table, type = c("eta", "omega", "epsilon"), partial = TRUE, generalized = FALSE, include_intercept = FALSE, ci = 0.95, alternative = "greater", verbose = TRUE ) .es_aov_strata( aov_table, DV_names, type = c("eta", "omega", "epsilon"), partial = TRUE, generalized = FALSE, include_intercept = FALSE, ci = 0.95, alternative = "greater", verbose = TRUE ) .es_aov_table( aov_table, type = c("eta", "omega", "epsilon"), partial = TRUE, generalized = FALSE, include_intercept = FALSE, ci = 0.95, alternative = "greater", verbose = TRUE )
.es_aov_simple( aov_table, type = c("eta", "omega", "epsilon"), partial = TRUE, generalized = FALSE, include_intercept = FALSE, ci = 0.95, alternative = "greater", verbose = TRUE ) .es_aov_strata( aov_table, DV_names, type = c("eta", "omega", "epsilon"), partial = TRUE, generalized = FALSE, include_intercept = FALSE, ci = 0.95, alternative = "greater", verbose = TRUE ) .es_aov_table( aov_table, type = c("eta", "omega", "epsilon"), partial = TRUE, generalized = FALSE, include_intercept = FALSE, ci = 0.95, alternative = "greater", verbose = TRUE )
aov_table 
Input data frame 
type 
Which effect size to compute? 
partial , generalized , ci , alternative , verbose

See 
include_intercept 
Should the intercept ( 
DV_names 
A character vector with the names of all the predictors,
including the grouping variable (e.g., 
More information regarding Confidence (Compatibiity) Intervals and how they are computed in effectsize.
Unless stated otherwise, confidence (compatibility) intervals (CIs) are
estimated using the noncentrality parameter method (also called the "pivot
method"). This method finds the noncentrality parameter ("ncp") of a
noncentral t, F, or $\chi^2$
distribution that places the observed
t, F, or $\chi^2$
test statistic at the desired probability point of
the distribution. For example, if the observed t statistic is 2.0, with 50
degrees of freedom, for which cumulative noncentral t distribution is t =
2.0 the .025 quantile (answer: the noncentral t distribution with ncp =
.04)? After estimating these confidence bounds on the ncp, they are
converted into the effect size metric to obtain a confidence interval for the
effect size (Steiger, 2004).
For additional details on estimation and troubleshooting, see effectsize_CIs.
"Confidence intervals on measures of effect size convey all the information
in a hypothesis test, and more." (Steiger, 2004). Confidence (compatibility)
intervals and p values are complementary summaries of parameter uncertainty
given the observed data. A dichotomous hypothesis test could be performed
with either a CI or a p value. The 100 (1  $\alpha$
)% confidence
interval contains all of the parameter values for which p > $\alpha$
for the current data and model. For example, a 95% confidence interval
contains all of the values for which p > .05.
Note that a confidence interval including 0 does not indicate that the null
(no effect) is true. Rather, it suggests that the observed data together with
the model and its assumptions combined do not provided clear evidence against
a parameter value of 0 (same as with any other value in the interval), with
the level of this evidence defined by the chosen $\alpha$
level (Rafi &
Greenland, 2020; Schweder & Hjort, 2016; Xie & Singh, 2013). To infer no
effect, additional judgments about what parameter values are "close enough"
to 0 to be negligible are needed ("equivalence testing"; Bauer & Kiesser,
1996).
Some effect sizes are directionless–they do have a minimum value that would
be interpreted as "no effect", but they cannot cross it. For example, a null
value of Kendall's W is 0, indicating no difference between
groups, but it can never have a negative value. Same goes for
U2 and Overlap: the null value of $U_2$
is
0.5, but it can never be smaller than 0.5; am Overlap of 1 means "full
overlap" (no difference), but it cannot be larger than 1.
When bootstrapping CIs for such effect sizes, the bounds of the CIs will
never cross (and often will never cover) the null. Therefore, these CIs
should not be used for statistical inference.
Typically, CIs are constructed as twotailed intervals, with an equal
proportion of the cumulative probability distribution above and below the
interval. CIs can also be constructed as onesided intervals,
giving only a lower bound or upper bound. This is analogous to computing a
1tailed p value or conducting a 1tailed hypothesis test.
Significance tests conducted using CIs (whether a value is inside the interval)
and using p values (whether p < alpha for that value) are only guaranteed
to agree when both are constructed using the same number of sides/tails.
Most effect sizes are not bounded by zero (e.g., r, d, g), and as such
are generally tested using 2tailed tests and 2sided CIs.
Some effect sizes are strictly positive–they do have a minimum value, of 0.
For example, $R^2$
, $\eta^2$
, $sr^2$
, and other varianceaccountedfor effect
sizes, as well as Cramer's V and multiple R, range from 0 to 1. These
typically involve F or $\chi^2$
statistics and are generally tested
using 1tailed tests which test whether the estimated effect size is
larger than the hypothesized null value (e.g., 0). In order for a CI to
yield the same significance decision it must then by a 1sided CI,
estimating only a lower bound. This is the default CI computed by
effectsize for these effect sizes, where alternative = "greater"
is set.
This lower bound interval indicates the smallest effect size that is not
significantly different from the observed effect size. That is, it is the
minimum effect size compatible with the observed data, background model
assumptions, and $\alpha$
level. This type of interval does not indicate
a maximum effect size value; anything up to the maximum possible value of the
effect size (e.g., 1) is in the interval.
Onesided CIs can also be used to test against a maximum effect size value
(e.g., is $R^2$
significantly smaller than a perfect correlation of 1.0?)
by setting alternative = "less"
. This estimates a CI with only an
upper bound; anything from the minimum possible value of the effect size
(e.g., 0) up to this upper bound is in the interval.
We can also obtain a 2sided interval by setting alternative = "two.sided"
.
These intervals can be interpreted in the same way as other 2sided
intervals, such as those for r, d, or g.
An alternative approach to aligning significance tests using CIs and 1tailed
p values that can often be found in the literature is to construct a
2sided CI at a lower confidence level (e.g., 100(12$\alpha$
)% = 100 
2*5% = 90%. This estimates the lower bound and upper bound for the above
1sided intervals simultaneously. These intervals are commonly reported when
conducting equivalence tests. For example, a 90% 2sided interval gives
the bounds for an equivalence test with $\alpha$
= .05. However, be aware
that this interval does not give 95% coverage for the underlying effect size
parameter value. For that, construct a 95% 2sided CI.
data("hardlyworking") fit < lm(salary ~ n_comps, data = hardlyworking) eta_squared(fit) # default, ci = 0.95, alternative = "greater" #> For oneway between subjects designs, partial eta squared is equivalent #> to eta squared. Returning eta squared. #> # Effect Size for ANOVA #> #> Parameter  Eta2  95% CI #>  #> n_comps  0.19  [0.14, 1.00] #> #>  Onesided CIs: upper bound fixed at [1.00].
eta_squared(fit, alternative = "less") # Test is eta is smaller than some value #> For oneway between subjects designs, partial eta squared is equivalent #> to eta squared. Returning eta squared. #> # Effect Size for ANOVA #> #> Parameter  Eta2  95% CI #>  #> n_comps  0.19  [0.00, 0.24] #> #>  Onesided CIs: lower bound fixed at [0.00].
eta_squared(fit, alternative = "two.sided") # 2sided bounds for alpha = .05 #> For oneway between subjects designs, partial eta squared is equivalent #> to eta squared. Returning eta squared. #> # Effect Size for ANOVA #> #> Parameter  Eta2  95% CI #>  #> n_comps  0.19  [0.14, 0.25]
eta_squared(fit, ci = 0.9, alternative = "two.sided") # both 1sided bounds for alpha = .05 #> For oneway between subjects designs, partial eta squared is equivalent #> to eta squared. Returning eta squared. #> # Effect Size for ANOVA #> #> Parameter  Eta2  90% CI #>  #> n_comps  0.19  [0.14, 0.24]
For very large sample sizes or effect sizes, the width of the CI can be smaller than the tolerance of the optimizer, resulting in CIs of width 0. This can also result in the estimated CIs excluding the point estimate.
In these cases, consider an alternative method for computing CIs, such as the bootstrap.
Bauer, P., & Kieser, M. (1996). A unifying approach for confidence intervals and testing of equivalence and difference. Biometrika, 83(4), 934–937. doi:10.1093/biomet/83.4.934
Rafi, Z., & Greenland, S. (2020). Semantic and cognitive tools to aid statistical science: Replace confidence and significance by compatibility and surprise. BMC Medical Research Methodology, 20(1), Article 244. doi:10.1186/s12874020011059
Schweder, T., & Hjort, N. L. (2016). Confidence, likelihood, probability: Statistical inference with confidence distributions. Cambridge University Press. doi:10.1017/CBO9781139046671
Steiger, J. H. (2004). Beyond the F test: Effect size confidence intervals and tests of close fit in the analysis of variance and contrast analysis. Psychological Methods, 9(2), 164–182. doi:10.1037/1082989x.9.2.164
Xie, M., & Singh, K. (2013). Confidence distribution, the frequentist distribution estimator of a parameter: A review. International Statistical Review, 81(1), 3–39. doi:10.1111/insr.12000
Deprecated / Defunct Functions
convert_odds_to_probs(...) convert_probs_to_odds(...) convert_d_to_r(...) convert_r_to_d(...) convert_oddsratio_to_d(...) convert_d_to_oddsratio(...) convert_oddsratio_to_r(...) convert_r_to_oddsratio(...) interpret_d(...) interpret_g(...) interpret_delta(...) interpret_parameters(...) normalized_chi(...) chisq_to_normalized(...) convert_d_to_common_language(...) d_to_common_language(...) convert_rb_to_common_language(...) rb_to_common_language(...) common_language(...)
convert_odds_to_probs(...) convert_probs_to_odds(...) convert_d_to_r(...) convert_r_to_d(...) convert_oddsratio_to_d(...) convert_d_to_oddsratio(...) convert_oddsratio_to_r(...) convert_r_to_oddsratio(...) interpret_d(...) interpret_g(...) interpret_delta(...) interpret_parameters(...) normalized_chi(...) chisq_to_normalized(...) convert_d_to_common_language(...) d_to_common_language(...) convert_rb_to_common_language(...) rb_to_common_language(...) common_language(...)
... 
Arguments to the deprecated function. 
effectsize
optionsCurrently, the following global options are supported:
es.use_symbols
logical: Should proper symbols be printed (TRUE
) instead of transliterated effect size names (FALSE
; default).
This function tries to return the best effectsize measure for the provided input model. See details.
## S3 method for class 'BFBayesFactor' effectsize(model, type = NULL, ci = 0.95, test = NULL, verbose = TRUE, ...) effectsize(model, ...) ## S3 method for class 'aov' effectsize(model, type = NULL, ...) ## S3 method for class 'htest' effectsize(model, type = NULL, verbose = TRUE, ...)
## S3 method for class 'BFBayesFactor' effectsize(model, type = NULL, ci = 0.95, test = NULL, verbose = TRUE, ...) effectsize(model, ...) ## S3 method for class 'aov' effectsize(model, type = NULL, ...) ## S3 method for class 'htest' effectsize(model, type = NULL, verbose = TRUE, ...)
model 
An object of class 
type 
The effect size of interest. See details. 
ci 
Value or vector of probability of the CI (between 0 and 1)
to be estimated. Default to 
test 
The indices of effect existence to compute. Character (vector) or
list with one or more of these options: 
verbose 
Toggle off warnings. 
... 
Arguments passed to or from other methods. See details. 
For an object of class htest
, data is extracted via insight::get_data()
, and passed to the relevant function according to:
A ttest depending on type
: "cohens_d"
(default), "hedges_g"
, or one of "p_superiority"
, "u1"
, "u2"
, "u3"
, "overlap"
.
For a Paired ttest: depending on type
: "rm_rm"
, "rm_av"
, "rm_b"
, "rm_d"
, "rm_z"
.
A Chisquared tests of independence or Fisher's Exact Test, depending on type
: "cramers_v"
(default), "tschuprows_t"
, "phi"
, "cohens_w"
, "pearsons_c"
, "cohens_h"
, "oddsratio"
, "riskratio"
, "arr"
, or "nnt"
.
A Chisquared tests of goodnessoffit, depending on type
: "fei"
(default) "cohens_w"
, "pearsons_c"
A Oneway ANOVA test, depending on type
: "eta"
(default), "omega"
or "epsilon"
squared, "f"
, or "f2"
.
A McNemar test returns Cohen's g.
A Wilcoxon test depending on type
: returns "rank_biserial
" correlation (default) or one of "p_superiority"
, "vda"
, "u2"
, "u3"
, "overlap"
.
A KruskalWallis test depending on type
: "epsilon"
(default) or "eta"
.
A Friedman test returns Kendall's W.
(Where applicable, ci
and alternative
are taken from the htest
if not otherwise provided.)
For an object of class BFBayesFactor
, using bayestestR::describe_posterior()
,
A ttest depending on type
: "cohens_d"
(default) or one of "p_superiority"
, "u1"
, "u2"
, "u3"
, "overlap"
.
A correlation test returns r.
A contingency table test, depending on type
: "cramers_v"
(default), "phi"
, "tschuprows_t"
, "cohens_w"
, "pearsons_c"
, "cohens_h"
, "oddsratio"
, or "riskratio"
, "arr"
, or "nnt"
.
A proportion test returns p.
Objects of class anova
, aov
, aovlist
or afex_aov
, depending on type
: "eta"
(default), "omega"
or "epsilon"
squared, "f"
, or "f2"
.
Other objects are passed to parameters::standardize_parameters()
.
For statistical models it is recommended to directly use the listed functions, for the full range of options they provide.
A data frame with the effect size (depending on input) and and its
CIs (CI_low
and CI_high
).
see
The see
package contains relevant plotting functions. See the plotting vignette in the see
package.
vignette(package = "effectsize")
## Hypothesis Testing ##  data("Music_preferences") Xsq < chisq.test(Music_preferences) effectsize(Xsq) effectsize(Xsq, type = "cohens_w") Tt < t.test(1:10, y = c(7:20), alternative = "less") effectsize(Tt) Tt < t.test( x = c(1.83, 0.50, 1.62, 2.48, 1.68, 1.88, 1.55, 3.06, 1.30), y = c(0.878, 0.647, 0.598, 2.05, 1.06, 1.29, 1.06, 3.14, 1.29), paired = TRUE ) effectsize(Tt, type = "rm_b") Aov < oneway.test(extra ~ group, data = sleep, var.equal = TRUE) effectsize(Aov) effectsize(Aov, type = "omega") Wt < wilcox.test(1:10, 7:20, mu = 3, alternative = "less", exact = FALSE) effectsize(Wt) effectsize(Wt, type = "u2") ## Models and Anova Tables ##  fit < lm(mpg ~ factor(cyl) * wt + hp, data = mtcars) effectsize(fit, method = "basic") anova_table < anova(fit) effectsize(anova_table) effectsize(anova_table, type = "epsilon") ## Bayesian Hypothesis Testing ##  bf_prop < BayesFactor::proportionBF(3, 7, p = 0.3) effectsize(bf_prop) bf_corr < BayesFactor::correlationBF(attitude$rating, attitude$complaints) effectsize(bf_corr) data(RCT_table) bf_xtab < BayesFactor::contingencyTableBF(RCT_table, sampleType = "poisson", fixedMargin = "cols") effectsize(bf_xtab) effectsize(bf_xtab, type = "oddsratio") effectsize(bf_xtab, type = "arr") bf_ttest < BayesFactor::ttestBF(sleep$extra[sleep$group == 1], sleep$extra[sleep$group == 2], paired = TRUE, mu = 1 ) effectsize(bf_ttest)
## Hypothesis Testing ##  data("Music_preferences") Xsq < chisq.test(Music_preferences) effectsize(Xsq) effectsize(Xsq, type = "cohens_w") Tt < t.test(1:10, y = c(7:20), alternative = "less") effectsize(Tt) Tt < t.test( x = c(1.83, 0.50, 1.62, 2.48, 1.68, 1.88, 1.55, 3.06, 1.30), y = c(0.878, 0.647, 0.598, 2.05, 1.06, 1.29, 1.06, 3.14, 1.29), paired = TRUE ) effectsize(Tt, type = "rm_b") Aov < oneway.test(extra ~ group, data = sleep, var.equal = TRUE) effectsize(Aov) effectsize(Aov, type = "omega") Wt < wilcox.test(1:10, 7:20, mu = 3, alternative = "less", exact = FALSE) effectsize(Wt) effectsize(Wt, type = "u2") ## Models and Anova Tables ##  fit < lm(mpg ~ factor(cyl) * wt + hp, data = mtcars) effectsize(fit, method = "basic") anova_table < anova(fit) effectsize(anova_table) effectsize(anova_table, type = "epsilon") ## Bayesian Hypothesis Testing ##  bf_prop < BayesFactor::proportionBF(3, 7, p = 0.3) effectsize(bf_prop) bf_corr < BayesFactor::correlationBF(attitude$rating, attitude$complaints) effectsize(bf_corr) data(RCT_table) bf_xtab < BayesFactor::contingencyTableBF(RCT_table, sampleType = "poisson", fixedMargin = "cols") effectsize(bf_xtab) effectsize(bf_xtab, type = "oddsratio") effectsize(bf_xtab, type = "arr") bf_ttest < BayesFactor::ttestBF(sleep$extra[sleep$group == 1], sleep$extra[sleep$group == 2], paired = TRUE, mu = 1 ) effectsize(bf_ttest)
Perform a Test for Practical Equivalence for indices of effect size.
## S3 method for class 'effectsize_table' equivalence_test( x, range = "default", rule = c("classic", "cet", "bayes"), ... )
## S3 method for class 'effectsize_table' equivalence_test( x, range = "default", rule = c("classic", "cet", "bayes"), ... )
x 
An effect size table, such as returned by 
range 
The range of practical equivalence of an effect. For onesides
CIs, a single value can be proved for the lower / upper bound to test
against (but see more details below). For twosided CIs, a single value is
duplicated to 
rule 
How should acceptance and rejection be decided? See details. 
... 
Arguments passed to or from other methods. 
The CIs used in the equivalence test are the ones in the provided effect size
table. For results equivalent (ha!) to those that can be obtained using the
TOST approach (e.g., Lakens, 2017), appropriate CIs should be extracted using
the function used to make the effect size table (cohens_d
, eta_squared
,
F_to_r
, etc), with alternative = "two.sided"
. See examples.
"classic"
 the classic method:
If the CI is completely within the ROPE  Accept H0
Else, if the CI does not contain 0  Reject H0
Else  Undecided
"cet"
 conditional equivalence testing:
If the CI does not contain 0  Reject H0
Else, If the CI is completely within the ROPE  Accept H0
Else  Undecided
"bayes"
 The Bayesian approach, as put forth by Kruschke:
If the CI does is completely outside the ROPE  Reject H0
Else, If the CI is completely within the ROPE  Accept H0
Else  Undecided
A data frame with the results of the equivalence test.
see
The see
package contains relevant plotting functions. See the plotting vignette in the see
package.
Campbell, H., & Gustafson, P. (2018). Conditional equivalence testing: An alternative remedy for publication bias. PLOS ONE, 13(4), e0195145. doi:10.1371/journal.pone.0195145
Kruschke, J. K. (2014). Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan. Academic Press
Kruschke, J. K. (2018). Rejecting or accepting parameter values in Bayesian estimation. Advances in Methods and Practices in Psychological Science, 1(2), 270280. doi:10.1177/2515245918771304
Lakens, D. (2017). Equivalence Tests: A Practical Primer for t Tests, Correlations, and MetaAnalyses. Social Psychological and Personality Science, 8(4), 355–362. doi:10.1177/1948550617697177
For more details, see bayestestR::equivalence_test()
.
data("hardlyworking") model < aov(salary ~ age + factor(n_comps) * cut(seniority, 3), data = hardlyworking) es < eta_squared(model, ci = 0.9, alternative = "two.sided") equivalence_test(es, range = c(0, 0.15)) # TOST data("RCT_table") OR < oddsratio(RCT_table, alternative = "greater") equivalence_test(OR, range = c(0, 1)) ds < t_to_d( t = c(0.45, 0.65, 7, 2.2, 2.25), df_error = c(675, 525, 2000, 900, 1875), ci = 0.9, alternative = "two.sided" # TOST ) # Can also plot if (require(see)) plot(equivalence_test(ds, range = 0.2)) if (require(see)) plot(equivalence_test(ds, range = 0.2, rule = "cet")) if (require(see)) plot(equivalence_test(ds, range = 0.2, rule = "bayes"))
data("hardlyworking") model < aov(salary ~ age + factor(n_comps) * cut(seniority, 3), data = hardlyworking) es < eta_squared(model, ci = 0.9, alternative = "two.sided") equivalence_test(es, range = c(0, 0.15)) # TOST data("RCT_table") OR < oddsratio(RCT_table, alternative = "greater") equivalence_test(OR, range = c(0, 1)) ds < t_to_d( t = c(0.45, 0.65, 7, 2.2, 2.25), df_error = c(675, 525, 2000, 900, 1875), ci = 0.9, alternative = "two.sided" # TOST ) # Can also plot if (require(see)) plot(equivalence_test(ds, range = 0.2)) if (require(see)) plot(equivalence_test(ds, range = 0.2, rule = "cet")) if (require(see)) plot(equivalence_test(ds, range = 0.2, rule = "bayes"))
$\eta^2$
and Other Effect Size for ANOVAFunctions to compute effect size measures for ANOVAs, such as Eta
($\eta$
), Omega ($\omega$
) and Epsilon ($\epsilon$
) squared,
and Cohen's f (or their partialled versions) for ANOVA tables. These indices
represent an estimate of how much variance in the response variables is
accounted for by the explanatory variable(s).
When passing models, effect sizes are computed using the sums of squares
obtained from anova(model)
which might not always be appropriate. See
details.
eta_squared( model, partial = TRUE, generalized = FALSE, ci = 0.95, alternative = "greater", verbose = TRUE, ... ) omega_squared( model, partial = TRUE, ci = 0.95, alternative = "greater", verbose = TRUE, ... ) epsilon_squared( model, partial = TRUE, ci = 0.95, alternative = "greater", verbose = TRUE, ... ) cohens_f( model, partial = TRUE, generalized = FALSE, squared = FALSE, method = c("eta", "omega", "epsilon"), model2 = NULL, ci = 0.95, alternative = "greater", verbose = TRUE, ... ) cohens_f_squared( model, partial = TRUE, generalized = FALSE, squared = TRUE, method = c("eta", "omega", "epsilon"), model2 = NULL, ci = 0.95, alternative = "greater", verbose = TRUE, ... ) eta_squared_posterior( model, partial = TRUE, generalized = FALSE, ss_function = stats::anova, draws = 500, verbose = TRUE, ... )
eta_squared( model, partial = TRUE, generalized = FALSE, ci = 0.95, alternative = "greater", verbose = TRUE, ... ) omega_squared( model, partial = TRUE, ci = 0.95, alternative = "greater", verbose = TRUE, ... ) epsilon_squared( model, partial = TRUE, ci = 0.95, alternative = "greater", verbose = TRUE, ... ) cohens_f( model, partial = TRUE, generalized = FALSE, squared = FALSE, method = c("eta", "omega", "epsilon"), model2 = NULL, ci = 0.95, alternative = "greater", verbose = TRUE, ... ) cohens_f_squared( model, partial = TRUE, generalized = FALSE, squared = TRUE, method = c("eta", "omega", "epsilon"), model2 = NULL, ci = 0.95, alternative = "greater", verbose = TRUE, ... ) eta_squared_posterior( model, partial = TRUE, generalized = FALSE, ss_function = stats::anova, draws = 500, verbose = TRUE, ... )
model 
An ANOVA table (or an ANOVAlike table, e.g., outputs from

partial 
If 
generalized 
A character vector of observed (nonmanipulated) variables
to be used in the estimation of a generalized Eta Squared. Can also be

ci 
Confidence Interval (CI) level 
alternative 
a character string specifying the alternative hypothesis;
Controls the type of CI returned: 
verbose 
Toggle warnings and messages on or off. 
... 
Arguments passed to or from other methods.

squared 
Return Cohen's f or Cohen's fsquared? 
method 
What effect size should be used as the basis for Cohen's f? 
model2 
Optional second model for Cohen's f (/squared). If specified, returns the effect size for Rsquaredchange between the two models. 
ss_function 
For Bayesian models, the function used to extract
sumofsquares. Uses 
draws 
For Bayesian models, an integer indicating the number of draws from the posterior predictive distribution to return. Larger numbers take longer to run, but provide estimates that are more stable. 
For aov
(or lm
), aovlist
and afex_aov
models, and for anova
objects
that provide SumsofSquares, the effect sizes are computed directly using
SumsofSquares. (For maov
(or mlm
) models, effect sizes are computed for
each response separately.)
For other ANOVA tables and models (converted to ANOVAlike tables via
anova()
methods), effect sizes are approximated via test statistic
conversion of the omnibus F statistic provided by the (see F_to_eta2()
for more details.)
When model
is a statistical model, the sums of squares (or F statistics)
used for the computation of the effect sizes are based on those returned by
anova(model)
. Different models have different default output type. For
example, for aov
and aovlist
these are type1 sums of squares, but for
lmerMod
(and lmerModLmerTest
) these are type3 sums of squares. Make
sure these are the sums of squares you are interested in. You might want to
convert your model to an ANOVA(like) table yourself and then pass the result
to eta_squared()
. See examples below for use of car::Anova()
and the
afex
package.
For type 3 sum of squares, it is generally recommended to fit models with
orthogonal factor weights (e.g., contr.sum
) and centered covariates,
for sensible results. See examples and the afex
package.
Both Omega and Epsilon are unbiased estimators of the
population's Eta, which is especially important is small samples. But
which to choose?
Though Omega is the more popular choice (Albers and Lakens, 2018), Epsilon is
analogous to adjusted R2 (Allen, 2017, p. 382), and has been found to be less
biased (Carroll & Nordholm, 1975).
Cohen's f can take on values between zero, when the population means are all
equal, and an indefinitely large number as standard deviation of means
increases relative to the average standard deviation within each group.
When comparing two models in a sequential regression analysis, Cohen's f for
Rsquare change is the ratio between the increase in Rsquare
and the percent of unexplained variance.
Cohen has suggested that the values of 0.10, 0.25, and 0.40 represent small,
medium, and large effect sizes, respectively.
For Bayesian models (fit with brms
or rstanarm
),
eta_squared_posterior()
simulates data from the posterior predictive
distribution (ppd) and for each simulation the Eta Squared is computed for
the model's fixed effects. This means that the returned values are the
population level effect size as implied by the posterior model (and not the
effect size in the sample data). See rstantools::posterior_predict()
for
more info.
A data frame with the effect size(s) between 01 (Eta2
, Epsilon2
,
Omega2
, Cohens_f
or Cohens_f2
, possibly with the partial
or
generalized
suffix), and their CIs (CI_low
and CI_high
).
For eta_squared_posterior()
, a data frame containing the ppd of the Eta
squared for each fixed effect, which can then be passed to
bayestestR::describe_posterior()
for summary stats.
A data frame containing the effect size values and their confidence intervals.
Unless stated otherwise, confidence (compatibility) intervals (CIs) are
estimated using the noncentrality parameter method (also called the "pivot
method"). This method finds the noncentrality parameter ("ncp") of a
noncentral t, F, or $\chi^2$
distribution that places the observed
t, F, or $\chi^2$
test statistic at the desired probability point of
the distribution. For example, if the observed t statistic is 2.0, with 50
degrees of freedom, for which cumulative noncentral t distribution is t =
2.0 the .025 quantile (answer: the noncentral t distribution with ncp =
.04)? After estimating these confidence bounds on the ncp, they are
converted into the effect size metric to obtain a confidence interval for the
effect size (Steiger, 2004).
For additional details on estimation and troubleshooting, see effectsize_CIs.
"Confidence intervals on measures of effect size convey all the information
in a hypothesis test, and more." (Steiger, 2004). Confidence (compatibility)
intervals and p values are complementary summaries of parameter uncertainty
given the observed data. A dichotomous hypothesis test could be performed
with either a CI or a p value. The 100 (1  $\alpha$
)% confidence
interval contains all of the parameter values for which p > $\alpha$
for the current data and model. For example, a 95% confidence interval
contains all of the values for which p > .05.
Note that a confidence interval including 0 does not indicate that the null
(no effect) is true. Rather, it suggests that the observed data together with
the model and its assumptions combined do not provided clear evidence against
a parameter value of 0 (same as with any other value in the interval), with
the level of this evidence defined by the chosen $\alpha$
level (Rafi &
Greenland, 2020; Schweder & Hjort, 2016; Xie & Singh, 2013). To infer no
effect, additional judgments about what parameter values are "close enough"
to 0 to be negligible are needed ("equivalence testing"; Bauer & Kiesser,
1996).
see
The see
package contains relevant plotting functions. See the plotting vignette in the see
package.
Albers, C., and Lakens, D. (2018). When power analyses based on pilot data are biased: Inaccurate effect size estimators and followup bias. Journal of experimental social psychology, 74, 187195.
Allen, R. (2017). Statistics and Experimental Design for Psychologists: A Model Comparison Approach. World Scientific Publishing Company.
Carroll, R. M., & Nordholm, L. A. (1975). Sampling Characteristics of Kelley's epsilon and Hays' omega. Educational and Psychological Measurement, 35(3), 541554.
Kelley, T. (1935) An unbiased correlation ratio measure. Proceedings of the National Academy of Sciences. 21(9). 554559.
Olejnik, S., & Algina, J. (2003). Generalized eta and omega squared statistics: measures of effect size for some common research designs. Psychological methods, 8(4), 434.
Steiger, J. H. (2004). Beyond the F test: Effect size confidence intervals and tests of close fit in the analysis of variance and contrast analysis. Psychological Methods, 9, 164182.
Other effect sizes for ANOVAs:
rank_epsilon_squared()
data(mtcars) mtcars$am_f < factor(mtcars$am) mtcars$cyl_f < factor(mtcars$cyl) model < aov(mpg ~ am_f * cyl_f, data = mtcars) (eta2 < eta_squared(model)) # More types: eta_squared(model, partial = FALSE) eta_squared(model, generalized = "cyl_f") omega_squared(model) epsilon_squared(model) cohens_f(model) model0 < aov(mpg ~ am_f + cyl_f, data = mtcars) # no interaction cohens_f_squared(model0, model2 = model) ## Interpretation of effect sizes ##  interpret_omega_squared(0.10, rules = "field2013") interpret_eta_squared(0.10, rules = "cohen1992") interpret_epsilon_squared(0.10, rules = "cohen1992") interpret(eta2, rules = "cohen1992") plot(eta2) # Requires the {see} package # Recommended: Type2 or 3 effect sizes + effects coding #  contrasts(mtcars$am_f) < contr.sum contrasts(mtcars$cyl_f) < contr.sum model < aov(mpg ~ am_f * cyl_f, data = mtcars) model_anova < car::Anova(model, type = 3) epsilon_squared(model_anova) # afex takes care of both type3 effects and effects coding: data(obk.long, package = "afex") model < afex::aov_car(value ~ gender + Error(id / (phase * hour)), data = obk.long, observed = "gender" ) omega_squared(model) eta_squared(model, generalized = TRUE) # observed vars are pulled from the afex model. ## Approx. effect sizes for mixed models ##  model < lme4::lmer(mpg ~ am_f * cyl_f + (1  vs), data = mtcars) omega_squared(model) ## Bayesian Models (PPD) ##  fit_bayes < rstanarm::stan_glm( mpg ~ factor(cyl) * wt + qsec, data = mtcars, family = gaussian(), refresh = 0 ) es < eta_squared_posterior(fit_bayes, verbose = FALSE, ss_function = car::Anova, type = 3 ) bayestestR::describe_posterior(es, test = NULL) # compare to: fit_freq < lm(mpg ~ factor(cyl) * wt + qsec, data = mtcars ) aov_table < car::Anova(fit_freq, type = 3) eta_squared(aov_table)
data(mtcars) mtcars$am_f < factor(mtcars$am) mtcars$cyl_f < factor(mtcars$cyl) model < aov(mpg ~ am_f * cyl_f, data = mtcars) (eta2 < eta_squared(model)) # More types: eta_squared(model, partial = FALSE) eta_squared(model, generalized = "cyl_f") omega_squared(model) epsilon_squared(model) cohens_f(model) model0 < aov(mpg ~ am_f + cyl_f, data = mtcars) # no interaction cohens_f_squared(model0, model2 = model) ## Interpretation of effect sizes ##  interpret_omega_squared(0.10, rules = "field2013") interpret_eta_squared(0.10, rules = "cohen1992") interpret_epsilon_squared(0.10, rules = "cohen1992") interpret(eta2, rules = "cohen1992") plot(eta2) # Requires the {see} package # Recommended: Type2 or 3 effect sizes + effects coding #  contrasts(mtcars$am_f) < contr.sum contrasts(mtcars$cyl_f) < contr.sum model < aov(mpg ~ am_f * cyl_f, data = mtcars) model_anova < car::Anova(model, type = 3) epsilon_squared(model_anova) # afex takes care of both type3 effects and effects coding: data(obk.long, package = "afex") model < afex::aov_car(value ~ gender + Error(id / (phase * hour)), data = obk.long, observed = "gender" ) omega_squared(model) eta_squared(model, generalized = TRUE) # observed vars are pulled from the afex model. ## Approx. effect sizes for mixed models ##  model < lme4::lmer(mpg ~ am_f * cyl_f + (1  vs), data = mtcars) omega_squared(model) ## Bayesian Models (PPD) ##  fit_bayes < rstanarm::stan_glm( mpg ~ factor(cyl) * wt + qsec, data = mtcars, family = gaussian(), refresh = 0 ) es < eta_squared_posterior(fit_bayes, verbose = FALSE, ss_function = car::Anova, type = 3 ) bayestestR::describe_posterior(es, test = NULL) # compare to: fit_freq < lm(mpg ~ factor(cyl) * wt + qsec, data = mtcars ) aov_table < car::Anova(fit_freq, type = 3) eta_squared(aov_table)
Convert Between ANOVA Effect Sizes
eta2_to_f2(es) eta2_to_f(es) f2_to_eta2(f2) f_to_eta2(f)
eta2_to_f2(es) eta2_to_f(es) f2_to_eta2(f2) f_to_eta2(f)
es 
Any measure of variance explained such as Eta, Epsilon, Omega, or RSquared, partial or otherwise. See details. 
f , f2

Cohen's f or fsquared. 
Any measure of variance explained can be converted to a corresponding Cohen's
f via:
$f^2 = \frac{\eta^2}{1  \eta^2}$
$\eta^2 = \frac{f^2}{1 + f^2}$
If a partial EtaSquared is used, the resulting Cohen's f is a
partialCohen's f; If a less biased estimate of variance explained is used
(such as Epsilon or OmegaSquared), the resulting Cohen's f is likewise a
less biased estimate of Cohen's f.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd Ed.). New York: Routledge.
Steiger, J. H. (2004). Beyond the F test: Effect size confidence intervals and tests of close fit in the analysis of variance and contrast analysis. Psychological Methods, 9, 164182.
eta_squared()
for more details.
Other convert between effect sizes:
d_to_r()
,
diff_to_cles
,
odds_to_probs()
,
oddsratio_to_riskratio()
,
w_to_fei()
$\eta^2$
and Other ANOVA Effect SizesThese functions are convenience functions to convert F and t test statistics
to partial Eta ($\eta$
), Omega ($\omega$
) Epsilon
($\epsilon$
) squared (an alias for the adjusted Eta squared) and Cohen's
f. These are useful in cases where the various Sum of Squares and Mean
Squares are not easily available or their computation is not straightforward
(e.g., in liner mixed models, contrasts, etc.). For test statistics derived
from lm
and aov
models, these functions give exact results. For all other
cases, they return close approximations.
See Effect Size from Test Statistics vignette.
F_to_eta2(f, df, df_error, ci = 0.95, alternative = "greater", ...) t_to_eta2(t, df_error, ci = 0.95, alternative = "greater", ...) F_to_epsilon2(f, df, df_error, ci = 0.95, alternative = "greater", ...) t_to_epsilon2(t, df_error, ci = 0.95, alternative = "greater", ...) F_to_eta2_adj(f, df, df_error, ci = 0.95, alternative = "greater", ...) t_to_eta2_adj(t, df_error, ci = 0.95, alternative = "greater", ...) F_to_omega2(f, df, df_error, ci = 0.95, alternative = "greater", ...) t_to_omega2(t, df_error, ci = 0.95, alternative = "greater", ...) F_to_f( f, df, df_error, squared = FALSE, ci = 0.95, alternative = "greater", ... ) t_to_f(t, df_error, squared = FALSE, ci = 0.95, alternative = "greater", ...) F_to_f2( f, df, df_error, squared = TRUE, ci = 0.95, alternative = "greater", ... ) t_to_f2(t, df_error, squared = TRUE, ci = 0.95, alternative = "greater", ...)
F_to_eta2(f, df, df_error, ci = 0.95, alternative = "greater", ...) t_to_eta2(t, df_error, ci = 0.95, alternative = "greater", ...) F_to_epsilon2(f, df, df_error, ci = 0.95, alternative = "greater", ...) t_to_epsilon2(t, df_error, ci = 0.95, alternative = "greater", ...) F_to_eta2_adj(f, df, df_error, ci = 0.95, alternative = "greater", ...) t_to_eta2_adj(t, df_error, ci = 0.95, alternative = "greater", ...) F_to_omega2(f, df, df_error, ci = 0.95, alternative = "greater", ...) t_to_omega2(t, df_error, ci = 0.95, alternative = "greater", ...) F_to_f( f, df, df_error, squared = FALSE, ci = 0.95, alternative = "greater", ... ) t_to_f(t, df_error, squared = FALSE, ci = 0.95, alternative = "greater", ...) F_to_f2( f, df, df_error, squared = TRUE, ci = 0.95, alternative = "greater", ... ) t_to_f2(t, df_error, squared = TRUE, ci = 0.95, alternative = "greater", ...)
df , df_error

Degrees of freedom of numerator or of the error estimate (i.e., the residuals). 
ci 
Confidence Interval (CI) level 
alternative 
a character string specifying the alternative hypothesis;
Controls the type of CI returned: 
... 
Arguments passed to or from other methods. 
t , f

The t or the F statistics. 
squared 
Return Cohen's f or Cohen's fsquared? 
These functions use the following formulae:
$\eta_p^2 = \frac{F \times df_{num}}{F \times df_{num} + df_{den}}$
$\epsilon_p^2 = \frac{(F  1) \times df_{num}}{F \times df_{num} + df_{den}}$
$\omega_p^2 = \frac{(F  1) \times df_{num}}{F \times df_{num} + df_{den} + 1}$
$f_p = \sqrt{\frac{\eta_p^2}{1\eta_p^2}}$
For t, the conversion is based on the equality of $t^2 = F$
when $df_{num}=1$
.
Both Omega and Epsilon are unbiased estimators of the population Eta. But which to choose? Though Omega is the more popular choice, it should be noted that:
The formula given above for Omega is only an approximation for complex designs.
Epsilon has been found to be less biased (Carroll & Nordholm, 1975).
A data frame with the effect size(s) between 01 (Eta2_partial
,
Epsilon2_partial
, Omega2_partial
, Cohens_f_partial
or
Cohens_f2_partial
), and their CIs (CI_low
and CI_high
).
Unless stated otherwise, confidence (compatibility) intervals (CIs) are
estimated using the noncentrality parameter method (also called the "pivot
method"). This method finds the noncentrality parameter ("ncp") of a
noncentral t, F, or $\chi^2$
distribution that places the observed
t, F, or $\chi^2$
test statistic at the desired probability point of
the distribution. For example, if the observed t statistic is 2.0, with 50
degrees of freedom, for which cumulative noncentral t distribution is t =
2.0 the .025 quantile (answer: the noncentral t distribution with ncp =
.04)? After estimating these confidence bounds on the ncp, they are
converted into the effect size metric to obtain a confidence interval for the
effect size (Steiger, 2004).
For additional details on estimation and troubleshooting, see effectsize_CIs.
"Confidence intervals on measures of effect size convey all the information
in a hypothesis test, and more." (Steiger, 2004). Confidence (compatibility)
intervals and p values are complementary summaries of parameter uncertainty
given the observed data. A dichotomous hypothesis test could be performed
with either a CI or a p value. The 100 (1  $\alpha$
)% confidence
interval contains all of the parameter values for which p > $\alpha$
for the current data and model. For example, a 95% confidence interval
contains all of the values for which p > .05.
Note that a confidence interval including 0 does not indicate that the null
(no effect) is true. Rather, it suggests that the observed data together with
the model and its assumptions combined do not provided clear evidence against
a parameter value of 0 (same as with any other value in the interval), with
the level of this evidence defined by the chosen $\alpha$
level (Rafi &
Greenland, 2020; Schweder & Hjort, 2016; Xie & Singh, 2013). To infer no
effect, additional judgments about what parameter values are "close enough"
to 0 to be negligible are needed ("equivalence testing"; Bauer & Kiesser,
1996).
see
The see
package contains relevant plotting functions. See the plotting vignette in the see
package.
Adjusted (partial) Etasquared is an alias for (partial) Epsilonsquared.
Albers, C., & Lakens, D. (2018). When power analyses based on pilot data are biased: Inaccurate effect size estimators and followup bias. Journal of experimental social psychology, 74, 187195. doi:10.31234/osf.io/b7z4q
Carroll, R. M., & Nordholm, L. A. (1975). Sampling Characteristics of Kelley's epsilon and Hays' omega. Educational and Psychological Measurement, 35(3), 541554.
Cumming, G., & Finch, S. (2001). A primer on the understanding, use, and calculation of confidence intervals that are based on central and noncentral distributions. Educational and Psychological Measurement, 61(4), 532574.
Friedman, H. (1982). Simplified determinations of statistical power, magnitude of effect and research sample sizes. Educational and Psychological Measurement, 42(2), 521526. doi:10.1177/001316448204200214
Mordkoff, J. T. (2019). A Simple Method for Removing Bias From a Popular Measure of Standardized Effect Size: Adjusted Partial Eta Squared. Advances in Methods and Practices in Psychological Science, 2(3), 228232. doi:10.1177/2515245919855053
Morey, R. D., Hoekstra, R., Rouder, J. N., Lee, M. D., & Wagenmakers, E. J. (2016). The fallacy of placing confidence in confidence intervals. Psychonomic bulletin & review, 23(1), 103123.
Steiger, J. H. (2004). Beyond the F test: Effect size confidence intervals and tests of close fit in the analysis of variance and contrast analysis. Psychological Methods, 9, 164182.
eta_squared()
for more details.
Other effect size from test statistic:
chisq_to_phi()
,
t_to_d()
mod < aov(mpg ~ factor(cyl) * factor(am), mtcars) anova(mod) (etas < F_to_eta2( f = c(44.85, 3.99, 1.38), df = c(2, 1, 2), df_error = 26 )) if (require(see)) plot(etas) # Compare to: eta_squared(mod) fit < lmerTest::lmer(extra ~ group + (1  ID), sleep) # anova(fit) # #> Type III Analysis of Variance Table with Satterthwaite's method # #> Sum Sq Mean Sq NumDF DenDF F value Pr(>F) # #> group 12.482 12.482 1 9 16.501 0.002833 ** # #>  # #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 F_to_eta2(16.501, 1, 9) F_to_omega2(16.501, 1, 9) F_to_epsilon2(16.501, 1, 9) F_to_f(16.501, 1, 9) ## Use with emmeans based contrasts ##  warp.lm < lm(breaks ~ wool * tension, data = warpbreaks) jt < emmeans::joint_tests(warp.lm, by = "wool") F_to_eta2(jt$F.ratio, jt$df1, jt$df2)
mod < aov(mpg ~ factor(cyl) * factor(am), mtcars) anova(mod) (etas < F_to_eta2( f = c(44.85, 3.99, 1.38), df = c(2, 1, 2), df_error = 26 )) if (require(see)) plot(etas) # Compare to: eta_squared(mod) fit < lmerTest::lmer(extra ~ group + (1  ID), sleep) # anova(fit) # #> Type III Analysis of Variance Table with Satterthwaite's method # #> Sum Sq Mean Sq NumDF DenDF F value Pr(>F) # #> group 12.482 12.482 1 9 16.501 0.002833 ** # #>  # #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 F_to_eta2(16.501, 1, 9) F_to_omega2(16.501, 1, 9) F_to_epsilon2(16.501, 1, 9) F_to_f(16.501, 1, 9) ## Use with emmeans based contrasts ##  warp.lm < lm(breaks ~ wool * tension, data = warpbreaks) jt < emmeans::joint_tests(warp.lm, by = "wool") F_to_eta2(jt$F.ratio, jt$df1, jt$df2)
Fictional data.
A 2by3 table.
data("food_class") food_class #> Soy Milk Meat #> Vegan 47 0 0 #> NotVegan 0 12 21
Other effect size datasets:
Music_preferences
,
Music_preferences2
,
RCT_table
,
Smoking_FASD
,
hardlyworking
,
rouder2016
,
screening_test
Transform a standardized vector into character, e.g., c("1 SD", "Mean", "+1 SD")
.
format_standardize( x, reference = x, robust = FALSE, digits = 1, protect_integers = TRUE, ... )
format_standardize( x, reference = x, robust = FALSE, digits = 1, protect_integers = TRUE, ... )
x 
A standardized numeric vector. 
reference 
The reference vector from which to compute the mean and SD. 
robust 
Logical, if 
digits 
Number of digits for rounding or significant figures. May also
be 
protect_integers 
Should integers be kept as integers (i.e., without decimals)? 
... 
Other arguments to pass to 
format_standardize(c(1, 0, 1)) format_standardize(c(1, 0, 1, 2), reference = rnorm(1000)) format_standardize(c(1, 0, 1, 2), reference = rnorm(1000), robust = TRUE) format_standardize(standardize(mtcars$wt), digits = 1) format_standardize(standardize(mtcars$wt, robust = TRUE), digits = 1)
format_standardize(c(1, 0, 1)) format_standardize(c(1, 0, 1, 2), reference = rnorm(1000)) format_standardize(c(1, 0, 1, 2), reference = rnorm(1000), robust = TRUE) format_standardize(standardize(mtcars$wt), digits = 1) format_standardize(standardize(mtcars$wt, robust = TRUE), digits = 1)
A sample (simulated) dataset, used in tests and some examples.
A data frame with 500 rows and 5 variables:
Salary, in Shmekels
Number of overtime hours (on average, per week)
Number of compliments given to the boss (observed over the last week)
Age in years
How many years with the company
Has this person been working here for more than 4 years?
data("hardlyworking") head(hardlyworking, n = 5) #> salary xtra_hours n_comps age seniority is_senior #> 1 19744.65 4.16 1 32 3 FALSE #> 2 11301.95 1.62 0 34 3 FALSE #> 3 20635.62 1.19 3 33 5 TRUE #> 4 23047.16 7.19 1 35 3 FALSE #> 5 27342.15 11.26 0 33 4 FALSE
Other effect size datasets:
Music_preferences
,
Music_preferences2
,
RCT_table
,
Smoking_FASD
,
food_class
,
rouder2016
,
screening_test
Interpret a value based on a set of rules. See rules()
.
interpret(x, ...) ## S3 method for class 'numeric' interpret(x, rules, name = attr(rules, "rule_name"), transform = NULL, ...) ## S3 method for class 'effectsize_table' interpret(x, rules, transform = NULL, ...)
interpret(x, ...) ## S3 method for class 'numeric' interpret(x, rules, name = attr(rules, "rule_name"), transform = NULL, ...) ## S3 method for class 'effectsize_table' interpret(x, rules, transform = NULL, ...)
x 
Vector of value break points (edges defining categories), or a data
frame of class 
... 
Currently not used. 
rules 
Set of 
name 
Name of the set of rules (will be printed). 
transform 
a function (or name of a function) to apply to 
For numeric input: A character vector of interpretations.
For data frames: the x
input with an additional Interpretation
column.
rules_grid < rules(c(0.01, 0.05), c("very significant", "significant", "not significant")) interpret(0.001, rules_grid) interpret(0.021, rules_grid) interpret(0.08, rules_grid) interpret(c(0.01, 0.005, 0.08), rules_grid) interpret(c(0.35, 0.15), c("small" = 0.2, "large" = 0.4), name = "Cohen's Rules") interpret(c(0.35, 0.15), rules(c(0.2, 0.4), c("small", "medium", "large"))) bigness < rules(c(1, 10), c("small", "medium", "big")) interpret(abs(5), bigness) interpret(5, bigness, transform = abs) #  d < cohens_d(mpg ~ am, data = mtcars) interpret(d, rules = "cohen1988") d < glass_delta(mpg ~ am, data = mtcars) interpret(d, rules = "gignac2016") interpret(d, rules = rules(1, c("tiny", "yeah okay"))) m < lm(formula = wt ~ am * cyl, data = mtcars) eta2 < eta_squared(m) interpret(eta2, rules = "field2013") X < chisq.test(mtcars$am, mtcars$cyl == 8) interpret(oddsratio(X), rules = "chen2010") interpret(cramers_v(X), "lovakov2021")
rules_grid < rules(c(0.01, 0.05), c("very significant", "significant", "not significant")) interpret(0.001, rules_grid) interpret(0.021, rules_grid) interpret(0.08, rules_grid) interpret(c(0.01, 0.005, 0.08), rules_grid) interpret(c(0.35, 0.15), c("small" = 0.2, "large" = 0.4), name = "Cohen's Rules") interpret(c(0.35, 0.15), rules(c(0.2, 0.4), c("small", "medium", "large"))) bigness < rules(c(1, 10), c("small", "medium", "big")) interpret(abs(5), bigness) interpret(5, bigness, transform = abs) #  d < cohens_d(mpg ~ am, data = mtcars) interpret(d, rules = "cohen1988") d < glass_delta(mpg ~ am, data = mtcars) interpret(d, rules = "gignac2016") interpret(d, rules = rules(1, c("tiny", "yeah okay"))) m < lm(formula = wt ~ am * cyl, data = mtcars) eta2 < eta_squared(m) interpret(eta2, rules = "field2013") X < chisq.test(mtcars$am, mtcars$cyl == 8) interpret(oddsratio(X), rules = "chen2010") interpret(cramers_v(X), "lovakov2021")
Interpret Bayes Factor (BF)
interpret_bf( bf, rules = "jeffreys1961", log = FALSE, include_value = FALSE, protect_ratio = TRUE, exact = TRUE )
interpret_bf( bf, rules = "jeffreys1961", log = FALSE, include_value = FALSE, protect_ratio = TRUE, exact = TRUE )
bf 
Value or vector of Bayes factor (BF) values. 
rules 
Can be 
log 
Is the 
include_value 
Include the value in the output. 
protect_ratio 
Should values smaller than 1 be represented as ratios? 
exact 
Should very large or very small values be reported with a scientific format (e.g., 4.24e5), or as truncated values (as "> 1000" and "< 1/1000"). 
Argument names can be partially matched.
Rules apply to BF as ratios, so BF of 10 is as extreme as a BF of 0.1 (1/10).
Jeffreys (1961) ("jeffreys1961"
; default)
BF = 1  No evidence
1 < BF <= 3  Anecdotal
3 < BF <= 10  Moderate
10 < BF <= 30  Strong
30 < BF <= 100  Very strong
BF > 100  Extreme.
Raftery (1995) ("raftery1995"
)
BF = 1  No evidence
1 < BF <= 3  Weak
3 < BF <= 20  Positive
20 < BF <= 150  Strong
BF > 150  Very strong
Jeffreys, H. (1961), Theory of Probability, 3rd ed., Oxford University Press, Oxford.
Raftery, A. E. (1995). Bayesian model selection in social research. Sociological methodology, 25, 111164.
Jarosz, A. F., & Wiley, J. (2014). What are the odds? A practical guide to computing and reporting Bayes factors. The Journal of Problem Solving, 7(1), 2.
interpret_bf(1) interpret_bf(c(5, 2, 0.01))
interpret_bf(1) interpret_bf(c(5, 2, 0.01))
Interpretation of standardized differences using different sets of rules of thumb.
interpret_cohens_d(d, rules = "cohen1988", ...) interpret_hedges_g(g, rules = "cohen1988") interpret_glass_delta(delta, rules = "cohen1988")
interpret_cohens_d(d, rules = "cohen1988", ...) interpret_hedges_g(g, rules = "cohen1988") interpret_glass_delta(delta, rules = "cohen1988")
d , g , delta

Value or vector of effect size values. 
rules 
Can be 
... 
Not directly used. 
Rules apply to equally to positive and negative d (i.e., they are given as absolute values).
Cohen (1988) ("cohen1988"
; default)
d < 0.2  Very small
0.2 <= d < 0.5  Small
0.5 <= d < 0.8  Medium
d >= 0.8  Large
Sawilowsky (2009) ("sawilowsky2009"
)
d < 0.1  Tiny
0.1 <= d < 0.2  Very small
0.2 <= d < 0.5  Small
0.5 <= d < 0.8  Medium
0.8 <= d < 1.2  Large
1.2 <= d < 2  Very large
d >= 2  Huge
Lovakov & Agadullina (2021) ("lovakov2021"
)
d < 0.15  Very small
0.15 <= d < 0.36  Small
0.36 <= d < 0.65  Medium
d >= 0.65  Large
Gignac & Szodorai (2016) ("gignac2016"
, based on the d_to_r()
conversion, see interpret_r()
)
d < 0.2  Very small
0.2 <= d < 0.41  Small
0.41 <= d < 0.63  Moderate
d >= 0.63  Large
Lovakov, A., & Agadullina, E. R. (2021). Empirically Derived Guidelines for Effect Size Interpretation in Social Psychology. European Journal of Social Psychology.
Gignac, G. E., & Szodorai, E. T. (2016). Effect size guidelines for individual differences researchers. Personality and individual differences, 102, 7478.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd Ed.). New York: Routledge.
Sawilowsky, S. S. (2009). New effect size rules of thumb.
interpret_cohens_d(.02) interpret_cohens_d(c(.5, .02)) interpret_cohens_d(.3, rules = "lovakov2021")
interpret_cohens_d(.02) interpret_cohens_d(c(.5, .02)) interpret_cohens_d(.3, rules = "lovakov2021")
Interpret Cohen's g
interpret_cohens_g(g, rules = "cohen1988", ...)
interpret_cohens_g(g, rules = "cohen1988", ...)
g 
Value or vector of effect size values. 
rules 
Can be 
... 
Not directly used. 
Rules apply to equally to positive and negative g (i.e., they are given as absolute values).
Cohen (1988) ("cohen1988"
; default)
d < 0.05  Very small
0.05 <= d < 0.15  Small
0.15 <= d < 0.25  Medium
d >= 0.25  Large
"Since g is so transparently clear a unit, it is expected that workers in any given substantive area of the behavioral sciences will very frequently be able to set relevant [effect size] values without the proposed conventions, or set up conventions of their own which are suited to their area of inquiry."  Cohen, 1988, page 147.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd Ed.). New York: Routledge.
interpret_cohens_g(.02) interpret_cohens_g(c(.3, .15))
interpret_cohens_g(.02) interpret_cohens_g(c(.3, .15))
Interpret Direction
interpret_direction(x)
interpret_direction(x)
x 
Numeric value. 
interpret_direction(.02) interpret_direction(c(.5, .02))
interpret_direction(.02) interpret_direction(c(.5, .02))
Interpretation of Bayesian diagnostic indices, such as Effective Sample Size (ESS) and Rhat.
interpret_ess(ess, rules = "burkner2017") interpret_rhat(rhat, rules = "vehtari2019")
interpret_ess(ess, rules = "burkner2017") interpret_rhat(rhat, rules = "vehtari2019")
ess 
Value or vector of Effective Sample Size (ESS) values. 
rules 
A character string (see Rules) or a custom set of 
rhat 
Value or vector of Rhat values. 
Bürkner, P. C. (2017) ("burkner2017"
; default)
ESS < 1000  Insufficient
ESS >= 1000  Sufficient
Vehtari et al. (2019) ("vehtari2019"
; default)
Rhat < 1.01  Converged
Rhat >= 1.01  Failed
Gelman & Rubin (1992) ("gelman1992"
)
Rhat < 1.1  Converged
Rhat >= 1.1  Failed
Bürkner, P. C. (2017). brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software, 80(1), 128.
Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical science, 7(4), 457472.
Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., & Bürkner, P. C. (2019). Ranknormalization, folding, and localization: An improved Rhat for assessing convergence of MCMC. arXiv preprint arXiv:1903.08008.
interpret_ess(1001) interpret_ess(c(852, 1200)) interpret_rhat(1.00) interpret_rhat(c(1.5, 0.9))
interpret_ess(1001) interpret_ess(c(852, 1200)) interpret_rhat(1.00) interpret_rhat(c(1.5, 0.9))
Interpretation of indices of fit found in confirmatory analysis or structural equation modelling, such as RMSEA, CFI, NFI, IFI, etc.
interpret_gfi(x, rules = "byrne1994") interpret_agfi(x, rules = "byrne1994") interpret_nfi(x, rules = "byrne1994") interpret_nnfi(x, rules = "byrne1994") interpret_cfi(x, rules = "byrne1994") interpret_rfi(x, rules = "default") interpret_ifi(x, rules = "default") interpret_pnfi(x, rules = "default") interpret_rmsea(x, rules = "byrne1994") interpret_srmr(x, rules = "byrne1994") ## S3 method for class 'lavaan' interpret(x, ...) ## S3 method for class 'performance_lavaan' interpret(x, ...)
interpret_gfi(x, rules = "byrne1994") interpret_agfi(x, rules = "byrne1994") interpret_nfi(x, rules = "byrne1994") interpret_nnfi(x, rules = "byrne1994") interpret_cfi(x, rules = "byrne1994") interpret_rfi(x, rules = "default") interpret_ifi(x, rules = "default") interpret_pnfi(x, rules = "default") interpret_rmsea(x, rules = "byrne1994") interpret_srmr(x, rules = "byrne1994") ## S3 method for class 'lavaan' interpret(x, ...) ## S3 method for class 'performance_lavaan' interpret(x, ...)
x 
vector of values, or an object of class 
rules 
Can be the name of a set of rules (see below) or custom set of

... 
Currently not used. 
Chisq: The model Chisquared assesses overall fit and the discrepancy between the sample and fitted covariance matrices. Its pvalue should be > .05 (i.e., the hypothesis of a perfect fit cannot be rejected). However, it is quite sensitive to sample size.
GFI/AGFI: The (Adjusted) Goodness of Fit is the proportion of variance
accounted for by the estimated population covariance. Analogous to R2. The
GFI and the AGFI should be > .95 and > .90, respectively (Byrne, 1994;
"byrne1994"
).
NFI/NNFI/TLI: The (Non) Normed Fit Index. An NFI of 0.95, indicates the
model of interest improves the fit by 95\
NNFI (also called the Tucker Lewis index; TLI) is preferable for smaller
samples. They should be > .90 (Byrne, 1994; "byrne1994"
) or > .95
(Schumacker & Lomax, 2004; "schumacker2004"
).
CFI: The Comparative Fit Index is a revised form of NFI. Not very
sensitive to sample size (Fan, Thompson, & Wang, 1999). Compares the fit of a
target model to the fit of an independent, or null, model. It should be > .96
(Hu & Bentler, 1999; "hu&bentler1999"
) or .90 (Byrne, 1994; "byrne1994"
).
RFI: the Relative Fit Index, also known as RHO1, is not guaranteed to vary from 0 to 1. However, RFI close to 1 indicates a good fit.
IFI: the Incremental Fit Index (IFI) adjusts the Normed Fit Index (NFI) for sample size and degrees of freedom (Bollen's, 1989). Over 0.90 is a good fit, but the index can exceed 1.
PNFI: the ParsimonyAdjusted Measures Index. There is no commonly agreedupon cutoff value for an acceptable model for this index. Should be > 0.50.
RMSEA: The Root Mean Square Error of Approximation is a
parsimonyadjusted index. Values closer to 0 represent a good fit. It should
be < .08 (Awang, 2012; "awang2012"
) or < .05 (Byrne, 1994; "byrne1994"
).
The pvalue printed with it tests the hypothesis that RMSEA is less than or
equal to .05 (a cutoff sometimes used for good fit), and thus should be not
significant.
RMR/SRMR: the (Standardized) Root Mean Square Residual represents the
squareroot of the difference between the residuals of the sample covariance
matrix and the hypothesized model. As the RMR can be sometimes hard to
interpret, better to use SRMR. Should be < .08 (Byrne, 1994; "byrne1994"
).
See the documentation for fitmeasures()
.
For structural equation models (SEM), Kline (2015) suggests that at a minimum the following indices should be reported: The model chisquare, the RMSEA, the CFI and the SRMR.
When possible, it is recommended to report dynamic cutoffs of fit indices. See https://dynamicfit.app/cfa/.
Awang, Z. (2012). A handbook on SEM. Structural equation modeling.
Byrne, B. M. (1994). Structural equation modeling with EQS and EQS/Windows. Thousand Oaks, CA: Sage Publications.
Fan, X., B. Thompson, and L. Wang (1999). Effects of sample size, estimation method, and model specification on structural equation modeling fit indexes. Structural Equation Modeling, 6, 5683.
Hu, L. T., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural equation modeling: a multidisciplinary journal, 6(1), 155.
Kline, R. B. (2015). Principles and practice of structural equation modeling. Guilford publications.
Schumacker, R. E., and Lomax, R. G. (2004). A beginner's guide to structural equation modeling, Second edition. Mahwah, NJ: Lawrence Erlbaum Associates.
Tucker, L. R., and Lewis, C. (1973). The reliability coefficient for maximum likelihood factor analysis. Psychometrika, 38, 110.
interpret_gfi(c(.5, .99)) interpret_agfi(c(.5, .99)) interpret_nfi(c(.5, .99)) interpret_nnfi(c(.5, .99)) interpret_cfi(c(.5, .99)) interpret_rmsea(c(.07, .04)) interpret_srmr(c(.5, .99)) interpret_rfi(c(.5, .99)) interpret_ifi(c(.5, .99)) interpret_pnfi(c(.5, .99)) # Structural Equation Models (SEM) structure < " ind60 =~ x1 + x2 + x3 dem60 =~ y1 + y2 + y3 dem60 ~ ind60 " model < lavaan::sem(structure, data = lavaan::PoliticalDemocracy) interpret(model)
interpret_gfi(c(.5, .99)) interpret_agfi(c(.5, .99)) interpret_nfi(c(.5, .99)) interpret_nnfi(c(.5, .99)) interpret_cfi(c(.5, .99)) interpret_rmsea(c(.07, .04)) interpret_srmr(c(.5, .99)) interpret_rfi(c(.5, .99)) interpret_ifi(c(.5, .99)) interpret_pnfi(c(.5, .99)) # Structural Equation Models (SEM) structure < " ind60 =~ x1 + x2 + x3 dem60 =~ y1 + y2 + y3 dem60 ~ ind60 " model < lavaan::sem(structure, data = lavaan::PoliticalDemocracy) interpret(model)
The value of an ICC lies between 0 to 1, with 0 indicating no reliability among raters and 1 indicating perfect reliability.
interpret_icc(icc, rules = "koo2016", ...)
interpret_icc(icc, rules = "koo2016", ...)
icc 
Value or vector of Intraclass Correlation Coefficient (ICC) values. 
rules 
Can be 
... 
Not used for now. 
Koo (2016) ("koo2016"
; default)
ICC < 0.50  Poor reliability
0.5 <= ICC < 0.75  Moderate reliability
0.75 <= ICC < 0.9  Good reliability
**ICC >= 0.9 **  Excellent reliability
Koo, T. K., and Li, M. Y. (2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine, 15(2), 155163.
interpret_icc(0.6) interpret_icc(c(0.4, 0.8))
interpret_icc(0.6) interpret_icc(c(0.4, 0.8))
Interpret Kendall's Coefficient of Concordance W
interpret_kendalls_w(w, rules = "landis1977")
interpret_kendalls_w(w, rules = "landis1977")
w 
Value or vector of Kendall's coefficient of concordance. 
rules 
Can be 
Landis & Koch (1977) ("landis1977"
; default)
0.00 <= w < 0.20  Slight agreement
0.20 <= w < 0.40  Fair agreement
0.40 <= w < 0.60  Moderate agreement
0.60 <= w < 0.80  Substantial agreement
w >= 0.80  Almost perfect agreement
Landis, J. R., & Koch G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33:15974.
Interpret Odds Ratio
interpret_oddsratio(OR, rules = "chen2010", log = FALSE, ...)
interpret_oddsratio(OR, rules = "chen2010", log = FALSE, ...)
OR 
Value or vector of (log) odds ratio values. 
rules 
Can be " 
log 
Are the provided values log odds ratio. 
... 
Currently not used. 
Rules apply to OR as ratios, so OR of 10 is as extreme as a OR of 0.1 (1/10).
Chen et al. (2010) ("chen2010"
; default)
OR < 1.68  Very small
1.68 <= OR < 3.47  Small
3.47 <= OR < 6.71  Medium
**OR >= 6.71 **  Large
Cohen (1988) ("cohen1988"
, based on the oddsratio_to_d()
conversion, see interpret_cohens_d()
)
OR < 1.44  Very small
1.44 <= OR < 2.48  Small
2.48 <= OR < 4.27  Medium
**OR >= 4.27 **  Large
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd Ed.). New York: Routledge.
Chen, H., Cohen, P., & Chen, S. (2010). How big is a big odds ratio? Interpreting the magnitudes of odds ratios in epidemiological studies. Communications in StatisticsSimulation and Computation, 39(4), 860864.
SánchezMeca, J., MarínMartínez, F., & ChacónMoscoso, S. (2003). Effectsize indices for dichotomized outcomes in metaanalysis. Psychological methods, 8(4), 448.
interpret_oddsratio(1) interpret_oddsratio(c(5, 2))
interpret_oddsratio(1) interpret_oddsratio(c(5, 2))
Interpret ANOVA Effect Sizes
interpret_omega_squared(es, rules = "field2013", ...) interpret_eta_squared(es, rules = "field2013", ...) interpret_epsilon_squared(es, rules = "field2013", ...) interpret_r2_semipartial(es, rules = "field2013", ...)
interpret_omega_squared(es, rules = "field2013", ...) interpret_eta_squared(es, rules = "field2013", ...) interpret_epsilon_squared(es, rules = "field2013", ...) interpret_r2_semipartial(es, rules = "field2013", ...)
es 
Value or vector of (partial) eta / omega / epsilon squared or semipartial r squared values. 
rules 
Can be 
... 
Not used for now. 
Field (2013) ("field2013"
; default)
ES < 0.01  Very small
0.01 <= ES < 0.06  Small
0.06 <= ES < 0.14  Medium
**ES >= 0.14 **  Large
Cohen (1992) ("cohen1992"
) applicable to oneway anova, or to partial
eta / omega / epsilon squared in multiway anova.
ES < 0.02  Very small
0.02 <= ES < 0.13  Small
0.13 <= ES < 0.26  Medium
ES >= 0.26  Large
Field, A (2013) Discovering statistics using IBM SPSS Statistics. Fourth Edition. Sage:London.
Cohen, J. (1992). A power primer. Psychological bulletin, 112(1), 155.
https://imaging.mrccbu.cam.ac.uk/statswiki/FAQ/effectSize/
interpret_eta_squared(.02) interpret_eta_squared(c(.5, .02), rules = "cohen1992")
interpret_eta_squared(.02) interpret_eta_squared(c(.5, .02), rules = "cohen1992")
Interpret pValues
interpret_p(p, rules = "default")
interpret_p(p, rules = "default")
p 
Value or vector of pvalues. 
rules 
Can be 
Default
p >= 0.05  Not significant
p < 0.05  Significant
Benjamin et al. (2018) ("rss"
)
p >= 0.05  Not significant
0.005 <= p < 0.05  Suggestive
p < 0.005  Significant
Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E. J., Berk, R., ... & Cesarini, D. (2018). Redefine statistical significance. Nature Human Behaviour, 2(1), 610.
interpret_p(c(.5, .02, 0.001)) interpret_p(c(.5, .02, 0.001), rules = "rss") stars < rules(c(0.001, 0.01, 0.05, 0.1), c("***", "**", "*", "+", ""), right = FALSE, name = "stars" ) interpret_p(c(.5, .02, 0.001), rules = stars)
interpret_p(c(.5, .02, 0.001)) interpret_p(c(.5, .02, 0.001), rules = "rss") stars < rules(c(0.001, 0.01, 0.05, 0.1), c("***", "**", "*", "+", ""), right = FALSE, name = "stars" ) interpret_p(c(.5, .02, 0.001), rules = stars)
Interpret Probability of Direction (pd)
interpret_pd(pd, rules = "default", ...)
interpret_pd(pd, rules = "default", ...)
pd 
Value or vector of probabilities of direction. 
rules 
Can be 
... 
Not directly used. 
Default (i.e., equivalent to pvalues)
pd <= 0.975  not significant
pd > 0.975  significant
Makowski et al. (2019) ("makowski2019"
)
pd <= 0.95  uncertain
pd > 0.95  possibly existing
pd > 0.97  likely existing
pd > 0.99  probably existing
pd > 0.999  certainly existing
Makowski, D., BenShachar, M. S., Chen, S. H., and Lüdecke, D. (2019). Indices of effect existence and significance in the Bayesian framework. Frontiers in psychology, 10, 2767.
interpret_pd(.98) interpret_pd(c(.96, .99), rules = "makowski2019")
interpret_pd(.98) interpret_pd(c(.96, .99), rules = "makowski2019")
Interpret Correlation Coefficient
interpret_r(r, rules = "funder2019", ...) interpret_phi(r, rules = "funder2019", ...) interpret_cramers_v(r, rules = "funder2019", ...) interpret_rank_biserial(r, rules = "funder2019", ...) interpret_fei(r, rules = "funder2019", ...)
interpret_r(r, rules = "funder2019", ...) interpret_phi(r, rules = "funder2019", ...) interpret_cramers_v(r, rules = "funder2019", ...) interpret_rank_biserial(r, rules = "funder2019", ...) interpret_fei(r, rules = "funder2019", ...)
r 
Value or vector of correlation coefficient. 
rules 
Can be 
... 
Not directly used. 
Since Cohen's w does not have a fixed upper bound, for all by the most
simple of cases (2by2 or 1by2 tables), interpreting Cohen's w as a
correlation coefficient is inappropriate (BenShachar, et al., 2024; Cohen,
1988, p. 222). Please us cramers_v()
of the like instead.
Rules apply to positive and negative r alike.
Funder & Ozer (2019) ("funder2019"
; default)
r < 0.05  Tiny
0.05 <= r < 0.1  Very small
0.1 <= r < 0.2  Small
0.2 <= r < 0.3  Medium
0.3 <= r < 0.4  Large
r >= 0.4  Very large
Gignac & Szodorai (2016) ("gignac2016"
)
r < 0.1  Very small
0.1 <= r < 0.2  Small
0.2 <= r < 0.3  Moderate
r >= 0.3  Large
Cohen (1988) ("cohen1988"
)
r < 0.1  Very small
0.1 <= r < 0.3  Small
0.3 <= r < 0.5  Moderate
r >= 0.5  Large
Lovakov & Agadullina (2021) ("lovakov2021"
)
r < 0.12  Very small
0.12 <= r < 0.24  Small
0.24 <= r < 0.41  Moderate
r >= 0.41  Large
Evans (1996) ("evans1996"
)
r < 0.2  Very weak
0.2 <= r < 0.4  Weak
0.4 <= r < 0.6  Moderate
0.6 <= r < 0.8  Strong
r >= 0.8  Very strong
As $\phi$
can be larger than 1  it is recommended to compute
and interpret Cramer's V instead.
Lovakov, A., & Agadullina, E. R. (2021). Empirically Derived Guidelines for Effect Size Interpretation in Social Psychology. European Journal of Social Psychology.
Funder, D. C., & Ozer, D. J. (2019). Evaluating effect size in psychological research: sense and nonsense. Advances in Methods and Practices in Psychological Science.
Gignac, G. E., & Szodorai, E. T. (2016). Effect size guidelines for individual differences researchers. Personality and individual differences, 102, 7478.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd Ed.). New York: Routledge.
Evans, J. D. (1996). Straightforward statistics for the behavioral sciences. Thomson Brooks/Cole Publishing Co.
BenShachar, M.S., Patil, I., Thériault, R., Wiernik, B.M., Lüdecke, D. (2023). Phi, Fei, Fo, Fum: Effect Sizes for Categorical Data That Use the Chi‑Squared Statistic. Mathematics, 11, 1982. doi:10.3390/math11091982
Page 88 of APA's 6th Edition.
interpret_r(.015) interpret_r(c(.5, .02)) interpret_r(.3, rules = "lovakov2021")
interpret_r(.015) interpret_r(c(.5, .02)) interpret_r(.3, rules = "lovakov2021")
$R^2$
)Interpret Coefficient of Determination ($R^2$
)
interpret_r2(r2, rules = "cohen1988")
interpret_r2(r2, rules = "cohen1988")
r2 
Value or vector of 
rules 
Can be 
Cohen (1988) ("cohen1988"
; default)
R2 < 0.02  Very weak
0.02 <= R2 < 0.13  Weak
0.13 <= R2 < 0.26  Moderate
R2 >= 0.26  Substantial
Falk & Miller (1992) ("falk1992"
)
R2 < 0.1  Negligible
R2 >= 0.1  Adequate
Chin, W. W. (1998) ("chin1998"
)
R2 < 0.19  Very weak
0.19 <= R2 < 0.33  Weak
0.33 <= R2 < 0.67  Moderate
R2 >= 0.67  Substantial
Hair et al. (2011) ("hair2011"
)
R2 < 0.25  Very weak
0.25 <= R2 < 0.50  Weak
0.50 <= R2 < 0.75  Moderate
R2 >= 0.75  Substantial
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd Ed.). New York: Routledge.
Falk, R. F., & Miller, N. B. (1992). A primer for soft modeling. University of Akron Press.
Chin, W. W. (1998). The partial least squares approach to structural equation modeling. Modern methods for business research, 295(2), 295336.
Hair, J. F., Ringle, C. M., & Sarstedt, M. (2011). PLSSEM: Indeed a silver bullet. Journal of Marketing theory and Practice, 19(2), 139152.
interpret_r2(.02) interpret_r2(c(.5, .02))
interpret_r2(.02) interpret_r2(c(.5, .02))
Interpretation of
interpret_rope(rope, ci = 0.9, rules = "default")
interpret_rope(rope, ci = 0.9, rules = "default")
rope 
Value or vector of percentages in ROPE. 
ci 
The Credible Interval (CI) probability, corresponding to the proportion of HDI, that was used. Can be 
rules 
A character string (see details) or a custom set of 
Default
For CI < 1
Rope = 0  Significant
0 < Rope < 1  Undecided
Rope = 1  Negligible
For CI = 1
Rope < 0.01  Significant
0.01 < Rope < 0.025  Probably significant
0.025 < Rope < 0.975  Undecided
0.975 < Rope < 0.99  Probably negligible
Rope > 0.99  Negligible
BayestestR's reporting guidelines
interpret_rope(0, ci = 0.9) interpret_rope(c(0.005, 0.99), ci = 1)
interpret_rope(0, ci = 0.9) interpret_rope(c(0.005, 0.99), ci = 1)
Interpret VIF index of multicollinearity.
interpret_vif(vif, rules = "default")
interpret_vif(vif, rules = "default")
vif 
Value or vector of VIFs. 
rules 
Can be 
Default
VIF < 5  Low
5 <= VIF < 10  Moderate
VIF >= 10  High
interpret_vif(c(1.4, 30.4))
interpret_vif(c(1.4, 30.4))
For use by other functions and packages.
is_effectsize_name(x, ignore_case = TRUE) get_effectsize_name(x, ignore_case = TRUE) get_effectsize_label( x, ignore_case = TRUE, use_symbols = getOption("es.use_symbols", FALSE) )
is_effectsize_name(x, ignore_case = TRUE) get_effectsize_name(x, ignore_case = TRUE) get_effectsize_label( x, ignore_case = TRUE, use_symbols = getOption("es.use_symbols", FALSE) )
x 
A character, or a vector. 
ignore_case 
Should case of input be ignored? 
use_symbols 
Should proper symbols be printed ( 
Compute effect size indices for standardized difference between two normal
multivariate distributions or between one multivariate distribution and a
defined point. This is the standardized effect size for Hotelling's $T^2$
test (e.g., DescTools::HotellingsT2Test()
). D is computed as:
$D = \sqrt{(\bar{X}_1\bar{X}_2\mu)^T \Sigma_p^{1} (\bar{X}_1\bar{X}_2\mu)}$
Where $\bar{X}_i$
are the column means, $\Sigma_p$
is the pooled
covariance matrix, and $\mu$
is a vector of the null differences for each
variable. When there is only one variate, this formula reduces to Cohen's
d.
mahalanobis_d( x, y = NULL, data = NULL, pooled_cov = TRUE, mu = 0, ci = 0.95, alternative = "greater", verbose = TRUE, ... )
mahalanobis_d( x, y = NULL, data = NULL, pooled_cov = TRUE, mu = 0, ci = 0.95, alternative = "greater", verbose = TRUE, ... )
x , y

A data frame or matrix. Any incomplete observations (with 
data 
An optional data frame containing the variables. 
pooled_cov 
Should equal covariance be assumed? Currently only

mu 
A named list/vector of the true difference in means for each variable. Can also be a vector of length 1, which will be recycled. 
ci 
Confidence Interval (CI) level 
alternative 
a character string specifying the alternative hypothesis;
Controls the type of CI returned: 
verbose 
Toggle warnings and messages on or off. 
... 
Not used. 
To specify a x
as a formula:
Two sample case: DV1 + DV2 ~ group
or cbind(DV1, DV2) ~ group
One sample case: DV1 + DV2 ~ 1
or cbind(DV1, DV2) ~ 1
A data frame with the Mahalanobis_D
and potentially its CI
(CI_low
and CI_high
).
Unless stated otherwise, confidence (compatibility) intervals (CIs) are
estimated using the noncentrality parameter method (also called the "pivot
method"). This method finds the noncentrality parameter ("ncp") of a
noncentral t, F, or $\chi^2$
distribution that places the observed
t, F, or $\chi^2$
test statistic at the desired probability point of
the distribution. For example, if the observed t statistic is 2.0, with 50
degrees of freedom, for which cumulative noncentral t distribution is t =
2.0 the .025 quantile (answer: the noncentral t distribution with ncp =
.04)? After estimating these confidence bounds on the ncp, they are
converted into the effect size metric to obtain a confidence interval for the
effect size (Steiger, 2004).
For additional details on estimation and troubleshooting, see effectsize_CIs.
"Confidence intervals on measures of effect size convey all the information
in a hypothesis test, and more." (Steiger, 2004). Confidence (compatibility)
intervals and p values are complementary summaries of parameter uncertainty
given the observed data. A dichotomous hypothesis test could be performed
with either a CI or a p value. The 100 (1  $\alpha$
)% confidence
interval contains all of the parameter values for which p > $\alpha$
for the current data and model. For example, a 95% confidence interval
contains all of the values for which p > .05.
Note that a confidence interval including 0 does not indicate that the null
(no effect) is true. Rather, it suggests that the observed data together with
the model and its assumptions combined do not provided clear evidence against
a parameter value of 0 (same as with any other value in the interval), with
the level of this evidence defined by the chosen $\alpha$
level (Rafi &
Greenland, 2020; Schweder & Hjort, 2016; Xie & Singh, 2013). To infer no
effect, additional judgments about what parameter values are "close enough"
to 0 to be negligible are needed ("equivalence testing"; Bauer & Kiesser,
1996).
see
The see
package contains relevant plotting functions. See the plotting vignette in the see
package.
Del Giudice, M. (2017). Heterogeneity coefficients for Mahalanobis' D as a multivariate effect size. Multivariate Behavioral Research, 52(2), 216221.
Mahalanobis, P. C. (1936). On the generalized distance in statistics. National Institute of Science of India.
Reiser, B. (2001). Confidence intervals for the Mahalanobis distance. Communications in StatisticsSimulation and Computation, 30(1), 3745.
stats::mahalanobis()
, cov_pooled()
Other standardized differences:
cohens_d()
,
means_ratio()
,
p_superiority()
,
rank_biserial()
,
repeated_measures_d()
## Two samples  mtcars_am0 < subset(mtcars, am == 0, select = c(mpg, hp, cyl) ) mtcars_am1 < subset(mtcars, am == 1, select = c(mpg, hp, cyl) ) mahalanobis_d(mtcars_am0, mtcars_am1) # Or mahalanobis_d(mpg + hp + cyl ~ am, data = mtcars) mahalanobis_d(mpg + hp + cyl ~ am, data = mtcars, alternative = "two.sided") # Different mu: mahalanobis_d(mpg + hp + cyl ~ am, data = mtcars, mu = c(mpg = 4, hp = 15, cyl = 0) ) # D is a multivariate d, so when only 1 variate is provided: mahalanobis_d(hp ~ am, data = mtcars) cohens_d(hp ~ am, data = mtcars) # One sample  mahalanobis_d(mtcars[, c("mpg", "hp", "cyl")]) # Or mahalanobis_d(mpg + hp + cyl ~ 1, data = mtcars, mu = c(mpg = 15, hp = 5, cyl = 3) )
## Two samples  mtcars_am0 < subset(mtcars, am == 0, select = c(mpg, hp, cyl) ) mtcars_am1 < subset(mtcars, am == 1, select = c(mpg, hp, cyl) ) mahalanobis_d(mtcars_am0, mtcars_am1) # Or mahalanobis_d(mpg + hp + cyl ~ am, data = mtcars) mahalanobis_d(mpg + hp + cyl ~ am, data = mtcars, alternative = "two.sided") # Different mu: mahalanobis_d(mpg + hp + cyl ~ am, data = mtcars, mu = c(mpg = 4, hp = 15, cyl = 0) ) # D is a multivariate d, so when only 1 variate is provided: mahalanobis_d(hp ~ am, data = mtcars) cohens_d(hp ~ am, data = mtcars) # One sample  mahalanobis_d(mtcars[, c("mpg", "hp", "cyl")]) # Or mahalanobis_d(mpg + hp + cyl ~ 1, data = mtcars, mu = c(mpg = 15, hp = 5, cyl = 3) )
Computes the ratio of two means (also known as the "response ratio"; RR) of
variables on a ratio scale (with an absolute 0). Pair with any reported
stats::t.test()
.
means_ratio( x, y = NULL, data = NULL, paired = FALSE, adjust = TRUE, log = FALSE, ci = 0.95, alternative = "two.sided", verbose = TRUE, ... )
means_ratio( x, y = NULL, data = NULL, paired = FALSE, adjust = TRUE, log = FALSE, ci = 0.95, alternative = "two.sided", verbose = TRUE, ... )
x , y

A numeric vector, or a character name of one in 
data 
An optional data frame containing the variables. 
paired 
If 
adjust 
Should the effect size be adjusted for smallsample bias?
Defaults to 
log 
Should the logratio be returned? Defaults to 
ci 
Confidence Interval (CI) level 
alternative 
a character string specifying the alternative hypothesis;
Controls the type of CI returned: 
verbose 
Toggle warnings and messages on or off. 
... 
Arguments passed to or from other methods. When 
The Means Ratio ranges from 0 to $\infty$
, with values smaller than 1
indicating that the second mean is larger than the first, values larger than
1 indicating that the second mean is smaller than the first, and values of 1
indicating that the means are equal.
A data frame with the effect size (Means_ratio
or
Means_ratio_adjusted
) and their CIs (CI_low
and CI_high
).
Confidence intervals are estimated as described by Lajeunesse (2011 & 2015) using the logratio standard error assuming a normal distribution. By this method, the log is taken of the ratio of means, which makes this outcome measure symmetric around 0 and yields a corresponding sampling distribution that is closer to normality.
"Confidence intervals on measures of effect size convey all the information
in a hypothesis test, and more." (Steiger, 2004). Confidence (compatibility)
intervals and p values are complementary summaries of parameter uncertainty
given the observed data. A dichotomous hypothesis test could be performed
with either a CI or a p value. The 100 (1  $\alpha$
)% confidence
interval contains all of the parameter values for which p > $\alpha$
for the current data and model. For example, a 95% confidence interval
contains all of the values for which p > .05.
Note that a confidence interval including 0 does not indicate that the null
(no effect) is true. Rather, it suggests that the observed data together with
the model and its assumptions combined do not provided clear evidence against
a parameter value of 0 (same as with any other value in the interval), with
the level of this evidence defined by the chosen $\alpha$
level (Rafi &
Greenland, 2020; Schweder & Hjort, 2016; Xie & Singh, 2013). To infer no
effect, additional judgments about what parameter values are "close enough"
to 0 to be negligible are needed ("equivalence testing"; Bauer & Kiesser,
1996).
see
The see
package contains relevant plotting functions. See the plotting vignette in the see
package.
The smallsample bias corrected response ratio reported from this function is derived from Lajeunesse (2015).
Lajeunesse, M. J. (2011). On the metaanalysis of response ratios for studies with correlated and multigroup designs. Ecology, 92(11), 20492055. doi:10.1890/110423.1
Lajeunesse, M. J. (2015). Bias and correction for the log response ratio in ecological metaanalysis. Ecology, 96(8), 20562063. doi:10.1890/142402.1
Hedges, L. V., Gurevitch, J., & Curtis, P. S. (1999). The metaanalysis of response ratios in experimental ecology. Ecology, 80(4), 1150–1156. doi:10.1890/00129658(1999)080[1150:TMAORR]2.0.CO;2
Other standardized differences:
cohens_d()
,
mahalanobis_d()
,
p_superiority()
,
rank_biserial()
,
repeated_measures_d()
x < c(1.83, 0.50, 1.62, 2.48, 1.68, 1.88, 1.55, 3.06, 1.30) y < c(0.878, 0.647, 0.598, 2.05, 1.06, 1.29, 1.06, 3.14, 1.29) means_ratio(x, y) means_ratio(x, y, adjust = FALSE) means_ratio(x, y, log = TRUE) # The ratio is scale invariant, making it a standardized effect size means_ratio(3 * x, 3 * y)
x < c(1.83, 0.50, 1.62, 2.48, 1.68, 1.88, 1.55, 3.06, 1.30) y < c(0.878, 0.647, 0.598, 2.05, 1.06, 1.29, 1.06, 3.14, 1.29) means_ratio(x, y) means_ratio(x, y, adjust = FALSE) means_ratio(x, y, log = TRUE) # The ratio is scale invariant, making it a standardized effect size means_ratio(3 * x, 3 * y)
Fictional data.
A 4by3 table, with a column for each major and a row for each type of music.
data("Music_preferences") Music_preferences #> Pop Rock Jazz Classic #> Psych 150 100 165 130 #> Econ 50 65 35 10 #> Law 2 55 40 25
Other effect size datasets:
Music_preferences2
,
RCT_table
,
Smoking_FASD
,
food_class
,
hardlyworking
,
rouder2016
,
screening_test
Fictional data, with more extreme preferences than Music_preferences
A 4by3 table, with a column for each major and a row for each type of music.
data("Music_preferences2") Music_preferences2 #> Pop Rock Jazz Classic #> Psych 151 130 12 7 #> Econ 77 6 111 4 #> Law 0 4 2 165
Other effect size datasets:
Music_preferences
,
RCT_table
,
Smoking_FASD
,
food_class
,
hardlyworking
,
rouder2016
,
screening_test
Convert Between Odds and Probabilities
odds_to_probs(odds, log = FALSE, ...) ## S3 method for class 'data.frame' odds_to_probs(odds, log = FALSE, select = NULL, exclude = NULL, ...) probs_to_odds(probs, log = FALSE, ...) ## S3 method for class 'data.frame' probs_to_odds(probs, log = FALSE, select = NULL, exclude = NULL, ...)
odds_to_probs(odds, log = FALSE, ...) ## S3 method for class 'data.frame' odds_to_probs(odds, log = FALSE, select = NULL, exclude = NULL, ...) probs_to_odds(probs, log = FALSE, ...) ## S3 method for class 'data.frame' probs_to_odds(probs, log = FALSE, select = NULL, exclude = NULL, ...)
odds 
The Odds (or 
log 
Take in or output log odds (such as in logistic models). 
... 
Arguments passed to or from other methods. 
select 
When a data frame is passed, character or list of of column names to be transformed. 
exclude 
When a data frame is passed, character or list of column names to be excluded from transformation. 
probs 
Probability values to convert. 
Converted index.
Other convert between effect sizes:
d_to_r()
,
diff_to_cles
,
eta2_to_f2()
,
oddsratio_to_riskratio()
,
w_to_fei()
odds_to_probs(3) odds_to_probs(1.09, log = TRUE) probs_to_odds(0.95) probs_to_odds(0.95, log = TRUE)
odds_to_probs(3) odds_to_probs(1.09, log = TRUE) probs_to_odds(0.95) probs_to_odds(0.95, log = TRUE)
Compute Odds Ratios, Risk Ratios, Cohen's h, Absolute Risk Reduction or
Number Needed to Treat. Report with any stats::chisq.test()
or
stats::fisher.test()
.
Note that these are computed with each column representing the different
groups, and the first column representing the treatment group and the
second column baseline (or control). Effects are given as treatment / control
. If you wish you use rows as groups you must pass a transposed
table, or switch the x
and y
arguments.
oddsratio(x, y = NULL, ci = 0.95, alternative = "two.sided", log = FALSE, ...) riskratio(x, y = NULL, ci = 0.95, alternative = "two.sided", log = FALSE, ...) cohens_h(x, y = NULL, ci = 0.95, alternative = "two.sided", ...) arr(x, y = NULL, ci = 0.95, alternative = "two.sided", ...) nnt(x, y = NULL, ci = 0.95, alternative = "two.sided", ...)
oddsratio(x, y = NULL, ci = 0.95, alternative = "two.sided", log = FALSE, ...) riskratio(x, y = NULL, ci = 0.95, alternative = "two.sided", log = FALSE, ...) cohens_h(x, y = NULL, ci = 0.95, alternative = "two.sided", ...) arr(x, y = NULL, ci = 0.95, alternative = "two.sided", ...) nnt(x, y = NULL, ci = 0.95, alternative = "two.sided", ...)
x 
a numeric vector or matrix. 
y 
a numeric vector; ignored if 
ci 
Confidence Interval (CI) level 
alternative 
a character string specifying the alternative hypothesis;
Controls the type of CI returned: 
log 
Take in or output the log of the ratio (such as in logistic models), e.g. when the desired input or output are log odds ratios instead odds ratios. 
... 
Ignored 
A data frame with the effect size (Odds_ratio
, Risk_ratio
(possibly with the prefix log_
), Cohens_h
, ARR
, NNT
) and its CIs
(CI_low
and CI_high
).
Confidence intervals are estimated using the standard normal parametric method (see Katz et al., 1978; Szumilas, 2010).
"Confidence intervals on measures of effect size convey all the information
in a hypothesis test, and more." (Steiger, 2004). Confidence (compatibility)
intervals and p values are complementary summaries of parameter uncertainty
given the observed data. A dichotomous hypothesis test could be performed
with either a CI or a p value. The 100 (1  $\alpha$
)% confidence
interval contains all of the parameter values for which p > $\alpha$
for the current data and model. For example, a 95% confidence interval
contains all of the values for which p > .05.
Note that a confidence interval including 0 does not indicate that the null
(no effect) is true. Rather, it suggests that the observed data together with
the model and its assumptions combined do not provided clear evidence against
a parameter value of 0 (same as with any other value in the interval), with
the level of this evidence defined by the chosen $\alpha$
level (Rafi &
Greenland, 2020; Schweder & Hjort, 2016; Xie & Singh, 2013). To infer no
effect, additional judgments about what parameter values are "close enough"
to 0 to be negligible are needed ("equivalence testing"; Bauer & Kiesser,
1996).
see
The see
package contains relevant plotting functions. See the plotting vignette in the see
package.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd Ed.). New York: Routledge.
Katz, D. J. S. M., Baptista, J., Azen, S. P., & Pike, M. C. (1978). Obtaining confidence intervals for the risk ratio in cohort studies. Biometrics, 469474.
Szumilas, M. (2010). Explaining odds ratios. Journal of the Canadian academy of child and adolescent psychiatry, 19(3), 227.
Other effect sizes for contingency table:
cohens_g()
,
phi()
data("RCT_table") RCT_table # note groups are COLUMNS oddsratio(RCT_table) oddsratio(RCT_table, alternative = "greater") riskratio(RCT_table) cohens_h(RCT_table) arr(RCT_table) nnt(RCT_table)
data("RCT_table") RCT_table # note groups are COLUMNS oddsratio(RCT_table) oddsratio(RCT_table, alternative = "greater") riskratio(RCT_table) cohens_h(RCT_table) arr(RCT_table) nnt(RCT_table)
Convert Between Odds Ratios, Risk Ratios and Other Metrics of Change in Probabilities
oddsratio_to_riskratio(OR, p0, log = FALSE, verbose = TRUE, ...) oddsratio_to_arr(OR, p0, log = FALSE, verbose = TRUE, ...) oddsratio_to_nnt(OR, p0, log = FALSE, verbose = TRUE, ...) logoddsratio_to_riskratio(logOR, p0, log = TRUE, verbose = TRUE, ...) logoddsratio_to_arr(logOR, p0, log = TRUE, verbose = TRUE, ...) logoddsratio_to_nnt(logOR, p0, log = TRUE, verbose = TRUE, ...) riskratio_to_oddsratio(RR, p0, log = FALSE, verbose = TRUE, ...) riskratio_to_arr(RR, p0, verbose = TRUE, ...) riskratio_to_logoddsratio(RR, p0, log = TRUE, verbose = TRUE, ...) riskratio_to_nnt(RR, p0, verbose = TRUE, ...) arr_to_riskratio(ARR, p0, verbose = TRUE, ...) arr_to_oddsratio(ARR, p0, log = FALSE, verbose = TRUE, ...) arr_to_logoddsratio(ARR, p0, log = TRUE, verbose = TRUE, ...) arr_to_nnt(ARR, ...) nnt_to_oddsratio(NNT, p0, log = FALSE, verbose = TRUE, ...) nnt_to_logoddsratio(NNT, p0, log = TRUE, verbose = TRUE, ...) nnt_to_riskratio(NNT, p0, verbose = TRUE, ...) nnt_to_arr(NNT, ...)
oddsratio_to_riskratio(OR, p0, log = FALSE, verbose = TRUE, ...) oddsratio_to_arr(OR, p0, log = FALSE, verbose = TRUE, ...) oddsratio_to_nnt(OR, p0, log = FALSE, verbose = TRUE, ...) logoddsratio_to_riskratio(logOR, p0, log = TRUE, verbose = TRUE, ...) logoddsratio_to_arr(logOR, p0, log = TRUE, verbose = TRUE, ...) logoddsratio_to_nnt(logOR, p0, log = TRUE, verbose = TRUE, ...) riskratio_to_oddsratio(RR, p0, log = FALSE, verbose = TRUE, ...) riskratio_to_arr(RR, p0, verbose = TRUE, ...) riskratio_to_logoddsratio(RR, p0, log = TRUE, verbose = TRUE, ...) riskratio_to_nnt(RR, p0, verbose = TRUE, ...) arr_to_riskratio(ARR, p0, verbose = TRUE, ...) arr_to_oddsratio(ARR, p0, log = FALSE, verbose = TRUE, ...) arr_to_logoddsratio(ARR, p0, log = TRUE, verbose = TRUE, ...) arr_to_nnt(ARR, ...) nnt_to_oddsratio(NNT, p0, log = FALSE, verbose = TRUE, ...) nnt_to_logoddsratio(NNT, p0, log = TRUE, verbose = TRUE, ...) nnt_to_riskratio(NNT, p0, verbose = TRUE, ...) nnt_to_arr(NNT, ...)
OR , logOR , RR , ARR , NNT

Oddsratio of 
p0 
Baseline risk 
log 
If:

verbose 
Toggle warnings and messages on or off. 
... 
Arguments passed to and from other methods. 
Converted index, or if OR
/logOR
is a logistic regression model, a
parameter table with the converted indices.
Grant, R. L. (2014). Converting an odds ratio to a range of plausible relative risks for better communication of research findings. Bmj, 348, f7450.
oddsratio()
, riskratio()
, arr()
, and nnt()
.
Other convert between effect sizes:
d_to_r()
,
diff_to_cles
,
eta2_to_f2()
,
odds_to_probs()
,
w_to_fei()
p0 < 0.4 p1 < 0.7 (OR < probs_to_odds(p1) / probs_to_odds(p0)) (RR < p1 / p0) (ARR < p1  p0) (NNT < arr_to_nnt(ARR)) riskratio_to_oddsratio(RR, p0 = p0) oddsratio_to_riskratio(OR, p0 = p0) riskratio_to_arr(RR, p0 = p0) arr_to_oddsratio(nnt_to_arr(NNT), p0 = p0) m < glm(am ~ factor(cyl), data = mtcars, family = binomial() ) oddsratio_to_riskratio(m, verbose = FALSE) # RR is relative to the intercept if p0 not provided
p0 < 0.4 p1 < 0.7 (OR < probs_to_odds(p1) / probs_to_odds(p0)) (RR < p1 / p0) (ARR < p1  p0) (NNT < arr_to_nnt(ARR)) riskratio_to_oddsratio(RR, p0 = p0) oddsratio_to_riskratio(OR, p0 = p0) riskratio_to_arr(RR, p0 = p0) arr_to_oddsratio(nnt_to_arr(NNT), p0 = p0) m < glm(am ~ factor(cyl), data = mtcars, family = binomial() ) oddsratio_to_riskratio(m, verbose = FALSE) # RR is relative to the intercept if p0 not provided
Cohen's $U_1$
, $U_2$
, and $U_3$
, probability of superiority,
proportion of overlap, WilcoxonMannWhitney odds, and Vargha and Delaney's
A are CLESs. These are effect sizes that represent differences between two
(independent) distributions in probabilistic terms (See details). Pair with
any reported stats::t.test()
or stats::wilcox.test()
.
p_superiority( x, y = NULL, data = NULL, mu = 0, paired = FALSE, parametric = TRUE, ci = 0.95, alternative = "two.sided", verbose = TRUE, ... ) cohens_u1( x, y = NULL, data = NULL, mu = 0, parametric = TRUE, ci = 0.95, alternative = "two.sided", iterations = 200, verbose = TRUE, ... ) cohens_u2( x, y = NULL, data = NULL, mu = 0, parametric = TRUE, ci = 0.95, alternative = "two.sided", iterations = 200, verbose = TRUE, ... ) cohens_u3( x, y = NULL, data = NULL, mu = 0, parametric = TRUE, ci = 0.95, alternative = "two.sided", iterations = 200, verbose = TRUE, ... ) p_overlap( x, y = NULL, data = NULL, mu = 0, parametric = TRUE, ci = 0.95, alternative = "two.sided", iterations = 200, verbose = TRUE, ... ) vd_a( x, y = NULL, data = NULL, mu = 0, ci = 0.95, alternative = "two.sided", verbose = TRUE, ... ) wmw_odds( x, y = NULL, data = NULL, mu = 0, paired = FALSE, ci = 0.95, alternative = "two.sided", verbose = TRUE, ... )
p_superiority( x, y = NULL, data = NULL, mu = 0, paired = FALSE, parametric = TRUE, ci = 0.95, alternative = "two.sided", verbose = TRUE, ... ) cohens_u1( x, y = NULL, data = NULL, mu = 0, parametric = TRUE, ci = 0.95, alternative = "two.sided", iterations = 200, verbose = TRUE, ... ) cohens_u2( x, y = NULL, data = NULL, mu = 0, parametric = TRUE, ci = 0.95, alternative = "two.sided", iterations = 200, verbose = TRUE, ... ) cohens_u3( x, y = NULL, data = NULL, mu = 0, parametric = TRUE, ci = 0.95, alternative = "two.sided", iterations = 200, verbose = TRUE, ... ) p_overlap( x, y = NULL, data = NULL, mu = 0, parametric = TRUE, ci = 0.95, alternative = "two.sided", iterations = 200, verbose = TRUE, ... ) vd_a( x, y = NULL, data = NULL, mu = 0, ci = 0.95, alternative = "two.sided", verbose = TRUE, ... ) wmw_odds( x, y = NULL, data = NULL, mu = 0, paired = FALSE, ci = 0.95, alternative = "two.sided", verbose = TRUE, ... )
x , y

A numeric vector, or a character name of one in 
data 
An optional data frame containing the variables. 
mu 
a number indicating the true value of the mean (or difference in means if you are performing a two sample test). 
paired 
If 
parametric 
Use parametric estimation (see 
ci 
Confidence Interval (CI) level 
alternative 
a character string specifying the alternative hypothesis;
Controls the type of CI returned: 
verbose 
Toggle warnings and messages on or off. 
... 
Arguments passed to or from other methods. When 
iterations 
The number of bootstrap replicates for computing confidence
intervals. Only applies when 
These measures of effect size present group differences in probabilistic terms:
Probability of superiority is the probability that, when sampling an observation from each of the groups at random, that the observation from the second group will be larger than the sample from the first group. For the onesample (or paired) case, it is the probability that the sample (or difference) is larger than mu. (Vargha and Delaney's A is an alias for the nonparametric probability of superiority.)
Cohen's $U_1$
is the proportion of the total of both distributions
that does not overlap.
Cohen's $U_2$
is the proportion of one of the groups that exceeds
the same proportion in the other group.
Cohen's $U_3$
is the proportion of the second group that is smaller
than the median of the first group.
Overlap (OVL) is the proportional overlap between the distributions.
(When parametric = FALSE
, bayestestR::overlap()
is used.)
WilcoxonMannWhitney odds are the odds of
nonparametric superiority (via probs_to_odds()
), that is the odds that,
when sampling an observation from each of the groups at random, that the
observation from the second group will be larger than the sample from the
first group.
Where $U_1$
, $U_2$
, and Overlap are agnostic to the direction of
the difference between the groups, $U_3$
and probability of superiority
are not.
The parametric version of these effects assumes normality of both populations and homoscedasticity. If those are not met, the non parametric versions should be used.
A data frame containing the common language effect sizes (and optionally their CIs).
For parametric CLES, the CIs are transformed CIs for Cohen's d (see
d_to_u3()
). For nonparametric (parametric = FALSE
) CLES, the CI of
Pr(superiority) is a transformed CI of the rankbiserial correlation
(rb_to_p_superiority()
), while for all others, confidence intervals are
estimated using the bootstrap method (using the {boot}
package).
"Confidence intervals on measures of effect size convey all the information
in a hypothesis test, and more." (Steiger, 2004). Confidence (compatibility)
intervals and p values are complementary summaries of parameter uncertainty
given the observed data. A dichotomous hypothesis test could be performed
with either a CI or a p value. The 100 (1  $\alpha$
)% confidence
interval contains all of the parameter values for which p > $\alpha$
for the current data and model. For example, a 95% confidence interval
contains all of the values for which p > .05.
Note that a confidence interval including 0 does not indicate that the null
(no effect) is true. Rather, it suggests that the observed data together with
the model and its assumptions combined do not provided clear evidence against
a parameter value of 0 (same as with any other value in the interval), with
the level of this evidence defined by the chosen $\alpha$
level (Rafi &
Greenland, 2020; Schweder & Hjort, 2016; Xie & Singh, 2013). To infer no
effect, additional judgments about what parameter values are "close enough"
to 0 to be negligible are needed ("equivalence testing"; Bauer & Kiesser,
1996).
Some effect sizes are directionless–they do have a minimum value that would
be interpreted as "no effect", but they cannot cross it. For example, a null
value of Kendall's W is 0, indicating no difference between
groups, but it can never have a negative value. Same goes for
U2 and Overlap: the null value of $U_2$
is
0.5, but it can never be smaller than 0.5; am Overlap of 1 means "full
overlap" (no difference), but it cannot be larger than 1.
When bootstrapping CIs for such effect sizes, the bounds of the CIs will
never cross (and often will never cover) the null. Therefore, these CIs
should not be used for statistical inference.
see
The see
package contains relevant plotting functions. See the plotting vignette in the see
package.
If mu
is not 0, the effect size represents the difference between the
first shifted sample (by mu
) and the second sample.
Cohen, J. (1977). Statistical power analysis for the behavioral sciences. New York: Routledge.
Reiser, B., & Faraggi, D. (1999). Confidence intervals for the overlapping coefficient: the normal equal variance case. Journal of the Royal Statistical Society, 48(3), 413418.
Ruscio, J. (2008). A probabilitybased measure of effect size: robustness to base rates and other factors. Psychological methods, 13(1), 19–30.
Vargha, A., & Delaney, H. D. (2000). A critique and improvement of the CL common language effect size statistics of McGraw and Wong. Journal of Educational and Behavioral Statistics, 25(2), 101132.
O’Brien, R. G., & Castelloe, J. (2006, March). Exploiting the link between the WilcoxonMannWhitney test and a simple odds statistic. In Proceedings of the Thirtyfirst Annual SAS Users Group International Conference (pp. 20931). Cary, NC: SAS Institute.
Agresti, A. (1980). Generalized odds ratios for ordinal data. Biometrics, 5967.
Other standardized differences:
cohens_d()
,
mahalanobis_d()
,
means_ratio()
,
rank_biserial()
,
repeated_measures_d()
Other rankbased effect sizes:
rank_biserial()
,
rank_epsilon_squared()
cohens_u2(mpg ~ am, data = mtcars) p_superiority(mpg ~ am, data = mtcars, parametric = FALSE) wmw_odds(mpg ~ am, data = mtcars) x < c(1.83, 0.5, 1.62, 2.48, 1.68, 1.88, 1.55, 3.06, 1.3) y < c(0.878, 0.647, 0.598, 2.05, 1.06, 1.29, 1.06, 3.14, 1.29) p_overlap(x, y) p_overlap(y, x) # direction of effect does not matter cohens_u3(x, y) cohens_u3(y, x) # direction of effect does matter
cohens_u2(mpg ~ am, data = mtcars) p_superiority(mpg ~ am, data = mtcars, parametric = FALSE) wmw_odds(mpg ~ am, data = mtcars) x < c(1.83, 0.5, 1.62, 2.48, 1.68, 1.88, 1.55, 3.06, 1.3) y < c(0.878, 0.647, 0.598, 2.05, 1.06, 1.29, 1.06, 3.14, 1.29) p_overlap(x, y) p_overlap(y, x) # direction of effect does not matter cohens_u3(x, y) cohens_u3(y, x) # direction of effect does matter
$\phi$
and Other Contingency Tables CorrelationsCompute phi ($\phi$
), Cramer's V, Tschuprow's T, Cohen's w,
פ (Fei), Pearson's contingency coefficient for
contingency tables or goodnessoffit. Pair with any reported
stats::chisq.test()
.
phi(x, y = NULL, adjust = TRUE, ci = 0.95, alternative = "greater", ...) cramers_v(x, y = NULL, adjust = TRUE, ci = 0.95, alternative = "greater", ...) tschuprows_t( x, y = NULL, adjust = TRUE, ci = 0.95, alternative = "greater", ... ) cohens_w( x, y = NULL, p = rep(1, length(x)), ci = 0.95, alternative = "greater", ... ) fei(x, p = rep(1, length(x)), ci = 0.95, alternative = "greater", ...) pearsons_c( x, y = NULL, p = rep(1, length(x)), ci = 0.95, alternative = "greater", ... )
phi(x, y = NULL, adjust = TRUE, ci = 0.95, alternative = "greater", ...) cramers_v(x, y = NULL, adjust = TRUE, ci = 0.95, alternative = "greater", ...) tschuprows_t( x, y = NULL, adjust = TRUE, ci = 0.95, alternative = "greater", ... ) cohens_w( x, y = NULL, p = rep(1, length(x)), ci = 0.95, alternative = "greater", ... ) fei(x, p = rep(1, length(x)), ci = 0.95, alternative = "greater", ...) pearsons_c( x, y = NULL, p = rep(1, length(x)), ci = 0.95, alternative = "greater", ... )
x 
a numeric vector or matrix. 
y 
a numeric vector; ignored if 
adjust 
Should the effect size be corrected for smallsample bias?
Defaults to 
ci 
Confidence Interval (CI) level 
alternative 
a character string specifying the alternative hypothesis;
Controls the type of CI returned: 
... 
Ignored. 
p 
a vector of probabilities of the same length as 
phi ($\phi$
), Cramer's V, Tschuprow's T, Cohen's w, and Pearson's
C are effect sizes for tests of independence in 2D contingency tables. For
2by2 tables, phi, Cramer's V, Tschuprow's T, and Cohen's w are
identical, and are equal to the simple correlation between two dichotomous
variables, ranging between 0 (no dependence) and 1 (perfect dependence).
For larger tables, Cramer's V, Tschuprow's T or Pearson's C should be
used, as they are bounded between 01. (Cohen's w can also be used, but
since it is not bounded at 1 (can be larger) its interpretation is more
difficult.) For square table, Cramer's V and Tschuprow's T give the same
results, but for nonsquare tables Tschuprow's T is more conservative:
while V will be 1 if either columns are fully dependent on rows (for each
column, there is only one non0 cell) or rows are fully dependent on
columns, T will only be 1 if both are true.
For goodnessoffit in 1D tables Cohen's W, פ (Fei)
or Pearson's C can be used. Cohen's w has no upper bound (can be
arbitrarily large, depending on the expected distribution). Fei is an
adjusted Cohen's w, accounting for the expected distribution, making it
bounded between 01 (BenShachar et al, 2023). Pearson's C is also bounded
between 01.
To summarize, for correlationlike effect sizes, we recommend:
For a 2x2 table, use phi()
For larger tables, use cramers_v()
For goodnessoffit, use fei()
A data frame with the effect size (Cramers_v
, phi
(possibly with
the suffix _adjusted
), Cohens_w
, Fei
) and its CIs (CI_low
and
CI_high
).
Unless stated otherwise, confidence (compatibility) intervals (CIs) are
estimated using the noncentrality parameter method (also called the "pivot
method"). This method finds the noncentrality parameter ("ncp") of a
noncentral t, F, or $\chi^2$
distribution that places the observed
t, F, or $\chi^2$
test statistic at the desired probability point of
the distribution. For example, if the observed t statistic is 2.0, with 50
degrees of freedom, for which cumulative noncentral t distribution is t =
2.0 the .025 quantile (answer: the noncentral t distribution with ncp =
.04)? After estimating these confidence bounds on the ncp, they are
converted into the effect size metric to obtain a confidence interval for the
effect size (Steiger, 2004).
For additional details on estimation and troubleshooting, see effectsize_CIs.
"Confidence intervals on measures of effect size convey all the information
in a hypothesis test, and more." (Steiger, 2004). Confidence (compatibility)
intervals and p values are complementary summaries of parameter uncertainty
given the observed data. A dichotomous hypothesis test could be performed
with either a CI or a p value. The 100 (1  $\alpha$
)% confidence
interval contains all of the parameter values for which p > $\alpha$
for the current data and model. For example, a 95% confidence interval
contains all of the values for which p > .05.
Note that a confidence interval including 0 does not indicate that the null
(no effect) is true. Rather, it suggests that the observed data together with
the model and its assumptions combined do not provided clear evidence against
a parameter value of 0 (same as with any other value in the interval), with
the level of this evidence defined by the chosen $\alpha$
level (Rafi &
Greenland, 2020; Schweder & Hjort, 2016; Xie & Singh, 2013). To infer no
effect, additional judgments about what parameter values are "close enough"
to 0 to be negligible are needed ("equivalence testing"; Bauer & Kiesser,
1996).
see
The see
package contains relevant plotting functions. See the plotting vignette in the see
package.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd Ed.). New York: Routledge.
BenShachar, M.S., Patil, I., Thériault, R., Wiernik, B.M., Lüdecke, D. (2023). Phi, Fei, Fo, Fum: Effect Sizes for Categorical Data That Use the Chi‑Squared Statistic. Mathematics, 11, 1982. doi:10.3390/math11091982
Johnston, J. E., Berry, K. J., & Mielke Jr, P. W. (2006). Measures of effect size for chisquared and likelihoodratio goodnessoffit tests. Perceptual and motor skills, 103(2), 412414.
Rosenberg, M. S. (2010). A generalized formula for converting chisquare tests to effect sizes for metaanalysis. PloS one, 5(4), e10059.
chisq_to_phi()
for details regarding estimation and CIs.
Other effect sizes for contingency table:
cohens_g()
,
oddsratio()
## 2by2 tables ##  data("RCT_table") RCT_table # note groups are COLUMNS phi(RCT_table) pearsons_c(RCT_table) ## Larger tables ##  data("Music_preferences") Music_preferences cramers_v(Music_preferences) cohens_w(Music_preferences) pearsons_c(Music_preferences) ## Goodness of fit ##  data("Smoking_FASD") Smoking_FASD fei(Smoking_FASD) cohens_w(Smoking_FASD) pearsons_c(Smoking_FASD) # Use custom expected values: fei(Smoking_FASD, p = c(0.015, 0.010, 0.975)) cohens_w(Smoking_FASD, p = c(0.015, 0.010, 0.975)) pearsons_c(Smoking_FASD, p = c(0.015, 0.010, 0.975))
## 2by2 tables ##  data("RCT_table") RCT_table # note groups are COLUMNS phi(RCT_table) pearsons_c(RCT_table) ## Larger tables ##  data("Music_preferences") Music_preferences cramers_v(Music_preferences) cohens_w(Music_preferences) pearsons_c(Music_preferences) ## Goodness of fit ##  data("Smoking_FASD") Smoking_FASD fei(Smoking_FASD) cohens_w(Smoking_FASD) pearsons_c(Smoking_FASD) # Use custom expected values: fei(Smoking_FASD, p = c(0.015, 0.010, 0.975)) cohens_w(Smoking_FASD, p = c(0.015, 0.010, 0.975)) pearsons_c(Smoking_FASD, p = c(0.015, 0.010, 0.975))
{effectsize}
TablesPrinting, formatting and plotting methods for effectsize
tables.
## S3 method for class 'effectsize_table' plot(x, ...) ## S3 method for class 'effectsize_table' print(x, digits = 2, use_symbols = getOption("es.use_symbols", FALSE), ...) ## S3 method for class 'effectsize_table' print_md(x, digits = 2, use_symbols = getOption("es.use_symbols", FALSE), ...) ## S3 method for class 'effectsize_table' print_html( x, digits = 2, use_symbols = getOption("es.use_symbols", FALSE), ... ) ## S3 method for class 'effectsize_table' format( x, digits = 2, output = c("text", "markdown", "html"), use_symbols = getOption("es.use_symbols", FALSE), ... ) ## S3 method for class 'effectsize_difference' print(x, digits = 2, append_CLES = NULL, ...)
## S3 method for class 'effectsize_table' plot(x, ...) ## S3 method for class 'effectsize_table' print(x, digits = 2, use_symbols = getOption("es.use_symbols", FALSE), ...) ## S3 method for class 'effectsize_table' print_md(x, digits = 2, use_symbols = getOption("es.use_symbols", FALSE), ...) ## S3 method for class 'effectsize_table' print_html( x, digits = 2, use_symbols = getOption("es.use_symbols", FALSE), ... ) ## S3 method for class 'effectsize_table' format( x, digits = 2, output = c("text", "markdown", "html"), use_symbols = getOption("es.use_symbols", FALSE), ... ) ## S3 method for class 'effectsize_difference' print(x, digits = 2, append_CLES = NULL, ...)
x 
Object to print. 
... 
Arguments passed to or from other functions. 
digits 
Number of digits for rounding or significant figures. May also
be 
use_symbols 
Should proper symbols be printed ( 
output 
Which output is the formatting intended for? Affects how title and footers are formatted. 
append_CLES 
Which Common Language Effect Sizes should be printed as well? Only applicable to Cohen's d, Hedges' g for independent samples of equal variance (pooled sd) or for the rankbiserial correlation for independent samples (See d_to_cles). 
see
The see
package contains relevant plotting functions. See the plotting vignette in the see
package.
$\Delta R^2$
)Compute the semipartial (part) correlation squared (also known as
$\Delta R^2$
). Currently, only lm()
models are supported.
r2_semipartial( model, type = c("terms", "parameters"), ci = 0.95, alternative = "greater", ... )
r2_semipartial( model, type = c("terms", "parameters"), ci = 0.95, alternative = "greater", ... )
model 
An 
type 
Type, either 
ci 
Confidence Interval (CI) level 
alternative 
a character string specifying the alternative hypothesis;
Controls the type of CI returned: 
... 
Arguments passed to or from other methods. 
This is similar to the last column of the "Conditional Dominance Statistics"
section of the parameters::dominance_analysis()
output. For each term, the
model is refit without the columns on the model matrix that correspond to that term. The $R^2$
of
this submodel is then subtracted from the $R^2$
of the full model to
yield the $\Delta R^2$
. (For type = "parameters"
, this is done for each
column in the model matrix.)
Note that this is unlike parameters::dominance_analysis()
, where term
deletion is done via the formula interface, and therefore may lead to
different results.
For other, nonlm()
models, as well as more verbose information and
options, please see the documentation for parameters::dominance_analysis()
.
A data frame with the effect size.
Confidence intervals are based on the normal approximation as provided by Alf
and Graf (1999). An adjustment to the lower bound of the CI is used, to
improve the coverage properties of the CIs, according to Algina et al (2008):
If the F test associated with the $sr^2$
is significant (at 1ci
level), but the lower bound of the CI is 0, it is set to a small value
(arbitrarily to a 10th of the estimated $sr^2$
); if the F test is not
significant, the lower bound is set to 0. (Additionally, lower and upper
bound are "fixed" so that they cannot be smaller than 0 or larger than 1.)
"Confidence intervals on measures of effect size convey all the information
in a hypothesis test, and more." (Steiger, 2004). Confidence (compatibility)
intervals and p values are complementary summaries of parameter uncertainty
given the observed data. A dichotomous hypothesis test could be performed
with either a CI or a p value. The 100 (1  $\alpha$
)% confidence
interval contains all of the parameter values for which p > $\alpha$
for the current data and model. For example, a 95% confidence interval
contains all of the values for which p > .05.
Note that a confidence interval including 0 does not indicate that the null
(no effect) is true. Rather, it suggests that the observed data together with
the model and its assumptions combined do not provided clear evidence against
a parameter value of 0 (same as with any other value in the interval), with
the level of this evidence defined by the chosen $\alpha$
level (Rafi &
Greenland, 2020; Schweder & Hjort, 2016; Xie & Singh, 2013). To infer no
effect, additional judgments about what parameter values are "close enough"
to 0 to be negligible are needed ("equivalence testing"; Bauer & Kiesser,
1996).
see
The see
package contains relevant plotting functions. See the plotting vignette in the see
package.
Alf Jr, E. F., & Graf, R. G. (1999). Asymptotic confidence limits for the difference between two squared multiple correlations: A simplified approach. Psychological Methods, 4(1), 7075. doi:10.1037/1082989X.4.1.70
Algina, J., Keselman, H. J., & Penfield, R. D. (2008). Confidence intervals for the squared multiple semipartial correlation coefficient. Journal of Modern Applied Statistical Methods, 7(1), 210. doi:10.22237/jmasm/1209614460
eta_squared()
, cohens_f()
for comparing two models,
parameters::dominance_analysis()
and
parameters::standardize_parameters()
.
data("hardlyworking") m < lm(salary ~ factor(n_comps) + xtra_hours * seniority, data = hardlyworking) r2_semipartial(m) r2_semipartial(m, type = "parameters") # Compare to `eta_squared()` #  npk.aov < lm(yield ~ N + P + K, npk) # When predictors are orthogonal, # eta_squared(partial = FALSE) gives the same effect size: performance::check_collinearity(npk.aov) eta_squared(npk.aov, partial = FALSE) r2_semipartial(npk.aov) # Compare to `dominance_analysis()` #  m_full < lm(salary ~ ., data = hardlyworking) r2_semipartial(m_full) # Compare to last column of "Conditional Dominance Statistics": parameters::dominance_analysis(m_full)
data("hardlyworking") m < lm(salary ~ factor(n_comps) + xtra_hours * seniority, data = hardlyworking) r2_semipartial(m) r2_semipartial(m, type = "parameters") # Compare to `eta_squared()` #  npk.aov < lm(yield ~ N + P + K, npk) # When predictors are orthogonal, # eta_squared(partial = FALSE) gives the same effect size: performance::check_collinearity(npk.aov) eta_squared(npk.aov, partial = FALSE) r2_semipartial(npk.aov) # Compare to `dominance_analysis()` #  m_full < lm(salary ~ ., data = hardlyworking) r2_semipartial(m_full) # Compare to last column of "Conditional Dominance Statistics": parameters::dominance_analysis(m_full)
Compute the rankbiserial correlation ($r_{rb}$
) and Cliff's delta
($\delta$
) effect sizes for nonparametric
(rank sum) differences. These effect sizes of dominance are closely related
to the Common Language Effect Sizes. Pair with any reported
stats::wilcox.test()
.
rank_biserial( x, y = NULL, data = NULL, mu = 0, paired = FALSE, ci = 0.95, alternative = "two.sided", verbose = TRUE, ... ) cliffs_delta( x, y = NULL, data = NULL, mu = 0, ci = 0.95, alternative = "two.sided", verbose = TRUE, ... )
rank_biserial( x, y = NULL, data = NULL, mu = 0, paired = FALSE, ci = 0.95, alternative = "two.sided", verbose = TRUE, ... ) cliffs_delta( x, y = NULL, data = NULL, mu = 0, ci = 0.95, alternative = "two.sided", verbose = TRUE, ... )
x , y

A numeric or ordered vector, or a character name of one in 
data 
An optional data frame containing the variables. 
mu 
a number indicating the value around which (a)symmetry (for onesample or paired samples) or shift (for independent samples) is to be estimated. See stats::wilcox.test. 
paired 
If 
ci 
Confidence Interval (CI) level 
alternative 
a character string specifying the alternative hypothesis;
Controls the type of CI returned: 
verbose 
Toggle warnings and messages on or off. 
... 
Arguments passed to or from other methods. When 
The rankbiserial correlation is appropriate for nonparametric tests of
differences  both for the one sample or paired samples case, that would
normally be tested with Wilcoxon's Signed Rank Test (giving the
matchedpairs rankbiserial correlation) and for two independent samples
case, that would normally be tested with MannWhitney's U Test (giving
Glass' rankbiserial correlation). See stats::wilcox.test. In both
cases, the correlation represents the difference between the proportion of
favorable and unfavorable pairs / signed ranks (Kerby, 2014). Values range
from 1
complete dominance of the second sample (all values of the second
sample are larger than all the values of the first sample) to +1
complete
dominance of the fist sample (all values of the second sample are smaller
than all the values of the first sample).
Cliff's delta is an alias to the rankbiserial correlation in the two sample case.
A data frame with the effect size r_rank_biserial
and its CI
(CI_low
and CI_high
).
When tied values occur, they are each given the average of the ranks that would have been given had no ties occurred. This results in an effect size of reduced magnitude. A correction has been applied for Kendall's W.
Confidence intervals for the rankbiserial correlation (and Cliff's delta) are estimated using the normal approximation (via Fisher's transformation).
"Confidence intervals on measures of effect size convey all the information
in a hypothesis test, and more." (Steiger, 2004). Confidence (compatibility)
intervals and p values are complementary summaries of parameter uncertainty
given the observed data. A dichotomous hypothesis test could be performed
with either a CI or a p value. The 100 (1  $\alpha$
)% confidence
interval contains all of the parameter values for which p > $\alpha$
for the current data and model. For example, a 95% confidence interval
contains all of the values for which p > .05.
Note that a confidence interval including 0 does not indicate that the null
(no effect) is true. Rather, it suggests that the observed data together with
the model and its assumptions combined do not provided clear evidence against
a parameter value of 0 (same as with any other value in the interval), with
the level of this evidence defined by the chosen $\alpha$
level (Rafi &
Greenland, 2020; Schweder & Hjort, 2016; Xie & Singh, 2013). To infer no
effect, additional judgments about what parameter values are "close enough"
to 0 to be negligible are needed ("equivalence testing"; Bauer & Kiesser,
1996).
see
The see
package contains relevant plotting functions. See the plotting vignette in the see
package.
Cureton, E. E. (1956). Rankbiserial correlation. Psychometrika, 21(3), 287290.
Glass, G. V. (1965). A ranking variable analogue of biserial correlation: Implications for shortcut item analysis. Journal of Educational Measurement, 2(1), 9195.
Kerby, D. S. (2014). The simple difference formula: An approach to teaching nonparametric correlation. Comprehensive Psychology, 3, 11IT.
King, B. M., & Minium, E. W. (2008). Statistical reasoning in the behavioral sciences. John Wiley & Sons Inc.
Cliff, N. (1993). Dominance statistics: Ordinal analyses to answer ordinal questions. Psychological bulletin, 114(3), 494.
Tomczak, M., & Tomczak, E. (2014). The need to report effect size estimates revisited. An overview of some recommended measures of effect size.
Other standardized differences:
cohens_d()
,
mahalanobis_d()
,
means_ratio()
,
p_superiority()
,
repeated_measures_d()
Other rankbased effect sizes:
p_superiority()
,
rank_epsilon_squared()
data(mtcars) mtcars$am < factor(mtcars$am) mtcars$cyl < factor(mtcars$cyl) # Two Independent Samples  (rb < rank_biserial(mpg ~ am, data = mtcars)) # Same as: # rank_biserial("mpg", "am", data = mtcars) # rank_biserial(mtcars$mpg[mtcars$am=="0"], mtcars$mpg[mtcars$am=="1"]) # cliffs_delta(mpg ~ am, data = mtcars) # More options: rank_biserial(mpg ~ am, data = mtcars, mu = 5) print(rb, append_CLES = TRUE) # One Sample  # from help("wilcox.test") x < c(1.83, 0.50, 1.62, 2.48, 1.68, 1.88, 1.55, 3.06, 1.30) y < c(0.878, 0.647, 0.598, 2.05, 1.06, 1.29, 1.06, 3.14, 1.29) depression < data.frame(first = x, second = y, change = y  x) rank_biserial(change ~ 1, data = depression) # same as: # rank_biserial("change", data = depression) # rank_biserial(mtcars$wt) # More options: rank_biserial(change ~ 1, data = depression, mu = 0.5) # Paired Samples  (rb < rank_biserial(Pair(first, second) ~ 1, data = depression)) # same as: # rank_biserial(depression$first, depression$second, paired = TRUE) interpret_rank_biserial(0.78) interpret(rb, rules = "funder2019")
data(mtcars) mtcars$am < factor(mtcars$am) mtcars$cyl < factor(mtcars$cyl) # Two Independent Samples  (rb < rank_biserial(mpg ~ am, data = mtcars)) # Same as: # rank_biserial("mpg", "am", data = mtcars) # rank_biserial(mtcars$mpg[mtcars$am=="0"], mtcars$mpg[mtcars$am=="1"]) # cliffs_delta(mpg ~ am, data = mtcars) # More options: rank_biserial(mpg ~ am, data = mtcars, mu = 5) print(rb, append_CLES = TRUE) # One Sample  # from help("wilcox.test") x < c(1.83, 0.50, 1.62, 2.48, 1.68, 1.88, 1.55, 3.06, 1.30) y < c(0.878, 0.647, 0.598, 2.05, 1.06, 1.29, 1.06, 3.14, 1.29) depression < data.frame(first = x, second = y, change = y  x) rank_biserial(change ~ 1, data = depression) # same as: # rank_biserial("change", data = depression) # rank_biserial(mtcars$wt) # More options: rank_biserial(change ~ 1, data = depression, mu = 0.5) # Paired Samples  (rb < rank_biserial(Pair(first, second) ~ 1, data = depression)) # same as: # rank_biserial(depression$first, depression$second, paired = TRUE) interpret_rank_biserial(0.78) interpret(rb, rules = "funder2019")
Compute rank epsilon squared ($E^2_R$
) or rank eta squared
($\eta^2_H$
) (to accompany stats::kruskal.test()
), and Kendall's W
(to accompany stats::friedman.test()
) effect sizes for nonparametric (rank
sum) oneway ANOVAs.
rank_epsilon_squared( x, groups, data = NULL, ci = 0.95, alternative = "greater", iterations = 200, verbose = TRUE, ... ) rank_eta_squared( x, groups, data = NULL, ci = 0.95, alternative = "greater", iterations = 200, verbose = TRUE, ... ) kendalls_w( x, groups, blocks, data = NULL, blocks_on_rows = TRUE, ci = 0.95, alternative = "greater", iterations = 200, verbose = TRUE, ... )
rank_epsilon_squared( x, groups, data = NULL, ci = 0.95, alternative = "greater", iterations = 200, verbose = TRUE, ... ) rank_eta_squared( x, groups, data = NULL, ci = 0.95, alternative = "greater", iterations = 200, verbose = TRUE, ... ) kendalls_w( x, groups, blocks, data = NULL, blocks_on_rows = TRUE, ci = 0.95, alternative = "greater", iterations = 200, verbose = TRUE, ... )
x 
Can be one of:

groups , blocks

A factor vector giving the group / block for the
corresponding elements of 
data 
An optional data frame containing the variables. 
ci 
Confidence Interval (CI) level 
alternative 
a character string specifying the alternative hypothesis;
Controls the type of CI returned: 
iterations 
The number of bootstrap replicates for computing confidence
intervals. Only applies when 
verbose 
Toggle warnings and messages on or off. 
... 
Arguments passed to or from other methods. When 
blocks_on_rows 
Are blocks on rows ( 
The rank epsilon squared and rank eta squared are appropriate for
nonparametric tests of differences between 2 or more samples (a rank based
ANOVA). See stats::kruskal.test. Values range from 0 to 1, with larger
values indicating larger differences between groups.
Kendall's W is appropriate for nonparametric tests of differences between
2 or more dependent samples (a rank based rmANOVA), where each group
(e.g.,
experimental condition) was measured for each block
(e.g., subject). This
measure is also common as a measure of reliability of the rankings of the
groups
between raters (blocks
). See stats::friedman.test. Values range
from 0 to 1, with larger values indicating larger differences between groups
/ higher agreement between raters.
A data frame with the effect size and its CI.
Confidence intervals for $E^2_R$
, $\eta^2_H$
, and Kendall's W are
estimated using the bootstrap method (using the {boot}
package).
When tied values occur, they are each given the average of the ranks that would have been given had no ties occurred. This results in an effect size of reduced magnitude. A correction has been applied for Kendall's W.
"Confidence intervals on measures of effect size convey all the information
in a hypothesis test, and more." (Steiger, 2004). Confidence (compatibility)
intervals and p values are complementary summaries of parameter uncertainty
given the observed data. A dichotomous hypothesis test could be performed
with either a CI or a p value. The 100 (1  $\alpha$
)% confidence
interval contains all of the parameter values for which p > $\alpha$
for the current data and model. For example, a 95% confidence interval
contains all of the values for which p > .05.
Note that a confidence interval including 0 does not indicate that the null
(no effect) is true. Rather, it suggests that the observed data together with
the model and its assumptions combined do not provided clear evidence against
a parameter value of 0 (same as with any other value in the interval), with
the level of this evidence defined by the chosen $\alpha$
level (Rafi &
Greenland, 2020; Schweder & Hjort, 2016; Xie & Singh, 2013). To infer no
effect, additional judgments about what parameter values are "close enough"
to 0 to be negligible are needed ("equivalence testing"; Bauer & Kiesser,
1996).
Some effect sizes are directionless–they do have a minimum value that would
be interpreted as "no effect", but they cannot cross it. For example, a null
value of Kendall's W is 0, indicating no difference between
groups, but it can never have a negative value. Same goes for
U2 and Overlap: the null value of $U_2$
is
0.5, but it can never be smaller than 0.5; am Overlap of 1 means "full
overlap" (no difference), but it cannot be larger than 1.
When bootstrapping CIs for such effect sizes, the bounds of the CIs will
never cross (and often will never cover) the null. Therefore, these CIs
should not be used for statistical inference.
see
The see
package contains relevant plotting functions. See the plotting vignette in the see
package.
Kendall, M.G. (1948) Rank correlation methods. London: Griffin.
Tomczak, M., & Tomczak, E. (2014). The need to report effect size estimates revisited. An overview of some recommended measures of effect size. Trends in sport sciences, 1(21), 1925.
Other rankbased effect sizes:
p_superiority()
,
rank_biserial()
Other effect sizes for ANOVAs:
eta_squared()
# Rank Eta/Epsilon Squared # ======================== rank_eta_squared(mpg ~ cyl, data = mtcars) rank_epsilon_squared(mpg ~ cyl, data = mtcars) # Kendall's W # =========== dat < data.frame( cond = c("A", "B", "A", "B", "A", "B"), ID = c("L", "L", "M", "M", "H", "H"), y = c(44.56, 28.22, 24, 28.78, 24.56, 18.78) ) (W < kendalls_w(y ~ cond  ID, data = dat, verbose = FALSE)) interpret_kendalls_w(0.11) interpret(W, rules = "landis1977")
# Rank Eta/Epsilon Squared # ======================== rank_eta_squared(mpg ~ cyl, data = mtcars) rank_epsilon_squared(mpg ~ cyl, data = mtcars) # Kendall's W # =========== dat < data.frame( cond = c("A", "B", "A", "B", "A", "B"), ID = c("L", "L", "M", "M", "H", "H"), y = c(44.56, 28.22, 24, 28.78, 24.56, 18.78) ) (W < kendalls_w(y ~ cond  ID, data = dat, verbose = FALSE)) interpret_kendalls_w(0.11) interpret(W, rules = "landis1977")
Fictional Results from a Workers' Randomized Control Trial
A 2by2 table, with a column for each group and a row for the diagnosis.
data("RCT_table") RCT_table #> Group #> Diagnosis Treatment Control #> Sick 71 30 #> Recovered 50 100
Other effect size datasets:
Music_preferences
,
Music_preferences2
,
Smoking_FASD
,
food_class
,
hardlyworking
,
rouder2016
,
screening_test
Compute effect size indices for standardized mean differences in repeated
measures data. Pair with any reported stats::t.test(paired = TRUE)
.
In a repeatedmeasures design, the same subjects are measured in multiple
conditions or time points. Unlike the case of independent groups, there are
multiple sources of variation that can be used to standardized the
differences between the means of the conditions / times.
repeated_measures_d( x, y, data = NULL, mu = 0, method = c("rm", "av", "z", "b", "d", "r"), adjust = TRUE, ci = 0.95, alternative = "two.sided", verbose = TRUE, ... ) rm_d( x, y, data = NULL, mu = 0, method = c("rm", "av", "z", "b", "d", "r"), adjust = TRUE, ci = 0.95, alternative = "two.sided", verbose = TRUE, ... )
repeated_measures_d( x, y, data = NULL, mu = 0, method = c("rm", "av", "z", "b", "d", "r"), adjust = TRUE, ci = 0.95, alternative = "two.sided", verbose = TRUE, ... ) rm_d( x, y, data = NULL, mu = 0, method = c("rm", "av", "z", "b", "d", "r"), adjust = TRUE, ci = 0.95, alternative = "two.sided", verbose = TRUE, ... )
x , y

Paired numeric vectors, or names of ones in

data 
An optional data frame containing the variables. 
mu 
a number indicating the true value of the mean (or difference in means if you are performing a two sample test). 
method 
Method of repeated measures standardized differences. See details. 
adjust 
Apply Hedges' smallsample bias correction? See 
ci 
Confidence Interval (CI) level 
alternative 
a character string specifying the alternative hypothesis;
Controls the type of CI returned: 
verbose 
Toggle warnings and messages on or off. 
... 
Arguments passed to or from other methods. When 
A data frame with the effect size and their CIs (CI_low
and
CI_high
).
Unlike Cohen's d for independent groups, where standardization
naturally is done by the (pooled) population standard deviation (cf. Glass’s
$\Delta$
), when measured across two conditions are dependent, there are
many more options for what error term to standardize by. Additionally, some
options allow for data to be replicated (many measurements per condition per
individual), others require a single observation per condition per individual
(aka, paired data; so replications are aggregated).
(It should be noted that all of these have awful and confusing notations.)
Standardize by...
Difference Score Variance: $d_{z}$
(Requires paired data)  This
is akin to computing difference scores for each individual and then
computing a onesample Cohen's d (Cohen, 1988, pp. 48; see examples).
WithinSubject Variance: $d_{rm}$
(Requires paired data)  Cohen
suggested adjusting $d_{z}$
to estimate the "standard" betweensubjects
d by a factor of $\sqrt{2(1r)}$
, where r is the Pearson correlation
between the paired measures (Cohen, 1988, pp. 48).
Control Variance: $d_{b}$
(aka Becker's d) (Requires paired
data)  Standardized by the variance of the control condition (or in a pre
posttreatment setting, the pretreatment condition). This is akin to Glass'
delta (glass_delta()
) (Becker, 1988). Note that this is taken here as the
second condition (y
).
Average Variance: $d_{av}$
(Requires paired data)  Instead of
standardizing by the variance in the of the control (or pre) condition,
Cumming suggests standardizing by the average variance of the two paired
conditions (Cumming, 2013, pp. 291).
All Variance: Just $d$
 This is the same as computing a standard
independentgroups Cohen's d (Cohen, 1988). Note that CIs do account for
the dependence, and so are typically more narrow (see examples).
Residual Variance: $d_{r}$
(Requires data with replications) 
Divide by the pooled variance after all individual differences have been
partialled out (i.e., the residual/level1 variance in an ANOVA or MLM
setting). In betweensubjects designs where each subject contributes a single
response, this is equivalent to classical Cohen’s d. Priors in the
BayesFactor
package are defined on this scale (Rouder et al., 2012).
Note that for paired data, when the two conditions have equal variance,
$d_{rm}$
, $d_{av}$
, $d_{b}$
are equal to $d$
.
Confidence intervals are estimated using the standard normal parametric method (see Algina & Keselman, 2003; Becker, 1988; Cooper et al., 2009; Hedges & Olkin, 1985; Pustejovsky et al., 2014).
"Confidence intervals on measures of effect size convey all the information
in a hypothesis test, and more." (Steiger, 2004). Confidence (compatibility)
intervals and p values are complementary summaries of parameter uncertainty
given the observed data. A dichotomous hypothesis test could be performed
with either a CI or a p value. The 100 (1  $\alpha$
)% confidence
interval contains all of the parameter values for which p > $\alpha$
for the current data and model. For example, a 95% confidence interval
contains all of the values for which p > .05.
Note that a confidence interval including 0 does not indicate that the null
(no effect) is true. Rather, it suggests that the observed data together with
the model and its assumptions combined do not provided clear evidence against
a parameter value of 0 (same as with any other value in the interval), with
the level of this evidence defined by the chosen $\alpha$
level (Rafi &
Greenland, 2020; Schweder & Hjort, 2016; Xie & Singh, 2013). To infer no
effect, additional judgments about what parameter values are "close enough"
to 0 to be negligible are needed ("equivalence testing"; Bauer & Kiesser,
1996).
see
The see
package contains relevant plotting functions. See the plotting vignette in the see
package.
rm_d()
is an alias for repeated_measures_d()
.
Algina, J., & Keselman, H. J. (2003). Approximate confidence intervals for effect sizes. Educational and Psychological Measurement, 63(4), 537553.
Becker, B. J. (1988). Synthesizing standardized mean‐change measures. British Journal of Mathematical and Statistical Psychology, 41(2), 257278.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd Ed.). New York: Routledge.
Cooper, H., Hedges, L., & Valentine, J. (2009). Handbook of research synthesis and metaanalysis. Russell Sage Foundation, New York.
Cumming, G. (2013). Understanding the new statistics: Effect sizes, confidence intervals, and metaanalysis. Routledge.
Hedges, L. V. & Olkin, I. (1985). Statistical methods for metaanalysis. Orlando, FL: Academic Press.
Pustejovsky, J. E., Hedges, L. V., & Shadish, W. R. (2014). Designcomparable effect sizes in multiple baseline designs: A general modeling framework. Journal of Educational and Behavioral Statistics, 39(5), 368393.
Rouder, J. N., Morey, R. D., Speckman, P. L., & Province, J. M. (2012). Default Bayes factors for ANOVA designs. Journal of mathematical psychology, 56(5), 356374.
cohens_d()
, and lmeInfo::g_mlm()
and emmeans::effsize()
for
more flexible methods.
Other standardized differences:
cohens_d()
,
mahalanobis_d()
,
means_ratio()
,
p_superiority()
,
rank_biserial()
# Paired data  data("sleep") sleep2 < reshape(sleep, direction = "wide", idvar = "ID", timevar = "group" ) repeated_measures_d(Pair(extra.1, extra.2) ~ 1, data = sleep2) # Same as: # repeated_measures_d(sleep$extra[sleep$group==1], # sleep$extra[sleep$group==2]) # repeated_measures_d(extra ~ group  ID, data = sleep) # More options: repeated_measures_d(Pair(extra.1, extra.2) ~ 1, data = sleep2, mu = 1) repeated_measures_d(Pair(extra.1, extra.2) ~ 1, data = sleep2, alternative = "less") # Other methods repeated_measures_d(Pair(extra.1, extra.2) ~ 1, data = sleep2, method = "av") repeated_measures_d(Pair(extra.1, extra.2) ~ 1, data = sleep2, method = "b") repeated_measures_d(Pair(extra.1, extra.2) ~ 1, data = sleep2, method = "d") repeated_measures_d(Pair(extra.1, extra.2) ~ 1, data = sleep2, method = "z", adjust = FALSE) # d_z is the same as Cohen's d for one sample (of individual difference): cohens_d(extra.1  extra.2 ~ 1, data = sleep2) # Repetition data  data("rouder2016") # For rm, ad, z, b, data is aggregated repeated_measures_d(rt ~ cond  id, data = rouder2016) # same as: rouder2016_wide < tapply(rouder2016[["rt"]], rouder2016[1:2], mean) repeated_measures_d(rouder2016_wide[, 1], rouder2016_wide[, 2]) # For r or d, data is not aggragated: repeated_measures_d(rt ~ cond  id, data = rouder2016, method = "r") repeated_measures_d(rt ~ cond  id, data = rouder2016, method = "d", adjust = FALSE) # d is the same as Cohen's d for two independent groups: cohens_d(rt ~ cond, data = rouder2016, ci = NULL)
# Paired data  data("sleep") sleep2 < reshape(sleep, direction = "wide", idvar = "ID", timevar = "group" ) repeated_measures_d(Pair(extra.1, extra.2) ~ 1, data = sleep2) # Same as: # repeated_measures_d(sleep$extra[sleep$group==1], # sleep$extra[sleep$group==2]) # repeated_measures_d(extra ~ group  ID, data = sleep) # More options: repeated_measures_d(Pair(extra.1, extra.2) ~ 1, data = sleep2, mu = 1) repeated_measures_d(Pair(extra.1, extra.2) ~ 1, data = sleep2, alternative = "less") # Other methods repeated_measures_d(Pair(extra.1, extra.2) ~ 1, data = sleep2, method = "av") repeated_measures_d(Pair(extra.1, extra.2) ~ 1, data = sleep2, method = "b") repeated_measures_d(Pair(extra.1, extra.2) ~ 1, data = sleep2, method = "d") repeated_measures_d(Pair(extra.1, extra.2) ~ 1, data = sleep2, method = "z", adjust = FALSE) # d_z is the same as Cohen's d for one sample (of individual difference): cohens_d(extra.1  extra.2 ~ 1, data = sleep2) # Repetition data  data("rouder2016") # For rm, ad, z, b, data is aggregated repeated_measures_d(rt ~ cond  id, data = rouder2016) # same as: rouder2016_wide < tapply(rouder2016[["rt"]], rouder2016[1:2], mean) repeated_measures_d(rouder2016_wide[, 1], rouder2016_wide[, 2]) # For r or d, data is not aggragated: repeated_measures_d(rt ~ cond  id, data = rouder2016, method = "r") repeated_measures_d(rt ~ cond  id, data = rouder2016, method = "d", adjust = FALSE) # d is the same as Cohen's d for two independent groups: cohens_d(rt ~ cond, data = rouder2016, ci = NULL)
A dataset "with 25 people each observing 50 trials in 2 conditions",
published as effectSizePuzzler.txt
by Jeff Rouder on March 24, 2016
(http://jeffrouder.blogspot.com/2016/03/theeffectsizepuzzler.html).
The data is used in examples and tests of rm_d()
.
A data frame with 2500 rows and 3 variables:
participant: 1...25
condition: 1,2
response time in seconds
data("rouder2016") head(rouder2016, n = 5) #> id cond rt #> 1 1 1 0.560 #> 2 1 1 0.930 #> 3 1 1 0.795 #> 4 1 1 0.615 #> 5 1 1 1.028
Other effect size datasets:
Music_preferences
,
Music_preferences2
,
RCT_table
,
Smoking_FASD
,
food_class
,
hardlyworking
,
screening_test
Create a container for interpretation rules of thumb. Usually used in conjunction with interpret.
rules(values, labels = NULL, name = NULL, right = TRUE) is.rules(x)
rules(values, labels = NULL, name = NULL, right = TRUE) is.rules(x)
values 
Vector of reference values (edges defining categories or critical values). 
labels 
Labels associated with each category. If 
name 
Name of the set of rules (will be printed). 
right 
logical, for thresholdtype rules, indicating if the thresholds themselves should be included in the interval to the right (lower values) or in the interval to the left (higher values). 
x 
An arbitrary R object. 
rules(c(0.05), c("significant", "not significant"), right = FALSE) rules(c(0.2, 0.5, 0.8), c("small", "medium", "large")) rules(c("small" = 0.2, "medium" = 0.5), name = "Cohen's Rules")
rules(c(0.05), c("significant", "not significant"), right = FALSE) rules(c(0.2, 0.5, 0.8), c("small", "medium", "large")) rules(c("small" = 0.2, "medium" = 0.5), name = "Cohen's Rules")
A sample (simulated) dataset, used in tests and some examples.
A data frame with 1600 rows and 3 variables:
Ground truth
Results given by the 1st test
Results given by the 2nd test
data("screening_test") head(screening_test, n = 5) #> Diagnosis Test1 Test2 #> 1 Neg "Neg" "Neg" #> 2 Neg "Neg" "Neg" #> 3 Neg "Neg" "Neg" #> 4 Neg "Neg" "Neg" #> 5 Neg "Neg" "Neg"
Other effect size datasets:
Music_preferences
,
Music_preferences2
,
RCT_table
,
Smoking_FASD
,
food_class
,
hardlyworking
,
rouder2016
The Pooled Standard Deviation is a weighted average of standard deviations for two or more groups, assumed to have equal variance. It represents the common deviation among the groups, around each of their respective means.
sd_pooled(x, y = NULL, data = NULL, verbose = TRUE, ...) mad_pooled(x, y = NULL, data = NULL, constant = 1.4826, verbose = TRUE, ...) cov_pooled(x, y = NULL, data = NULL, verbose = TRUE, ...)
sd_pooled(x, y = NULL, data = NULL, verbose = TRUE, ...) mad_pooled(x, y = NULL, data = NULL, constant = 1.4826, verbose = TRUE, ...) cov_pooled(x, y = NULL, data = NULL, verbose = TRUE, ...)
x , y

A numeric vector, or a character name of one in 
data 
An optional data frame containing the variables. 
verbose 
Toggle warnings and messages on or off. 
... 
Arguments passed to or from other methods. When 
constant 
scale factor. 
The standard version is calculated as:
$\sqrt{\frac{\sum (x_i  \bar{x})^2}{n_1 + n_2  2}}$
The robust version is calculated as:
$1.4826 \times Median(\left\{x  Median_x,\,y  Median_y\right\})$
Numeric, the pooled standard deviation. For cov_pooled()
a matrix.
sd_pooled(mpg ~ am, data = mtcars) mad_pooled(mtcars$mpg, factor(mtcars$am)) cov_pooled(mpg + hp + cyl ~ am, data = mtcars)
sd_pooled(mpg ~ am, data = mtcars) mad_pooled(mtcars$mpg, factor(mtcars$am)) cov_pooled(mpg + hp + cyl ~ am, data = mtcars)
Fictional data.
A 1by3 table, with a column for each diagnosis.
data("Smoking_FASD") Smoking_FASD #> FAS PFAS TD #> 17 11 640
Other effect size datasets:
Music_preferences
,
Music_preferences2
,
RCT_table
,
food_class
,
hardlyworking
,
rouder2016
,
screening_test
These functions are convenience functions to convert t, z and F test
statistics to Cohen's d and partial r. These are useful in cases where
the data required to compute these are not easily available or their
computation is not straightforward (e.g., in liner mixed models, contrasts,
etc.).
See Effect Size from Test Statistics vignette.
t_to_d(t, df_error, paired = FALSE, ci = 0.95, alternative = "two.sided", ...) z_to_d(z, n, paired = FALSE, ci = 0.95, alternative = "two.sided", ...) F_to_d( f, df, df_error, paired = FALSE, ci = 0.95, alternative = "two.sided", ... ) t_to_r(t, df_error, ci = 0.95, alternative = "two.sided", ...) z_to_r(z, n, ci = 0.95, alternative = "two.sided", ...) F_to_r(f, df, df_error, ci = 0.95, alternative = "two.sided", ...)
t_to_d(t, df_error, paired = FALSE, ci = 0.95, alternative = "two.sided", ...) z_to_d(z, n, paired = FALSE, ci = 0.95, alternative = "two.sided", ...) F_to_d( f, df, df_error, paired = FALSE, ci = 0.95, alternative = "two.sided", ... ) t_to_r(t, df_error, ci = 0.95, alternative = "two.sided", ...) z_to_r(z, n, ci = 0.95, alternative = "two.sided", ...) F_to_r(f, df, df_error, ci = 0.95, alternative = "two.sided", ...)
t , f , z

The t, the F or the z statistics. 
paired 
Should the estimate account for the tvalue being testing the difference between dependent means? 
ci 
Confidence Interval (CI) level 
alternative 
a character string specifying the alternative hypothesis;
Controls the type of CI returned: 
... 
Arguments passed to or from other methods. 
n 
The number of observations (the sample size). 
df , df_error

Degrees of freedom of numerator or of the error estimate (i.e., the residuals). 
These functions use the following formulae to approximate r and d:
$r_{partial} = t / \sqrt{t^2 + df_{error}}$
$r_{partial} = z / \sqrt{z^2 + N}$
$d = 2 * t / \sqrt{df_{error}}$
$d_z = t / \sqrt{df_{error}}$
$d = 2 * z / \sqrt{N}$
The resulting d
effect size is an approximation to Cohen's d, and
assumes two equal group sizes. When possible, it is advised to directly
estimate Cohen's d, with cohens_d()
, emmeans::eff_size()
, or similar
functions.
A data frame with the effect size(s)(r
or d
), and their CIs
(CI_low
and CI_high
).
Unless stated otherwise, confidence (compatibility) intervals (CIs) are
estimated using the noncentrality parameter method (also called the "pivot
method"). This method finds the noncentrality parameter ("ncp") of a
noncentral t, F, or $\chi^2$
distribution that places the observed
t, F, or $\chi^2$
test statistic at the desired probability point of
the distribution. For example, if the observed t statistic is 2.0, with 50
degrees of freedom, for which cumulative noncentral t distribution is t =
2.0 the .025 quantile (answer: the noncentral t distribution with ncp =
.04)? After estimating these confidence bounds on the ncp, they are
converted into the effect size metric to obtain a confidence interval for the
effect size (Steiger, 2004).
For additional details on estimation and troubleshooting, see effectsize_CIs.
"Confidence intervals on measures of effect size convey all the information
in a hypothesis test, and more." (Steiger, 2004). Confidence (compatibility)
intervals and p values are complementary summaries of parameter uncertainty
given the observed data. A dichotomous hypothesis test could be performed
with either a CI or a p value. The 100 (1  $\alpha$
)% confidence
interval contains all of the parameter values for which p > $\alpha$
for the current data and model. For example, a 95% confidence interval
contains all of the values for which p > .05.
Note that a confidence interval including 0 does not indicate that the null
(no effect) is true. Rather, it suggests that the observed data together with
the model and its assumptions combined do not provided clear evidence against
a parameter value of 0 (same as with any other value in the interval), with
the level of this evidence defined by the chosen $\alpha$
level (Rafi &
Greenland, 2020; Schweder & Hjort, 2016; Xie & Singh, 2013). To infer no
effect, additional judgments about what parameter values are "close enough"
to 0 to be negligible are needed ("equivalence testing"; Bauer & Kiesser,
1996).
see
The see
package contains relevant plotting functions. See the plotting vignette in the see
package.
Friedman, H. (1982). Simplified determinations of statistical power, magnitude of effect and research sample sizes. Educational and Psychological Measurement, 42(2), 521526. doi:10.1177/001316448204200214
Wolf, F. M. (1986). Metaanalysis: Quantitative methods for research synthesis (Vol. 59). Sage.
Rosenthal, R. (1994) Parametric measures of effect size. In H. Cooper and L.V. Hedges (Eds.). The handbook of research synthesis. New York: Russell Sage Foundation.
Steiger, J. H. (2004). Beyond the F test: Effect size confidence intervals and tests of close fit in the analysis of variance and contrast analysis. Psychological Methods, 9, 164182.
Cumming, G., & Finch, S. (2001). A primer on the understanding, use, and calculation of confidence intervals that are based on central and noncentral distributions. Educational and Psychological Measurement, 61(4), 532574.
Other effect size from test statistic:
F_to_eta2()
,
chisq_to_phi()
## t Tests res < t.test(1:10, y = c(7:20), var.equal = TRUE) t_to_d(t = res$statistic, res$parameter) t_to_r(t = res$statistic, res$parameter) t_to_r(t = res$statistic, res$parameter, alternative = "less") res < with(sleep, t.test(extra[group == 1], extra[group == 2], paired = TRUE)) t_to_d(t = res$statistic, res$parameter, paired = TRUE) t_to_r(t = res$statistic, res$parameter) t_to_r(t = res$statistic, res$parameter, alternative = "greater") ## Linear Regression model < lm(rating ~ complaints + critical, data = attitude) (param_tab < parameters::model_parameters(model)) (rs < t_to_r(param_tab$t[2:3], param_tab$df_error[2:3])) # How does this compare to actual partial correlations? correlation::correlation(attitude, select = "rating", select2 = c("complaints", "critical"), partial = TRUE )
## t Tests res < t.test(1:10, y = c(7:20), var.equal = TRUE) t_to_d(t = res$statistic, res$parameter) t_to_r(t = res$statistic, res$parameter) t_to_r(t = res$statistic, res$parameter, alternative = "less") res < with(sleep, t.test(extra[group == 1], extra[group == 2], paired = TRUE)) t_to_d(t = res$statistic, res$parameter, paired = TRUE) t_to_r(t = res$statistic, res$parameter) t_to_r(t = res$statistic, res$parameter, alternative = "greater") ## Linear Regression model < lm(rating ~ complaints + critical, data = attitude) (param_tab < parameters::model_parameters(model)) (rs < t_to_r(param_tab$t[2:3], param_tab$df_error[2:3])) # How does this compare to actual partial correlations? correlation::correlation(attitude, select = "rating", select2 = c("complaints", "critical"), partial = TRUE )
Enables a conversion between different indices of effect size, such as Cohen's w to פ (Fei), and Cramer's V to Tschuprow's T.
w_to_fei(w, p) w_to_v(w, nrow, ncol) w_to_t(w, nrow, ncol) w_to_c(w) fei_to_w(fei, p) v_to_w(v, nrow, ncol) t_to_w(t, nrow, ncol) c_to_w(c) v_to_t(v, nrow, ncol) t_to_v(t, nrow, ncol)
w_to_fei(w, p) w_to_v(w, nrow, ncol) w_to_t(w, nrow, ncol) w_to_c(w) fei_to_w(fei, p) v_to_w(v, nrow, ncol) t_to_w(t, nrow, ncol) c_to_w(c) v_to_t(v, nrow, ncol) t_to_v(t, nrow, ncol)
w , c , v , t , fei

Effect size to be converted 
p 
Vector of expected values. See 
nrow , ncol

The number of rows/columns in the contingency table. 
BenShachar, M.S., Patil, I., Thériault, R., Wiernik, B.M., Lüdecke, D. (2023). Phi, Fei, Fo, Fum: Effect Sizes for Categorical Data That Use the Chi‑Squared Statistic. Mathematics, 11, 1982. doi:10.3390/math11091982
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd Ed.). New York: Routledge.
Other convert between effect sizes:
d_to_r()
,
diff_to_cles
,
eta2_to_f2()
,
odds_to_probs()
,
oddsratio_to_riskratio()
library(effectsize) ## 2D tables ##  data("Music_preferences2") Music_preferences2 cramers_v(Music_preferences2, adjust = FALSE) v_to_t(0.80, 3, 4) tschuprows_t(Music_preferences2) ## Goodness of fit ##  data("Smoking_FASD") Smoking_FASD cohens_w(Smoking_FASD, p = c(0.015, 0.010, 0.975)) w_to_fei(0.11, p = c(0.015, 0.010, 0.975)) fei(Smoking_FASD, p = c(0.015, 0.010, 0.975)) ## Power analysis ##  # See https://osf.io/cg64s/ p0 < c(0.35, 0.65) Fei < 0.3 pwr::pwr.chisq.test( w = fei_to_w(Fei, p = p0), df = length(p0)  1, sig.level = 0.01, power = 0.85 )
library(effectsize) ## 2D tables ##  data("Music_preferences2") Music_preferences2 cramers_v(Music_preferences2, adjust = FALSE) v_to_t(0.80, 3, 4) tschuprows_t(Music_preferences2) ## Goodness of fit ##  data("Smoking_FASD") Smoking_FASD cohens_w(Smoking_FASD, p = c(0.015, 0.010, 0.975)) w_to_fei(0.11, p = c(0.015, 0.010, 0.975)) fei(Smoking_FASD, p = c(0.015, 0.010, 0.975)) ## Power analysis ##  # See https://osf.io/cg64s/ p0 < c(0.35, 0.65) Fei < 0.3 pwr::pwr.chisq.test( w = fei_to_w(Fei, p = p0), df = length(p0)  1, sig.level = 0.01, power = 0.85 )