--- title: "Multilevel Correlations" output: rmarkdown::html_vignette: toc: true fig_width: 10.08 fig_height: 6 tags: [r, correlation, types] vignette: > %\VignetteIndexEntry{Multilevel Correlations} \usepackage[utf8]{inputenc} %\VignetteEngine{knitr::rmarkdown} editor_options: chunk_output_type: console bibliography: bibliography.bib --- --- This vignette can be cited as: ```{r cite} citation("correlation") ``` --- ```{r, include=FALSE} library(knitr) options( knitr.kable.NA = "", digits = 2, out.width = "100%", message = FALSE, warning = FALSE, dpi = 450 ) if (!requireNamespace("ggplot2", quietly = TRUE) || !requireNamespace("BayesFactor", quietly = TRUE) || !requireNamespace("lme4", quietly = TRUE)) { knitr::opts_chunk$set(eval = FALSE) } ``` ## Data Imagine we have an experiment in which **10 individuals** completed a task with **100 trials**. For each of the 1000 trials (10 * 100) in total, we measured two things, **V1** and **V2**, and we are interested in **investigating the link between these two variables**. We will generate data using the [`simulate_simpson()`](https://easystats.github.io/bayestestR/reference/simulate_simpson.html) function from this package and look at its summary: ```{r} library(correlation) data <- simulate_simpson(n = 100, groups = 10) summary(data) ``` Now let's visualize the two variables: ```{r} library(ggplot2) ggplot(data, aes(x = V1, y = V2)) + geom_point() + geom_smooth(colour = "black", method = "lm", se = FALSE) + theme_classic() ``` That seems pretty straightforward! It seems like there is a **negative correlation** between V1 and V2. Let's test this. ## Simple correlation ```{r} correlation(data) ``` Indeed, there is a **strong, negative and significant correlation** between V1 and V2. Great, can we go ahead and **publish these results in _PNAS_**? ## The Simpson's Paradox Not so fast! Ever heard of the [**Simpson's Paradox**](https://en.wikipedia.org/wiki/Simpson%27s_paradox)? Let's colour our datapoints by group (by individuals): ```{r} library(ggplot2) ggplot(data, aes(x = V1, y = V2)) + geom_point(aes(colour = Group)) + geom_smooth(aes(colour = Group), method = "lm", se = FALSE) + geom_smooth(colour = "black", method = "lm", se = FALSE) + theme_classic() ``` Mmh, interesting. It seems like, for each subject, the relationship is different. The (global) negative trend seems to be an artifact of **differences between the groups** and could be spurious! **Multilevel *(as in multi-group)* ** correlations allow us to account for **differences between groups**. It is based on a partialization of the group, entered as a random effect in a mixed linear regression. You can compute them with the [**correlations**](https://github.com/easystats/correlation) package by setting the `multilevel` argument to `TRUE`. ```{r} correlation(data, multilevel = TRUE) ``` For completeness, let's also see if its Bayesian cousin agrees with it: ```{r} correlation(data, multilevel = TRUE, bayesian = TRUE) ``` **Dayum!** We were too hasty in our conclusions! Taking the group into account seems to be super important. _Note_: In this simple case where only two variables are of interest, it would be of course best to directly proceed using a mixed regression model instead of correlations. That being said, the latter can be useful for exploratory analysis, when multiple variables are of interest, or in combination with a network or structural approach.