| Title: | Estimation of Model-Based Predictions, Contrasts and Means |
|---|---|
| Description: | Implements a general interface for model-based estimations for a wide variety of models, used in the computation of marginal means, contrast analysis and predictions. For a list of supported models, see 'insight::supported_models()'. |
| Authors: | Dominique Makowski [aut, cre] (ORCID: <https://orcid.org/0000-0001-5375-9967>), Daniel Lüdecke [aut] (ORCID: <https://orcid.org/0000-0002-8895-3206>), Mattan S. Ben-Shachar [aut] (ORCID: <https://orcid.org/0000-0002-4287-4801>), Indrajeet Patil [aut] (ORCID: <https://orcid.org/0000-0003-1995-6531>), Rémi Thériault [aut] (ORCID: <https://orcid.org/0000-0003-4315-6788>) |
| Maintainer: | Dominique Makowski <[email protected]> |
| License: | GPL-3 |
| Version: | 0.15.0.3 |
| Built: | 2026-06-03 18:39:03 UTC |
| Source: | https://github.com/easystats/modelbased |
as.data.frame() method for modelbased objects. Can be used to return
a "raw" data frame without attributes and with standardized column names.
By default, the original column names are preserved, to avoid unexpected
changes, but this can be changed with the preserve_names argument.
## S3 method for class 'estimate_contrasts' as.data.frame( x, row.names = NULL, optional = FALSE, ..., stringsAsFactors = FALSE, use_responsename = FALSE, preserve_names = TRUE )## S3 method for class 'estimate_contrasts' as.data.frame( x, row.names = NULL, optional = FALSE, ..., stringsAsFactors = FALSE, use_responsename = FALSE, preserve_names = TRUE )
x |
An object returned by the different |
row.names |
|
optional |
logical. If |
... |
Arguments passed to the |
stringsAsFactors |
logical: should the character vector be converted to a factor? |
use_responsename |
Logical, if |
preserve_names |
Logical, if |
A data frame.
model <- lm(Petal.Length ~ Species, data = iris) out <- estimate_means(model, "Species") # default print(out) as.data.frame(out) as.data.frame(out, preserve_names = FALSE) as.data.frame(out, preserve_names = FALSE, use_responsename = TRUE)model <- lm(Petal.Length ~ Species, data = iris) out <- estimate_means(model, "Species") # default print(out) as.data.frame(out) as.data.frame(out, preserve_names = FALSE) as.data.frame(out, preserve_names = FALSE, use_responsename = TRUE)
A sample data set from a course about the analysis of factorial designs, by Mattan S. Ben-Shachar. See following link for more information: https://github.com/mattansb/Analysis-of-Factorial-Designs-foR-Psychologists
The data consists of five variables from 120 observations:
ID: A unique identifier for each participant
sex: The participant's sex
time: The time of day the participant was tested (morning, noon, or afternoon)
coffee: Group indicator, whether participant drank coffee or not
("coffee" or "control").
alertness: The participant's alertness score.
This function extracts the raw data points (i.e. the data
that was used to fit the model) and "averages" (i.e. "collapses") the
response variable over the levels of the grouping factor given in
collapse_by. Only works with mixed models.
collapse_by_group(grid, model, collapse_by = NULL, residuals = FALSE)collapse_by_group(grid, model, collapse_by = NULL, residuals = FALSE)
grid |
A data frame representing the data grid, or an object of class
|
model |
The model for which to compute partial residuals. The data grid
|
collapse_by |
Name of the (random effects) grouping factor. Data is collapsed by the levels of this factor. |
residuals |
Logical, if |
A data frame with raw data points, averaged over the levels of
the given grouping factor from the random effects. The group level of
the random effect is saved in the column "random".
data(efc, package = "modelbased") efc$e15relat <- as.factor(efc$e15relat) efc$c161sex <- as.factor(efc$c161sex) levels(efc$c161sex) <- c("male", "female") model <- lme4::lmer(neg_c_7 ~ c161sex + (1 | e15relat), data = efc) me <- estimate_means(model, "c161sex") head(efc) collapse_by_group(me, model, "e15relat")data(efc, package = "modelbased") efc$e15relat <- as.factor(efc$e15relat) efc$c161sex <- as.factor(efc$c161sex) levels(efc$c161sex) <- c("male", "female") model <- lme4::lmer(neg_c_7 ~ c161sex + (1 | e15relat), data = efc) me <- estimate_means(model, "c161sex") head(efc) collapse_by_group(me, model, "e15relat")
This function summarises the smooth term trend in terms of linear segments. Using the approximate derivative, it separates a non-linear vector into quasi-linear segments (in which the trend is either positive or negative). Each of this segment its characterized by its beginning, end, size (in proportion, relative to the total size) trend (the linear regression coefficient) and linearity (the R2 of the linear regression).
describe_nonlinear(data, ...) ## S3 method for class 'data.frame' describe_nonlinear(data, x = NULL, y = NULL, ...) estimate_smooth(data, ...)describe_nonlinear(data, ...) ## S3 method for class 'data.frame' describe_nonlinear(data, x = NULL, y = NULL, ...) estimate_smooth(data, ...)
data |
The data containing the link, as for instance obtained by
|
... |
Other arguments to be passed to or from. |
x, y
|
The name of the responses variable ( |
A data frame of linear description of non-linear terms.
# Create data data <- data.frame(x = rnorm(200)) data$y <- data$x^2 + rnorm(200, 0, 0.5) model <<- lm(y ~ poly(x, 2), data = data) link_data <- estimate_relation(model, length = 100) describe_nonlinear(link_data, x = "x")# Create data data <- data.frame(x = rnorm(200)) data$y <- data$x^2 + rnorm(200, 0, 0.5) model <<- lm(y ~ poly(x, 2), data = data) link_data <- estimate_relation(model, length = 100) describe_nonlinear(link_data, x = "x")
print() method for modelbased objects. Can be used to tweak the output
of tables.
## S3 method for class 'estimate_contrasts' display( object, select = NULL, include_grid = NULL, full_labels = NULL, format = "markdown", ... ) ## S3 method for class 'estimate_contrasts' format( x, format = NULL, select = getOption("modelbased_select", NULL), include_grid = getOption("modelbased_include_grid", FALSE), ... ) ## S3 method for class 'estimate_contrasts' print(x, select = NULL, include_grid = NULL, full_labels = NULL, ...)## S3 method for class 'estimate_contrasts' display( object, select = NULL, include_grid = NULL, full_labels = NULL, format = "markdown", ... ) ## S3 method for class 'estimate_contrasts' format( x, format = NULL, select = getOption("modelbased_select", NULL), include_grid = getOption("modelbased_include_grid", FALSE), ... ) ## S3 method for class 'estimate_contrasts' print(x, select = NULL, include_grid = NULL, full_labels = NULL, ...)
select |
Determines which columns are printed and the table layout. There are two options for this argument:
Using Note: glue-like syntax is still experimental in the case of more complex models (like mixed models) and may not return expected results. |
include_grid |
Logical, if |
full_labels |
Logical, if |
format |
String, indicating the output format. Can be |
... |
Arguments passed to |
x, object
|
An object returned by the different |
Invisibly returns x.
Columns and table layout can be customized using options():
modelbased_select: options(modelbased_select = <string>) will set a
default value for the select argument and can be used to define a custom
default layout for printing.
modelbased_include_grid: options(modelbased_include_grid = TRUE) will
set a default value for the include_grid argument and can be used to
include data grids in the output by default or not.
modelbased_full_labels: options(modelbased_full_labels = FALSE) will
remove redundant (duplicated) labels from rows.
easystats_display_format: options(easystats_display_format = <value>)
will set the default format for the display() methods. Can be one of
"markdown", "html", or "tt".
Use print_html() and print_md() to create tables in HTML or
markdown format, respectively.
model <- lm(Petal.Length ~ Species, data = iris) out <- estimate_means(model, "Species") # default print(out) # smaller set of columns print(out, select = "minimal") # remove redundant labels data(efc, package = "modelbased") efc <- datawizard::to_factor(efc, c("c161sex", "c172code", "e16sex")) levels(efc$c172code) <- c("low", "mid", "high") fit <- lm(neg_c_7 ~ c161sex * c172code * e16sex, data = efc) out <- estimate_means(fit, c("c161sex", "c172code", "e16sex")) print(out, full_labels = FALSE, select = "{estimate} ({se})")model <- lm(Petal.Length ~ Species, data = iris) out <- estimate_means(model, "Species") # default print(out) # smaller set of columns print(out, select = "minimal") # remove redundant labels data(efc, package = "modelbased") efc <- datawizard::to_factor(efc, c("c161sex", "c172code", "e16sex")) levels(efc$c172code) <- c("low", "mid", "high") fit <- lm(neg_c_7 ~ c161sex * c172code * e16sex, data = efc) out <- estimate_means(fit, c("c161sex", "c172code", "e16sex")) print(out, full_labels = FALSE, select = "{estimate} ({se})")
Selected variables from the EUROFAMCARE survey. Useful when testing on "real-life" data sets, including random missing values. This data set also has value and variable label attributes.
Run a contrast analysis by estimating the differences between each level of a
factor. See also other related functions such as estimate_means()
and estimate_slopes().
estimate_contrasts(model, ...) ## Default S3 method: estimate_contrasts( model, contrast = NULL, by = NULL, predict = NULL, ci = 0.95, comparison = "pairwise", estimate = NULL, p_adjust = "none", transform = NULL, post_process = NULL, keep_iterations = FALSE, effectsize = NULL, iterations = 200, es_type = NULL, backend = NULL, verbose = TRUE, ... )estimate_contrasts(model, ...) ## Default S3 method: estimate_contrasts( model, contrast = NULL, by = NULL, predict = NULL, ci = 0.95, comparison = "pairwise", estimate = NULL, p_adjust = "none", transform = NULL, post_process = NULL, keep_iterations = FALSE, effectsize = NULL, iterations = 200, es_type = NULL, backend = NULL, verbose = TRUE, ... )
model |
A statistical model. |
... |
Other arguments passed, for instance, to
|
contrast |
A character vector indicating the name of the variable(s) for
which to compute the contrasts, optionally including representative values or
levels at which contrasts are evaluated (e.g., |
by |
The (focal) predictor variable(s) at which to evaluate the desired
effect / mean / contrasts. Other predictors of the model that are not
included here will be collapsed and "averaged" over (the effect will be
estimated across them). |
predict |
Is passed to the
See also section Predictions on different scales. |
ci |
Confidence Interval (CI) level. Default to |
comparison |
Specify the type of contrasts or tests that should be carried out.
|
estimate |
The
You can set a default option for the Note following limitations:
|
p_adjust |
The p-values adjustment method for frequentist multiple
comparisons. Can be one of |
transform |
A function applied to predictions and confidence intervals
to (back-) transform results, which can be useful in case the regression
model has a transformed response variable (e.g., |
post_process |
Optional formula, character string or function (see
|
keep_iterations |
If |
effectsize |
Desired measure of standardized effect size, one of
|
iterations |
The number of bootstrap resamples to perform. |
es_type |
Specifies the type of effect-size measure to estimate when
using |
backend |
Whether to use Another difference is that You can set a default backend via |
verbose |
Use |
The estimate_slopes(), estimate_means() and estimate_contrasts()
functions are forming a group, as they are all based on marginal
estimations (estimations based on a model). All three are built on the
emmeans or marginaleffects package (depending on the backend
argument), so reading its documentation (for instance emmeans::emmeans(),
emmeans::emtrends() or this website) is
recommended to understand the idea behind these types of procedures.
Model-based predictions is the basis for all that follows. Indeed, the
first thing to understand is how models can be used to make predictions
(see estimate_relation()). This corresponds to the predicted response (or
"outcome variable") given specific predictor values of the predictors
(i.e., given a specific data configuration). This is why the concept of
the reference grid is so important for direct
predictions.
Marginal "means", obtained via estimate_means(), are an extension of
such predictions, allowing to "average" (collapse) some of the predictors,
to obtain the average response value at a specific predictors
configuration. This is typically used when some of the predictors of
interest are factors. Indeed, the parameters of the model will usually give
you the intercept value and then the "effect" of each factor level (how
different it is from the intercept). Marginal means can be used to directly
give you the mean value of the response variable at all the levels of a
factor. Moreover, it can also be used to control, or average over
predictors, which is useful in the case of multiple predictors with or
without interactions.
Marginal contrasts, obtained via estimate_contrasts(), are themselves
at extension of marginal means, in that they allow to investigate the
difference (i.e., the contrast) between the marginal means. This is, again,
often used to get all pairwise differences between all levels of a factor.
It works also for continuous predictors, for instance one could also be
interested in whether the difference at two extremes of a continuous
predictor is significant.
Finally, marginal effects, obtained via estimate_slopes(), are
different in that their focus is not values on the response variable, but
the model's parameters. The idea is to assess the effect of a predictor at
a specific configuration of the other predictors. This is relevant in the
case of interactions or non-linear relationships, when the effect of a
predictor variable changes depending on the other predictors. Moreover,
these effects can also be "averaged" over other predictors, to get for
instance the "general trend" of a predictor over different factor levels.
Example: Let's imagine the following model lm(y ~ condition * x) where
condition is a factor with 3 levels A, B and C and x a continuous
variable (like age for example). One idea is to see how this model performs,
and compare the actual response y to the one predicted by the model (using
estimate_expectation()). Another idea is evaluate the average mean at each of
the condition's levels (using estimate_means()), which can be useful to
visualize them. Another possibility is to evaluate the difference between
these levels (using estimate_contrasts()). Finally, one could also estimate
the effect of x averaged over all conditions, or instead within each
condition (using estimate_slopes()).
A data frame of estimated contrasts.
comparison = "pairwise": This method computes all possible unique
differences between pairs of levels of the focal predictor. For example, if
a factor has levels A, B, and C, it would compute A-B, A-C, and B-C.
comparison = "reference": This compares each level of the focal predictor
to a specified reference level (by default, the first level). For example,
if levels are A, B, C, and A is the reference, it computes B-A and C-A.
comparison = "sequential": This compares each level to the one
immediately following it in the factor's order. For levels A, B, C, it
would compute B-A and C-B.
comparison = "meandev": This contrasts each level's estimate against the
grand mean of all estimates for the focal predictor.
comparison = "meanotherdev": Similar to meandev, but each level's
estimate is compared against the mean of all other levels, excluding
itself.
comparison = "poly": These are used for ordered categorical variables to
test for linear, quadratic, cubic, etc., trends across the levels. They
assume equal spacing between levels.
comparison = "helmert": Contrast 2nd level to the first, 3rd to the
average of the first two, and so on. Each level (except the first) is
compared to the mean of the preceding levels. For levels A, B, C, it would
compute B-A and C-(A+B)/2.
comparison = "trt_vs_ctrl": This compares all levels (excluding the
first, which is typically the control) against the first level. It's often
used when comparing multiple treatment groups to a single control group.
To test multiple hypotheses jointly (usually used for factorial designs),
comparison can also be "joint". In this case, use the test argument
to specify which test should be conducted: "F" (default) or "Chi2".
comparison = "inequality" computes the absolute inequality of groups,
or in other words, the marginal effect inequality summary of categorical
predictors' overall effects, respectively, the comprehensive effect of an
independent variable across all outcome categories of a nominal or ordinal
dependent variable (total marginal effect, see Mize and Han, 2025). The
marginal effect inequality focuses on the heterogeneity of the effects of a
categorical independent variable. It helps understand how the effect of
the variable differs across its categories or levels. When the dependent
variable is categorical (e.g., logistic, ordinal or multinomial
regression), marginal effect inequality provides a holistic view of how an
independent variable affects a nominal or ordinal dependent variable. It
summarizes the overall impact (absolute inequality, or total marginal
effects) across all possible outcome categories.
comparison = "inequality_ratio" is comparable to
comparison = "inequality", but instead of calculating the absolute
inequality, it computes the relative inequality of groups. This is useful
to compare the relative effects of different predictors on the dependent
variable. It provides a measure of how much more or less inequality one
predictor has
compared to another.
comparison = "inequality_pairwise" computes pairwise differences of
absolute inequality measures, while "inequality_ratio_pairwise" computes
pairwise differences of relative inequality measures (ratios). Depending on
the sign, this measure indicates which of the predictors has a stronger
impact on the dependent variable in terms of inequalities.
Examples for analysing inequalities are shown in the related vignette.
Calculating contrasts between average slopes can tell us about the
"context" effects when modelling within- and between-effects. An example
for within- and between effects is described
in this vignette.
A context effect describes the additional influence that the social or
regional environment (e.g., place of residence) has on an individual,
independent of their personal characteristics. It demonstrates that people
with identical individual circumstances (such as the same income) face
different opportunities or risks depending on the environment in which they
live. This can be achieved by specifying both numeric predictors in the
contrast argument, e.g. contrast = c("x_within", "x_between"). It is
also possible to stratify context effects at different levels of another
variable using by. If pairwise comparisons of context effects (i.e., the
pairwise comparisons of the difference of between within- and
between-effects, or the difference of average slopes) at different levels
of another variable are required, add that variable to the contrast
argument instead, e.g. contrast = c("x_within", "x_between", "factor").
These contrasts can additionally be stratified by another variable using by
again, e.g. contrast = c("x_within", "x_between", "factor1"), by = "factor2".
Note that when average slopes are contrasted, the comparison argument has
no effect and is always set to "pairwise". See also 'Examples'.
By default, estimate_contrasts() reports no standardized effect size on
purpose. Should one request one, some things are to keep in mind. As the
authors of emmeans write, "There is substantial disagreement among
practitioners on what is the appropriate sigma to use in computing effect
sizes; or, indeed, whether any effect-size measure is appropriate for some
situations. The user is completely responsible for specifying appropriate
parameters (or for failing to do so)."
In particular, effect size method "boot" does not correct for covariates
in the model, so should probably only be used when there is just one
categorical predictor (with however many levels). Some believe that if there
are multiple predictors or any covariates, it is important to re-compute
sigma adding back in the response variance associated with the variables that
aren't part of the contrast.
effectsize = "emmeans" uses emmeans::eff_size with
sigma = stats::sigma(model), edf = stats::df.residual(model) and
method = "identity". This standardizes using the MSE (sigma). Some believe
this works when the contrasts are the only predictors in the model, but not
when there are covariates. The response variance accounted for by the
covariates should not be removed from the SD used to standardize. Otherwise,
d will be overestimated.
effectsize = "marginal" uses the following formula to compute effect
size: d_adj <- difference * (1- R2)/ sigma. This standardizes
using the response SD with only the between-groups variance on the focal
factor/contrast removed. This allows for groups to be equated on their
covariates, but creates an appropriate scale for standardizing the response.
effectsize = "boot" uses bootstrapping (defaults to a low value of
200) through bootES::bootES. Adjusts for contrasts, but not for covariates.
To define representative values for focal predictors (specified in by,
contrast, and slope), you can use several methods. These values are
internally generated by insight::get_datagrid(), so consult its
documentation for more details.
You can directly specify values as strings or lists for by, contrast,
and slope.
For numeric focal predictors, use examples like by = "gear = c(4, 8)",
by = list(gear = c(4, 8)) or by = "gear = 5:10"
For factor or character predictors, use by = "Species = c('setosa', 'virginica')"
or by = list(Species = c('setosa', 'virginica'))
You can use "shortcuts" within square brackets, such as by = "Sepal.Width = [sd]"
or by = "Sepal.Width = [fivenum]"
For numeric focal predictors, if no representative values are specified
(i.e., by = "gear" and not by = "gear = c(4, 8)"), length and
range control the number and type of representative values for the focal
predictors:
length determines how many equally spaced values are generated.
range specifies the type of values, like "range" or "sd".
length and range apply to all numeric focal predictors.
If you have multiple numeric predictors, length and range can accept
multiple elements, one for each predictor (see 'Examples').
For integer variables, only values that appear in the data will be included
in the data grid, independent from the length argument. This behaviour
can be changed by setting protect_integers = FALSE, which will then treat
integer variables as numerics (and possibly produce fractions).
See also this vignette
for some examples, and insight::get_datagrid() to learn more about how to
create data grids for predictors of interest.
The predict argument allows to generate predictions on different scales of
the response variable. The "link" option does not apply to all models, and
usually not to Gaussian models. "link" will leave the values on scale of
the linear predictors. "response" (or NULL) will transform them on scale
of the response variable. Thus for a logistic model, "link" will give
estimations expressed in log-odds (probabilities on logit scale) and
"response" in terms of probabilities.
To predict distributional parameters (called "dpar" in other packages), for
instance when using complex formulae in brms models, the predict argument
can take the value of the parameter you want to estimate, for instance
"sigma", "kappa", etc.
"response" and "inverse_link" both return predictions on the response
scale, however, "response" first calculates predictions on the response
scale for each observation and then aggregates them by groups or levels
defined in by. "inverse_link" first calculates predictions on the link
scale for each observation, then aggregates them by groups or levels defined
in by, and finally back-transforms the predictions to the response scale.
Both approaches have advantages and disadvantages. "response" usually
produces less biased predictions, but confidence intervals might be outside
reasonable bounds (i.e., for instance can be negative for count data). The
"inverse_link" approach is more robust in terms of confidence intervals,
but might produce biased predictions. However, you can try to set
bias_correction = TRUE, to adjust for this bias.
In particular for mixed models, using "response" is recommended, because
averaging across random effects groups is then more accurate.
Mize, T., & Han, B. (2025). Inequality and Total Effect Summary Measures for Nominal and Ordinal Variables. Sociological Science, 12, 115–157. doi:10.15195/v12.a7
Montiel Olea, J. L., and Plagborg-Møller, M. (2019). Simultaneous confidence bands: Theory, implementation, and an application to SVARs. Journal of Applied Econometrics, 34(1), 1–17. doi:10.1002/jae.2656
## Not run: # Basic usage -------------------------------- # -------------------------------------------- model <- lm(Sepal.Width ~ Species, data = iris) estimate_contrasts(model) # Dealing with interactions model <- lm(Sepal.Width ~ Species * Petal.Width, data = iris) # By default: selects first factor estimate_contrasts(model) # Can also run contrasts between points of numeric, stratified by "Species" estimate_contrasts(model, contrast = "Petal.Width", by = "Species") # Or both estimate_contrasts(model, contrast = c("Species", "Petal.Width"), length = 2) # Or with custom specifications estimate_contrasts(model, contrast = c("Species", "Petal.Width = c(1, 2)")) # Or modulate it estimate_contrasts(model, by = "Petal.Width", length = 4) # Standardized differences estimated <- estimate_contrasts(lm(Sepal.Width ~ Species, data = iris)) standardize(estimated) # contrasts of slopes ------------------------ # -------------------------------------------- data(qol_cancer, package = "parameters") qol_cancer$ID <- as.numeric(qol_cancer$ID) qol_cancer$grp <- as.factor(ifelse(qol_cancer$ID < 100, "Group 1", "Group 2")) model <- lm(QoL ~ time * education * grp, data = qol_cancer) # "time" only has integer values and few values, so it's treated like a factor estimate_contrasts(model, "time", by = "education") # Setting `integer_as_continuous = TRUE` treats "time" as a continuous # variable. This allows us to compare its average slope rather than making # discrete pairwise comparisons at each time point. estimate_contrasts( model, contrast = "time", by = "education", integer_as_continuous = TRUE ) # pairwise comparisons for multiple groups estimate_contrasts( model, "time", by = c("education", "grp"), integer_as_continuous = TRUE ) # if we want pairwise comparisons only for one factor, but group by another, # we need the formula specification and define the grouping variable after # the vertical bar estimate_contrasts( model, "time", by = c("education", "grp"), comparison = ~pairwise | grp, integer_as_continuous = TRUE ) # custom factor contrasts - contrasts the average effects of two levels # against the remaining third level # --------------------------------------------------------------------- data(puppy_love, package = "modelbased") cond_tx <- cbind("no treatment" = c(1, 0, 0), "treatment" = c(0, 0.5, 0.5)) model <- lm(happiness ~ puppy_love * dose, data = puppy_love) estimate_slopes(model, "puppy_love", by = "dose", comparison = cond_tx) # Note: for the emmeans-backend, we need to use `estimate_contrasts()` for # the above example: cond_tx <- list(`no treatment` = c(1, 0, 0), treatment = c(0, 0.5, 0.5)) estimate_contrasts( model, contrast = "puppy_love", by = "dose", comparison = cond_tx, backend = "emmeans" ) # Other models (mixed, Bayesian, ...) -------- # -------------------------------------------- data <- iris data$Petal.Length_factor <- ifelse(data$Petal.Length < 4.2, "A", "B") model <- lme4::lmer(Sepal.Width ~ Species + (1 | Petal.Length_factor), data = data) estimate_contrasts(model) data <- mtcars data$cyl <- as.factor(data$cyl) data$am <- as.factor(data$am) model <- rstanarm::stan_glm(mpg ~ cyl * wt, data = data, refresh = 0) estimate_contrasts(model) estimate_contrasts(model, by = "wt", length = 4) model <- rstanarm::stan_glm( Sepal.Width ~ Species + Petal.Width + Petal.Length, data = iris, refresh = 0 ) estimate_contrasts(model, by = "Petal.Length = [sd]", test = "bf") # Context effects -------------------------------------------- # This is the difference of within- and between-effects, which # typically are two average slopes that are compared. It is # possible to calculate the context effect at different levels # of another variable. # ------------------------------------------------------------ data("qol_cancer", package = "parameters") qol_cancer <- datawizard::demean(qol_cancer, select = "phq4", by = "ID") model <- lme4::lmer( QoL ~ time * (phq4_within + phq4_between) + (1 + time | ID), data = qol_cancer ) # context effect (difference between within- and between-effect) # at each time point - we calculate the contrast of two average slopes # at different levels of "time" estimate_contrasts(model, c("phq4_within", "phq4_between"), by = "time") # is the trend of the context effect across time points statistically # significant? In this case, we just want the contrasts of the overall # average slopes (not stratfied nor contrasted by time). estimate_contrasts(model, c("phq4_within", "phq4_between")) # now we ask whether contexts effects are different for different educational # levels. We now need to model a 3-way interaction between time, education # and the centered phq4 variables. model <- lme4::lmer( QoL ~ time * education * (phq4_within + phq4_between) + (1 + time | ID), data = qol_cancer ) # how do time trends of context effects differ between education levels? estimate_contrasts(model, c("phq4_within", "phq4_between"), by = "education") # are differences in time trends of context effects statistically significant # between education levels? estimate_contrasts(model, c("phq4_within", "phq4_between", "education")) # Post-processing of multiple comparisons --------------------- # Caution! Don't expect this example to be meaningful! # It is # just to demonstrate the usage of the `post_process` argument. # ------------------------------------------------------------- data("qol_cancer", package = "parameters") model <- lme4::lmer(QoL ~ time * education + (1 + time | ID), data = qol_cancer) # contrasts (pairwise comparisons) by timepoints - the default estimate_contrasts(model, "education=c('low', 'mid')", by = "time") # contrasts (pairwise comparisons) by timepoints, the default for `comparison` # additionally, we compare the differences of these contrasts across timepoints # against the reference time point (contrasts at times 2 and 3 against # contrasts at time 1) estimate_contrasts(model, contrasts = "education=c('low', 'mid')", by = "time", post_process = ~reference ) # multiple post-processing steps - same as before, but calculates # poly-contrasts in addition to reference contrasts estimate_contrasts(model, contrasts = "education=c('low', 'mid')", by = "time", post_process = list(~reference, ~poly) ) ## End(Not run)## Not run: # Basic usage -------------------------------- # -------------------------------------------- model <- lm(Sepal.Width ~ Species, data = iris) estimate_contrasts(model) # Dealing with interactions model <- lm(Sepal.Width ~ Species * Petal.Width, data = iris) # By default: selects first factor estimate_contrasts(model) # Can also run contrasts between points of numeric, stratified by "Species" estimate_contrasts(model, contrast = "Petal.Width", by = "Species") # Or both estimate_contrasts(model, contrast = c("Species", "Petal.Width"), length = 2) # Or with custom specifications estimate_contrasts(model, contrast = c("Species", "Petal.Width = c(1, 2)")) # Or modulate it estimate_contrasts(model, by = "Petal.Width", length = 4) # Standardized differences estimated <- estimate_contrasts(lm(Sepal.Width ~ Species, data = iris)) standardize(estimated) # contrasts of slopes ------------------------ # -------------------------------------------- data(qol_cancer, package = "parameters") qol_cancer$ID <- as.numeric(qol_cancer$ID) qol_cancer$grp <- as.factor(ifelse(qol_cancer$ID < 100, "Group 1", "Group 2")) model <- lm(QoL ~ time * education * grp, data = qol_cancer) # "time" only has integer values and few values, so it's treated like a factor estimate_contrasts(model, "time", by = "education") # Setting `integer_as_continuous = TRUE` treats "time" as a continuous # variable. This allows us to compare its average slope rather than making # discrete pairwise comparisons at each time point. estimate_contrasts( model, contrast = "time", by = "education", integer_as_continuous = TRUE ) # pairwise comparisons for multiple groups estimate_contrasts( model, "time", by = c("education", "grp"), integer_as_continuous = TRUE ) # if we want pairwise comparisons only for one factor, but group by another, # we need the formula specification and define the grouping variable after # the vertical bar estimate_contrasts( model, "time", by = c("education", "grp"), comparison = ~pairwise | grp, integer_as_continuous = TRUE ) # custom factor contrasts - contrasts the average effects of two levels # against the remaining third level # --------------------------------------------------------------------- data(puppy_love, package = "modelbased") cond_tx <- cbind("no treatment" = c(1, 0, 0), "treatment" = c(0, 0.5, 0.5)) model <- lm(happiness ~ puppy_love * dose, data = puppy_love) estimate_slopes(model, "puppy_love", by = "dose", comparison = cond_tx) # Note: for the emmeans-backend, we need to use `estimate_contrasts()` for # the above example: cond_tx <- list(`no treatment` = c(1, 0, 0), treatment = c(0, 0.5, 0.5)) estimate_contrasts( model, contrast = "puppy_love", by = "dose", comparison = cond_tx, backend = "emmeans" ) # Other models (mixed, Bayesian, ...) -------- # -------------------------------------------- data <- iris data$Petal.Length_factor <- ifelse(data$Petal.Length < 4.2, "A", "B") model <- lme4::lmer(Sepal.Width ~ Species + (1 | Petal.Length_factor), data = data) estimate_contrasts(model) data <- mtcars data$cyl <- as.factor(data$cyl) data$am <- as.factor(data$am) model <- rstanarm::stan_glm(mpg ~ cyl * wt, data = data, refresh = 0) estimate_contrasts(model) estimate_contrasts(model, by = "wt", length = 4) model <- rstanarm::stan_glm( Sepal.Width ~ Species + Petal.Width + Petal.Length, data = iris, refresh = 0 ) estimate_contrasts(model, by = "Petal.Length = [sd]", test = "bf") # Context effects -------------------------------------------- # This is the difference of within- and between-effects, which # typically are two average slopes that are compared. It is # possible to calculate the context effect at different levels # of another variable. # ------------------------------------------------------------ data("qol_cancer", package = "parameters") qol_cancer <- datawizard::demean(qol_cancer, select = "phq4", by = "ID") model <- lme4::lmer( QoL ~ time * (phq4_within + phq4_between) + (1 + time | ID), data = qol_cancer ) # context effect (difference between within- and between-effect) # at each time point - we calculate the contrast of two average slopes # at different levels of "time" estimate_contrasts(model, c("phq4_within", "phq4_between"), by = "time") # is the trend of the context effect across time points statistically # significant? In this case, we just want the contrasts of the overall # average slopes (not stratfied nor contrasted by time). estimate_contrasts(model, c("phq4_within", "phq4_between")) # now we ask whether contexts effects are different for different educational # levels. We now need to model a 3-way interaction between time, education # and the centered phq4 variables. model <- lme4::lmer( QoL ~ time * education * (phq4_within + phq4_between) + (1 + time | ID), data = qol_cancer ) # how do time trends of context effects differ between education levels? estimate_contrasts(model, c("phq4_within", "phq4_between"), by = "education") # are differences in time trends of context effects statistically significant # between education levels? estimate_contrasts(model, c("phq4_within", "phq4_between", "education")) # Post-processing of multiple comparisons --------------------- # Caution! Don't expect this example to be meaningful! # It is # just to demonstrate the usage of the `post_process` argument. # ------------------------------------------------------------- data("qol_cancer", package = "parameters") model <- lme4::lmer(QoL ~ time * education + (1 + time | ID), data = qol_cancer) # contrasts (pairwise comparisons) by timepoints - the default estimate_contrasts(model, "education=c('low', 'mid')", by = "time") # contrasts (pairwise comparisons) by timepoints, the default for `comparison` # additionally, we compare the differences of these contrasts across timepoints # against the reference time point (contrasts at times 2 and 3 against # contrasts at time 1) estimate_contrasts(model, contrasts = "education=c('low', 'mid')", by = "time", post_process = ~reference ) # multiple post-processing steps - same as before, but calculates # poly-contrasts in addition to reference contrasts estimate_contrasts(model, contrasts = "education=c('low', 'mid')", by = "time", post_process = list(~reference, ~poly) ) ## End(Not run)
After fitting a model, it is useful generate model-based estimates of the response variables for different combinations of predictor values. Such estimates can be used to make inferences about relationships between variables, to make predictions about individual cases, or to compare the predicted values against the observed data.
The modelbased package includes 4 "related" functions, that mostly differ in
their default arguments (in particular, data and predict):
estimate_prediction(data = NULL, predict = "prediction", ...)
estimate_expectation(data = NULL, predict = "expectation", ...)
estimate_relation(data = "grid", predict = "expectation", ...)
estimate_link(data = "grid", predict = "link", ...)
While they are all based on model-based predictions (using
insight::get_predicted()), they differ in terms of the type of
predictions they make by default. For instance, estimate_prediction() and
estimate_expectation() return predictions for the original data used to fit
the model, while estimate_relation() and estimate_link() return
predictions on a insight::get_datagrid(). Similarly, estimate_link
returns predictions on the link scale, while the others return predictions on
the response scale. Note that the relevance of these differences depends on
the model family (for instance, for linear models, estimate_relation is
equivalent to estimate_link(), since there is no difference between the
link-scale and the response scale).
Note that you can run plot() on
the output of these functions to get some visual insights (see the
plotting examples).
See the details section below for details about the different possibilities.
estimate_expectation( model, data = NULL, by = NULL, predict = "expectation", ci = 0.95, transform = NULL, iterations = NULL, keep_iterations = FALSE, ... ) estimate_link( model, data = "grid", by = NULL, predict = "link", ci = 0.95, transform = NULL, iterations = NULL, keep_iterations = FALSE, ... ) estimate_prediction( model, data = NULL, by = NULL, predict = "prediction", ci = 0.95, transform = NULL, iterations = NULL, keep_iterations = FALSE, ... ) estimate_relation( model, data = "grid", by = NULL, predict = "expectation", ci = 0.95, transform = NULL, iterations = NULL, keep_iterations = FALSE, ... )estimate_expectation( model, data = NULL, by = NULL, predict = "expectation", ci = 0.95, transform = NULL, iterations = NULL, keep_iterations = FALSE, ... ) estimate_link( model, data = "grid", by = NULL, predict = "link", ci = 0.95, transform = NULL, iterations = NULL, keep_iterations = FALSE, ... ) estimate_prediction( model, data = NULL, by = NULL, predict = "prediction", ci = 0.95, transform = NULL, iterations = NULL, keep_iterations = FALSE, ... ) estimate_relation( model, data = "grid", by = NULL, predict = "expectation", ci = 0.95, transform = NULL, iterations = NULL, keep_iterations = FALSE, ... )
model |
A statistical model. |
data |
A data frame with model's predictors to estimate the response. If
|
by |
The predictor variable(s) at which to estimate the response. Other
predictors of the model that are not included here will be set to their mean
value (for numeric predictors), reference level (for factors) or mode (other
types). The |
predict |
This parameter controls what is predicted (and gets internally
passed to |
ci |
Confidence Interval (CI) level. Default to |
transform |
A function applied to predictions and confidence intervals
to (back-) transform results, which can be useful in case the regression
model has a transformed response variable (e.g., |
iterations |
For Bayesian models, this corresponds to the number of
posterior draws. If |
keep_iterations |
If |
... |
You can add all the additional control arguments from
|
A data frame of predicted values and uncertainty intervals, with
class "estimate_predicted". Methods for visualisation_recipe()
and plot() are available.
The most important way that various types of response estimates differ is in terms of what quantity is being estimated and the meaning of the uncertainty intervals. The major choices are expected values for uncertainty in the regression line and predicted values for uncertainty in the individual case predictions.
Expected values refer to the fitted regression line - the estimated average response value (i.e., the "expectation") for individuals with specific predictor values. For example, in a linear model y = 2 + 3x + 4z + e, the estimated average y for individuals with x = 1 and z = 2 is 11.
For expected values, uncertainty intervals refer to uncertainty in the estimated conditional average (where might the true regression line actually fall)? Uncertainty intervals for expected values are also called "confidence intervals".
Expected values and their uncertainty intervals are useful for describing the relationship between variables and for describing how precisely a model has been estimated.
For generalized linear models, expected values are reported on one of two scales:
The link scale refers to scale of the fitted regression line, after transformation by the link function. For example, for a logistic regression (logit binomial) model, the link scale gives expected log-odds. For a log-link Poisson model, the link scale gives the expected log-count.
The response scale refers to the original scale of the response variable (i.e., without any link function transformation). Expected values on the link scale are back-transformed to the original response variable metric (e.g., expected probabilities for binomial models, expected counts for Poisson models).
In contrast to expected values, predicted values refer to predictions for individual cases. Predicted values are also called "posterior predictions" or "posterior predictive draws".
For predicted values, uncertainty intervals refer to uncertainty in the individual response values for each case (where might any single case actually fall)? Uncertainty intervals for predicted values are also called "prediction intervals" or "posterior predictive intervals".
Predicted values and their uncertainty intervals are useful for forecasting the range of values that might be observed in new data, for making decisions about individual cases, and for checking if model predictions are reasonable ("posterior predictive checks").
Predicted values and intervals are always on the scale of the original response variable (not the link scale).
modelbased provides 4 functions for generating model-based response estimates and their uncertainty:
estimate_expectation():
Generates expected values (conditional average) on the response scale.
The uncertainty interval is a confidence interval.
By default, values are computed using the data used to fit the model.
estimate_link():
Generates expected values (conditional average) on the link scale.
The uncertainty interval is a confidence interval.
By default, values are computed using a reference grid spanning the
observed range of predictor values (see insight::get_datagrid()).
estimate_prediction():
Generates predicted values (for individual cases) on the response scale.
The uncertainty interval is a prediction interval.
By default, values are computed using the data used to fit the model.
estimate_relation():
Like estimate_expectation().
Useful for visualizing a model.
Generates expected values (conditional average) on the response scale.
The uncertainty interval is a confidence interval.
By default, values are computed using a reference grid spanning the
observed range of predictor values (see insight::get_datagrid()).
If the data = NULL, values are estimated using the data used to fit the
model. If data = "grid", values are computed using a reference grid
spanning the observed range of predictor values with
insight::get_datagrid(). This can be useful for model visualization. The
number of predictor values used for each variable can be controlled with the
length argument. data can also be a data frame containing columns with
names matching the model frame (see insight::get_data()). This can be used
to generate model predictions for specific combinations of predictor values.
For finite mixture models (currently, only the brms::mixture() family
from package brms is supported), use predict = "classification" with
data = NULL to predict the class membership for each observation (e.g.,
estimate_prediction(model, predict = "classification")). To return
predicted values stratified by class membership, use predict = "link"
(possibly in combination with data or by, e.g.
estimate_link(model, by = "predictor")). Other predict options will
return predicted values of the outcome for the full data, not stratified by
class membership.
These functions are built on top of insight::get_predicted() and correspond
to different specifications of its parameters. It may be useful to read its
documentation,
in particular the description of the predict argument for additional
details on the difference between expected vs. predicted values and link vs.
response scales.
Additional control parameters can be used to control results from
insight::get_datagrid() (when data = "grid") and from
insight::get_predicted() (the function used internally to compute
predictions).
For plotting, check the examples in visualisation_recipe(). Also check out
the Vignettes and README examples for
various examples, tutorials and usecases.
library(modelbased) # Linear Models model <- lm(mpg ~ wt, data = mtcars) # Get predicted and prediction interval (see insight::get_predicted) estimate_expectation(model) # Get expected values with confidence interval pred <- estimate_relation(model) pred # Visualisation (see visualisation_recipe()) plot(pred) # Standardize predictions pred <- estimate_relation(lm(mpg ~ wt + am, data = mtcars)) z <- standardize(pred, include_response = FALSE) z unstandardize(z, include_response = FALSE) # Logistic Models model <- glm(vs ~ wt, data = mtcars, family = "binomial") estimate_expectation(model) estimate_relation(model) # Mixed models data(mtcars) mtcars$gear <- as.factor(mtcars$gear) model <- glmmTMB::glmmTMB(mpg ~ wt + (1 | gear), data = mtcars) estimate_expectation(model) estimate_relation(model) # Predict random effects and calculate contrasts estim <- estimate_relation(model, by = "gear") estim estimate_contrasts(estim) # Bayesian models model <- suppressWarnings(rstanarm::stan_glm( mpg ~ wt, data = mtcars, refresh = 0, iter = 200 )) estimate_expectation(model) estimate_relation(model)library(modelbased) # Linear Models model <- lm(mpg ~ wt, data = mtcars) # Get predicted and prediction interval (see insight::get_predicted) estimate_expectation(model) # Get expected values with confidence interval pred <- estimate_relation(model) pred # Visualisation (see visualisation_recipe()) plot(pred) # Standardize predictions pred <- estimate_relation(lm(mpg ~ wt + am, data = mtcars)) z <- standardize(pred, include_response = FALSE) z unstandardize(z, include_response = FALSE) # Logistic Models model <- glm(vs ~ wt, data = mtcars, family = "binomial") estimate_expectation(model) estimate_relation(model) # Mixed models data(mtcars) mtcars$gear <- as.factor(mtcars$gear) model <- glmmTMB::glmmTMB(mpg ~ wt + (1 | gear), data = mtcars) estimate_expectation(model) estimate_relation(model) # Predict random effects and calculate contrasts estim <- estimate_relation(model, by = "gear") estim estimate_contrasts(estim) # Bayesian models model <- suppressWarnings(rstanarm::stan_glm( mpg ~ wt, data = mtcars, refresh = 0, iter = 200 )) estimate_expectation(model) estimate_relation(model)
Extract random parameters of each individual group in the context of mixed models, commonly referred to as BLUPs (Best Linear Unbiased Predictors). Can be reshaped to be of the same dimensions as the original data, which can be useful to add the random effects to the original data.
estimate_grouplevel(model, ...) ## Default S3 method: estimate_grouplevel(model, type = "random", ...) ## S3 method for class 'brmsfit' estimate_grouplevel( model, type = "random", dispersion = TRUE, test = NULL, diagnostic = NULL, ... ) reshape_grouplevel(x, ...) ## S3 method for class 'estimate_grouplevel' reshape_grouplevel(x, indices = "all", group = NULL, ...)estimate_grouplevel(model, ...) ## Default S3 method: estimate_grouplevel(model, type = "random", ...) ## S3 method for class 'brmsfit' estimate_grouplevel( model, type = "random", dispersion = TRUE, test = NULL, diagnostic = NULL, ... ) reshape_grouplevel(x, ...) ## S3 method for class 'estimate_grouplevel' reshape_grouplevel(x, indices = "all", group = NULL, ...)
model |
A mixed model with random effects. |
... |
Other arguments passed to |
type |
String, describing the type of estimates that should be returned.
Can be
|
dispersion, test, diagnostic
|
Arguments passed to
|
x |
The output of |
indices |
A character vector containing the indices (i.e., which columns) to extract (e.g., "Coefficient", "Median"). |
group |
The name of the random factor to select as string value (e.g.,
|
Unlike raw group means, BLUPs apply shrinkage: they are a compromise between the group estimate and the population estimate. This improves generalizability and prevents overfitting.
# lme4 model data(mtcars) model <- lme4::lmer(mpg ~ hp + (1 | carb), data = mtcars) random <- estimate_grouplevel(model) # Show group-specific effects random # Visualize random effects plot(random) # Reshape to wide data... reshaped <- reshape_grouplevel(random, group = "carb", indices = c("Coefficient", "SE")) # ...and can be easily combined with the original data alldata <- merge(mtcars, reshaped) # overall coefficients estimate_grouplevel(model, type = "total")# lme4 model data(mtcars) model <- lme4::lmer(mpg ~ hp + (1 | carb), data = mtcars) random <- estimate_grouplevel(model) # Show group-specific effects random # Visualize random effects plot(random) # Reshape to wide data... reshaped <- reshape_grouplevel(random, group = "carb", indices = c("Coefficient", "SE")) # ...and can be easily combined with the original data alldata <- merge(mtcars, reshaped) # overall coefficients estimate_grouplevel(model, type = "total")
Estimate average values of the response variable at each factor level or
at representative values, respectively at values defined in a "data grid" or
"reference grid". For plotting, check the examples in
visualisation_recipe(). See also other related functions such as
estimate_contrasts() and estimate_slopes().
estimate_means( model, by = "auto", predict = NULL, ci = 0.95, estimate = NULL, transform = NULL, keep_iterations = FALSE, backend = NULL, verbose = TRUE, ... )estimate_means( model, by = "auto", predict = NULL, ci = 0.95, estimate = NULL, transform = NULL, keep_iterations = FALSE, backend = NULL, verbose = TRUE, ... )
model |
A statistical model. |
by |
The (focal) predictor variable(s) at which to evaluate the desired
effect / mean / contrasts. Other predictors of the model that are not
included here will be collapsed and "averaged" over (the effect will be
estimated across them). |
predict |
Is passed to the
See also section Predictions on different scales. |
ci |
Confidence Interval (CI) level. Default to |
estimate |
The
You can set a default option for the Note following limitations:
|
transform |
A function applied to predictions and confidence intervals
to (back-) transform results, which can be useful in case the regression
model has a transformed response variable (e.g., |
keep_iterations |
If |
backend |
Whether to use Another difference is that You can set a default backend via |
verbose |
Use |
... |
Other arguments passed, for instance, to
|
The estimate_slopes(), estimate_means() and estimate_contrasts()
functions are forming a group, as they are all based on marginal
estimations (estimations based on a model). All three are built on the
emmeans or marginaleffects package (depending on the backend
argument), so reading its documentation (for instance emmeans::emmeans(),
emmeans::emtrends() or this website) is
recommended to understand the idea behind these types of procedures.
Model-based predictions is the basis for all that follows. Indeed, the
first thing to understand is how models can be used to make predictions
(see estimate_relation()). This corresponds to the predicted response (or
"outcome variable") given specific predictor values of the predictors
(i.e., given a specific data configuration). This is why the concept of
the reference grid is so important for direct
predictions.
Marginal "means", obtained via estimate_means(), are an extension of
such predictions, allowing to "average" (collapse) some of the predictors,
to obtain the average response value at a specific predictors
configuration. This is typically used when some of the predictors of
interest are factors. Indeed, the parameters of the model will usually give
you the intercept value and then the "effect" of each factor level (how
different it is from the intercept). Marginal means can be used to directly
give you the mean value of the response variable at all the levels of a
factor. Moreover, it can also be used to control, or average over
predictors, which is useful in the case of multiple predictors with or
without interactions.
Marginal contrasts, obtained via estimate_contrasts(), are themselves
at extension of marginal means, in that they allow to investigate the
difference (i.e., the contrast) between the marginal means. This is, again,
often used to get all pairwise differences between all levels of a factor.
It works also for continuous predictors, for instance one could also be
interested in whether the difference at two extremes of a continuous
predictor is significant.
Finally, marginal effects, obtained via estimate_slopes(), are
different in that their focus is not values on the response variable, but
the model's parameters. The idea is to assess the effect of a predictor at
a specific configuration of the other predictors. This is relevant in the
case of interactions or non-linear relationships, when the effect of a
predictor variable changes depending on the other predictors. Moreover,
these effects can also be "averaged" over other predictors, to get for
instance the "general trend" of a predictor over different factor levels.
Example: Let's imagine the following model lm(y ~ condition * x) where
condition is a factor with 3 levels A, B and C and x a continuous
variable (like age for example). One idea is to see how this model performs,
and compare the actual response y to the one predicted by the model (using
estimate_expectation()). Another idea is evaluate the average mean at each of
the condition's levels (using estimate_means()), which can be useful to
visualize them. Another possibility is to evaluate the difference between
these levels (using estimate_contrasts()). Finally, one could also estimate
the effect of x averaged over all conditions, or instead within each
condition (using estimate_slopes()).
A data frame of estimated marginal means.
To define representative values for focal predictors (specified in by,
contrast, and slope), you can use several methods. These values are
internally generated by insight::get_datagrid(), so consult its
documentation for more details.
You can directly specify values as strings or lists for by, contrast,
and slope.
For numeric focal predictors, use examples like by = "gear = c(4, 8)",
by = list(gear = c(4, 8)) or by = "gear = 5:10"
For factor or character predictors, use by = "Species = c('setosa', 'virginica')"
or by = list(Species = c('setosa', 'virginica'))
You can use "shortcuts" within square brackets, such as by = "Sepal.Width = [sd]"
or by = "Sepal.Width = [fivenum]"
For numeric focal predictors, if no representative values are specified
(i.e., by = "gear" and not by = "gear = c(4, 8)"), length and
range control the number and type of representative values for the focal
predictors:
length determines how many equally spaced values are generated.
range specifies the type of values, like "range" or "sd".
length and range apply to all numeric focal predictors.
If you have multiple numeric predictors, length and range can accept
multiple elements, one for each predictor (see 'Examples').
For integer variables, only values that appear in the data will be included
in the data grid, independent from the length argument. This behaviour
can be changed by setting protect_integers = FALSE, which will then treat
integer variables as numerics (and possibly produce fractions).
See also this vignette
for some examples, and insight::get_datagrid() to learn more about how to
create data grids for predictors of interest.
The predict argument allows to generate predictions on different scales of
the response variable. The "link" option does not apply to all models, and
usually not to Gaussian models. "link" will leave the values on scale of
the linear predictors. "response" (or NULL) will transform them on scale
of the response variable. Thus for a logistic model, "link" will give
estimations expressed in log-odds (probabilities on logit scale) and
"response" in terms of probabilities.
To predict distributional parameters (called "dpar" in other packages), for
instance when using complex formulae in brms models, the predict argument
can take the value of the parameter you want to estimate, for instance
"sigma", "kappa", etc.
"response" and "inverse_link" both return predictions on the response
scale, however, "response" first calculates predictions on the response
scale for each observation and then aggregates them by groups or levels
defined in by. "inverse_link" first calculates predictions on the link
scale for each observation, then aggregates them by groups or levels defined
in by, and finally back-transforms the predictions to the response scale.
Both approaches have advantages and disadvantages. "response" usually
produces less biased predictions, but confidence intervals might be outside
reasonable bounds (i.e., for instance can be negative for count data). The
"inverse_link" approach is more robust in terms of confidence intervals,
but might produce biased predictions. However, you can try to set
bias_correction = TRUE, to adjust for this bias.
In particular for mixed models, using "response" is recommended, because
averaging across random effects groups is then more accurate.
For finite mixture models (currently, only the brms::mixture() family
from package brms is supported), use predict = "link" to return predicted
values stratified by class membership. To predict the class membership, use
predict = "classification". See also
this vignette.
There are two ways of performing equivalence tests with modelbased.
Using the marginaleffects machinery
The first is by specifying the equivalence argument. It takes a numeric
vector of length two, defining the lower and upper bounds of the region of
equivalence (ROPE). The output then includes an additional column
p_Equivalence. A high p-value (non-significant result) means we reject the
assumption of practical equivalence (and that a minimal important difference
can be assumed, or that the estimate of the predicted value, slope or
contrast is likely outside the ROPE).
Using the equivalence_test() function
The second option is to use the parameters::equivalence_test.lm()
function from the parameters package on the output of
estimate_means(), estimate_slopes() or estimate_contrasts(). This
method is more flexible and implements different "rules" to calculate
practical equivalence. Furthermore, the rule decisions of accepting,
rejecting, or undecided regarding the null hypothesis of the equivalence
test are also provided. Thus, resulting p-values may differ from those
p-values returned when using the equivalence argument.
The output from equivalence_test() returns a column SGPV, the "second
generation p-value", which is equivalent to the p (Equivalence) column when
using the equivalence argument. It is basically representative of the ROPE coverage
from the confidence interval of the estimate (i.e. the proportion of the
confidence intervals that lies within the region of practical equivalence).
modelbased_backend: options(modelbased_backend = <string>) will set a
default value for the backend argument and can be used to set the package
used by default to calculate marginal means. Can be "marginaleffects" or
"emmeans".
modelbased_estimate: options(modelbased_estimate = <string>) will
set a default value for the estimate argument.
modelbased_integer: options(modelbased_integer = <value>) will set the
minimum number of unique values in an integer predictor to treat that
predictor as a "discrete integer" or as continuous. If the integer has more
than modelbased_integer unique values, it is treated as continuous. Set
to TRUE to always treat integer predictors as continuous.
Chatton, A. and Rohrer, J.M. 2024. The Causal Cookbook: Recipes for Propensity Scores, G-Computation, and Doubly Robust Standardization. Advances in Methods and Practices in Psychological Science. 2024;7(1). doi:10.1177/25152459241236149
Dickerman, Barbra A., and Miguel A. Hernán. 2020. Counterfactual Prediction Is Not Only for Causal Inference. European Journal of Epidemiology 35 (7): 615–17. doi:10.1007/s10654-020-00659-8
Heiss, A. (2022). Marginal and conditional effects for GLMMs with marginaleffects. Andrew Heiss. doi:10.59350/xwnfm-x1827
library(modelbased) # Frequentist models # ------------------- model <- lm(Petal.Length ~ Sepal.Width * Species, data = iris) estimate_means(model) # the `length` argument is passed to `insight::get_datagrid()` and modulates # the number of representative values to return for numeric predictors estimate_means(model, by = c("Species", "Sepal.Width"), length = 2) # an alternative way to setup your data grid is specify the values directly estimate_means(model, by = c("Species", "Sepal.Width = c(2, 4)")) # or use one of the many predefined "tokens" that help you creating a useful # data grid - to learn more about creating data grids, see help in # `?insight::get_datagrid`. estimate_means(model, by = c("Species", "Sepal.Width = [fivenum]")) ## Not run: # same for factors: filter by specific levels estimate_means(model, by = "Species = c('versicolor', 'setosa')") estimate_means(model, by = c("Species", "Sepal.Width = 0")) # estimate marginal average of response at values for numeric predictor estimate_means(model, by = "Sepal.Width", length = 5) estimate_means(model, by = "Sepal.Width = c(2, 4)") # or provide the definition of the data grid as list estimate_means( model, by = list(Sepal.Width = c(2, 4), Species = c("versicolor", "setosa")) ) # equivalence test: the null-hypothesis is that the estimate is outside # the equivalence bounds [-4.5, 4.5] estimate_means(model, by = "Species", equivalence = c(-4.5, 4.5)) # Methods that can be applied to it: means <- estimate_means(model, by = c("Species", "Sepal.Width = 0")) plot(means) # which runs visualisation_recipe() standardize(means) # grids for numeric predictors, combine range and length model <- lm(Sepal.Length ~ Sepal.Width * Petal.Length, data = iris) # create a "grid": value range for first numeric predictor, mean +/-1 SD # for remaining numeric predictors. estimate_means(model, c("Sepal.Width", "Petal.Length"), range = "grid") # range from minimum to maximum spread over four values, # and mean +/- 1 SD (a total of three values) estimate_means( model, by = c("Sepal.Width", "Petal.Length"), range = c("range", "sd"), length = c(4, 3) ) data <- iris data$Petal.Length_factor <- ifelse(data$Petal.Length < 4.2, "A", "B") model <- lme4::lmer( Petal.Length ~ Sepal.Width + Species + (1 | Petal.Length_factor), data = data ) estimate_means(model) estimate_means(model, by = "Sepal.Width", length = 3) ## End(Not run)library(modelbased) # Frequentist models # ------------------- model <- lm(Petal.Length ~ Sepal.Width * Species, data = iris) estimate_means(model) # the `length` argument is passed to `insight::get_datagrid()` and modulates # the number of representative values to return for numeric predictors estimate_means(model, by = c("Species", "Sepal.Width"), length = 2) # an alternative way to setup your data grid is specify the values directly estimate_means(model, by = c("Species", "Sepal.Width = c(2, 4)")) # or use one of the many predefined "tokens" that help you creating a useful # data grid - to learn more about creating data grids, see help in # `?insight::get_datagrid`. estimate_means(model, by = c("Species", "Sepal.Width = [fivenum]")) ## Not run: # same for factors: filter by specific levels estimate_means(model, by = "Species = c('versicolor', 'setosa')") estimate_means(model, by = c("Species", "Sepal.Width = 0")) # estimate marginal average of response at values for numeric predictor estimate_means(model, by = "Sepal.Width", length = 5) estimate_means(model, by = "Sepal.Width = c(2, 4)") # or provide the definition of the data grid as list estimate_means( model, by = list(Sepal.Width = c(2, 4), Species = c("versicolor", "setosa")) ) # equivalence test: the null-hypothesis is that the estimate is outside # the equivalence bounds [-4.5, 4.5] estimate_means(model, by = "Species", equivalence = c(-4.5, 4.5)) # Methods that can be applied to it: means <- estimate_means(model, by = c("Species", "Sepal.Width = 0")) plot(means) # which runs visualisation_recipe() standardize(means) # grids for numeric predictors, combine range and length model <- lm(Sepal.Length ~ Sepal.Width * Petal.Length, data = iris) # create a "grid": value range for first numeric predictor, mean +/-1 SD # for remaining numeric predictors. estimate_means(model, c("Sepal.Width", "Petal.Length"), range = "grid") # range from minimum to maximum spread over four values, # and mean +/- 1 SD (a total of three values) estimate_means( model, by = c("Sepal.Width", "Petal.Length"), range = c("range", "sd"), length = c(4, 3) ) data <- iris data$Petal.Length_factor <- ifelse(data$Petal.Length < 4.2, "A", "B") model <- lme4::lmer( Petal.Length ~ Sepal.Width + Species + (1 | Petal.Length_factor), data = data ) estimate_means(model) estimate_means(model, by = "Sepal.Width", length = 3) ## End(Not run)
Estimate the slopes (i.e., the coefficient) of a predictor over or within different factor levels, or alongside a numeric variable. In other words, to assess the effect of a predictor at specific configurations data. It corresponds to the derivative and can be useful to understand where a predictor has a significant role when interactions or non-linear relationships are present.
Other related functions based on marginal estimations includes
estimate_contrasts() and estimate_means().
See the Details section below, and don't forget to also check out the Vignettes and README examples for various examples, tutorials and use cases.
estimate_slopes( model, slope = NULL, by = NULL, predict = NULL, ci = 0.95, estimate = NULL, transform = NULL, p_adjust = "none", keep_iterations = FALSE, backend = NULL, verbose = TRUE, trend = NULL, ... )estimate_slopes( model, slope = NULL, by = NULL, predict = NULL, ci = 0.95, estimate = NULL, transform = NULL, p_adjust = "none", keep_iterations = FALSE, backend = NULL, verbose = TRUE, trend = NULL, ... )
model |
A statistical model. |
slope, trend
|
A character indicating the name of the variable for which
to compute the slopes. To get marginal effects at specific values, use
|
by |
The (focal) predictor variable(s) at which to evaluate the desired
effect / mean / contrasts. Other predictors of the model that are not
included here will be collapsed and "averaged" over (the effect will be
estimated across them). |
predict |
Is passed to the
See also section Predictions on different scales. |
ci |
Confidence Interval (CI) level. Default to |
estimate |
The
You can set a default option for the Note following limitations:
|
transform |
A function applied to predictions and confidence intervals
to (back-) transform results, which can be useful in case the regression
model has a transformed response variable (e.g., |
p_adjust |
The p-values adjustment method for frequentist multiple
comparisons. For |
keep_iterations |
If |
backend |
Whether to use Another difference is that You can set a default backend via |
verbose |
Use |
... |
Other arguments passed, for instance, to
|
The estimate_slopes(), estimate_means() and estimate_contrasts()
functions are forming a group, as they are all based on marginal
estimations (estimations based on a model). All three are built on the
emmeans or marginaleffects package (depending on the backend
argument), so reading its documentation (for instance emmeans::emmeans(),
emmeans::emtrends() or this website) is
recommended to understand the idea behind these types of procedures.
Model-based predictions is the basis for all that follows. Indeed, the
first thing to understand is how models can be used to make predictions
(see estimate_relation()). This corresponds to the predicted response (or
"outcome variable") given specific predictor values of the predictors
(i.e., given a specific data configuration). This is why the concept of
the reference grid is so important for direct
predictions.
Marginal "means", obtained via estimate_means(), are an extension of
such predictions, allowing to "average" (collapse) some of the predictors,
to obtain the average response value at a specific predictors
configuration. This is typically used when some of the predictors of
interest are factors. Indeed, the parameters of the model will usually give
you the intercept value and then the "effect" of each factor level (how
different it is from the intercept). Marginal means can be used to directly
give you the mean value of the response variable at all the levels of a
factor. Moreover, it can also be used to control, or average over
predictors, which is useful in the case of multiple predictors with or
without interactions.
Marginal contrasts, obtained via estimate_contrasts(), are themselves
at extension of marginal means, in that they allow to investigate the
difference (i.e., the contrast) between the marginal means. This is, again,
often used to get all pairwise differences between all levels of a factor.
It works also for continuous predictors, for instance one could also be
interested in whether the difference at two extremes of a continuous
predictor is significant.
Finally, marginal effects, obtained via estimate_slopes(), are
different in that their focus is not values on the response variable, but
the model's parameters. The idea is to assess the effect of a predictor at
a specific configuration of the other predictors. This is relevant in the
case of interactions or non-linear relationships, when the effect of a
predictor variable changes depending on the other predictors. Moreover,
these effects can also be "averaged" over other predictors, to get for
instance the "general trend" of a predictor over different factor levels.
Example: Let's imagine the following model lm(y ~ condition * x) where
condition is a factor with 3 levels A, B and C and x a continuous
variable (like age for example). One idea is to see how this model performs,
and compare the actual response y to the one predicted by the model (using
estimate_expectation()). Another idea is evaluate the average mean at each of
the condition's levels (using estimate_means()), which can be useful to
visualize them. Another possibility is to evaluate the difference between
these levels (using estimate_contrasts()). Finally, one could also estimate
the effect of x averaged over all conditions, or instead within each
condition (using estimate_slopes()).
A data.frame of class estimate_slopes.
To define representative values for focal predictors (specified in by,
contrast, and slope), you can use several methods. These values are
internally generated by insight::get_datagrid(), so consult its
documentation for more details.
You can directly specify values as strings or lists for by, contrast,
and slope.
For numeric focal predictors, use examples like by = "gear = c(4, 8)",
by = list(gear = c(4, 8)) or by = "gear = 5:10"
For factor or character predictors, use by = "Species = c('setosa', 'virginica')"
or by = list(Species = c('setosa', 'virginica'))
You can use "shortcuts" within square brackets, such as by = "Sepal.Width = [sd]"
or by = "Sepal.Width = [fivenum]"
For numeric focal predictors, if no representative values are specified
(i.e., by = "gear" and not by = "gear = c(4, 8)"), length and
range control the number and type of representative values for the focal
predictors:
length determines how many equally spaced values are generated.
range specifies the type of values, like "range" or "sd".
length and range apply to all numeric focal predictors.
If you have multiple numeric predictors, length and range can accept
multiple elements, one for each predictor (see 'Examples').
For integer variables, only values that appear in the data will be included
in the data grid, independent from the length argument. This behaviour
can be changed by setting protect_integers = FALSE, which will then treat
integer variables as numerics (and possibly produce fractions).
See also this vignette
for some examples, and insight::get_datagrid() to learn more about how to
create data grids for predictors of interest.
Montiel Olea, J. L., and Plagborg-Møller, M. (2019). Simultaneous confidence bands: Theory, implementation, and an application to SVARs. Journal of Applied Econometrics, 34(1), 1–17. doi:10.1002/jae.2656
library(ggplot2) # Get an idea of the data ggplot(iris, aes(x = Petal.Length, y = Sepal.Width)) + geom_point(aes(color = Species)) + geom_smooth(color = "black", se = FALSE) + geom_smooth(aes(color = Species), linetype = "dotted", se = FALSE) + geom_smooth(aes(color = Species), method = "lm", se = FALSE) # Model it model <- lm(Sepal.Width ~ Species * Petal.Length, data = iris) # Compute the marginal effect of Petal.Length at each level of Species slopes <- estimate_slopes(model, slope = "Petal.Length", by = "Species") slopes # What is the *average* slope of Petal.Length? This can be calculated by # taking the average of the slopes across all Species, using `comparison`. # We pass a function to `comparison` that calculates the mean of the slopes. estimate_slopes( model, slope = "Petal.Length", by = "Species", comparison = ~I(mean(x)) ) ## Not run: # Plot it plot(slopes) standardize(slopes) model <- mgcv::gam(Sepal.Width ~ s(Petal.Length), data = iris) slopes <- estimate_slopes(model, by = "Petal.Length", length = 50) summary(slopes) plot(slopes) model <- mgcv::gam(Sepal.Width ~ s(Petal.Length, by = Species), data = iris) slopes <- estimate_slopes(model, slope = "Petal.Length", by = c("Petal.Length", "Species"), length = 20 ) summary(slopes) plot(slopes) # marginal effects, grouped by Species, at different values of Petal.Length estimate_slopes(model, slope = "Petal.Length", by = c("Petal.Length", "Species"), length = 10 ) # marginal effects at different values of Petal.Length estimate_slopes(model, slope = "Petal.Length", by = "Petal.Length", length = 10) # marginal effects at very specific values of Petal.Length estimate_slopes(model, slope = "Petal.Length", by = "Petal.Length=c(1, 3, 5)") # average marginal effects of Petal.Length, # just for the trend within a certain range estimate_slopes(model, slope = "Petal.Length=seq(2, 4, 0.01)") ## End(Not run) ## Not run: # marginal effects with different `estimate` options data(penguins, package = "datasets") penguins$long_bill <- factor(datawizard::categorize(penguins$bill_len), labels = c("short", "long")) m <- glm(long_bill ~ sex + species + island * bill_dep, data = penguins, family = "binomial") # the emmeans default estimate_slopes(m, "bill_dep", by = "island") emmeans::emtrends(m, "island", var = "bill_dep", regrid = "response") # the marginaleffects default estimate_slopes(m, "bill_dep", by = "island", estimate = "average") marginaleffects::avg_slopes(m, variables = "bill_dep", by = "island") ## End(Not run)library(ggplot2) # Get an idea of the data ggplot(iris, aes(x = Petal.Length, y = Sepal.Width)) + geom_point(aes(color = Species)) + geom_smooth(color = "black", se = FALSE) + geom_smooth(aes(color = Species), linetype = "dotted", se = FALSE) + geom_smooth(aes(color = Species), method = "lm", se = FALSE) # Model it model <- lm(Sepal.Width ~ Species * Petal.Length, data = iris) # Compute the marginal effect of Petal.Length at each level of Species slopes <- estimate_slopes(model, slope = "Petal.Length", by = "Species") slopes # What is the *average* slope of Petal.Length? This can be calculated by # taking the average of the slopes across all Species, using `comparison`. # We pass a function to `comparison` that calculates the mean of the slopes. estimate_slopes( model, slope = "Petal.Length", by = "Species", comparison = ~I(mean(x)) ) ## Not run: # Plot it plot(slopes) standardize(slopes) model <- mgcv::gam(Sepal.Width ~ s(Petal.Length), data = iris) slopes <- estimate_slopes(model, by = "Petal.Length", length = 50) summary(slopes) plot(slopes) model <- mgcv::gam(Sepal.Width ~ s(Petal.Length, by = Species), data = iris) slopes <- estimate_slopes(model, slope = "Petal.Length", by = c("Petal.Length", "Species"), length = 20 ) summary(slopes) plot(slopes) # marginal effects, grouped by Species, at different values of Petal.Length estimate_slopes(model, slope = "Petal.Length", by = c("Petal.Length", "Species"), length = 10 ) # marginal effects at different values of Petal.Length estimate_slopes(model, slope = "Petal.Length", by = "Petal.Length", length = 10) # marginal effects at very specific values of Petal.Length estimate_slopes(model, slope = "Petal.Length", by = "Petal.Length=c(1, 3, 5)") # average marginal effects of Petal.Length, # just for the trend within a certain range estimate_slopes(model, slope = "Petal.Length=seq(2, 4, 0.01)") ## End(Not run) ## Not run: # marginal effects with different `estimate` options data(penguins, package = "datasets") penguins$long_bill <- factor(datawizard::categorize(penguins$bill_len), labels = c("short", "long")) m <- glm(long_bill ~ sex + species + island * bill_dep, data = penguins, family = "binomial") # the emmeans default estimate_slopes(m, "bill_dep", by = "island") emmeans::emtrends(m, "island", var = "bill_dep", regrid = "response") # the marginaleffects default estimate_slopes(m, "bill_dep", by = "island", estimate = "average") marginaleffects::avg_slopes(m, variables = "bill_dep", by = "island") ## End(Not run)
A sample data set, used in tests and some examples. Useful for demonstrating count models (with or without zero-inflation component). It consists of nine variables from 250 observations.
These functions are convenient wrappers around the emmeans and the
marginaleffects packages. They are mostly available for developers who want
to leverage a unified API for getting model-based estimates, and regular users
should use the estimate_* set of functions.
The get_emmeans(), get_emcontrasts() and get_emtrends() functions are
wrappers around emmeans::emmeans() and emmeans::emtrends().
get_emcontrasts( model, contrast = NULL, by = NULL, predict = NULL, comparison = "pairwise", keep_iterations = FALSE, verbose = TRUE, ... ) get_emmeans( model, by = "auto", predict = NULL, keep_iterations = FALSE, verbose = TRUE, ... ) get_emtrends( model, trend = NULL, by = NULL, predict = NULL, keep_iterations = FALSE, verbose = TRUE, ... ) get_marginalcontrasts( model, contrast = NULL, by = NULL, predict = NULL, ci = 0.95, comparison = "pairwise", estimate = NULL, transform = NULL, post_process = NULL, p_adjust = "none", keep_iterations = FALSE, verbose = TRUE, ... ) get_marginalmeans( model, by = "auto", predict = NULL, ci = 0.95, estimate = NULL, transform = NULL, keep_iterations = FALSE, verbose = TRUE, ... ) get_marginaltrends( model, trend = NULL, by = NULL, predict = NULL, ci = 0.95, estimate = NULL, transform = NULL, p_adjust = "none", keep_iterations = FALSE, verbose = TRUE, ... )get_emcontrasts( model, contrast = NULL, by = NULL, predict = NULL, comparison = "pairwise", keep_iterations = FALSE, verbose = TRUE, ... ) get_emmeans( model, by = "auto", predict = NULL, keep_iterations = FALSE, verbose = TRUE, ... ) get_emtrends( model, trend = NULL, by = NULL, predict = NULL, keep_iterations = FALSE, verbose = TRUE, ... ) get_marginalcontrasts( model, contrast = NULL, by = NULL, predict = NULL, ci = 0.95, comparison = "pairwise", estimate = NULL, transform = NULL, post_process = NULL, p_adjust = "none", keep_iterations = FALSE, verbose = TRUE, ... ) get_marginalmeans( model, by = "auto", predict = NULL, ci = 0.95, estimate = NULL, transform = NULL, keep_iterations = FALSE, verbose = TRUE, ... ) get_marginaltrends( model, trend = NULL, by = NULL, predict = NULL, ci = 0.95, estimate = NULL, transform = NULL, p_adjust = "none", keep_iterations = FALSE, verbose = TRUE, ... )
model |
A statistical model. |
contrast |
A character vector indicating the name of the variable(s) for
which to compute the contrasts, optionally including representative values or
levels at which contrasts are evaluated (e.g., |
by |
The (focal) predictor variable(s) at which to evaluate the desired
effect / mean / contrasts. Other predictors of the model that are not
included here will be collapsed and "averaged" over (the effect will be
estimated across them). |
predict |
Is passed to the
See also section Predictions on different scales. |
comparison |
Specify the type of contrasts or tests that should be carried out.
|
keep_iterations |
If |
verbose |
Use |
... |
Other arguments passed, for instance, to
|
trend |
A character indicating the name of the variable for which
to compute the slopes. To get marginal effects at specific values, use
|
ci |
Confidence Interval (CI) level. Default to |
estimate |
The
You can set a default option for the Note following limitations:
|
transform |
A function applied to predictions and confidence intervals
to (back-) transform results, which can be useful in case the regression
model has a transformed response variable (e.g., |
post_process |
Optional formula, character string or function (see
|
p_adjust |
The p-values adjustment method for frequentist multiple
comparisons. For |
# Basic usage model <- lm(Sepal.Width ~ Species, data = iris) get_emcontrasts(model) ## Not run: # Dealing with interactions model <- lm(Sepal.Width ~ Species * Petal.Width, data = iris) # By default: selects first factor get_emcontrasts(model) # Or both get_emcontrasts(model, contrast = c("Species", "Petal.Width"), length = 2) # Or with custom specifications get_emcontrasts(model, contrast = c("Species", "Petal.Width=c(1, 2)")) # Or modulate it get_emcontrasts(model, by = "Petal.Width", length = 4) ## End(Not run) model <- lm(Sepal.Length ~ Species + Petal.Width, data = iris) # By default, 'by' is set to "Species" get_emmeans(model) ## Not run: # Overall mean (close to 'mean(iris$Sepal.Length)') get_emmeans(model, by = NULL) # One can estimate marginal means at several values of a 'modulate' variable get_emmeans(model, by = "Petal.Width", length = 3) # Interactions model <- lm(Sepal.Width ~ Species * Petal.Length, data = iris) get_emmeans(model) get_emmeans(model, by = c("Species", "Petal.Length"), length = 2) get_emmeans(model, by = c("Species", "Petal.Length = c(1, 3, 5)"), length = 2) ## End(Not run) ## Not run: model <- lm(Sepal.Width ~ Species * Petal.Length, data = iris) get_emtrends(model) get_emtrends(model, by = "Species") get_emtrends(model, by = "Petal.Length") get_emtrends(model, by = c("Species", "Petal.Length")) ## End(Not run) model <- lm(Petal.Length ~ poly(Sepal.Width, 4), data = iris) get_emtrends(model) get_emtrends(model, by = "Sepal.Width") model <- lm(Sepal.Length ~ Species + Petal.Width, data = iris) # By default, 'by' is set to "Species" get_marginalmeans(model) # Overall mean (close to 'mean(iris$Sepal.Length)') get_marginalmeans(model, by = NULL) ## Not run: # One can estimate marginal means at several values of a 'modulate' variable get_marginalmeans(model, by = "Petal.Width", length = 3) # Interactions model <- lm(Sepal.Width ~ Species * Petal.Length, data = iris) get_marginalmeans(model) get_marginalmeans(model, by = c("Species", "Petal.Length"), length = 2) get_marginalmeans(model, by = c("Species", "Petal.Length = c(1, 3, 5)"), length = 2) ## End(Not run) model <- lm(Sepal.Width ~ Species * Petal.Length, data = iris) get_marginaltrends(model, trend = "Petal.Length", by = "Species") get_marginaltrends(model, trend = "Petal.Length", by = "Petal.Length") get_marginaltrends(model, trend = "Petal.Length", by = c("Species", "Petal.Length"))# Basic usage model <- lm(Sepal.Width ~ Species, data = iris) get_emcontrasts(model) ## Not run: # Dealing with interactions model <- lm(Sepal.Width ~ Species * Petal.Width, data = iris) # By default: selects first factor get_emcontrasts(model) # Or both get_emcontrasts(model, contrast = c("Species", "Petal.Width"), length = 2) # Or with custom specifications get_emcontrasts(model, contrast = c("Species", "Petal.Width=c(1, 2)")) # Or modulate it get_emcontrasts(model, by = "Petal.Width", length = 4) ## End(Not run) model <- lm(Sepal.Length ~ Species + Petal.Width, data = iris) # By default, 'by' is set to "Species" get_emmeans(model) ## Not run: # Overall mean (close to 'mean(iris$Sepal.Length)') get_emmeans(model, by = NULL) # One can estimate marginal means at several values of a 'modulate' variable get_emmeans(model, by = "Petal.Width", length = 3) # Interactions model <- lm(Sepal.Width ~ Species * Petal.Length, data = iris) get_emmeans(model) get_emmeans(model, by = c("Species", "Petal.Length"), length = 2) get_emmeans(model, by = c("Species", "Petal.Length = c(1, 3, 5)"), length = 2) ## End(Not run) ## Not run: model <- lm(Sepal.Width ~ Species * Petal.Length, data = iris) get_emtrends(model) get_emtrends(model, by = "Species") get_emtrends(model, by = "Petal.Length") get_emtrends(model, by = c("Species", "Petal.Length")) ## End(Not run) model <- lm(Petal.Length ~ poly(Sepal.Width, 4), data = iris) get_emtrends(model) get_emtrends(model, by = "Sepal.Width") model <- lm(Sepal.Length ~ Species + Petal.Width, data = iris) # By default, 'by' is set to "Species" get_marginalmeans(model) # Overall mean (close to 'mean(iris$Sepal.Length)') get_marginalmeans(model, by = NULL) ## Not run: # One can estimate marginal means at several values of a 'modulate' variable get_marginalmeans(model, by = "Petal.Width", length = 3) # Interactions model <- lm(Sepal.Width ~ Species * Petal.Length, data = iris) get_marginalmeans(model) get_marginalmeans(model, by = c("Species", "Petal.Length"), length = 2) get_marginalmeans(model, by = c("Species", "Petal.Length = c(1, 3, 5)"), length = 2) ## End(Not run) model <- lm(Sepal.Width ~ Species * Petal.Length, data = iris) get_marginaltrends(model, trend = "Petal.Length", by = "Species") get_marginaltrends(model, trend = "Petal.Length", by = "Petal.Length") get_marginaltrends(model, trend = "Petal.Length", by = c("Species", "Petal.Length"))
Global options from the modelbased package
For calculating marginal means
options(modelbased_backend = <string>) will set a default value for the
backend argument and can be used to set the package used by default to
calculate marginal means. Can be "marginaleffects" or "emmeans".
options(modelbased_estimate = <string>) will set a default value for the
estimate argument, which modulates the type of target population
predictions refer to.
options(modelbased_integer = <value>) will set the minimum number of
unique values in an integer predictor to treat that predictor as a
"discrete integer" or as continuous. If the integer has more than
modelbased_integer unique values, it is treated as continuous. Set to
TRUE to always treat integer predictors as continuous.
For printing
options(modelbased_select = <string>) will set a default value for the
select argument and can be used to define a custom default layout for
printing.
options(modelbased_include_grid = TRUE) will set a default value for the
include_grid argument and can be used to include data grids in the output
by default or not.
options(modelbased_full_labels = FALSE) will remove redundant
(duplicated) labels from rows.
options(easystats_display_format = <value>) will set the default format
for the display() methods. Can be one of "markdown", "html", or
"tt". See display.estimate_contrasts() for details.
For plotting
options(modelbased_join_dots = <logical>) will set a default value for
the join_dots.
options(modelbased_numeric_as_discrete = <number>) will set a default
value for the modelbased_numeric_as_discrete argument. Can also be
FALSE.
options(modelbased_ribbon_alpha = <number>) will set a default value for
the alpha argument of the ribbon geom. Should be a number between 0
and 1.
options(modelbased_tinyplot_dodge = <number>) will set a default value
for the dodge argument (spacing between geoms) when using
tinyplot::plt(). Should be a number between 0 and 1.
Most modelbased objects can be visualized using either the plot()
function, which internally calls the visualisation_recipe() function and
relies on {ggplot2}. There is also a tinyplot() method, which uses the
{tinyplot} package and relies on the core R graphic system. See the
examples below for more information and examples on how to create and
customize plots.
The plotting works by mapping any predictors from the by argument to the
x-axis, colors, alpha (transparency) and facets. Thus, the appearance of the
plot depends on the order of the variables that you specify in the by
argument. For instance, the plots corresponding to
estimate_relation(model, by=c("Species", "Sepal.Length")) and
estimate_relation(model, by=c("Sepal.Length", "Species")) will look
different.
The automated plotting is primarily meant for convenient visual checks, but
for publication-ready figures, we recommend re-creating the figures using the
{ggplot2} package directly.
## S3 method for class 'estimate_predicted' plot(x, ...) ## S3 method for class 'estimate_means' plot(x, ...) ## S3 method for class 'estimate_means' tinyplot( x, type = NULL, dodge = NULL, show_data = FALSE, collapse_group = NULL, numeric_as_discrete = NULL, ... ) ## S3 method for class 'estimate_predicted' visualisation_recipe( x, show_data = FALSE, show_residuals = FALSE, collapse_group = NULL, point = NULL, line = NULL, pointrange = NULL, ribbon = NULL, facet = NULL, grid = NULL, join_dots = NULL, numeric_as_discrete = NULL, ... ) ## S3 method for class 'estimate_slopes' visualisation_recipe( x, line = NULL, pointrange = NULL, ribbon = NULL, facet = NULL, grid = NULL, ... ) ## S3 method for class 'estimate_grouplevel' visualisation_recipe( x, line = NULL, pointrange = NULL, ribbon = NULL, facet = NULL, grid = NULL, ... )## S3 method for class 'estimate_predicted' plot(x, ...) ## S3 method for class 'estimate_means' plot(x, ...) ## S3 method for class 'estimate_means' tinyplot( x, type = NULL, dodge = NULL, show_data = FALSE, collapse_group = NULL, numeric_as_discrete = NULL, ... ) ## S3 method for class 'estimate_predicted' visualisation_recipe( x, show_data = FALSE, show_residuals = FALSE, collapse_group = NULL, point = NULL, line = NULL, pointrange = NULL, ribbon = NULL, facet = NULL, grid = NULL, join_dots = NULL, numeric_as_discrete = NULL, ... ) ## S3 method for class 'estimate_slopes' visualisation_recipe( x, line = NULL, pointrange = NULL, ribbon = NULL, facet = NULL, grid = NULL, ... ) ## S3 method for class 'estimate_grouplevel' visualisation_recipe( x, line = NULL, pointrange = NULL, ribbon = NULL, facet = NULL, grid = NULL, ... )
x |
A modelbased object. |
... |
Arguments passed from |
type |
The type of |
dodge |
Dodge value for grouped plots. If |
show_data |
Logical, if |
collapse_group |
This argument only takes effect when either |
numeric_as_discrete |
Maximum number of unique values in a numeric
predictor to treat that predictor as discrete. Defaults to |
show_residuals |
Logical, if |
point, line, pointrange, ribbon, facet, grid
|
Additional aesthetics and parameters for the geoms (see customization example). |
join_dots |
Logical, if |
There are two options to remove the confidence bands or errors bars
from the plot. To remove error bars, simply set the pointrange geom to
point, e.g. plot(..., pointrange = list(geom = "point")). To remove the
confidence bands from line geoms, use ribbon = "none".
An object of class visualisation_recipe that describes the layers
used to create a plot based on {ggplot2}. The related plot() method is in
the {see} package.
Some arguments for plot() can get global defaults using options():
modelbased_join_dots: options(modelbased_join_dots = <logical>) will
set a default value for the join_dots.
modelbased_numeric_as_discrete: options(modelbased_numeric_as_discrete = <number>)
will set a default value for the modelbased_numeric_as_discrete argument.
Can also be FALSE.
modelbased_ribbon_alpha: options(modelbased_ribbon_alpha = <number>)
will set a default value for the alpha argument of the ribbon geom.
Should be a number between 0 and 1.
modelbased_tinyplot_dodge: options(modelbased_tinyplot_dodge = <number>)
will set a default value for the dodge argument (spacing between geoms)
when using tinyplot::plt(). Should be a number between 0 and 1.
# ============================================== # tinyplot # ============================================== library(tinyplot) data(efc, package = "modelbased") efc <- datawizard::to_factor(efc, c("e16sex", "c172code", "e42dep")) m <- lm(neg_c_7 ~ e16sex + c172code + barthtot, data = efc) em <- estimate_means(m, "c172code") plt(em) # pass additional tinyplot arguments for customization, e.g. plt(em, theme = "classic") plt(em, theme = "classic", flip = TRUE) # etc. # Aside: use tinyplot::tinytheme() to set a persistent theme tinytheme("classic") # continuous variable example em <- estimate_means(m, "barthtot") plt(em) # grouped example m <- lm(neg_c_7 ~ e16sex * c172code + e42dep, data = efc) em <- estimate_means(m, c("e16sex", "c172code")) plt(em) # use plt_add (alias tinyplot_add) to add layers plt_add(type = "l", lty = 2) # Reset to default theme tinytheme() library(ggplot2) library(see) # ============================================== # estimate_relation, estimate_expectation, ... # ============================================== # Simple Model --------------- x <- estimate_relation(lm(mpg ~ wt, data = mtcars)) layers <- visualisation_recipe(x) layers plot(layers) # visualization_recipe() is called implicitly when you call plot() plot(estimate_relation(lm(mpg ~ qsec, data = mtcars))) ## Not run: # It can be used in a pipe workflow lm(mpg ~ qsec, data = mtcars) |> estimate_relation(ci = c(0.5, 0.8, 0.9)) |> plot() # Customize aesthetics ---------- plot(x, point = list(color = "red", alpha = 0.6, size = 3), line = list(color = "blue", size = 3), ribbon = list(fill = "green", alpha = 0.7) ) + theme_minimal() + labs(title = "Relationship between MPG and WT") # Customize raw data ------------- plot(x, point = list(geom = "density_2d_filled"), line = list(color = "white")) + scale_x_continuous(expand = c(0, 0)) + scale_y_continuous(expand = c(0, 0)) + theme(legend.position = "none") # Single predictors examples ----------- plot(estimate_relation(lm(Sepal.Length ~ Species, data = iris))) # 2-ways interaction ------------ # Numeric * numeric x <- estimate_relation(lm(mpg ~ wt * qsec, data = mtcars)) plot(x) # Numeric * factor x <- estimate_relation(lm(Sepal.Width ~ Sepal.Length * Species, data = iris)) plot(x) # ============================================== # estimate_means # ============================================== # Simple Model --------------- x <- estimate_means(lm(Sepal.Width ~ Species, data = iris), by = "Species") layers <- visualisation_recipe(x) layers plot(layers) # Customize aesthetics layers <- visualisation_recipe(x, point = list(width = 0.03, color = "red"), pointrange = list(size = 2, linewidth = 2), line = list(linetype = "dashed", color = "blue") ) plot(layers) # Two levels --------------- data <- mtcars data$cyl <- as.factor(data$cyl) model <- lm(mpg ~ cyl * wt, data = data) x <- estimate_means(model, by = c("cyl", "wt")) plot(x) # GLMs --------------------- data <- data.frame(vs = mtcars$vs, cyl = as.factor(mtcars$cyl)) x <- estimate_means(glm(vs ~ cyl, data = data, family = "binomial"), by = c("cyl")) plot(x) # ============================================== # Adding original data to the plot # ============================================== data(efc, package = "modelbased") efc$e15relat <- as.factor(efc$e15relat) efc$c161sex <- as.factor(efc$c161sex) levels(efc$c161sex) <- c("male", "female") model <- lme4::lmer(neg_c_7 ~ c161sex + (1 | e15relat), data = efc) me <- estimate_means(model, "c161sex") plot(me, show_data = TRUE) # data points: collapse by / average over random effects groups ------- plot(me, show_data = TRUE, collapse_group = "e15relat") ## End(Not run) # ============================================== # estimate_slopes # ============================================== model <- lm(Sepal.Width ~ Species * Petal.Length, data = iris) x <- estimate_slopes(model, trend = "Petal.Length", by = "Species") layers <- visualisation_recipe(x) layers plot(layers) ## Not run: # Customize aesthetics and add horizontal line and theme layers <- visualisation_recipe(x, pointrange = list(size = 2, linewidth = 2)) plot(layers) + geom_hline(yintercept = 0, linetype = "dashed", color = "red") + theme_minimal() + labs(y = "Effect of Petal.Length", title = "Marginal Effects") model <- lm(Petal.Length ~ poly(Sepal.Width, 4), data = iris) x <- estimate_slopes(model, trend = "Sepal.Width", by = "Sepal.Width", length = 20) plot(visualisation_recipe(x)) model <- lm(Petal.Length ~ Species * poly(Sepal.Width, 3), data = iris) x <- estimate_slopes(model, trend = "Sepal.Width", by = c("Sepal.Width", "Species")) plot(visualisation_recipe(x)) ## End(Not run) # ============================================== # estimate_grouplevel # ============================================== ## Not run: data <- lme4::sleepstudy data <- rbind(data, data) data$Newfactor <- rep(c("A", "B", "C", "D")) # 1 random intercept model <- lme4::lmer(Reaction ~ Days + (1 | Subject), data = data) x <- estimate_grouplevel(model) layers <- visualisation_recipe(x) layers plot(layers) # 2 random intercepts model <- lme4::lmer(Reaction ~ Days + (1 | Subject) + (1 | Newfactor), data = data) x <- estimate_grouplevel(model) plot(x) + geom_hline(yintercept = 0, linetype = "dashed") + theme_minimal() # Note: we need to use hline instead of vline because the axes is flipped model <- lme4::lmer(Reaction ~ Days + (1 + Days | Subject) + (1 | Newfactor), data = data) x <- estimate_grouplevel(model) plot(x) ## End(Not run)# ============================================== # tinyplot # ============================================== library(tinyplot) data(efc, package = "modelbased") efc <- datawizard::to_factor(efc, c("e16sex", "c172code", "e42dep")) m <- lm(neg_c_7 ~ e16sex + c172code + barthtot, data = efc) em <- estimate_means(m, "c172code") plt(em) # pass additional tinyplot arguments for customization, e.g. plt(em, theme = "classic") plt(em, theme = "classic", flip = TRUE) # etc. # Aside: use tinyplot::tinytheme() to set a persistent theme tinytheme("classic") # continuous variable example em <- estimate_means(m, "barthtot") plt(em) # grouped example m <- lm(neg_c_7 ~ e16sex * c172code + e42dep, data = efc) em <- estimate_means(m, c("e16sex", "c172code")) plt(em) # use plt_add (alias tinyplot_add) to add layers plt_add(type = "l", lty = 2) # Reset to default theme tinytheme() library(ggplot2) library(see) # ============================================== # estimate_relation, estimate_expectation, ... # ============================================== # Simple Model --------------- x <- estimate_relation(lm(mpg ~ wt, data = mtcars)) layers <- visualisation_recipe(x) layers plot(layers) # visualization_recipe() is called implicitly when you call plot() plot(estimate_relation(lm(mpg ~ qsec, data = mtcars))) ## Not run: # It can be used in a pipe workflow lm(mpg ~ qsec, data = mtcars) |> estimate_relation(ci = c(0.5, 0.8, 0.9)) |> plot() # Customize aesthetics ---------- plot(x, point = list(color = "red", alpha = 0.6, size = 3), line = list(color = "blue", size = 3), ribbon = list(fill = "green", alpha = 0.7) ) + theme_minimal() + labs(title = "Relationship between MPG and WT") # Customize raw data ------------- plot(x, point = list(geom = "density_2d_filled"), line = list(color = "white")) + scale_x_continuous(expand = c(0, 0)) + scale_y_continuous(expand = c(0, 0)) + theme(legend.position = "none") # Single predictors examples ----------- plot(estimate_relation(lm(Sepal.Length ~ Species, data = iris))) # 2-ways interaction ------------ # Numeric * numeric x <- estimate_relation(lm(mpg ~ wt * qsec, data = mtcars)) plot(x) # Numeric * factor x <- estimate_relation(lm(Sepal.Width ~ Sepal.Length * Species, data = iris)) plot(x) # ============================================== # estimate_means # ============================================== # Simple Model --------------- x <- estimate_means(lm(Sepal.Width ~ Species, data = iris), by = "Species") layers <- visualisation_recipe(x) layers plot(layers) # Customize aesthetics layers <- visualisation_recipe(x, point = list(width = 0.03, color = "red"), pointrange = list(size = 2, linewidth = 2), line = list(linetype = "dashed", color = "blue") ) plot(layers) # Two levels --------------- data <- mtcars data$cyl <- as.factor(data$cyl) model <- lm(mpg ~ cyl * wt, data = data) x <- estimate_means(model, by = c("cyl", "wt")) plot(x) # GLMs --------------------- data <- data.frame(vs = mtcars$vs, cyl = as.factor(mtcars$cyl)) x <- estimate_means(glm(vs ~ cyl, data = data, family = "binomial"), by = c("cyl")) plot(x) # ============================================== # Adding original data to the plot # ============================================== data(efc, package = "modelbased") efc$e15relat <- as.factor(efc$e15relat) efc$c161sex <- as.factor(efc$c161sex) levels(efc$c161sex) <- c("male", "female") model <- lme4::lmer(neg_c_7 ~ c161sex + (1 | e15relat), data = efc) me <- estimate_means(model, "c161sex") plot(me, show_data = TRUE) # data points: collapse by / average over random effects groups ------- plot(me, show_data = TRUE, collapse_group = "e15relat") ## End(Not run) # ============================================== # estimate_slopes # ============================================== model <- lm(Sepal.Width ~ Species * Petal.Length, data = iris) x <- estimate_slopes(model, trend = "Petal.Length", by = "Species") layers <- visualisation_recipe(x) layers plot(layers) ## Not run: # Customize aesthetics and add horizontal line and theme layers <- visualisation_recipe(x, pointrange = list(size = 2, linewidth = 2)) plot(layers) + geom_hline(yintercept = 0, linetype = "dashed", color = "red") + theme_minimal() + labs(y = "Effect of Petal.Length", title = "Marginal Effects") model <- lm(Petal.Length ~ poly(Sepal.Width, 4), data = iris) x <- estimate_slopes(model, trend = "Sepal.Width", by = "Sepal.Width", length = 20) plot(visualisation_recipe(x)) model <- lm(Petal.Length ~ Species * poly(Sepal.Width, 3), data = iris) x <- estimate_slopes(model, trend = "Sepal.Width", by = c("Sepal.Width", "Species")) plot(visualisation_recipe(x)) ## End(Not run) # ============================================== # estimate_grouplevel # ============================================== ## Not run: data <- lme4::sleepstudy data <- rbind(data, data) data$Newfactor <- rep(c("A", "B", "C", "D")) # 1 random intercept model <- lme4::lmer(Reaction ~ Days + (1 | Subject), data = data) x <- estimate_grouplevel(model) layers <- visualisation_recipe(x) layers plot(layers) # 2 random intercepts model <- lme4::lmer(Reaction ~ Days + (1 | Subject) + (1 | Newfactor), data = data) x <- estimate_grouplevel(model) plot(x) + geom_hline(yintercept = 0, linetype = "dashed") + theme_minimal() # Note: we need to use hline instead of vline because the axes is flipped model <- lme4::lmer(Reaction ~ Days + (1 + Days | Subject) + (1 | Newfactor), data = data) x <- estimate_grouplevel(model) plot(x) ## End(Not run)
estimate_contrasts()
This function "pools" (i.e. combines) multiple estimate_contrasts objects,
returned by estimate_contrasts(), in a similar fashion as mice::pool().
pool_contrasts(x, ...)pool_contrasts(x, ...)
x |
A list of |
... |
Currently not used. |
Averaging of parameters follows Rubin's rules (Rubin, 1987, p. 76).
A data frame with pooled comparisons or contrasts of predictions.
Rubin, D.B. (1987). Multiple Imputation for Nonresponse in Surveys. New York: John Wiley and Sons.
data("nhanes2", package = "mice") imp <- mice::mice(nhanes2, printFlag = FALSE) comparisons <- lapply(1:5, function(i) { m <- lm(bmi ~ age + hyp + chl, data = mice::complete(imp, action = i)) estimate_contrasts(m, "age") }) pool_contrasts(comparisons)data("nhanes2", package = "mice") imp <- mice::mice(nhanes2, printFlag = FALSE) comparisons <- lapply(1:5, function(i) { m <- lm(bmi ~ age + hyp + chl, data = mice::complete(imp, action = i)) estimate_contrasts(m, "age") }) pool_contrasts(comparisons)
This function "pools" (i.e. combines) multiple estimate_means objects, in
a similar fashion as mice::pool().
pool_predictions(x, transform = NULL, ...) pool_slopes(x, transform = NULL, ...)pool_predictions(x, transform = NULL, ...) pool_slopes(x, transform = NULL, ...)
x |
A list of |
transform |
A function applied to predictions and confidence intervals
to (back-) transform results, which can be useful in case the regression
model has a transformed response variable (e.g., |
... |
Currently not used. |
Averaging of parameters follows Rubin's rules (Rubin, 1987, p. 76).
Pooling is applied to the predicted values and based on the standard errors
as they are calculated in the estimate_means or estimate_predicted
objects provided in x. For objects of class estimate_means, the predicted
values are on the response scale by default, and standard errors are
calculated using the delta method. Then, pooling estimates and calculating
standard errors for the pooled estimates based ob Rubin's rule is carried
out. There is no back-transformation to the link-scale of predicted values
before applying Rubin's rule.
A data frame with pooled predictions.
Rubin, D.B. (1987). Multiple Imputation for Nonresponse in Surveys. New York: John Wiley and Sons.
# example for multiple imputed datasets data("nhanes2", package = "mice") imp <- mice::mice(nhanes2, printFlag = FALSE) # estimated marginal means predictions <- lapply(1:5, function(i) { m <- lm(bmi ~ age + hyp + chl, data = mice::complete(imp, action = i)) estimate_means(m, "age") }) pool_predictions(predictions) # estimated slopes (marginal effects) slopes <- lapply(1:5, function(i) { m <- lm(bmi ~ age + hyp + chl, data = mice::complete(imp, action = i)) estimate_slopes(m, "chl") }) pool_slopes(slopes)# example for multiple imputed datasets data("nhanes2", package = "mice") imp <- mice::mice(nhanes2, printFlag = FALSE) # estimated marginal means predictions <- lapply(1:5, function(i) { m <- lm(bmi ~ age + hyp + chl, data = mice::complete(imp, action = i)) estimate_means(m, "age") }) pool_predictions(predictions) # estimated slopes (marginal effects) slopes <- lapply(1:5, function(i) { m <- lm(bmi ~ age + hyp + chl, data = mice::complete(imp, action = i)) estimate_slopes(m, "chl") }) pool_slopes(slopes)
Fictitious data related to whether puppy therapy works when you
adjust for a person’s love of puppies, taken from the {discovr} package
(Field 2025)
Following variables are included in the dataset:
'id“: Participant id
dose: Treatment group to which the participant was randomly assigned (No
puppies (control), 15 minutes of puppy therapy, 30 minutes of puppy
therapy)
happiness: Self-reported happiness from 0 (as unhappy as I can possibly
imagine being) to 10 (as happy as I can possibly imagine being)
puppy_love: Self-reported love of puppies from 0 (I am a weird person who
hates puppies, please be deeply suspicious of me) to 7 (puppies are the
best thing ever, one day I might marry one)
For further details, see ?discovr::puppy_love.
Field, A. P. (2025). Discovering statistics using R and RStudio (2nd ed.). London: Sage.
This function computes partial residuals based on a data grid,
where the data grid is usually a data frame from all combinations of factor
variables or certain values of numeric vectors. This data grid is usually used
as newdata argument in predict(), and can be created with
insight::get_datagrid().
residualize_over_grid(grid, model, ...) ## S3 method for class 'data.frame' residualize_over_grid(grid, model, predictor_name, ...)residualize_over_grid(grid, model, ...) ## S3 method for class 'data.frame' residualize_over_grid(grid, model, predictor_name, ...)
grid |
A data frame representing the data grid, or an object of class
|
model |
The model for which to compute partial residuals. The data grid
|
... |
Currently not used. |
predictor_name |
The name of the focal predictor, for which partial residuals are computed. |
A data frame with residuals for the focal predictor.
For generalized linear models (glms), residualized scores are computed as
inv.link(link(Y) + r) where Y are the predicted values on the response
scale, and r are the working residuals.
For (generalized) linear mixed models, the random effect are also partialled out.
Fox J, Weisberg S. Visualizing Fit and Lack of Fit in Complex Regression Models with Predictor Effect Plots and Partial Residuals. Journal of Statistical Software 2018;87.
set.seed(1234) x1 <- rnorm(200) x2 <- rnorm(200) # quadratic relationship y <- 2 * x1 + x1^2 + 4 * x2 + rnorm(200) d <- data.frame(x1, x2, y) model <- lm(y ~ x1 + x2, data = d) pr <- estimate_means(model, c("x1", "x2")) head(residualize_over_grid(pr, model))set.seed(1234) x1 <- rnorm(200) x2 <- rnorm(200) # quadratic relationship y <- 2 * x1 + x1^2 + 4 * x2 + rnorm(200) d <- data.frame(x1, x2, y) model <- lm(y ~ x1 + x2, data = d) pr <- estimate_means(model, c("x1", "x2")) head(residualize_over_grid(pr, model))
Smoothing a vector or a time series. For data.frames, the function will smooth all numeric variables stratified by factor levels (i.e., will smooth within each factor level combination).
smoothing(x, method = "loess", strength = 0.25, ...)smoothing(x, method = "loess", strength = 0.25, ...)
x |
A numeric vector. |
method |
Can be "loess" (default) or "smooth". A loess smoothing can be slow. |
strength |
This argument only applies when |
... |
Arguments passed to or from other methods. |
A smoothed vector or data frame.
x <- sin(seq(0, 4 * pi, length.out = 100)) + rnorm(100, 0, 0.2) plot(x, type = "l") lines(smoothing(x, method = "smooth"), type = "l", col = "blue") lines(smoothing(x, method = "loess"), type = "l", col = "red") x <- sin(seq(0, 4 * pi, length.out = 10000)) + rnorm(10000, 0, 0.2) plot(x, type = "l") lines(smoothing(x, method = "smooth"), type = "l", col = "blue") lines(smoothing(x, method = "loess"), type = "l", col = "red")x <- sin(seq(0, 4 * pi, length.out = 100)) + rnorm(100, 0, 0.2) plot(x, type = "l") lines(smoothing(x, method = "smooth"), type = "l", col = "blue") lines(smoothing(x, method = "loess"), type = "l", col = "red") x <- sin(seq(0, 4 * pi, length.out = 10000)) + rnorm(10000, 0, 0.2) plot(x, type = "l") lines(smoothing(x, method = "smooth"), type = "l", col = "blue") lines(smoothing(x, method = "loess"), type = "l", col = "red")
Find zero crossings of a vector, i.e., indices when the numeric variable crosses 0. It is useful for finding the points where a function changes by looking at the zero crossings of its derivative.
zero_crossings(x) find_inversions(x)zero_crossings(x) find_inversions(x)
x |
A numeric vector. |
Vector of zero crossings or points of inversion.
Based on the uniroot.all function from the rootSolve package.
x <- sin(seq(0, 4 * pi, length.out = 100)) # plot(x, type = "b") modelbased::zero_crossings(x) modelbased::find_inversions(x)x <- sin(seq(0, 4 * pi, length.out = 100)) # plot(x, type = "b") modelbased::zero_crossings(x) modelbased::find_inversions(x)