Homework 06

Author

Alec L. Robitaille

Published

April 11, 2024

Setup

targets::tar_source('R')

Question 1

Conduct a prior predictive simulation for the Reedfrog model. By this I mean to simulate the prior distribution of tank survival probabilities. Start by using this prior:

\[\alpha_{j} \sim Normal(\bar{\alpha}, \sigma)\] \[\bar{\alpha} \sim Normal(0, 1)\] \[\sigma \sim Exponential(1)\]

Be sure to transform the alpha j values to the probability scale for plotting and summary. How does increasing the width of the prior on sigma change the prior distribution of alpha j? You might try Exponential(10) and Exponential(0.1) for example.

Prior predictive simulation

# Load data
tar_load(reedfrogs)

tar_load(h06_q01_exp_1_brms_sample_prior)
h06_q01_exp_1_brms_sample_prior$prior

          prior     class      coef group resp dpar nlpar lb ub       source
   normal(0, 1) Intercept                                               user
 exponential(1)        sd                                  0            user
 exponential(1)        sd            tank                  0    (vectorized)
 exponential(1)        sd Intercept  tank                  0    (vectorized)

mcmc_areas(
    h06_q01_exp_1_brms_sample_prior,
    regex_pars = 'r_tank', 
    transformations = inv_logit
)

tar_load(h06_q01_exp_0pt1_brms_sample_prior)
h06_q01_exp_0pt1_brms_sample_prior$prior

            prior     class      coef group resp dpar nlpar lb ub       source
     normal(0, 1) Intercept                                               user
 exponential(0.1)        sd                                  0            user
 exponential(0.1)        sd            tank                  0    (vectorized)
 exponential(0.1)        sd Intercept  tank                  0    (vectorized)

mcmc_areas(
    h06_q01_exp_0pt1_brms_sample_prior,
    regex_pars = 'r_tank',
    transformations = inv_logit
)

tar_load(h06_q01_exp_10_brms_sample_prior)
h06_q01_exp_10_brms_sample_prior$prior

           prior     class      coef group resp dpar nlpar lb ub       source
    normal(0, 1) Intercept                                               user
 exponential(10)        sd                                  0            user
 exponential(10)        sd            tank                  0    (vectorized)
 exponential(10)        sd Intercept  tank                  0    (vectorized)

mcmc_areas(
    h06_q01_exp_10_brms_sample_prior, 
    regex_pars = 'r_tank', 
    transformations = inv_logit
)

Question 2

Revisit the Reedfrog survival data, data(reedfrogs). Start with the varying effects model from the book and lecture. Then modify it to estimate the causal effects of the treatment variables pred and size, including how size might modify the effect of predation. An easy approach is to estimate an effect for each combination of pred and size. Justify your model with a DAG of this experiment

Estimand

What is the causal effect of predation and size on survival?

Scientific model

coords <- data.frame(
    name = c('D', 'G', 'P', 'T', 'S'),
    x =    c(1,    2,   3,   1,    2),
    y =    c(0,    0,   0,   1,  1)
)

dag <- dagify(
    S ~ D + G + P + T,
  coords = coords,
    outcome = 'S',
    exposure = c('G', 'P')
)
ggdag(dag, seed = 2, layout = 'auto') + theme_dag()

48 tanks (T) of reedfrogs
treatments: density (D), size (G), predation (P)
outcome: survival (S)

Adjustment set

ggdag_adjustment_set(dag, effect = 'direct') + theme_dag()

Analyze the data

# Load model
tar_load(h06_q02_brms_sample)
h06_q02_brms_sample

 Family: binomial 
  Links: mu = logit 
Formula: surv | trials(density) ~ 1 + pred * size + (1 | tank) 
   Data: h06_q02_brms_data (Number of observations: 48) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Multilevel Hyperparameters:
~tank (Number of levels: 48) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.60      0.11     0.39     0.84 1.00     2366     3084

Regression Coefficients:
                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept              2.42      0.26     1.92     2.95 1.00     3325     3205
predpred              -2.74      0.32    -3.37    -2.09 1.00     2578     2881
sizesmall              0.16      0.35    -0.52     0.86 1.00     2709     3141
predpred:sizesmall     0.56      0.43    -0.31     1.40 1.00     2404     2819

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

# Estimated marginal effects
marg_eff <- emmeans(h06_q02_brms_sample, ~ size*pred, regrid = 'response')
marg_eff

 size  pred  prob lower.HPD upper.HPD
 big   no   0.918     0.876     0.953
 small no   0.929     0.892     0.961
 big   pred 0.418     0.319     0.524
 small pred 0.598     0.496     0.697

Point estimate displayed: median 
HPD interval probability: 0.95

plot(marg_eff) + labs(x = 'Survival', y = 'Size : Predation')

Question 3

3-OPTIONAL CHALLENGE. Return to the Trolley data, data(Trolley), from Chapter 12. Define and fit a varying intercepts model for these data. By this I mean to add an intercept parameter for the individual participants to the linear model. Cluster the varying intercepts on individual participants, as indicated by the unique values in the id variable. Include action, intention, and contact as treatment effects of interest. Compare the varying intercepts model and a model that ignores individuals. What is the impact of individual variation in these data?

Estimand

What is the causal effect of action, intention and contact on survey responses, considering the variation of individuals?

Scientific model

coords <- data.frame(
    name = c('X', 'R', 'S', 'E', 'Y', 'G',  'P'),
    x =    c(1,    2,   3,   1,    2,   3,  2),
    y =    c(0,    0,   0,   -2,  -1,  -2,  -3)
)

dag <- dagify(
    R ~ X + S + E + Y + G,
    E ~ Y + G,
  coords = coords
)
ggdag(dag, seed = 2, layout = 'auto') + theme_dag()

R: response to the trolley story
X: action, intention, contact
S: story
E: education
Y: age
G: gender

Analyze the data

tar_load(trolley)

tar_load(h06_q03_no_ind_var_brms_sample)
tar_load(h06_q03_ind_var_brms_sample)

h06_q03_no_ind_var_brms_sample

 Family: cumulative 
  Links: mu = logit; disc = identity 
Formula: response ~ action + intention + contact 
   Data: h06_q03_no_ind_var_brms_data (Number of observations: 9930) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1]    -2.83      0.05    -2.92    -2.74 1.00     2852     2513
Intercept[2]    -2.15      0.04    -2.24    -2.07 1.00     3311     2649
Intercept[3]    -1.57      0.04    -1.65    -1.49 1.00     3371     2981
Intercept[4]    -0.55      0.04    -0.62    -0.48 1.00     3692     3232
Intercept[5]     0.12      0.04     0.05     0.19 1.00     3963     3357
Intercept[6]     1.02      0.04     0.95     1.10 1.00     4207     3415
action          -0.71      0.04    -0.79    -0.62 1.00     3790     2873
intention       -0.72      0.04    -0.79    -0.65 1.00     4356     3247
contact         -0.96      0.05    -1.06    -0.86 1.00     4065     2910

Further Distributional Parameters:
     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc     1.00      0.00     1.00     1.00   NA       NA       NA

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

h06_q03_ind_var_brms_sample

 Family: cumulative 
  Links: mu = logit; disc = identity 
Formula: response ~ action + intention + contact + (1 | id) 
   Data: h06_q03_ind_var_brms_data (Number of observations: 9930) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Multilevel Hyperparameters:
~id (Number of levels: 331) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     1.89      0.08     1.73     2.05 1.00      654     1167

Regression Coefficients:
             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1]    -3.95      0.12    -4.17    -3.73 1.01      373      838
Intercept[2]    -3.03      0.11    -3.25    -2.82 1.01      354      742
Intercept[3]    -2.25      0.11    -2.46    -2.03 1.01      345      737
Intercept[4]    -0.78      0.11    -0.99    -0.57 1.01      336      754
Intercept[5]     0.26      0.11     0.04     0.47 1.01      334      838
Intercept[6]     1.66      0.11     1.44     1.88 1.01      357      825
action          -0.96      0.04    -1.04    -0.87 1.00     6949     3184
intention       -0.95      0.04    -1.03    -0.88 1.00     7202     3024
contact         -1.27      0.05    -1.38    -1.17 1.00     6878     3113

Further Distributional Parameters:
     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc     1.00      0.00     1.00     1.00   NA       NA       NA

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

tidy_selected_vars <- function(m, vars) {
    draws <- tidy_draws(m) |>
        gather_variables()
    setDT(draws)
    draws[.variable %in% vars]
}
sel_vars <- c('b_action', 'b_intention', 'b_contact')

no_ind_var <- tidy_selected_vars(h06_q03_no_ind_var_brms_sample, sel_vars)
no_ind_var[, type := 'No individual variation']

ind_var <- tidy_selected_vars(h06_q03_ind_var_brms_sample, sel_vars)
ind_var[, type := 'Varying intercepts for each individual']

ggplot(rbindlist(list(no_ind_var, ind_var))) + 
    stat_halfeye(position = 'dodge', 
                             aes(inv_logit(.value),
                                    type,
                                    fill = type),
                             alpha = 0.2) + 
    guides(fill = 'none') + 
    labs(x = 'inv_logit(parameter estimate)', y = '') + 
    xlim(0, 1) + 
    facet_wrap(~.variable, ncol = 1)

Parameter estimates for intention, action and contact for the model that includes varying intercepts for each individual are consistently lower than the model that ignores individual variation.