Homework 05

Author

Alec L. Robitaille

Published

April 28, 2023

source('R/packages.R')

Data

# Data
source('R/data_grants.R')

# discipline
# gender
# applications
# awards

DT_grants <- data_grants()

Question 1

The data in data(NWOGrants) are outcomes for scientific funding applications for the Netherlands Organization for Scientific Research (NWO) from 2010–2012 (see van der Lee and Ellemers doi:10.1073/pnas.1510159112). These data have a structure similar to the UCBAdmit data discussed in Chapter 11 and in lecture. There are applications and each has an associated gender (of the lead researcher). But instead of departments, there are disciplines. Draw a DAG for this sample. Then use the backdoor criterion and a binomial GLM to estimate the TOTAL causal effect of gender on grant awards

coords <- data.frame(
    name = c('G', 'D', 'A'),
    x =    c(1,    2,   3),
    y =    c(0,    1,   0)
)

dagify(
    A ~ G + D,
    D ~ G + A,
    coords = coords
) |> ggdag(seed = 2) + theme_dag()

Total effect of gender on award has two paths: through discipline and directly to award.

Prior predictive simulation

get_prior(
    awards | trials(applications) ~ gender,
    family = 'binomial',
    data = DT_grants
)
                prior     class    coef group resp dpar nlpar lb ub
               (flat)         b                                    
               (flat)         b genderm                            
 student_t(3, 0, 2.5) Intercept                                    
       source
      default
 (vectorized)
      default
tar_load(m_h05_q01_prior)
m_h05_q01_prior$priors
NULL
mcmc_areas(m_h05_q01_prior, regex_pars = 'b')

Posterior

tar_load(m_h05_q01)
m_h05_q01
 Family: binomial 
  Links: mu = logit 
Formula: awards | trials(applications) ~ gender 
   Data: DT_grants (Number of observations: 18) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    -1.74      0.08    -1.89    -1.59 1.00     2211     2503
genderm       0.20      0.10     0.01     0.40 1.00     2691     2575

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
mcmc_areas(m_h05_q01, regex_pars = 'b')

Estimated marginal means

emmean_total_gender <- emmeans(m_h05_q01, ~ gender, regrid = 'response')
emmean_total_gender
 gender  prob lower.HPD upper.HPD
 f      0.150     0.131     0.170
 m      0.177     0.158     0.194

Point estimate displayed: median 
HPD interval probability: 0.95 
plot(emmean_total_gender, level = .87) + xlim(0, 1) + xlab('award')

Contrast

contrast(emmean_total_gender, method = 'pairwise')
 contrast estimate lower.HPD upper.HPD
 f - m     -0.0273   -0.0544  -0.00148

Point estimate displayed: median 
HPD interval probability: 0.95 

Question 2

Now estimate the DIRECT causal effect of gender on grant awards. Use the same DAG as above to justify one or more binomial models. Compute the average direct causal effect of gender, weighting each discipline in proportion to the number of applications in the sample. Refer to the marginal effect example in Lecture 9 for help.

Direct effect of gender on award requires conditioning on discipline.

Posterior

tar_load(m_h05_q02)
m_h05_q02
 Family: binomial 
  Links: mu = logit 
Formula: awards | trials(applications) ~ gender * discipline 
   Data: DT_grants (Number of observations: 18) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Population-Level Effects: 
                                     Estimate Est.Error l-95% CI u-95% CI Rhat
Intercept                               -1.39      0.17    -1.72    -1.07 1.00
genderm                                  0.15      0.18    -0.21     0.50 1.00
disciplineEarthDlifesciences            -0.26      0.25    -0.76     0.21 1.00
disciplineHumanities                    -0.09      0.22    -0.54     0.32 1.00
disciplineInterdisciplinary             -0.06      0.26    -0.56     0.45 1.00
disciplineMedicalsciences               -0.56      0.22    -1.00    -0.15 1.00
disciplinePhysicalsciences               0.03      0.29    -0.53     0.59 1.00
disciplinePhysics                        0.11      0.35    -0.57     0.80 1.00
disciplineSocialsciences                -0.59      0.20    -0.98    -0.20 1.00
disciplineTechnicalsciences             -0.05      0.27    -0.58     0.46 1.00
genderm:disciplineEarthDlifesciences     0.32      0.28    -0.23     0.88 1.00
genderm:disciplineHumanities            -0.40      0.26    -0.93     0.12 1.00
genderm:disciplineInterdisciplinary     -0.56      0.33    -1.21     0.08 1.00
genderm:disciplineMedicalsciences        0.31      0.26    -0.18     0.81 1.00
genderm:disciplinePhysicalsciences      -0.19      0.31    -0.80     0.42 1.00
genderm:disciplinePhysics                0.09      0.37    -0.61     0.81 1.00
genderm:disciplineSocialsciences         0.11      0.24    -0.36     0.57 1.00
genderm:disciplineTechnicalsciences     -0.33      0.29    -0.89     0.25 1.00
                                     Bulk_ESS Tail_ESS
Intercept                                3049     3085
genderm                                  3420     3139
disciplineEarthDlifesciences             4020     2646
disciplineHumanities                     3883     3121
disciplineInterdisciplinary              4820     3325
disciplineMedicalsciences                3890     3053
disciplinePhysicalsciences               4429     2894
disciplinePhysics                        4788     2639
disciplineSocialsciences                 3646     2997
disciplineTechnicalsciences              4453     3311
genderm:disciplineEarthDlifesciences     5047     3279
genderm:disciplineHumanities             4354     2747
genderm:disciplineInterdisciplinary      5120     2880
genderm:disciplineMedicalsciences        4444     3457
genderm:disciplinePhysicalsciences       4971     3365
genderm:disciplinePhysics                4965     2604
genderm:disciplineSocialsciences         4069     3360
genderm:disciplineTechnicalsciences      4916     3102

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
mcmc_areas(m_h05_q02, regex_pars = 'b')

Estimated marginal means

emmean_direct_gender <- emmeans(m_h05_q02, ~ gender * discipline, regrid = 'response')
emmean_direct_gender
 gender discipline           prob lower.HPD upper.HPD
 f      Chemical sciences   0.199    0.1506     0.252
 m      Chemical sciences   0.224    0.1658     0.286
 f      Earth/life sciences 0.161    0.1076     0.218
 m      Earth/life sciences 0.234    0.1746     0.295
 f      Humanities          0.184    0.1343     0.237
 m      Humanities          0.150    0.1082     0.197
 f      Interdisciplinary   0.191    0.1243     0.264
 m      Interdisciplinary   0.135    0.0833     0.199
 f      Medical sciences    0.124    0.0924     0.163
 m      Medical sciences    0.182    0.1441     0.232
 f      Physical sciences   0.205    0.1239     0.296
 m      Physical sciences   0.197    0.1359     0.262
 f      Physics             0.217    0.1069     0.340
 m      Physics             0.260    0.1662     0.358
 f      Social sciences     0.121    0.0949     0.153
 m      Social sciences     0.152    0.1174     0.183
 f      Technical sciences  0.192    0.1223     0.269
 m      Technical sciences  0.165    0.1187     0.217

Point estimate displayed: median 
HPD interval probability: 0.95 
plot(emmean_direct_gender, level = .87) + xlim(0, 1) + labs(x = 'award', y = 'discipline')

Contrast

Emmeans contrast

cont <- contrast(emmean_direct_gender, by = 'discipline')
contrast(emmean_direct_gender, by = 'discipline') |> plot()

Prediction contrast preserving entire posterior

predicted_f <- copy(DT_grants)[, gender := 'f'] |> add_linpred_draws(transform =TRUE, m_h05_q02, value = 'pred_f')
predicted_m <- copy(DT_grants)[, gender := 'm'] |> add_linpred_draws(transform =TRUE, m_h05_q02, value = 'pred_m')

setDT(predicted_f)
predicted_f[, gender := NULL]
setDT(predicted_m)
predicted_m[, gender := NULL]
m <- predicted_f[predicted_m, on = intersect(colnames(predicted_f), colnames(predicted_m))]

m[, diff_fm := pred_f - pred_m]


ggplot(m) + 
    geom_vline(xintercept = 0) + 
    stat_halfeye(aes(diff_fm, discipline)) + 
    labs(x = 'Difference (female - male) in predicted award rate by discipline')

ggplot(m) + 
    geom_vline(xintercept = 0) + 
    stat_halfeye(aes(diff_fm)) + 
    labs(x = 'Average difference (female - male) in predicted award rate across discplines')

Question 3

Considering the total effect (problem 1) and direct effect (problem 2) of gender, what causes contribute to the average difference between women and men in award rate in this sample? It is not necessary to say whether or not there is evidence of discrimination or the presence or absence of unobserved confounds (which are likely!). Simply explain how the direct effects you have estimated make sense (or not) of the total effect.

Gender level differences in number of applications to each discipline. Some disciplines, eg. social sciences and medical sciences, have lower overall award rates and female applicants tend to apply to these more than male applicants.

DT_grants[, .(discipline, prop_app = applications / sum(applications), awards), gender]
    gender          discipline    prop_app awards
    <fctr>              <fctr>       <num>  <int>
 1:      m   Chemical sciences 0.050764526     22
 2:      m   Physical sciences 0.082568807     26
 3:      m             Physics 0.040978593     18
 4:      m          Humanities 0.140672783     33
 5:      m  Technical sciences 0.115596330     30
 6:      m   Interdisciplinary 0.064220183     12
 7:      m Earth/life sciences 0.095412844     38
 8:      m     Social sciences 0.259938838     65
 9:      m    Medical sciences 0.149847095     46
10:      f   Chemical sciences 0.032828283     10
11:      f   Physical sciences 0.032828283      9
12:      f             Physics 0.007575758      2
13:      f          Humanities 0.139730640     32
14:      f  Technical sciences 0.052188552     13
15:      f   Interdisciplinary 0.065656566     17
16:      f Earth/life sciences 0.106060606     18
17:      f     Social sciences 0.344276094     47
18:      f    Medical sciences 0.218855219     29
ggplot(DT_grants[, .(awards, applications), .(gender, discipline)],
    aes(awards / applications, discipline)) + 
    geom_line() + 
    geom_point(aes(color = gender), size = 3)  +
    scale_color_scico_d(end = .7)

DT_grants[, award_by_disc := sum(awards), discipline]
DT_grants[, prop_award_by_disc := sum(awards) / sum(applications), discipline]

ggplot(DT_grants[, .(prop_award_by_disc, discipline, prop_app = applications / sum(applications), awards), gender],
    aes(prop_app, prop_award_by_disc)) + 
    geom_line(aes(group = discipline)) +
    geom_point(aes(shape = gender, color = discipline), size = 3) + 
    labs(x = 'Proportion of applications to discipline',
         y = 'Proportion of applications awarded by discipline')