Lecture 09 Notes

Author

Alec L. Robitaille

Published

March 21, 2024

Generalized linear models

Linear models: expected value is an additive combination of parameters

\[Y_{i} \sim Normal(\mu_{i}, \sigma)\] \[\mu_{i} = \alpha + \beta_{X} X_{i} + \beta_{Z} Z_{i}\]

Generalized linear models: expected value is some function of an additive combination of parameters

\[Y_{i} \sim Bernoulli(p_{i})\] \[f(p_{i}) = \alpha + \beta_{X} X_{i} + \beta_{Z} Z_{i}\]

  • \(Y_{i}\): 0/1 outcome
  • \(p_{i}\): probability of event
  • \(f(p_{i})\): f() maps probability scale to linear model scale
  • \(\alpha + \beta_{{X}} X_{i} + \beta_{Z} Z_{i})\): can be any real value

In generalized linear models, all parameters interact because the outcome is bounded. Gaussian models are the only type of model where this is not true since the outcome is unbounded.

Distributions

Distributions are matched to the constraints on observed variables. The link function is matched to this distribution.

  • Exponential distribution: time to event that has a constant rate. Greater than 0, single parameter \(\lambda\) that defines the rate.
  • Binomial distribution: count events bounded in an observation window. eg. some number of trials
  • Poisson distribution: count events with constant rate. This is the count distribution of the exponentially distributed events. We don’t know the maximum or number of coin tosses.
  • Gamma distribution: sum up exponential processes.
  • Normal distribution: Gamma distribution with large mean converges to the Normal distribution.

Distribution you want to use to model an observed variable is governed by the constraints on observation. Eg. you can’t have negative counts. You can’t use some test to decide if your data are “Normal”.

Modeling events

Like tide prediction engines (analog computers), the gears inside the machine bear no resemblance to the observable predictions that they output

  • Events are discrete, unordered outcomes
  • Observations are counts of events
  • Estimates are on the probability or log-odds/logit scale

These estimates are bounded, eg. the probability of an event is restricted to 0-1.

Two options, that are equivalent for inference:

  • Logistic regression: binary [0,1] outcome and logit link, with Bernoulli distribution
  • Binomial regression: count [0, N] outcome and logit link, with Binomial distribution. Binomial variables are sums of Bernoulli variables.

Logistic priors

The logit link compresses parameter distributions. Anything above +4 = almost always, anything below -4 = almost never.

Example: UC Berkeley Admissions

  • Graduate school applications for 1973 UC Berkeley
  • Stratified by department and gender of application

See more here:

Traag, V.A. and Waltman, L., 2022. Causal foundations of bias, disparity and fairness. arXiv preprint arXiv:2207.13665.

1. Estimand

coords <- data.frame(
    name = c('G', 'D', 'A', 'G_star', 'R', 'U'),
    x =    c(1,    2,   3,   2,         4,   4),
    y =    c(0,    1,   0,  -1,        -1,   1)
)

Is there gender discrimination in graduate admissions?

dagify(
    A ~ G,
  coords = coords
) |> ggdag(seed = 2, layout = 'auto') + theme_dag()

  • G: gender
  • A: admission
dagify(
    D ~ G,
  A ~ D,
  coords = coords
) |> ggdag(seed = 2, layout = 'auto') + theme_dag()

  • D: department

Typically, department is considered a mediating variable. Departments differ in their number of applicants, number of accepted students, potential differences in applicants and acceptance by gender, etc

What does the “causal effect of gender” mean?

Really this is another variable - the perceived gender G*.

dagify(
    A ~ R + G_star + D,
    G_star ~ G,
    D ~ G,
  coords = coords
) |> ggdag(seed = 2, layout = 'auto') + theme_dag()

Which path represents “discrimination”?

  • direct discrimination: status-based discrimination from G -> A. Requires strong assumptions.
  • structural discrimination: indirect through department G -> D -> A. Requires strong assumptions.
  • total discrimination: G -> A and G -> D -> A, in other words: what an individuals experience. Requires mild assumptions.

These three types of discrimination require different estimators.

2. Generative model

  • Set number of applicants
  • Sample gender
  • Sampled with preference for different departments by gender (indirect effect)
  • Define acceptance rate for departments
  • Simulate acceptance

Changing the acceptance rate where acceptance differs by department and gender, pattern of differences in acceptance rates is the same in this simulated data. This illustrates the fundamental problem of determining if discrimination is occurring.

Generative model could be greatly improved:

  • Admission rates depend upon size of applicant pool and the distribution of qualifications
  • Admission rates are then based on ranking applicants to a set number of spaces

3. Statistical models

Total causal effect of G

\(A_{i} \sim Bernoulli(p_{i})\)

\(logit(p_{i}) = \alpha[G_{i}]\)

\(\alpha = [\alpha_{1}, \alpha_{2}]\)

Direct causal effect of G

\(A_{i} \sim Bernoulli(p_{i})\)

\(logit(p_{i}) = \alpha[G_{i}, D_{i}]\)

\(\alpha = \begin{bmatrix} \alpha_{1, 1} & \alpha_{1, 2} \\ \alpha_{2, 1} & \alpha_{2, 2}\end{bmatrix}\)

4. Validate the model

Use inverse logit function to transform variables back on probability scale. Determine if known parameters are recovered by the model.

5. Analyze the data

To use a binomial distribution, aggregate the long format data into acceptance sums for each gender and department. This is equivalent to using the original data with a Bernoulli distribution.

  • inspect trace plots and trank plots to ensure chains are well mixed
  • extract samples from the posterior
  • convert the entire posterior distribution back to the prediction scale with the inverse logit
  • compute contrasts for total effect (across genders) and direct effects (within departments)

Post stratification

Back to the question: what’s the average direct effect of gender across departments?

This depends on the perception of gender on the admission officer

dagify(
    A ~ G_star + D,
    G_star ~ G,
    D ~ G,
  coords = coords
) |> ggdag(seed = 2, layout = 'auto') + theme_dag()

To calculate the causal effect of G*, we must average (marginalize) over the departments. Process: simulate as if all applicants are women, then simulate as if all applicants are men. Then compute the contrasts.

Post stratification is re-weighting estimates for a specific target population. Eg. at a different university, the distributions would differ, and we could predict how the consequences of intervention differ

Survival analysis

How long did an event take to happen? Time-to-event.

Cannot ignore censored cases, where event never happened.

  • Left-censored: don’t know when time started
  • Right-censored: observation ended before event

Example: cat adoptions

Adoption rates of black and non-black cats

Events: adopted, or something else (death, escape, censored)

Outcome variables: days to event. Appropriate distributions are exponential and gamma. Exponential arises from a single part that needs to fail before the so-called machine dies, whereas the Gamma distribution requires multiple parts to fail.

For simplest situation, time to adoption, this represents it:

\(D_{i} \sim Exponential(\lambda_{i})\)

\(p(D_{i} | \lambda_{i}) = \lambda_{i} exp(-\lambda_{i} D_{i})\)

But what about the censored cats?

Event happened = cumulative distribution, probability of event happening up to time x

Event didn’t happen = complementary cumulative distribution, probability event hasn’t occurred up to time x

\(D_{i} = 1 \sim Exponential(\lambda_{i})\)

\(D_{i} = 0 \sim Exponential-CCDF(\lambda_{i})\) (not-yet-adoptions)

\(\lambda_{i} = 1 / \mu_{i}\)

\(log \mu_{i} = \alpha_{CID[i]}\)

  • \(D_{i} = 1\): observed adoptions
  • \(D_{i} = 0\): not yet adoptions