Lecture 19 Notes

Author

Alec L. Robitaille

Published

June 17, 2024

Scientific models

Linear models are additive combinations of variables. They can be used for many applications, and the fit is ok but not ideal. Using linear models is like fitting a square peg in a round hole - it can work but we give up give up scientific information we could use to better inform our statistical analysis.

LMs, GLMs, GLMMs are flexible machines for describing associations. Without an external causal model, linear models have no causal interpretation.

Example: height

The weight of humans is proportional to height. For example, represent humans as a cylinder.

Starting from the equation for the volume of a cylinder:

\(V = \pi r^{2} h\)

Represent the radius as proportion of height

\(V = \pi (ph)^{2} h\)

Finally, the weight depends on the density (k) of the cylinder

\(W = kV = k \pi (ph)^{2} h\)

And simplified, the formula for the expected weight is

\(W = k \pi p^{2}h^{3}\)

This is not a regression model, it is a scientific model that arises from the physical structure of human bodies (assuming human bodies are well approximated by a cylinder).

Statistical model

\(W_{i} \sim Distribution(\mu_{i})\)

\(\mu_{i} = k \pi p^{2} h^{3}\)

\(p \sim Distribution(\)

\(k \sim Distribution()\)

How to set priors? Choose measurement scales and simulate.

Units:

  • \(\mu_{i}\): kg
  • \(k\): \(kg/cm^{3}\)
  • \(h^{3}\)

Measurement scales are social constructs, and it can be easier to divide them out. We can do that by rescaling by, eg. the mean height and weight, resulting in the same data dimensionless. By dividing by the mean, anything larger than 1 is larger than the mean, anything less than 1 is less than the mean.

P is a proportion, therefore between 0-1, in this case < 0.5 because the radius of a body in this population is less than a half of its height.

k is a density, a positive real number > 1. Beta distribution is for proportions,

Weight is a positive real, variance scales with the mean. Growth is multiplicative, log-normal is natural choice.

LogNormal is parameterized using the mu and sigma: mu in log-normal is mean of log, not mean of observed. Therefore, when we model we need to exponentiate mu because the formula corresponds to weight on the natural scale.

\(W_{i} \sim LogNormal(\mu_{i}, \sigma)\)

\(exp(\mu_{i}) = k \pi p^{2} h^{3}\)

\(p \sim Beta(25, 50)\)

\(k \sim Exponential(0.5)\)

\(\sigma \sim Exponential(1)\)

ggplot() +
    stat_function(fun = dbeta, args = list(shape1 = 25, shape2 = 50)) +
    xlim(0, 1) +
    labs(x = 'p', y = 'dbeta(x, 25, 50)')

ggplot() +
    stat_function(fun = dexp, args = list(rate = 0.5)) +
    xlim(0, 5) +
    labs(x = 'k', y = 'dexp(0.5)')

DT <- data_Howell()

form <- bf(scale_weight_div_mean ~ log(k * 3.1415 * p ^ 2 * scale_height_div_mean ^ 3),
                     p ~ 1,
                     k ~ 1,
                     nl = TRUE)

get_prior(form, data = DT, family = 'lognormal')
                prior class      coef group resp dpar nlpar lb ub       source
 student_t(3, 0, 2.5) sigma                                  0         default
               (flat)     b                               k            default
               (flat)     b Intercept                     k       (vectorized)
               (flat)     b                               p            default
               (flat)     b Intercept                     p       (vectorized)
tar_load(m_l19_nl_howell)
m_l19_nl_howell$prior
            prior class      coef group resp dpar nlpar lb ub       source
 exponential(0.5)     b                               k  0            user
 exponential(0.5)     b Intercept                     k  0    (vectorized)
     beta(25, 50)     b                               p  0  1         user
     beta(25, 50)     b Intercept                     p  0  1 (vectorized)
   exponential(1) sigma                                  0            user
m_l19_nl_howell
 Family: lognormal 
  Links: mu = identity; sigma = identity 
Formula: scale_weight_div_mean ~ log(k * 3.1415 * p^2 * scale_height_div_mean^3) 
         p ~ 1
         k ~ 1
   Data: data_Howell() (Number of observations: 544) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
            Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
p_Intercept     0.34      0.05     0.25     0.45 1.00      950     1248
k_Intercept     2.76      0.92     1.51     4.96 1.00      949     1264

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.21      0.01     0.19     0.22 1.00     1358     1280

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
pred <- add_predicted_draws(
    m_l19_nl_howell, 
    newdata = data.frame(scale_height_div_mean = seq(0, 1.3, 0.01))
)

ggplot(pred, aes(scale_height_div_mean, .prediction)) + 
    stat_lineribbon() + 
    geom_point(aes(scale_height_div_mean, scale_weight_div_mean), data = DT) + 
    scale_fill_grey(start = 0.8, end = 0.5) + 
    labs(x = 'Height (scaled)', y = 'Weight (scaled)')

Help translating this model from rethinking to brms from the brms translation of Statistical Rethinking 2nd edition.

Models using scientific structure can be easier to tune, because differences from prediction error are more closely tied to the problem at hand, rather than geocentric type models.

The posterior shows that the relationship between k and p is a negative curve where as k increases the values for p decrease. The model has identified a mathematical function in our formula, that could be used directly instead of including these parameters in the model.

ggplot(as_draws_df(m_l19_nl_howell)) + 
    geom_point(aes(b_k_Intercept, b_p_Intercept)) + 
    labs(x = 'k', y = 'p')

Taking an example individual with scaled height = 1 and scaled weight = 1, we can solve for k.

\(\mu_{i} = k \pi p^{2} h^{3}\)

\((1) = k \pi p^{2} (1)^{3}\)

\(k = \frac{1}{\pi p^{2}}\)

This is the relationship show in the graph above.

Further, given:

\((1) = k \pi p^{2} (1)^{3}\)

Subbing in the expression for k above, we get:

\((1) = \pi \theta (1)^{3}\)

\(\theta\) is used to represent \(k p^2\) and therefore it must be

\(\theta \approx \pi^{-1}\)

Given this, there are no parameters in the model, it is dimensionless. Restructuring the model without those two parameters, the model returns the same result and, as a bonus, it fits much more efficiently.

tar_load(m_l19_nl_howell_no_dim)
m_l19_nl_howell_no_dim$prior
                  prior     class                       coef group resp dpar
                 (flat)         b                                           
                 (flat)         b logscale_height_div_meanE3                
 student_t(3, 0.1, 2.5) Intercept                                           
         exponential(1)     sigma                                           
 nlpar lb ub       source
                  default
             (vectorized)
                  default
        0            user
m_l19_nl_howell_no_dim
 Family: lognormal 
  Links: mu = identity; sigma = identity 
Formula: scale_weight_div_mean ~ log(scale_height_div_mean^3) 
   Data: data_Howell() (Number of observations: 544) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
                           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
Intercept                     -0.07      0.01    -0.08    -0.06 1.00     5244
logscale_height_div_meanE3     0.77      0.01     0.76     0.79 1.00     3663
                           Tail_ESS
Intercept                      3333
logscale_height_div_meanE3     3196

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.13      0.00     0.12     0.13 1.00     3079     2750

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
pred_no_dim <- add_predicted_draws(
    m_l19_nl_howell_no_dim, 
    newdata = data.frame(scale_height_div_mean = seq(0, 1.3, 0.01))
)

ggplot(pred_no_dim, aes(scale_height_div_mean, .prediction)) + 
    stat_lineribbon() + 
    geom_point(aes(scale_height_div_mean, scale_weight_div_mean), data = DT) + 
    scale_fill_grey(start = 0.8, end = 0.5) + 
    labs(x = 'Height (scaled)', y = 'Weight (scaled)')

State-based models

What we want is a latent state but what we have is only emissions or behaviours of a latent variable. A misguided approach would be to just analyse this data as if it is a categorical variable, disregarding the difference between the behaviour and the underlying latent state.

This family of models is common in movement, learning, population dynamics, international relations, family planning.

Example: social conformity

Do children copy the majority?

We can’t see the strategy selected, we can only see the choice.

Majority choice is consistent with many strategies:

  • random color would follow majority 1/3 the time
  • random demonstrator would follow majority 3/4 the time
  • random demonstration would follow majority 1/2 the time

Total evidence for majority choice is exaggerated because multiple processes lead to the majority choice beyond a majority preference.

Strategy space:

  1. Majority
  2. Minority
  3. Maverick (unchosen)
  4. Random color (ignore demonstrators)
  5. Follow first (copy the first demonstrator)

This is a state based model. There are three possible observations: unchosen, majority or minority.

\(Y_{i} \sim Categorical(\theta)\)

\(\theta_{j} = \sum_{S=1}^{5} p_{s} Pr(Y = j | S)\)

\(p \sim Dirichlet([4, 4, 4, 4, 4])\)

  • \(\theta_{j}\) is the probability of choice j
  • \(S = 1\) is the average over strategies
  • \(p_{s}\) is the prior probability of strategy S
  • \(Pr(Y=j | S)\) is the probability of choice j assuming strategy S

Example: population dynamics

Latent states can be time varying. Example: ecological dynamics with numbers of different species over time.

Estimand: how do different species interact? How do interactions influence population dynamics?

Lotka-Voltera equations specify the change in number of individuals over time.

\(\frac{dH}{dt} = H_{t} * b_{H} - H_{t} (L_{t} m_{H})\)

\(\frac{dL}{dt} = L(H_{t} * b_{L}) - L_{t} * m_{L}\)

  • \(m_{H}\) the survival of hares depends on lynx
  • \(H_{t} * b_{L}\) the birth rate of lynx depdends on hare

Model

\(h_{t} \sim LogNormal(log(p_{H}H_{t}, \sigma_{H})\)

\(l_{t} \sim LogNormal(log(p_{L}L_{t}, \sigma_{L})\)

Without causal model, little hope to understand ecological interventions. Note this is an example of using the built in ODE solver in Stan.