Lecture 20 Notes

Author

Alec L. Robitaille

Published

May 14, 2023

Stargazing

Fortune telling frameworks (eg. horoscopes, tarot cards, linear models) has to be vague, derived from vague facts the advice has to be vague. Also often an exaggerated importance.

Cannot offload subjective responsibility onto an objective procedure

Tendency to focus on parts that are mathematical, objective - the quality of the data analysis

Other things that are also important

quality of theory
quality of data
quality of code and procedures
code documentation
reporting

Planning

Goal setting

estimands defined at the beginning

Theory building

Which assumptions will we make to construct an appropriate estimator?
causal model

Levels of theory building

Heuristic causal model (DAGs)
Structural causal models (synthetic functions that identify in precise mathematical ways the relationships between variables)
Dynamic models (eg. ODEs)
Agent-based models (most fine grained approach)

These all specify or imply algebraic systems that can be analysed for their implications.

Best way to learn models is to read models

DAGs

Heuristic causal models

Treatment and outcome
Other causes
Other effects
Unobserved causes

Justified sampling plan

Justified analysis plan

Documentation

Open software and data formats

Working

Express theory as probabilistic program
Prove planned analysis could work (conditionally on assumptions)
Test pipeline on synthetic data
Run pipeline on empirical data

Control

version control (Git)
history
backup
accountability

Incremental testing

build things iteratively
test each piece

Documentation

comment everything
for you and for others

Review

at least two people should look at each thing you do
explain the code to someone (rubber ducky)

Reporting

Describing methods

math-stats notation of statistical model (software independent)
explanation of how math-stats model provides estimand
algorithm used to produce estimate
diagnostics, code tests
cite software packages

Justifying priors

“Priors were chosen through prior predictive simulation so that pre-data predictions span the range of scientifically plausible outcomes. In the results, we explicitly compared the posterior distribution to the prior, so that the impact of the sample is obvious.”

Responding to reviewers: change discussion from statistics to causal models, scientific models.

Point readers to a primer paper on Bayesian statistics in your field.

Describing data

Sample size, but specifically the structure of your data - how many observations of how many units?

At which level (across or within clusters) are variables measured?

Missing values

Describing results

Focus of results typically are on estimands, presented using marginal causal effects

Warn against causal interpretation of control variables (Table 2 fallacy)

Sample realizations > Densities > Intervals

Making decisions

Academic research: communicate uncertainty, conditional on sample and models

Industry, applied research: what should we do, given uncertainty, conditional on sample and models?

Bayesian decision theory:

State costs and benefits of outcomes
Compute posterior benefits of hypothetical policy choices (interventions)

Horoscopes for research

Fixes:

No statistics without associated causal model
Prove that your code works in principle
Share as much as possible
Beware of proxies for research quality

Planning

Goal setting

Theory building

DAGs

Justified sampling plan

Justified analysis plan

Documentation

Open software and data formats

Working

Control

Incremental testing

Documentation

Review

Reporting

Sharing materials

Describing methods

Describing data

Describing results

Making decisions

Horoscopes for research