Extending targets

Files

{targets} can be used to track external files for changes. This includes input data files but also outputs like saved plots or tables. To signal to {targets} that these target represents an external file to track, use the argument ‘format’ set to ‘file’.

_targets.R

# ...

c(
  tar_target(
    path_file,
    'path/to/some-file.csv',
    format = 'file'
  ),
  tar_target(
    read_file,
    read.csv(path_file)
  )
)

graph LR
  style Legend fill:#FFFFFF00,stroke:#000000;
  style Graph fill:#FFFFFF00,stroke:#000000;
  subgraph Legend
    x2db1ec7a48f65a9b(["Outdated"]):::outdated
    xd03d7c7dd2ddda2b(["Regular target"]):::none
  end
  subgraph Graph
    direction LR
    xdf1dccc52212d8fc(["path_file"]):::outdated --> xba6fe59610bb07a8(["read_file"]):::outdated
    
  end
  classDef outdated stroke:#000000,color:#000000,fill:#78B7C5;
  classDef none stroke:#000000,color:#000000,fill:#94a4ac;

Alternatively, we can use the [tar_file_read()])(https://docs.ropensci.org/tarchetypes/reference/tar_file_read.html) function from the {tarchetypes} package which is our first example of a “target factory”. Target factories are functions that produce multiple targets, with a minimal input. They can be helpful for simplifying our pipelines, generating complex sets of pipelines from standard outputs.

The tar_file_read function simplifies tracking the path to the input file, and reading the input file with a custom read function. Note how the output network of targets looks the same!

_targets.R

# ...

c(
  tar_file_read(
  file, 
  'path/to/some-file.csv',
  read_csv(file = !!.x)
)

graph LR
  style Legend fill:#FFFFFF00,stroke:#000000;
  style Graph fill:#FFFFFF00,stroke:#000000;
  subgraph Legend
    x2db1ec7a48f65a9b(["Outdated"]):::outdated
    xd03d7c7dd2ddda2b(["Regular target"]):::none
  end
  subgraph Graph
    direction LR
    xe35dfa49f50d9903(["file_file"]):::outdated --> xd6afb00b098fa2c2(["file"]):::outdated
    
  end
  classDef outdated stroke:#000000,color:#000000,fill:#78B7C5;
  classDef none stroke:#000000,color:#000000,fill:#94a4ac;

Targetopia

Let’s look at other examples of target factories in the “targetopia” group of packages:

Iterating

Like we mentioned in the functions resources, there are plenty of options in R for iterating including:

  • the apply family
  • {purrr}
  • {dplyr} with group_by and mutate
  • {data.table}’s by

{targets} offers a new approach that integrates into our pipeline seamlessly and extends our functions to run over groups of rows in a data.frame. There are two steps:

  1. Use the {tarchetypes} function tar_group_by() to define the variables to group the data.frame on.
  2. Use “dynamic branching” with the tar_target() argument pattern.

For example:

_targets.R

#...
c(
    # Group the mtcars data.frame by the values in the column "cyl"
    tar_group_by(
        group_counts,
        mtcars,
        cyl
    ),

    # Averages by group
    tar_target(
        mean_mpg,
        mean(mtcars$mpg),
        pattern = map(mtcars)
    ),

    # Summarize all the cars
    tar_target(
        summarize_cars,
        summarize(group_counts)
    )
)

We pass the data.frame to the command argument, and the following arguments are the column names to group on (in this case cyl). Then downstream targets that want to iterate over the groups of rows use the argument pattern with the function map(). The target corresponding to the grouped data.frame is passed to map().

If you want to use the full dataset, ungrouped, again downstream simply refer to the target without the pattern argument and {targets} will automatically combine the data.frame again.

Another approach that Alec likes to use with {data.table} is:

_targets.R

# ...

# Variables
# Split by: within which column or set of columns (eg. c(id, yr))
#  do we want to split our analysis?
split_by <- c('id', 'species')

# Targets
c(
    tar_target(
        split_data,
        data[, tar_group := .GRP, by = split_by],
        iteration = 'group'
    ),
    tar_target(
        split_key,
        unique(locs_prep[, .SD, .SDcols = c(split_by, 'tar_group')])
    )

)

Read more:

These are example of “dynamic branching”, but also see “static branching” functions:

iteration = 'list'

The iteration argument for tar_target() is used to change how {targets} splits and combines branches. This is is required when you want to return objects like plots, models or matrices that aren’t obviously combined.

For example:

_targets.R

#...
c(
    # Group the mtcars data.frame by the values in the column "cyl"
    tar_group_by(
        group_counts,
        mtcars,
        cyl
    ),

    # Averages by group
    tar_target(
        plot_cars,
        ggplot(mtcars) + geom_histogram(mpg),
        pattern = map(mtcars),
        iteration = 'list'
    ),

    # Matrix of correlations between vars
    tar_target(
        correlation_metrics,
        cor(mtcars),
        pattern = map(mtcars),
        iteration = 'list'
    )
)

See more details for argument iteration in ?tar_target: https://docs.ropensci.org/targets/reference/tar_target.html

Bonus

tar_option_set

Look into the arguments provided by tar_option_set(). This is link to the full reference page: https://docs.ropensci.org/targets/reference/tar_option_set.html

  • format argument
    • for data.table objects you can use {qs} and format qs to retain data.table clas
  • error argument to define what {targets} should do when it hits an error
  • workspace_on_error argument to optionally save a workspace file for each target that throws an error
  • cue argument

If you are already done all the previous exercises and have extra time, or on your own after the workshop, try this bonus exercise:

  • write a function that processes the ice data
  • look at the relationship between adult and chick counts, and the ice data and weather data. How might these two be influencing adult and chick counts?