Using targets

TipLearning Goals
  • Execute a workflow in {targets} that reads in data, performs a function, and saves an output.
  • Recognize the value of workflows for reducing mental load and improving efficiency.

Now that we have our _targets.R script setup with the previous exercises and we have added our first few targets to the pipeline, let’s take a look at using {targets}. It is a bit different than the usual R approach where we make edits, then rerun everything, wait for results, notice issues, rerun from scratch…

The following functions allow us to interact with the pipeline’s definition, metadata and results. Because of this, we shouldn’t put them inside the _targets.R script or in our R/ directory. Often a {targets} pipeline will result in a rendered document or manuscript, a number of saved plots or an output data file.

In our case, let’s make a new script called explore.R in the root of the project’s directory. Then we can keep track of our commands like tar_meta(), tar_visnetwork() and tar_read(), and easily rerun them when we make changes to the pipeline.

Instruction: make a empty script called explore.R in the root of the project’s directory. At the top, load the packages and/or the functions with source() or tar_source().

explore.R

library(targets)
tar_source('R')

tar_visnetwork()

Before we run our pipeline, we can check that everything looks good in the dependency graph with tar_visnetwork(). By default, tar_visnetwork() shows objects, functions and targets. You can adjust this behaviour to only show the targets using the argument targets_only set to TRUE.

Another argument that is especially useful when you have many targets: name. To show only selected targets, you can use {tidyselect} helpers like starts_with(). The name argument comes up in many other {targets} functions including tar_meta() and tar_read().

Related functions to tar_visnetwork() include:

  • tar_glimpse() is a faster version of tar_visnetwork() that doesn’t check the pipeline’s metadata to see if functions have been are outdated or not
  • tar_mermaid() returns a Mermaid.js diagram representing the pipeline
  • tar_network() returns a network of edges and nodes representing the pipeline

Exercise: tar_visnetwork()

NoteObjective

Visualize your {targets} workflow.

Instruction: run tar_visnetwork() from your explore.R script to see the dependency graph defined in your _targets.R script. Try out the targets_only and name arguments too.

Hint

explore.R

library(targets)
tar_source('R')

tar_visnetwork()

tar_visnetwork(targets_only = TRUE)

tar_visnetwork(ends_with('counts'))

tar_make()

Run the {targets} pipeline with tar_make(). The pipeline is run in a new external R process, which means that tar_make() doesn’t impact your current workspace and, more importantly, is not influenced by your current workspace. This totally isolated environment is how {targets} offers us a truly reproducible approach - only things defined in the _targets.R script (or sourced within it) are considered.

Exercise: tar_make()

NoteObjective

Run your {targets} workflow.

Instruction: run tar_make() from your explore.R script.

Hint

explore.R

library(targets)
tar_source('R')

tar_make()

tar_read(), tar_load()

To look at results from the {targets} pipeline, use tar_read() and tar_load(). tar_read() reads targets without saving them as an object in your environment and tar_load() loads the object directly to your environment using the name of the target as the object.

Exercise: tar_read(), tar_load()

NoteObjective

Load completed targets from your workflow.

Instruction: run tar_read() and tar_load() from your explore.R script to read/load the targets defined in your {targets} pipeline.

Hint

explore.R

library(targets)
tar_source('R')

tar_read(prep_counts)

tar_load(sums)

tar_read(plotted)

tar_meta()

The metadata related to the {targets} pipeline is accessible using the function tar_meta().

Exercise: tar_meta()

NoteObjective

Explore the metadata from your workflow.

Instruction: run tar_meta() from your explore.R script to look at the metadata associated with your {targets} pipeline. Also try using the names argument to filter the metadata returned.

Note: we sometimes find it helpful to use the View() function to open the metadata as a table in RStudio.

Hint

explore.R

library(targets)
tar_source('R')

tar_meta()

tar_meta(ends_with('counts'))

View(tar_meta())

Edits, rerun, edits, …

Now that we have a {targets} pipeline, the approach is a bit different than we might be used to using R interactively. With a {targets} pipeline, everything is defined in the pipeline so our process is now about making edits to our functions and our _targets.R pipeline, then rerunning tar_make(). {targets} tracks all functions and objects (and optionally can also track external files) and will only rerun downstream targets when required. This is the other main way that {targets} makes our lives easier. We no longer need to mentally keep track of versions of scripts, output files, figures, etc - just run tar_make() to run anything outdated (or check with tar_visnetwork()).

Exercise: practice the {targets} approach

NoteObjective

Update and rerun your {targets} workflow.

Instruction: make sure your {targets} pipeline is up to date with tar_visnetwork(). Then, make an edit to the function sum_counts(). Rerun the {targets} pipeline. Next, make an edit to the function filter_islands(). Rerun with tar_make(). Lastly, make an edit to prepare_csv() and rerun. Note which targets were rerun with each change and discuss with your neighbours.

Bonus:

  • make a change to prepare_csv() that doesn’t actually change the returned object (eg. by adding an irrelevant intermediate step). Does this force the {targets} pipeline to rerun?

Forcing reruns

The following functions can be used to force {targets} to rerun targets.

  • tar_invalidate() deletes the metadata record associated with a target, meaning it will rerun regardless of whether it is out of date. However, the already-run file still exists (and can be found using tar_path_target())
  • tar_delete() deletes individual target output values
  • tar_destroy() deletes the whole _targets/ data store (caution)

Bonus

Make more targets!

  • use the prepare_csv() function with the weather timeseries
  • make a target for something that isn’t one of our custom functions eg. summary(). ({targets} don’t need to be custom functions, any R command can be used)
  • use tar_invalidate() to rerun a part of the example {targets} workflow