graph LR
style Legend fill:#FFFFFF00,stroke:#000000;
style Graph fill:#FFFFFF00,stroke:#000000;
subgraph Legend
x2db1ec7a48f65a9b(["Outdated"]):::outdated
xd03d7c7dd2ddda2b(["Regular target"]):::none
end
subgraph Graph
direction LR
xdf1dccc52212d8fc(["path_file"]):::outdated --> xba6fe59610bb07a8(["read_file"]):::outdated
end
classDef outdated stroke:#000000,color:#000000,fill:#78B7C5;
classDef none stroke:#000000,color:#000000,fill:#94a4ac;
Extending targets
Files
{targets} can be used to track external files for changes. This includes input data files but also outputs like saved plots or tables. To signal to {targets} that these target represents an external file to track, use the argument ‘format’ set to ‘file’.
_targets.R
Alternatively, we can use the [tar_file_read()])(https://docs.ropensci.org/tarchetypes/reference/tar_file_read.html) function from the {tarchetypes} package which is our first example of a “target factory”. Target factories are functions that produce multiple targets, with a minimal input. They can be helpful for simplifying our pipelines, generating complex sets of pipelines from standard outputs.
The tar_file_read function simplifies tracking the path to the input file, and reading the input file with a custom read function. Note how the output network of targets looks the same!
_targets.R
# ...
c(
tar_file_read(
file,
'path/to/some-file.csv',
read_csv(file = !!.x)
)graph LR
style Legend fill:#FFFFFF00,stroke:#000000;
style Graph fill:#FFFFFF00,stroke:#000000;
subgraph Legend
x2db1ec7a48f65a9b(["Outdated"]):::outdated
xd03d7c7dd2ddda2b(["Regular target"]):::none
end
subgraph Graph
direction LR
xe35dfa49f50d9903(["file_file"]):::outdated --> xd6afb00b098fa2c2(["file"]):::outdated
end
classDef outdated stroke:#000000,color:#000000,fill:#78B7C5;
classDef none stroke:#000000,color:#000000,fill:#94a4ac;
Targetopia
Let’s look at other examples of target factories in the “targetopia” group of packages:
Iterating
Like we mentioned in the functions resources, there are plenty of options in R for iterating including:
- the
applyfamily - {purrr}
- {dplyr} with
group_byandmutate - {data.table}’s by
{targets} offers a new approach that integrates into our pipeline seamlessly and extends our functions to run over groups of rows in a data.frame. There are two steps:
- Use the {tarchetypes} function
tar_group_by()to define the variables to group the data.frame on. - Use “dynamic branching” with the
tar_target()argumentpattern.
For example:
_targets.R
We pass the data.frame to the command argument, and the following arguments are the column names to group on (in this case cyl). Then downstream targets that want to iterate over the groups of rows use the argument pattern with the function map(). The target corresponding to the grouped data.frame is passed to map().
If you want to use the full dataset, ungrouped, again downstream simply refer to the target without the pattern argument and {targets} will automatically combine the data.frame again.
Another approach that Alec likes to use with {data.table} is:
_targets.R
# ...
# Variables
# Split by: within which column or set of columns (eg. c(id, yr))
# do we want to split our analysis?
split_by <- c('id', 'species')
# Targets
c(
tar_target(
split_data,
data[, tar_group := .GRP, by = split_by],
iteration = 'group'
),
tar_target(
split_key,
unique(locs_prep[, .SD, .SDcols = c(split_by, 'tar_group')])
)
)Read more:
-
tar_group_by(): https://docs.ropensci.org/tarchetypes/reference/tar_group_by.html -
tar_group_count(): https://docs.ropensci.org/tarchetypes/reference/tar_group_count.html -
tar_group(): https://docs.ropensci.org/targets/reference/tar_group.html
These are example of “dynamic branching”, but also see “static branching” functions:
-
tar_map: https://docs.ropensci.org/tarchetypes/reference/tar_map.html -
tar_rep: https://docs.ropensci.org/tarchetypes/reference/tar_rep.html
iteration = 'list'
The iteration argument for tar_target() is used to change how {targets} splits and combines branches. This is is required when you want to return objects like plots, models or matrices that aren’t obviously combined.
For example:
_targets.R
#...
c(
# Group the mtcars data.frame by the values in the column "cyl"
tar_group_by(
group_counts,
mtcars,
cyl
),
# Averages by group
tar_target(
plot_cars,
ggplot(mtcars) + geom_histogram(mpg),
pattern = map(mtcars),
iteration = 'list'
),
# Matrix of correlations between vars
tar_target(
correlation_metrics,
cor(mtcars),
pattern = map(mtcars),
iteration = 'list'
)
)See more details for argument iteration in ?tar_target: https://docs.ropensci.org/targets/reference/tar_target.html
Bonus
tar_option_set
Look into the arguments provided by tar_option_set(). This is link to the full reference page: https://docs.ropensci.org/targets/reference/tar_option_set.html
-
formatargument- for
data.tableobjects you can use {qs} and format qs to retain data.table clas
- for
-
errorargument to define what {targets} should do when it hits an error -
workspace_on_errorargument to optionally save a workspace file for each target that throws an error -
cueargument
If you are already done all the previous exercises and have extra time, or on your own after the workshop, try this bonus exercise:
- write a function that processes the ice data
- look at the relationship between adult and chick counts, and the ice data and weather data. How might these two be influencing adult and chick counts?