Checks

Checks in functions are used to ensure inputs and arguments are as expected, to handle errors and to provide informative error messages to users. In this workshop we are going to focus on writing checks with the function stopifnot because it is simple and succinct. It was recently improved in R 4.0.0 when the R development team added the option to provide a more informative error which will help us clearly communicate with our users (and ourselves).

stopifnot() expects a logical statement that you are expecting to be true, and, if not, stopifnot() will throw an error.

To build logical statements, you may find following list of functions useful:

  • class checks
    • is.integer, is.numeric, is.data.frame, is.character
  • nulls, NAs, NaNs
    • is.null, is.na, is.nan
  • dimensions
    • length, nrow, ncol, dim
  • equality, comparisons
    • == (equals)
    • >, <, >=, <= (less than, greater than, less than or equal to, greater than or equal to)
    • identical
  • negation
    • !
  • directories, files
    • file.exists
    • dir
    • basename, dirname

Exercises: function checks

Logical statements

Instruction: write a series of logical statements for the following objects then consult the solution to check your work. Focus only on the logical statement for now, we’ll use stopifnot in the following exercises.

Number

  • check if x is a numeric, integer, double
  • check if x is of length 1
  • check if x is greater than 0
x <- 10
Show solution
[1] TRUE
[1] FALSE
[1] TRUE
length(x) == 1
[1] TRUE
x > 0
[1] TRUE

data.frame

  • check if DF is a data.frame, list, matrix
  • check if DF has two columns and three rows
  • check if DF’s column numbers is an integer, double, numeric
DF <- data.frame(colors = c('red', 'green', 'blue'), numbers = c(42.1, 2L, 10))
Show solution
[1] TRUE
[1] TRUE
[1] FALSE
ncol(DF) == 2
[1] TRUE
nrow(DF) == 3
[1] TRUE
is.integer(DF$numbers)
[1] FALSE
is.double(DF$numbers)
[1] TRUE
is.numeric(DF$numbers)
[1] TRUE

plot_xy()

First, write a function to plot a data.frame, providing columns names for the data to be plotted on the x and y axes. Use the example data that we prepared with our prepare_csv function.

Follow the steps from our approach to developing functions:

  1. Setup the function
    • Make a function script in the R/ directory named plot_xy.R
    • Write the function’s skeleton (name, arguments, curly braces)
  2. Setup the test script
    • Make a corresponding test script in the tests/ directory named test_plot_xy.R.
    • Load any required packages (library(package))
    • Source the function (source('R/function.R))
    • Load example data and/or arguments for the function

Add a new section ‘Development’ in the test script (tests/test_plot_xy.R) to develop the body of your function. Since we have already built a function for preparing CSV files, use it in your test script, and your choice of base R plotting or {ggplot2}. Add the code to your function plot_xy() and test!

path_counts <- file.path('raw-data', 'adelie-adult-chick-counts.csv')
prep_counts <- prepare_csv(path_counts)
Hints: base R

This function has one step: taking an input data.frame and generating a plot.

The function’s arguments should include the data.frame, as well as the column names for the X and Y axes.

Pass the data.frame’s to the plot function’s x and y arguments with the [[ syntax, eg. DF[[x_column]]
Hints: ggplot2

This function has one step: taking an input data.frame and generating a plot.

The function’s arguments should include the data.frame, as well as the column names for the X and Y axes.

There are two main options:

  • Pass the data.frame’s to ggplot’s data argument then use the .data[[col]] syntax to wrap the x_col and y_col arguments (quoted column names) within aes.

  • Pass the data.frame’s to ggplot’s data argument then use the {{ }} “embrace operator” to wrap the x_col and y_col arguments (unquoted* column names) within aes.

See the ?aes help page and the example with the “embrace operator”, and more details here: https://ggplot2.tidyverse.org/articles/ggplot2-in-packages.html#using-aes-and-vars-in-a-package-function


Writing checks for plot_xy()

Instruction: add the following checks to your plot_xy() function and test them in your test script (tests/test_plot_xy.R).

  • check if the columns provided to the x_col and y_col arguments exist in the data.frame (quoted column name arguments)
  • check if the plot generated is of class ggplot before returning (ggplot2)
Hints: unquoted column names with ggplot2

If you opted to use unquoted column names, it might be trickier to check if column names exist in the data.frame. However - this check is already covered by ggplot() anyways.

In case you wanted to add this, on top of the internal ggplot2 check, try something like this:

as_name(enquo(x_col)) %in% colnames(DT)

These are {rlang} functions, part of the “tidy eval tools” and examples of metaprogramming. The Advanced R book has a detailed section on this concept.


Bonus

Writing checks for prepare_csv()

Instruction: add the following checks to your prepare_csv() function and test them in your test script (tests/test_prepare_csv.R).

  • check if the path points to a file that exists
  • before returning the object, check that it is a data.frame

More informative errors with stopifnot()

Instruction: Write more informative errors for your stopifnot() checks using the following syntax: stopifnot("error message" = logical_statement). Think about your user (either someone else or future you) - what would help them understand and resolve this error?

For example,

stopifnot("x is not a numeric" = is.numeric(x))

Add a color_by argument to plot_xy()

Instruction: using the same approach you used for x_col and y_col, add a color_col argument to plot_xy().