Functions

R for Data Science

Links:

Functions

Why write functions?

  • “Improving reach as a data scientist”
  • “Automate common tasks”
  • Three big advantages
    1. Give a function an evocative name that makes your code easier to understand
    2. As requirements change, only need to update code in one place instead of many
    3. Eliminate chance of incremental mistakes when copy pasting
  • Other advantages
    • Easier to apply over groups or chunks in data, eg. by individual with lapply/map/data.table’s by

When to write functions?

  • When you’ve copy pasted more than twice

How to write functions?

  • Develop code for accomplishing the task
  • Analyse the code
    • How many inputs
    • Look for duplication, eg. calculating the mean multiple times
      • Save as intermediate value
  • List inputs
  • List arguments
  • Name function (verb)
    • snake_case
    • Common prefix eg input_select, input_checkbox, input_text
    • Don’t override existing functions (see conflicted later)
  • Name arguments (nouns)
  • Place developed code in body of the function
  • Check results with eg. NAs, missing values, known results

Conditionals

  • if else (else if)
  • || && combine multiple logical expressions and short-circuit when || sees the first true and && sees the first false
  • | & are vectorized and cannot be directly used in an if unless you also use any / all

Checking values

  • stopifnot
  • if else then stop

Return

  • always last thing
  • otherwise explicit return()

Environment

  • lexical scoping
    • if variable isnt available in the functions environment, will look in environment where function was defined
    • in a simple case, eg f(x) {x + y}, this means y, but also { and +
    • if + is reassigned it will be used, meaning you can override a function from eg. base R
  • https://adv-r.hadley.nz/environments.html

Iteration

Note: we’ll use targets’ dynamic and static branching instead for this workshop

Iterating over a single list/vector/etc

  • for loops
  • purrr::map

Multiple inputs

  • purrr::map2

Loop for side effects

  • purrr::walk

Managing errors

  • purrr::safely
  • try
  • purrr::possibly
  • purrr::quietly

Conditional execution

Advanced R

Functions

Three components of a function

  • arguments (formals()): arguments used to control the function
  • body (body()): code inside the function
  • environment (environment()): determins how function finds values associated with names
    • implicit based on where you defined the function

Functions are objects

Primitive functions

  • only found in the base package
  • eg. sum, [
  • they are the exceptions to above, for primitive functions the following return NULL formals(), body(), environment()
  • primitive functions indicated by typeof(f) is “builtin” or “special”

Anonymous functions are used without first assigning them

do.call if you have arguments already in a list

Combining function calls

  • nesting f(g(x)) is concise, good for short sequences but hard to read (right to left, inside out)
  • intermediate objects, requires naming each intermediate object, useful when independent objects are useful otherwise not
  • pipes let you run and function and then then next one and then the next one (…) in a chain

Lexical scoping

  • name masking

Functional programming

R is at its heart a functional language

Functional languages have:

  1. first-class functions that behave like any other data structure. you can assign to variables, store them in lists, pass them as arguments, etc
  2. functions that are pure. pure functions satisfy two properties - 1) outputs depend only on inputs where rerunning the function with the same inputs will yield the same results, and 2) functions have no side-effects like changing global variables, writing to disk, displaying to the screen

For R, functions are ideally either very pure, or very impure (plotting, saving, etc)

Functionals are functions that take a function as input and returns a vector as output

  • eg. lapply, apply, tapply, purrr::map
  • commonly used as an alternative to for loops
  • (lots of great background)

Function factories

  • eg. power1 <- function(exp) function(x) x ^ exp, square <- power1(2)
  • (beyond scope of workshop)

Function operators

Control flow

Choices

  • if (condition) true_action
  • if (condition) true_action else false_action
  • also else if
  • condition must be length 1
  • switch (https://adv-r.hadley.nz/control-flow.html#switch)
    • more compact than a bunch of if else if else

Loops

  • for
    • caution 1:length if length == 0, instead use seq_along, seq.int, etc.
  • while
  • repeat

Environments

Software Carpentry: Programming with R

Functions

Defining a function

  • R automatically returns whichever variable is on the last line of the body of the function

Composing functions

  • compose functions by combining eg. two functions into a new function that uses both
  • nest function calls by passing the output of one function directly as input to the next (also see pipes)

Testing

  • use simple example data instead of your actual data where you might not know what the output value should be

Error handling

  • add errors and warnings to ensure the inputs and arguments are appropriate eg. the expected class
  • errors and warnings help inform a user about the function’s expectations

Defining defaults

  • R uses the position of arguments, typically only recommended with a smaller number of arguments
  • with a larger number of arguments, use named arguments to ensure arguments are correctly passed
  • set a default value for an argument by passing it with = in the function definition

Making Choices

  • make decisions using logical comparisons
    • >, <
  • use logical comparisons with if else to define conditional statements
  • && and ||

Loops in R

  • vectorized functions
  • vector recycling
  • for or apply?

The Call Stack

Details on variable scoping

Software Carpentry: R for Reproducible Scientific Analysis

Control Flow

  • if, else control flow in R as conditional statements
  • ifelse is vectorized
  • any returns TRUE if there is at least one TRUE in the vector
  • all returns TRUE if all are TRUE in the vector

Functions

Also see above Software Carpentry: Programming with R’s section on Functions (some similar content)

Defensive programming

  • defensive programming can involve including check conditions and throwing errors when something is not as expected
  • use stopifnot() to detect where a conditional statement is FALSE and return an error

R Packages

Testing

Why is formal testing worth the trouble?

  • the informal approach is load the function, experiment with it in console or scrap script, repeat
  • an informal approach works fine in the moment, but when you return to the function in the future, say to add a new argument or test a new data type, you have no record of the tests you ran the first time
  • benefits to writing formal tests
    • fewer bugs since you are explicit about how the code should work
    • better code structure because if you find it hard to write tests for your functions, it might be that the design of your functions is not ideal
    • when you discover a bug, write a test that replicates the bug - then making the test pass becomes the concrete goal in fixing the bug
    • when functions are well covered by tests (robust), you can be confident making changes without accidentally breaking something

Introduction testthat

Test mechanics and workflow

Expectations

  • testing for
    • equality
    • errors
    • match
    • length
    • s3 class, s4 class, type

What They Forgot to Teach You

Naming

https://speakerdeck.com/jennybc/how-to-name-files

Efficient R

Efficient programming

Communication with the user:

  • fatal errors: stop()
  • warnings: warning()
  • informative output: message()
  • invisible returns (eg. plots): invisible()

Efficient optimisation

Efficient base R

purrr

Jenny Bryan’s purrr tutorial