Functions
R for Data Science
Links:
Functions
Why write functions?
- “Improving reach as a data scientist”
- “Automate common tasks”
- Three big advantages
- Give a function an evocative name that makes your code easier to understand
- As requirements change, only need to update code in one place instead of many
- Eliminate chance of incremental mistakes when copy pasting
- Other advantages
- Easier to apply over groups or chunks in data, eg. by individual with lapply/map/data.table’s by
When to write functions?
- When you’ve copy pasted more than twice
How to write functions?
- Develop code for accomplishing the task
- Analyse the code
- How many inputs
- Look for duplication, eg. calculating the mean multiple times
- Save as intermediate value
- List inputs
- List arguments
- Name function (verb)
snake_case
- Common prefix eg
input_select
,input_checkbox
,input_text
- Don’t override existing functions (see
conflicted
later)
- Name arguments (nouns)
- Place developed code in body of the function
- Check results with eg. NAs, missing values, known results
Conditionals
- if else (else if)
||
&&
combine multiple logical expressions and short-circuit when||
sees the first true and&&
sees the first false|
&
are vectorized and cannot be directly used in an if unless you also use any / all
Checking values
stopifnot
if
else
thenstop
Return
- always last thing
- otherwise explicit
return()
Environment
- lexical scoping
- if variable isnt available in the functions environment, will look in environment where function was defined
- in a simple case, eg f(x) {x + y}, this means
y
, but also{
and+
- if
+
is reassigned it will be used, meaning you can override a function from eg. base R
- https://adv-r.hadley.nz/environments.html
Iteration
Note: we’ll use targets’ dynamic and static branching instead for this workshop
Iterating over a single list/vector/etc
for
loopspurrr::map
Multiple inputs
purrr::map2
Loop for side effects
purrr::walk
Managing errors
purrr::safely
try
purrr::possibly
purrr::quietly
Conditional execution
Advanced R
Functions
Three components of a function
- arguments (
formals()
): arguments used to control the function - body (
body()
): code inside the function - environment (
environment()
): determins how function finds values associated with names- implicit based on where you defined the function
Functions are objects
Primitive functions
- only found in the base package
- eg. sum,
[
- they are the exceptions to above, for primitive functions the following return NULL
formals()
,body()
,environment()
- primitive functions indicated by typeof(f) is “builtin” or “special”
Anonymous functions are used without first assigning them
do.call
if you have arguments already in a list
Combining function calls
- nesting
f(g(x))
is concise, good for short sequences but hard to read (right to left, inside out) - intermediate objects, requires naming each intermediate object, useful when independent objects are useful otherwise not
- pipes let you run and function and then then next one and then the next one (…) in a chain
Lexical scoping
- name masking
Functional programming
R is at its heart a functional language
Functional languages have:
- first-class functions that behave like any other data structure. you can assign to variables, store them in lists, pass them as arguments, etc
- functions that are pure. pure functions satisfy two properties - 1) outputs depend only on inputs where rerunning the function with the same inputs will yield the same results, and 2) functions have no side-effects like changing global variables, writing to disk, displaying to the screen
For R, functions are ideally either very pure, or very impure (plotting, saving, etc)
Functionals are functions that take a function as input and returns a vector as output
- eg. lapply, apply, tapply, purrr::map
- commonly used as an alternative to for loops
- (lots of great background)
Function factories
- eg. power1 <- function(exp) function(x) x ^ exp, square <- power1(2)
- (beyond scope of workshop)
Function operators
Control flow
Choices
if
(condition) true_actionif
(condition) true_actionelse
false_action- also
else if
- condition must be length 1
switch
(https://adv-r.hadley.nz/control-flow.html#switch)- more compact than a bunch of
if
else if
else
- more compact than a bunch of
Loops
for
- caution
1:length
iflength == 0
, instead useseq_along
,seq.int
, etc.
- caution
while
repeat
Environments
Software Carpentry: Programming with R
Functions
Defining a function
- R automatically returns whichever variable is on the last line of the body of the function
Composing functions
- compose functions by combining eg. two functions into a new function that uses both
- nest function calls by passing the output of one function directly as input to the next (also see pipes)
Testing
- use simple example data instead of your actual data where you might not know what the output value should be
Error handling
- add errors and warnings to ensure the inputs and arguments are appropriate eg. the expected class
- errors and warnings help inform a user about the function’s expectations
Defining defaults
- R uses the position of arguments, typically only recommended with a smaller number of arguments
- with a larger number of arguments, use named arguments to ensure arguments are correctly passed
- set a default value for an argument by passing it with = in the function definition
Making Choices
- make decisions using logical comparisons
>
,<
- use logical comparisons with
if
else
to define conditional statements &&
and||
Loops in R
- vectorized functions
- vector recycling
for
orapply
?
The Call Stack
Details on variable scoping
Software Carpentry: R for Reproducible Scientific Analysis
Control Flow
if
,else
control flow in R as conditional statementsifelse
is vectorizedany
returns TRUE if there is at least one TRUE in the vectorall
returns TRUE if all are TRUE in the vector
Functions
Also see above Software Carpentry: Programming with R’s section on Functions (some similar content)
Defensive programming
- defensive programming can involve including check conditions and throwing errors when something is not as expected
- use
stopifnot()
to detect where a conditional statement is FALSE and return an error
R Packages
Testing
Why is formal testing worth the trouble?
- the informal approach is load the function, experiment with it in console or scrap script, repeat
- an informal approach works fine in the moment, but when you return to the function in the future, say to add a new argument or test a new data type, you have no record of the tests you ran the first time
- benefits to writing formal tests
- fewer bugs since you are explicit about how the code should work
- better code structure because if you find it hard to write tests for your functions, it might be that the design of your functions is not ideal
- when you discover a bug, write a test that replicates the bug - then making the test pass becomes the concrete goal in fixing the bug
- when functions are well covered by tests (robust), you can be confident making changes without accidentally breaking something
Introduction testthat
Test mechanics and workflow
Expectations
- testing for
- equality
- errors
- match
- length
- s3 class, s4 class, type
What They Forgot to Teach You
Naming
Efficient R
Efficient programming
- fatal errors:
stop()
- warnings:
warning()
- informative output:
message()
- invisible returns (eg. plots):
invisible()