Projects

File Structure

Good file structure allows you to manage all the components of your (often large) projects, while facilitating easy sharing and reducing the risk of accidentally deleting/altering important files. Keeping your raw data file in its own folder (e.g., input/ or raw/ makes it harder to mix up these files with intermediate ones down the line).

Software Carpentry’s R for Reproducible Scientific Analysis:

Best practices for file structure/data management include:

  1. Treat raw data as read-only
  2. Store data cleaning scripts in a separate folder and create a second “read-only” data folder to hold the “cleaned” data sets
  3. Treat generated output as disposable

Efficient R Programming suggests a sub-directory resembling something like below to keep things tidy:

project
└───input/
└───output/
└───R/
└───graphics/
└───README.md
project
└───data/
    └───derived/
    └───raw-data/
└───R/
└───script/
└───graphics/
└───README.md

Good Enough Practices in Scientific Computing suggests similar file structure and data management practices.

README

  • A README file can act as a type of metadata (see below): it facilitates people using your data, script, etc.
  • There are basic requirements from a README in order to make your work usable (highlighted in our Think/Pair/Share exercise)

ARDC Metadata Guide In order to use data, we need to know:

  • how the data is structured what it describes

  • how to read it (e.g. column headings and units)

  • methodological information such as instrument settings and calibrations, reagents used, or survey questions

  • exactly what they are allowed to do with the data through rights metadata such as licensing

  • how to acknowledge the original creators by citing the data

Reproducible Quantitative Methods Metadata is required for open data, by making a data reuse plan we can ensure that our data is usable for other people, into the future.

Metadata should warn users about problems/inconsistencies in the data and provide checks to make sure data is functioning properly (White et al., 2013)

Cornell University best practices provides a README template that is free to adapt, alter, and use

Examples:

RStudio Projects

Using an RStudio Project makes sharing your data/code with others (and your future self) SO MUCH EASIER! One of the main issues with sharing code is the changing working directories/missing files/etc. The RStudio Project completely solves this for you. You can just copy and paste the folder wherever you need it, with nothing breaking.

Software Carpentry’s R for Reproducible Scientific Analysis and Efficient R Programming both discuss further the importance of using RStudio Projects and how to set them up.