Projects
File Structure
Good file structure allows you to manage all the components of your (often large) projects, while facilitating easy sharing and reducing the risk of accidentally deleting/altering important files. Keeping your raw data file in its own folder (e.g., input/
or raw/
makes it harder to mix up these files with intermediate ones down the line).
Software Carpentry’s R for Reproducible Scientific Analysis:
Best practices for file structure/data management include:
- Treat raw data as read-only
- Store data cleaning scripts in a separate folder and create a second “read-only” data folder to hold the “cleaned” data sets
- Treat generated output as disposable
Efficient R Programming suggests a sub-directory resembling something like below to keep things tidy:
project
└───input/
└───output/
└───R/
└───graphics/
└───README.md
project
└───data/
└───derived/
└───raw-data/
└───R/
└───script/
└───graphics/
└───README.md
Good Enough Practices in Scientific Computing suggests similar file structure and data management practices.
README
- A README file can act as a type of metadata (see below): it facilitates people using your data, script, etc.
- There are basic requirements from a README in order to make your work usable (highlighted in our Think/Pair/Share exercise)
ARDC Metadata Guide In order to use data, we need to know:
how the data is structured what it describes
how to read it (e.g. column headings and units)
methodological information such as instrument settings and calibrations, reagents used, or survey questions
exactly what they are allowed to do with the data through rights metadata such as licensing
how to acknowledge the original creators by citing the data
Reproducible Quantitative Methods Metadata is required for open data, by making a data reuse plan we can ensure that our data is usable for other people, into the future.
Metadata should warn users about problems/inconsistencies in the data and provide checks to make sure data is functioning properly (White et al., 2013)
Cornell University best practices provides a README template that is free to adapt, alter, and use
Examples:
RStudio Projects
Using an RStudio Project makes sharing your data/code with others (and your future self) SO MUCH EASIER! One of the main issues with sharing code is the changing working directories/missing files/etc. The RStudio Project completely solves this for you. You can just copy and paste the folder wherever you need it, with nothing breaking.
Software Carpentry’s R for Reproducible Scientific Analysis and Efficient R Programming both discuss further the importance of using RStudio Projects and how to set them up.