ProjectTemplate for R - First Impressions

eric | Jan. 21, 2020, 6:10 p.m.

Order, Order, Order! If you have done any major R-project, you quickly get to the point where it is hard to keep everything ordered - your scripts, your data, your output, your tests... If you have done several major R-projects, you know how hard it is to keep a similar structure and workflow between projects. The end result can be that projects are hardly reproducible because you confuse others with a lack of order in each project, and a different order - to the degree that you have any order - from one project to the next.

There are some resources available - free and open-source - that can help you keep order in your project and keep a similar structure and workflow in every project you do. One such resource is ProjectTemplate [1].

How It Works

ProjectTemplate helps you automate the grinding in your project by organizing the files in your project in predefined folders, loading all the R packages you need, loading your data sets into memory, and pre-processing (munging) your data into a form suitable for analysis. The workings of the package are easily configured in a single configuration file. In addition to the automation of grudgery, the ProjectTemplate team want to promote better coding and analysis practices by [1]:

  • Curating the best R packages.
  • Providing simple tools for keeping a log of your work
  • Providing template code for:
    • Data diagnostics
    • Data munging
    • Code profiling
    • Unit testing

ProjectTemplate for R

ProjectTemplate creates a series of folders in the project. Folders of particular importance are:

data - Both original and eventual processed data
src - All R-scripts created for the project
figure - All graphical output
reports - All project reports

In each directory, there is a README.md-file that explains what the directory is for. For example, the README.md for the src- folder looks like this:

Here you'll store your final statistical analysis scripts. You should add the following piece of code to the start of each analysis script: `library('ProjectTemplate); load.project()`. You should also do your best to ensure that any code that's shared between the analyses in `src` is moved into the `munge` directory; if you do that, you can execute all of the analyses in the `src` directory in parallel. A future release of ProjectTemplate will provide tools to automatically execute every individual analysis from `src` in parallel.

This is helpful for ensuring a consistent use of the template throughout the project and between different projects.

Getting Started

To load the project, you'll first need to setwd() into the root directory of your project. Next, you need to run the following two lines of R code:

install.packages('ProjectTemplate')
library('ProjectTemplate')

After you enter the second line of code, you'll see a series of automated messages as ProjectTemplate goes about doing its work. This work involves:

  • Reading in the global configuration file contained in config.
  • Loading any R packages listed in the configuration file.
  • Reading in any data-sets stored in data or cache.

The workings of the package are set in the configuration file /config/global.dcf. A typical setup might look like this: version: 0.8 data_loading: TRUE data_loading_header: TRUE data_ignore: cache_loading: TRUE recursive_loading: FALSE munging: TRUE logging: FALSE logging_level: INFO load_libraries: TRUE libraries: reshape, plyr, dplyr, ggplot2, stringr, lubridate, Hmisc as_factors: TRUE data_tables: FALSE attach_internal_libraries: FALSE cache_loaded_data: TRUE sticky_variables: NONE

This setup loads the data, goes through all the scripts in the munge folder, and loads all the libraries defined above, every time you call load.project(). Have a look at the Getting Started Guide from ProjectTemplate for more details [2] on how to get started. The guide is one of the more well-written "getting started"-guides I've seen.

Preliminary Conclusion

ProjectTemplate is a good aid to make sure that your R-projects are reproducible. It helps maintain good order and a similar structure across projects. The same - and perhaps more importantly - goes for workflow. Note that I have just used it for a couple of relatively simple projects so far, and I will write an extensive review as soon as I gain more experience with it.

Reference

[1] The ProjectTemplate Website - http://projecttemplate.net/ - Downloaded January 20, 2018.

[2] ProjectTemplate Getting Started Guide - http://projecttemplate.net/getting_started.html - Downloaded January 20, 2018.  

About Me

Experienced dev and PM. Data science, DataOps, Python and R. DevOps, Linux, clean code and agile. 10+ years working remotely. Polyglot. Startup experience.
LinkedIn Profile

By Me

Statistics & R - a blog about - you guessed it - statistics and the R programming language.
R-blog

Erlang Explained - a blog on the marvelllous programming language Erlang.
Erlang Explained