eric | Jan. 21, 2020, 6:10 p.m.
Order, Order, Order! If you have done any major R-project, you quickly get to the point where it is hard to keep everything ordered - your scripts, your data, your output, your tests... If you have done several major R-projects, you know how hard it is to keep a similar structure and workflow between projects. The end result can be that projects are hardly reproducible because you confuse others with a lack of order in each project, and a different order - to the degree that you have any order - from one project to the next.
There are some resources available - free and open-source - that can help you keep order in your project and keep a similar structure and workflow in every project you do. One such resource is ProjectTemplate .
ProjectTemplate helps you automate the grinding in your project by organizing the files in your project in predefined folders, loading all the R packages you need, loading your data sets into memory, and pre-processing (munging) your data into a form suitable for analysis. The workings of the package are easily configured in a single configuration file. In addition to the automation of grudgery, the ProjectTemplate team want to promote better coding and analysis practices by :
ProjectTemplate creates a series of folders in the project. Folders of particular importance are:
data - Both original and eventual processed data src - All R-scripts created for the project figure - All graphical output reports - All project reports
In each directory, there is a README.md-file that explains what the directory is for. For example, the README.md for the src- folder looks like this:
Here you'll store your final statistical analysis scripts. You should add the following piece of code to the start of each analysis script: `library('ProjectTemplate); load.project()`. You should also do your best to ensure that any code that's shared between the analyses in `src` is moved into the `munge` directory; if you do that, you can execute all of the analyses in the `src` directory in parallel. A future release of ProjectTemplate will provide tools to automatically execute every individual analysis from `src` in parallel.
This is helpful for ensuring a consistent use of the template throughout the project and between different projects.
To load the project, you'll first need to
setwd() into the root directory of your project. Next, you need to run the following two lines of R code:
After you enter the second line of code, you'll see a series of automated messages as ProjectTemplate goes about doing its work. This work involves:
The workings of the package are set in the configuration file
/config/global.dcf. A typical setup might look like this:
version: 0.8 data_loading: TRUE data_loading_header: TRUE data_ignore: cache_loading: TRUE recursive_loading: FALSE munging: TRUE logging: FALSE logging_level: INFO load_libraries: TRUE libraries: reshape, plyr, dplyr, ggplot2, stringr, lubridate, Hmisc as_factors: TRUE data_tables: FALSE attach_internal_libraries: FALSE cache_loaded_data: TRUE sticky_variables: NONE
This setup loads the data, goes through all the scripts in the munge folder, and loads all the libraries defined above, every time you call
load.project(). Have a look at the Getting Started Guide from ProjectTemplate for more details  on how to get started. The guide is one of the more well-written "getting started"-guides I've seen.
ProjectTemplate is a good aid to make sure that your R-projects are reproducible. It helps maintain good order and a similar structure across projects. The same - and perhaps more importantly - goes for workflow. Note that I have just used it for a couple of relatively simple projects so far, and I will write an extensive review as soon as I gain more experience with it.
 The ProjectTemplate Website - http://projecttemplate.net/ - Downloaded January 20, 2018.
 ProjectTemplate Getting Started Guide - http://projecttemplate.net/getting_started.html - Downloaded January 20, 2018.
Experienced dev and PM. Data science, DataOps, Python and R. DevOps, Linux, clean code and agile. 10+ years working remotely. Polyglot. Startup experience.