Literate Statistical Programming From The Command Line

eric | May 7, 2021, 4:19 p.m.

Literate statistical programming can be a useful way to put text, code, data, output all in one document. If you have a Linux desktop, and the distribution includes proper support for R, chances are you may want to work with R from the command line.

Literate Programming

Literate programming means that text and code are all in one place and that they are in a logical order. Furthermore, data and results are automatically updated to reflect external changes. As the code is live, it is easy to test the code when building the document. The only downsides are that text and code in one place can make documents more difficult to read, and processing of documents can be slow.
 

Literate programming works well for manuals, short to medium-length technical documents, tutorials, reports, data pre-processing documents and data summaries. It does not work particularly well for long research articles and reports, complex and / or time-consuming computations, documents that require very precise formatting.

The knitr Package

knitr is a package for literate programming. It is a powerful tool for integrating code and text in a simple document format. knitr supports RMarkdown, LaTeX and HTML as documentation languages. The package can generate exports to a series of formats, including PDF and HTML [1].
 
knitr is an integral part of Rstudio, and publishing with knitr is really easy. However, there are at least two reasons why you may want to use it from the command line; you are using a Linux-distribution that makes installing Rstudio difficult, or you just like working from the command line. In my case, it is a bit of both. Note that you can work with R from the command line on all platforms.

Installing knitr

Before installing the package knitr from the R terminal, we need to prepare the ground by installing the necessary Linux-libraries. From the command line on Debian, Ubuntu, Parrot and similar:

$ sudo apt install pandoc pandoc-citeproc

In addition, for PDF output you need LaTeX, which is contained in the TeX Live package. To install TeX Live via the package manager:

$ apt-get install texlive

On most distributions, the necessary libraries are available in the package manager. Pandoc is in the Debian, Ubuntu, Slackware, Arch, Fedora, NiXOS, openSUSE, and gentoo repositories. If you cannot find it in the manager, go to the Pandoc installation page for instructions on how to proceed.

Install and load the knitr package from the R command line:  

> install.packages("knitr)  > library(knitr)

Writing Documents

Documents are written in a format called Rmarkdown with the file extension Rmd. This is a format that allows a mix of regular markdown with chunks of R code. The code can be executed. Chunks are included like this:

# Markdown Title

Any markdown text

```r

data <- read.csv("thedata.csv", header = TRUE, sep = ",")

```

So any R-code can be included by preceding it with three grave accents and the character "r", and succeeded by three grave accents. Rstudio has created a R Markdown Cheat Sheet [2] that makes it easier to remember all the possibilities in the combination of knitr and Markdown.

Generating Output

To execute the code in the Rmd-file, and to see the resulting output, we need to load the knitr library, set the working directory to the directory containing the Rmd-file we want to run (or include the whole path from our current working directory to the file in question), and execute the knitr-package's knit-command on the file:

> library(knitr)

> setwd(<working directory>)

> knit("filename.Rmd")

The code above will generate a markdown-file in the working directory. Code will be shown and the results of the executed code will be shown. So how does this result look like? Let's try to run a complete example that shows both code and the resulting report. The Rmarkdown file could look like this:

# Data From MT-Cars

Example of literate programming with the MT-cars data. Above we have added a title with markdown, here we are adding som text, and below there is a horizontal line.

*** ## Introduction

Above there is a subtitle. Next, we will add an R code chunk. In order to get and process the data, we need the following libraries:

```{r get_libraries} library(dplyr) ```

This code chunk loads the dply library.

## Data Processing

Let's do some work with the MT-cars data and display the result. First, we subset the data to only include cars with more than 6 cylinders, and then we display the result.

```{r subsetting_data}

mtcars_subset <- subset(mtcars, mtcars$cyl > 6)

mtcars_subset

```

## Graphics Graphics can easily be included in the report. Using the base barplot function:

```{r bar_chart_mtcars, fig.cap = "Figure 1: MPG for cars with more that 6 cylinders", fig.width=10, fig.height=8} par(mar=c(9,4,0,0),las=2)

barplot(mtcars_subset$mpg, names.arg=row.names(mtcars_subset), col=rainbow(15), ylab = "Miles per gallon", ylim = c(0,1.25*max(mtcars_subset$mpg)))

```

To generate the markdown, we need to do the following:

> library(knitr)

> setwd("directory with the Rmarkdown file")

> knit("cars.Rmd")

processing file: cars.Rmd |......... | 14%

ordinary text without R code |................... | 29%

label: get_libraries |............................ | 43%

ordinary text without R code |..................................... | 57%

label: subsetting_data |.............................................. | 71%

ordinary text without R code |........................................................ | 86%

label: bar_chart_mtcars (with options) List of 3 $ fig.cap : chr "Figure 1: MPG for cars with more that 6 cylinders" $ fig.width : num 10 $ fig.height: num 4 |.................................................................| 100%

ordinary text without R code output file: cars.md [1] "cars.md"

Have a look at the result here. You can see that you have a report that contains the code and the results of the code. If you want HTML instead of markdown as output from the Rmarkdown-file, you can use the convenience function knit2html [4].

> knit2html(“cars.Rmd”) > browseURL(“cars.html”)

It is a bit confusing that the result from knit2html says "output file: cars.md" as for regular knit, but if you use the browse-command in R or look at the working directory, you will see that a cars.html-file has been created. Click here for a sample output.

Note: You can also see that the result for the barplot isn't exactly the same as on the graphics device on your client. There are a few tricks to making good graphics from Rmarkdown-files with knitr, and there is a separate graphics guide in the documentation to help you along.

Control Generation With Parameters

You can gain more control over the publishing process by using parameters in the head of the Rmarkdown file combined with the render()-command. Example of a header that generates a title in the document and outputs it to html:

---

title: Analysis of Data From Severe Weather Events

output: html_document

---

To achieve this, we need to load the Rmarkdown-library and use the render-command:

> library(rmarkdown)

> render("cars.Rmd")

You can output to a wide range of file formats using the output-parameter. All possible output options and what they create:

  • html_document - html
  • pdf_document - pdf (requires Tex)
  • word_document - Microsoft Word (.docx)
  • odt_document - OpenDocument Text
  • rtf_document - Rich Text Format
  • md_document- Markdown
  • github_document - Github compatible markdown
  • ioslides_presentation - ioslides HTML slides
  • slidy_presentation - slidy HTML slides
  • beamer_presentation - Beamer pdf slides (requires Tex)

You can find these options and all sub-options, as well as more explanations on parameter use, in the cheat sheet. [2]

Conclusion

Literate programming can help create easily read and easily maintained reports. knitr is a particularly useful package for such programming. It is easy to get started with knitr, both within Rstudio and on the command line. Moreover, there is a wide range of possibilities within the package, so many that the package's creator has written a whole book on the subject. [5]

References

  1. knitr - The documentation page for the knitr package - Visited February 2, 2018.
  2. R Markdown Cheat Sheet Rstudio - Downloaded February 2, 2018.
  3. Making use of external R code in knitr and R markdown - Visited February 1, 2018.
  4. knit2html - R Documentation - Visited February 5, 2018.
  5. Dynamic Documents with R and knitr, Second Edition , by Yihui Xie (the creator of knitr) - Visited February 4, 2018.

About Me

Experienced dev and PM. Data science, DataOps, Python and R. DevOps, Linux, clean code and agile. 10+ years working remotely. Polyglot. Startup experience.
LinkedIn Profile

By Me

Statistics & R - a blog about - you guessed it - statistics and the R programming language.
R-blog

Erlang Explained - a blog on the marvelllous programming language Erlang.
Erlang Explained