eric | May 7, 2021, 4:19 p.m.
Literate statistical programming can be a useful way to put text, code, data, output all in one document. If you have a Linux desktop, and the distribution includes proper support for R, chances are you may want to work with R from the command line.
Literate programming works well for manuals, short to medium-length technical documents, tutorials, reports, data pre-processing documents and data summaries. It does not work particularly well for long research articles and reports, complex and / or time-consuming computations, documents that require very precise formatting.
Before installing the package knitr from the R terminal, we need to prepare the ground by installing the necessary Linux-libraries. From the command line on Debian, Ubuntu, Parrot and similar:
$ sudo apt install pandoc pandoc-citeproc
In addition, for PDF output you need LaTeX, which is contained in the TeX Live package. To install TeX Live via the package manager:
$ apt-get install texlive
On most distributions, the necessary libraries are available in the package manager. Pandoc is in the Debian, Ubuntu, Slackware, Arch, Fedora, NiXOS, openSUSE, and gentoo repositories. If you cannot find it in the manager, go to the Pandoc installation page for instructions on how to proceed.
Install and load the knitr package from the R command line:
> install.packages("knitr) > library(knitr)
Documents are written in a format called Rmarkdown with the file extension Rmd. This is a format that allows a mix of regular markdown with chunks of R code. The code can be executed. Chunks are included like this:
# Markdown Title
Any markdown text
```r
data <- read.csv("thedata.csv", header = TRUE, sep = ",")
```
So any R-code can be included by preceding it with three grave accents and the character "r", and succeeded by three grave accents. Rstudio has created a R Markdown Cheat Sheet [2] that makes it easier to remember all the possibilities in the combination of knitr and Markdown.
To execute the code in the Rmd-file, and to see the resulting output, we need to load the knitr library, set the working directory to the directory containing the Rmd-file we want to run (or include the whole path from our current working directory to the file in question), and execute the knitr-package's knit-command on the file:
> library(knitr)
> setwd(<working directory>)
> knit("filename.Rmd")
The code above will generate a markdown-file in the working directory. Code will be shown and the results of the executed code will be shown. So how does this result look like? Let's try to run a complete example that shows both code and the resulting report. The Rmarkdown file could look like this:
# Data From MT-Cars
Example of literate programming with the MT-cars data. Above we have added a title with markdown, here we are adding som text, and below there is a horizontal line.
*** ## Introduction
Above there is a subtitle. Next, we will add an R code chunk. In order to get and process the data, we need the following libraries:
```{r get_libraries} library(dplyr) ```
This code chunk loads the dply library.
## Data Processing
Let's do some work with the MT-cars data and display the result. First, we subset the data to only include cars with more than 6 cylinders, and then we display the result.
```{r subsetting_data}
mtcars_subset <- subset(mtcars, mtcars$cyl > 6)
mtcars_subset
```
## Graphics Graphics can easily be included in the report. Using the base barplot function:
```{r bar_chart_mtcars, fig.cap = "Figure 1: MPG for cars with more that 6 cylinders", fig.width=10, fig.height=8} par(mar=c(9,4,0,0),las=2)
barplot(mtcars_subset$mpg, names.arg=row.names(mtcars_subset), col=rainbow(15), ylab = "Miles per gallon", ylim = c(0,1.25*max(mtcars_subset$mpg)))
```
To generate the markdown, we need to do the following:
> library(knitr)
> setwd("directory with the Rmarkdown file")
> knit("cars.Rmd")
processing file: cars.Rmd |......... | 14%
ordinary text without R code |................... | 29%
label: get_libraries |............................ | 43%
ordinary text without R code |..................................... | 57%
label: subsetting_data |.............................................. | 71%
ordinary text without R code |........................................................ | 86%
label: bar_chart_mtcars (with options) List of 3 $ fig.cap : chr "Figure 1: MPG for cars with more that 6 cylinders" $ fig.width : num 10 $ fig.height: num 4 |.................................................................| 100%
ordinary text without R code output file: cars.md [1] "cars.md"
Have a look at the result here. You can see that you have a report that contains the code and the results of the code. If you want HTML instead of markdown as output from the Rmarkdown-file, you can use the convenience function knit2html [4].
> knit2html(“cars.Rmd”) > browseURL(“cars.html”)
It is a bit confusing that the result from knit2html says "output file: cars.md" as for regular knit, but if you use the browse-command in R or look at the working directory, you will see that a cars.html-file has been created. Click here for a sample output.
Note: You can also see that the result for the barplot isn't exactly the same as on the graphics device on your client. There are a few tricks to making good graphics from Rmarkdown-files with knitr, and there is a separate graphics guide in the documentation to help you along.
You can gain more control over the publishing process by using parameters in the head of the Rmarkdown file combined with the render()-command. Example of a header that generates a title in the document and outputs it to html:
---
title: Analysis of Data From Severe Weather Events
output: html_document
---
To achieve this, we need to load the Rmarkdown-library and use the render-command:
> library(rmarkdown)
> render("cars.Rmd")
You can output to a wide range of file formats using the output-parameter. All possible output options and what they create:
You can find these options and all sub-options, as well as more explanations on parameter use, in the cheat sheet. [2]
Literate programming can help create easily read and easily maintained reports. knitr is a particularly useful package for such programming. It is easy to get started with knitr, both within Rstudio and on the command line. Moreover, there is a wide range of possibilities within the package, so many that the package's creator has written a whole book on the subject. [5]
Experienced dev and PM. Data science, DataOps, Python and R. DevOps, Linux, clean code and agile. 10+ years working remotely. Polyglot. Startup experience.
LinkedIn Profile
Statistics & R - a blog about - you guessed it - statistics and the R programming language.
R-blog
Erlang Explained - a blog on the marvelllous programming language Erlang.
Erlang Explained