R Programming: Goodbye recode(), Hello case_match()

eric | Sept. 24, 2024, 10:51 a.m.

In R programming, both `recode()` and `case_match()` are used to replace or reassign values in a vector. However, the function `recode()` is considered superseded in favor of the more general `case_match()`. This is a breakdown of what each function does and the key differences between them.

 recode() Function

What it Does

The `recode()` function in R (part of the **dplyr** package) allows you to replace specific values in a vector with other values. It works well for categorical variables or factors where certain values need to be mapped to new values.

Usage

> library(dplyr)
> x <- c("A", "B", "C", "A", "C", "B")
> recode(x, A = "Alfa Romeo", B = "Bugatti", C = "Caterham")
[1] "Alfa Romeo" "Bugatti"    "Caterham"   "Alfa Romeo" "Caterham"   "Bugatti"  

case_match() Function

What it Does

The `case_match()` function is also part of the dplyr package, but offers more flexibility than recode(). It allows you to apply pattern matching for more complex recoding scenarios, similar to a case-when logic in SQL. It can handle ranges, matches conditions, and can include default cases.

Usage

> library(dplyr)
> x <- c(1, 2, 3, 4, 5, 6, 7)
> case_match(x,
+            1 ~ "One",
+            2 ~ "Two",
+            3 ~ "Three",
+            4 ~ "Four",
+            .default = "Other")
[1] "One"   "Two"   "Three" "Four"  "Other" "Other" "Other"

Key Differences

1. Flexibility:

  - recode() works well for direct value-to-value replacements but lacks the ability to handle ranges or conditions.

  - case_match() is more flexible because it allows matching based on logical conditions (e.g., ranges, multiple values, etc.), and provides a default case using `.default`.

2. Syntax:

  - recode() uses named arguments (old values matched with new ones).

  - case_match() follows a pattern like `value ~ replacement` which makes it more readable for matching multiple conditions, i.e., cleaner code.

3. Default Values:

  - recode() doesn’t have built-in handling for default values. If a value is not specified, it remains unchanged.

  - case_match() includes .default to assign a value for cases that do not match any condition.

Summary

recode() is simple but limited to direct value mappings. case_match() is more powerful and flexible, capable of handling more complex recoding scenarios involving ranges, conditions, and defaults. Because case_match() provides a more robust and readable way to match and recode values, it is now recommended over recode().

Note that for creating new variables based on logical vectors, if_else() is recommended. For complicated criteria — beyond the capabilities of case_match(), check out case_when():
case_when(
  x %in% c("a", "b") ~ 1,
  x %in% "c" ~ 2,
  x %in% c("d", "e") ~ 3
)

case_match(
  x,
  c("a", "b") ~ 1,
  "c" ~ 2,
  c("d", "e") ~ 3
)

About Me

Experienced dev and PM. Data science, DataOps, Python and R. DevOps, Linux, clean code and agile. 10+ years working remotely. Polyglot. Startup experience.
LinkedIn Profile

By Me

Statistics & R - a blog about - you guessed it - statistics and the R programming language.
R-blog

Erlang Explained - a blog on the marvelllous programming language Erlang.
Erlang Explained