5 Generalisable Code

When writing code, you should consider how to make it as generalisable as possible. That is, you should write your code in a way that allows yourself and others to take what you’ve written and use it for several different purposes. In practice, this means writing code as functions within R.

A function is a set of commands that are bundled together so that they can all be repeated with a single line of code. A function will generally accept different inputs, or parameters, which allows it to be applied in different situations. Within R, functions can be grouped together in packages.

This means writing your code as much as possible to use these three methods (in reverse order of preference):

  1. Custom functions
  2. Custom packages
  3. Multi-use packages (eg. mojrap)

Code should use as few individual lines of processing as possible. You should design your process to make use of functions as much as possible and these functions should be designed to work on other datasets than those within your RAP process.

When creating functions, make them as generalisable as possible. This means not including any code within your function that is only applicable to your data and not hardcoding any dataset-specific variables. Any references to data should be set within the parameters of the function so that the function can work on other datasets. Code that is used to recode variables, for example, should refer to a lookup table rather than hard-coding the recodes within the function. This means that your function can be used in other contexts by simply referring to a new lookup table.

These functions should be built into packages, either custom packages for your particular RAP, or packages such as mojrap which work across multiple processes.

Mojrap is intended to serve as the repository for functions that can be used across publications within MOJ. It is also good practice to use the common packages used across tidy data to standardise your code, these could include:

  • dplyr
  • tidyverse
  • openxlsx
  • ggplot2

When writing your RAPped code in R, it is good practice to use package names with a double colon when calling functions from external packages. For example, when using dplyr’s filter function use dplyr::filter rather than just filter. This means whenever your code is reproduced, the packages required to run it are obvious from the functions used within the code. This also makes functions easier to code into packages if you do this later down the line.