6 Generalisable Code

Your code should consist of as few lines as possible, while also being easily understandable. To aid this, consider how to make it as generalisable as possible so yourself and others can take what you’ve written and use it for several different purposes. In practice, this means writing code as functions - an MoJ level 2 RAP component.

A function is a set of commands that are bundled together so that they can all be repeated with a single line of code. A function should accept different inputs, or parameters, which allows it to be applied in different situations. Your functions should not include any code that is only applicable to your data or hardcoding of any dataset-specific variables. Any references to data should be set within the parameters of the function so that the function can work on other datasets. Code that is used to recode variables, for example, should refer to a lookup table rather than hard-coding the recodes within the function. This means that your function can be used in other contexts by simply referring to a new lookup table.

There are advantages to functions being within packages - an MoJ level 3 RAP component. Some of the benefits are provided in the Developing R packages course introduction.

In general, your code should be written to maximise the use of functions that are (in order of preference):

  1. Within multi-use (across project) packages, whether internal or external, and that are well maintained by others
  2. Within customised packages for your particular project (the maintenance burden resting with you/your team)
  3. Custom made but not within a package for your particular project (the maintenance burden resting with you/your team)

Examples of multi-use packages that are commonly used across tidy data, enabling your code to be standardised, and which are very well maintained include:

  • dplyr
  • tidyverse
  • openxlsx
  • ggplot2

An example of an MoJ multi-use package is Mojrap, this being intended to serve as the repository for functions that can be used across publications within MOJ. More information about internal packages can be found at MoJ R packages.

When using functions within packages in R, it is good practice to use package names with a double colon. For example, when using dplyr’s filter function use dplyr::filter rather than just filter. This means whenever your code is reproduced, the packages required to run it are obvious from the functions used within the code. This also makes functions easier to code into packages if you do this later down the line. For information about writing functions within packages see see the Developing R packages section on making functions work within a package.