1 Introduction to reproducibility and RAP

The purpose of this manual is to set out the MoJ approach to reproducibility and RAP which stands for “Reproducible Analytical Pipelines” and also provide or signpost suitable guidance so the approach can be realised. The default position is that all MoJ coding projects should be done in a reproducible manner. This is an enabler of quality and good and efficient AQA. For more information about the benefits of RAP see this Analytical Function article on RAP. The last letter in RAP could just as easily be “Process” as “Pipelines”.

Reproducible: Meaning that at any point in the future we should be able to reproduce everything that we have done today. No more getting queries about a publication from five years ago and searching around to try to find how a figure was calculated.

Pipeline (or Process): Meaning an end-to-end system that is as close as possible to a ‘one click’ process (minimising manual steps) for turning source data into useful analytical outputs. As far as possible, this process should be ‘future-proofed’ or at least built to be easy to maintain and able to take account of new developments (eg. package updates/data changes).

The MoJ approach to reproducibility and RAP involves the following:

  • To be considered fully reproducible, an MoJ RAP project should include the three levels of key RAP components in MoJ coding projects.

  • All MoJ coding projects (regardless of whether considered RAP) should include the level one components to be considered reproducible.

  • For statistical publications and other similar processes priority should be given to the production of harmonised, understandable data files as a primary RAP output, from which additional outputs to meet user needs, such as publication text, charts and tables can be built along with other uses of the data, such as data visualisations.

For the production of statistical publications, the diagram below illustrates how a full analytical pipeline might look at a high level: