2 RAP Structure
In order to develop a pipeline like the one above, certain ways of structuring your RAP project can help.
2.1 Use one repo for each endpoint in your process
As in the diagram above, define a set of endpoints or outputs that your process will produce. This could include:
- Cleaned datasets for internal use
- Publishable datasets (these may be the same as those above)
- Publication outputs (Charts, Tables, Publication text)
- Separate repos for other outputs (eg. MI packs, briefing packs, data visualisation tools)
This makes it easier to use outputs for multiple purposes, rather than having to extract them from the middle of a larger process. It also allows other users who want to adapt your code or outputs to find the section of code that they need, without having to understand the full range of code.
In the diagram above, the aim of the first repository should be to render the data into a format that can be used to produce the broadest range of other outputs, which are then each created within their own repositories. You should also consider the point at which you want to include disclosure control in the process. For example, the first repo might create a full, unredacted dataset for internal use. You may they want to include a further stage which aggregates data to prevent small numbers being generated.
2.2 Common structure to a RAP repo
A RAP repository should have:
- An ‘inputs’ folder, for any inputs required by the process. This should only be used for things like lookup tables, templates etc. Datasets should be stored in s3.
- An ‘outputs’ folder, for all outputs (eg. HTML, pdf, png) generated by the process.
- A ‘processing’ folder containing R scripts or Rmd files used in the process.
- A ‘functions’ folder within the processing folder, containing any functions created for the process.
A RAP repository should not have:
- Stored datasets
- General functions. These should be stored in a package – ideally mojrap, or a process-specific package if necessary.
Note that this structure only applies to a RAP repo that works as a series of scripts run manually. You may wish to create your repo as a package, in which case you should adopt the required structure for a package.