Chapter 4 Reproducible
We want our code to be reproducible so that:
- it can be used by others (both for collaboration and to allow effective review and accountability);
- it keeps working over time (protected from external changes);
- it can be easily reused by others in their own projects.
There are a number of steps that we can take to ensure that our code is as reproducible as possible.
4.1 Manage project dependencies
Your project will depend on an number of external factors, such as software or packages. These dependencies may mean that your project won’t work on others’ machines or may not work on your machine at a later date (e.g. as external packages are updated over time). To ensure that this doesn’t become an issue for your project, you should use some kind of dependency management tool.
Dependency management tools
Language | Tools |
---|---|
R | We recommend using Conda. Other alternatives are Packrat and Renv. |
Python | We recommend using Conda |
Javascript | Include third party library dependencies in the project as .js files |
Include a git hash
If practical, the output of your code should include the git hash of the code that produced it. By doing so, the analysis should be more reproducible, there is no ambiguity about the specific code that was used to generate it.
R
You can access the git hash using either of the following code: snippets.
or
4.2 Format
If the output is a report, the write up should be fully reproducible, or as close as possible.
- Avoid workflows that require manually copying and pasting results between documents.
- For Python, consider using Jupyter notebooks. For R, use
rmarkdown
.
4.3 Optimize for change
- Don’t try to solve every conceivable problem up-front, instead focus on making your code easy to change when needed.
- Don’t prematurely optimize - choose clarity over performance, unless there is a serious performance issue that needs to be addressed.
- Change can come in several forms, including hardware - your code will eventually be run on a colleague’s machine or a server somewhere. Without over-complicating things, write your code with this in mind. For example, use relative paths (e.g.
./file_in_the_project_directory.R
rather than/Users/my_username/development/my_project/file_in_the_project_directory.R
)