2 Analytical Platform
2.1 Introduction
To gain an overview of the Analytical Platform (AP) watch this 2-3 min introductory video, although please be aware that given it’s a few years old some things have changed. In addition to the information in this chapter, you can:
- Attend or work through yourself the Introduction: Using R on the Analytical Platform - see the internal R training section.
- Read the Analytical Platform User Guidance which provides more technical details.
2.2 Summary of key terms
It will help you to be familiar with the following key terms:
- Analytical Platform (AP): A data analysis environment providing modern tools and key datasets for MoJ analysts. AP contains training documents, resources, and access to various analytical software such as Rstudio and Jupyter.
- Control Panel: A place to navigate to Rstudio, Jupyter, S3 Buckets etc
- RStudio: Development environment for writing R code and R Shiny apps
- JupyterLab: Development environment for writing Python code including Python notebooks
- Git: Version control software that enables multiple people to make separate changes at the same time.
- GitHub: A web-based interface that uses Git and on which you publish and share your version-controlled code. You use Git locally (e.g. using RStudio) to track versions of your code, and then submit those changes to Github.
- GitHub Repositories (Repo): Broadly similar to setting up a project folder on DOM1 shared drive to save work and share with others. Files on Github Repos represent the definitive version of the project. Everyone who works on the project makes contributions from their own personal versions.
- Amazon S3: A web-based cloud storage platform for storing data. Access to amazon S3 buckets can be managed.
- Slack: Collaboration tool where you can get technical support for Analytical Platform tools such as R, Python, Git. You can share knowledge, submit admin requests and communicate quickly with other AP users.
2.3 Getting set up
Follow the steps in the Getting Started section of the Analytical Platform User Guide. You need to:
- Set up a Slack account.
- Set up a GitHub account with two-factor authentication.
- Set up a Analytical Platform account.
- Set up RStudio to use R and/or JupyterLab to use Python.
You can learn more about GitHub by attending or working through yourself the Introduction to Git/GitHub - see the internal Git/GitHub training section.
For those that need to get set up to use Athena databases for SQL (in R or Athena) on the Analytical Platform, please follow the additional instructions in the “Training Requirements” section of the Introduction to SQL training repository.
2.4 Managing data
Once you have got set up on the Analytical Platform, do read about the following data management/handling topics:
- How data are held on the Analytical Platform and finding the data you need. You can read about the three different data storage options (Amazon S3, Curated databases and home directories).
- Working with Amazon S3, data FAQ, the Data Uploader tool and interacting with Amazon S3 via the Analytical Platform.
- Information governance procedures to be followed.
- Data retention policies including when deleting data means they are permanently deleted.