Section 2 Demographics

This section gives details of variable naming and labelling conventions for demographic characteristics. This is primarily aimed at variables for inclusion in datasets as defined in Section 1, but the same terminology and categorisation should be used in publication text and tables where appropriate.

The principles here aim to mirror the GSS Harmonised Principles for Demographics as much as possible. However, there are some differences where MoJ requirements or data limitations require an alternative approach.

Where there is no particular guidance listed below, producers should aim to follow the GSS Harmonised Principles for Demographics.

2.1 Sex

It is GSS policy to always make available statistics disaggregated by sex. This variable should therefore be included in all datasets.

2.1.1 Sex

The variable sex should be used to refer to the sex of a person.

The preferred values of ‘Male’ and ‘Female’ should be used in most cases.

Guidance for reporting on gender identity where a simple male/female split is inadequate is currently under consideration.

2.1.2 Reporting guidance

For the purpose of consistency, avoid the use of the word ‘gender’ when referring to sex.

If you need to refer to groups within statistical commentary try to use ‘Male’ or ‘Female’ as an adjective. (ie. Do not refer to ‘females’ but to ‘female prisoners’ or ‘female defendants’). This aids clarity in determining the population being reported on. It also avoids confusion in cases where populations may include those aged both under and over 18 and for which words like ‘men’ and ‘women’ carry connotations regarding age.

In tables, use ‘All persons’ to refer to the total of the Male and Female row, in line with the GSS Harmonised Principles

2.2 Age

2.2.1 Age

The variable name age should be used for a variable referring to a single year of age.

The variable should be numeric, with only positive integer values allowed (including 0).

Zero (0) should be used to signify an age of less than 12 months.

2.2.2 Age Band

Single years of age should be provided where possible. Where this is not possible, most likely due to concerns about disclosure, this can be replaced by variables which band age ranges together.

The variable name age_band should be used for a variable referring to a range of ages which have been grouped together. Only one age banding variable should be included in a dataset, preferably that with the narrowest banding.

These variables should be constructed as strings with a ‘to’ separator between the maximum and minimum values in a range or ‘and over’ to indicate a range where only the minimum value is specified (eg. ‘10 to 15’, ‘75 and over’). This is preferred to ‘-’ or ‘+’ (eg. 10-15, 75+) as these may be mistaken for mathematical operators.

Avoid using ‘under’ (eg. ‘Under 18’) as when arranged alphabetically, this will cause this value to appear at the bottom of a list, when it should be at the top. Instead, describe this as a range, including zero where necessary (eg. 0 to 17).

The Preferred age bands below should be used wherever possible. These have been chosen to match as closely as possible with GSS Harmonised Principles while also maintaining cut-off points which have relevance within the Justice System. Bands from the Alternative list can be used where additional grouping is required.

Where necessary, these bands can be grouped together or broken down further, although producers should try to use boundaries that are coterminous with those in the Preferred or Alternative list in order to maintain a degree of comparability between publications.

Example

Producers who want a finer breakdown of data for those aged under 10 could use the categories ‘0’, ‘1 to 4’ and ‘5 to 9’ because these combine to form an existing band (‘0 to 9’). Producers should avoid using a breakdown such as ‘0 to 4’ and ‘5 to 10’ because the ‘5 to 10’ band crosses over a boundary in the preferred banding.

Similarly, producers can use a ‘60 and over’ age band if necessary because it is produced by combining age bands within the preferred bandings.

There will be particular cases where it is necessary to provide data according to an incompatible age banding (eg. where a law is applicable to those over/under a certain age which falls in the middle of one of the preferred bands). In these cases, appropriate reporting is the primary concern and producers will need to deviate from the harmonised age bands, although they should seek to align as much as possible with higher or lower age bands.

The list of preferred and alternative age bands is below:

Preferred Alternative
0 to 9 0 to 14
10 to 14 0 to 17
15 to 17 15 to 20
18 to 20 18 to 24
21 to 24 60 to 69
25 to 29 18 and over
30 to 39 21 and over
40 to 49 25 and over
50 to 59 60 and over
60 to 64 65 and over
65 to 69 70 and over
70 to 74
75 and over

2.2.2.1 Differences from GSS Harmonised Principles

The MoJ preferred age bands are most similar to the GSS Harmonised Principle E age banding. Generally, the difference is greater aggregation. However, the main difference is in those bands from 15 to 24.

This is largely due to Youth Justice services covering those aged 15 to 17, meaning this age group requires its own age band within MoJ, but is not a recognised age band within any of the GSS Harmonised Principles for age. Many MoJ publications also currently provide an age cutoff at 21, so this has been maintained in this guidance for consistency within publications.

MoJ Bands GSS Bands
15 to 17 15-19
18 to 20 20-24
21 to 24

2.2.3 Age Group

The variable name age_group should be used for a variable that describes a range of ages without giving any numerical values. The possible values and their corresponding age bands are below:

age_group age_band
Juveniles 0 to 17
Young adults 18 to 20
Adults 21 and over

2.3 Marital Status

Standards covering marital or same-sex civil partnership status or living arrangements are problematic for MoJ due to the nature of some statistical subject matter (eg. those in prison) and the impact this has on terms such as ‘cohabiting’ or ‘separated’. As a result, the term ‘Legally registered partnership’ is preferred and roughly corresponds to GSS Harmonised Principles.

Where other measures of marital status or living arrangements are available, these can be included in data, but variables providing corresponding values for Partnership Status should also be included for comparability with other datasets.

2.3.1 Partnership Status

The variables partnership_status and partnership_type should be used to categorise this concept, with partnership_type being a subcategory of partnership_status. Both are character variables which can hold the following values:

partnership_status partnership_type Notes
In a legally registered partnership Married or registered in a same-sex civil partnership
Not in a legally registered partnership Single Never married or formed a same-sex civil partnership
Divorced Includes dissolved same-sex civil partnerships
Widowed Includes surviving partners from a same-sex civil partnership

2.4 Ethnicity

2.4.1 Ethnicity

Producers should, as standard, make data available by ethnicity using the 18+1 ethnicity classification that is part of the GSS Harmonised Principles. This should be included in a variable named ethnicity.

If the full 18+1 classification is not possible, either due to the way data are collected or because of disclosure concerns, ethnicity can instead be reported according to broader groupings. These are called ethnic_group, which groups some ethnic groups together, or ethnic_group_broad which is the broadest categorisation and separates between ‘White’ and ‘Other ethnic groups’. This categorisation is based on best practice for reporting on ethnicity as advised by the Race Disparity Unit (https://guide.ethnicity-facts-figures.service.gov.uk/how-we-write).

Additionally, if the full 18+1 classification isn’t included in a dataset, an additional variable, ethnic_minority can be included. This is similar to the ethnic_group_broad classification, but separates between ‘White British’ and ‘Ethnic minority groups’. This is most likely to be used in cases where ethnicity categories are combined due to disclosure concerns.

The appropriate categories are detailed below:

ethnic_minority ethnic_group_broad ethnic_group ethnicity
White British White White English/ Welsh/ Scottish/ Northern Irish/ British
Ethnic minority groups Irish
Gypsy, Traveller or Irish Traveller *
Any other White background
No equivalent to White (unspecified) - do not use this variable if data includes this classification White (unspecified)
Other ethnic groups Mixed/ Multiple ethnic groups White and Black Caribbean
White and Black African
White and Black (unspecified)
White and Asian
Any other Mixed ethnic background
Mixed ethnic background (unspecified)
Asian/ Asian British Indian
Pakistani
Bangladeshi
Chinese
Any other Asian background
Asian/ Asian British (unspecified)
Black/ African/ Caribbean/ Black British African
Caribbean
Any other Black background
Black/ Black British (unspecified)
Other ethnic group Arab
Any other ethnic group
Other (unspecified)

* Where numbers are too small or disclosive the ‘Gypsy, Traveller or Irish Traveller’ category should be merged with ‘Any other White background’.

2.4.2 Use of (unspecified) ethnicity codes

Each ethnic_group includes one ethnicity category of ‘xxx (unspecified)’. These are not part of the GSS Harmonised Principles but have been added to MoJ standards to account for data taken from administrative systems which may not record data at the appropriate level of detail for the Harmonised classifications to be used.

These can be used when no further breakdown of a person’s broader ethnic group is available. It differs from the ‘Any other…’ category as it includes those who may or may not be in one of the listed ethnicity categories. The ‘Any other…’ categories are exclusive to the other specified ethnicity categories (see example below).

Example

‘Any other Asian background’ includes those who are from an Asian/ Asian British background but who are NOT Indian, Pakistani, Bangladeshi or Chinese.

‘Asian/ Asian British (unspecified)’ includes those who are from an Asian/ Asian British background and who MAY BE Indian, Pakistani, Bangladeshi or Chinese, but there is not enough information to know whether they are from one of these backgrounds or another Asian background.

The Mixed category of ‘White and Black (unspecified)’ should be used in cases where it is known that a person is from a mixed White and Black ethnic background, but it is not possible to distinguish whether they are from a ‘White/ Black African’ or a ‘White/ Black Caribbean’ background.

2.4.3 Reporting Guidance

For information on writing about ethnicity, please see the Race Disparity Unit’s guide on How we write about statistics and ethnicity

2.5 Nationality

While there is not currently a Harmonised Principle covering Nationality, the National Statistics country classification (NSCC) provides a standardised list of country names and country codes for use in statistical outputs. Producers in MoJ are encouraged to use this list when reporting on the nationality of data subjects.

The full NSCC list can be found in the NSCC Classification tab in the NSCC Classification and Coding Index

2.5.1 Nationality Name

The variable nationality_name should be used in MoJ data to denote a nationality variable that aligns with the NSCC country list. The country names in this variable should match the country names in the ‘Category’ column of the NSCC Classification and Coding Index. (Although see exceptions below).

2.5.2 Nationality Code

The NSCC list includes a two-letter, three-letter and three-digit code for each country.

MoJ guidance is to use the two-letter code as standard and to include it on datasets under the variable name nationality_nscc_a2.

2.5.3 Exceptions

The NSCC list is intended to be used for data relating to countries, not nationalities. Therefore, there are some cases in which the NSCC category doesn’t relate to a nationality. This is most notable for the United Kingdom, in which the NSCC provides primary codes for each constituent nation, but also applies to Spain and Cyprus.

The NSCC does however include additional codes for:

  • United Kingdom Not Otherwise Specified

  • Spain Not Otherwise Specified

  • Cyprus Not Otherwise Specified

In these cases you should use these names and codes, removing the ‘not otherwise specified’ part of the name, as below:

nationality_name nationality_NSCCa2
United Kingdom XK
Spain XE
Cyprus XC

2.5.4 National Identity

National Identity is used to distinguish between those identifying as from any of the constituent nations of the United Kingdom, or as simply ‘British’. It should be contained in a variable named national_identity. This variable should only contain a valid value for people whose nationality group is in the UK or Ireland (that is, NSCC nationality codes XF, XG, XH, XI, XJ, XK or IE). Any other nationality code should have no value for national_identity.

The inclusion of those with Irish nationality is to allow for comparison with Northern Irish data which also collects this information for Irish national identity.

Valid values are:

national_identity
British
English
Scottish
Welsh
Northern Irish
Other
Unspecified

2.5.5 Reporting guidance

Published tables should name countries using the same spelling and naming conventions used in the NSCC list.

If using any country descriptions with a ‘…Not Otherwise Specified’ suffix this suffix can be removed from the country name in the table on condition that any overlapping category is not also used. For example, you can include ‘United Kingdom’ in a table, as long as that table does not also include any of ‘England’, ‘Scotland’, ‘Wales’, ‘Northern Ireland’ or ‘Great Britain’. If any of these sub-groups are included in the table, then the full name ‘United Kingdom Not Otherwise Specfied’ should be used.