From f1615f2545168849c8d0f5263ddeecbf62cab618 Mon Sep 17 00:00:00 2001 From: Rich Bielby Date: Thu, 12 Sep 2024 09:32:03 +0100 Subject: [PATCH 01/18] First draft of API standards guidance, including section on tidy data. --- statistics-production/api-data-standards.qmd | 101 +++++++++++++++++++ 1 file changed, 101 insertions(+) create mode 100644 statistics-production/api-data-standards.qmd diff --git a/statistics-production/api-data-standards.qmd b/statistics-production/api-data-standards.qmd new file mode 100644 index 0000000..42dfd67 --- /dev/null +++ b/statistics-production/api-data-standards.qmd @@ -0,0 +1,101 @@ +--- +title: "API Data Standards" +--- + +```{r include=FALSE} +``` + +

Guidance on how to structure data files specifically for the EES API

+ +--- + +## Introduction + +The API offers analysts, both internal to the DfE and external consumers and communicators of education statistics, +a way to programmatically access data on EES. However, in order to ensure a fit for purpose service, not all EES +data will be accessible via the API, and any that is will need to pass a higher bar for quality. In effect API data +**must** meet all the criteria laid out in our [Open data standards guidance](../statistics-production/ud.qmd). + +Whilst the EES data screener tests for a significant base level of data quality and consistency, there are some +additional criteria that are either too awkward to test for rigorously using the screener or are tested for but +returned as warnings. Data intended for the EES API must pass all the base level screener tests, plus a number +that only return warnings, plus manual inspection by the platform gatekeepers. These are primarily: + +- Strict tidy data structures - i.e. appropriate use of filters and indicators. +- Standardised filter col_names and items consistent with the harmonised standards. +- Standardised indicator col_names meeting the naming standards. +- Character limits for col_names and filter items. + +Examples of these that do and don't meet the API data standards are provided in the following sections. + +## Tidy data structure + +The key thing on tidy data structure is to avoid filter items being included within indicator col_names. Where you have collections of related terms appearing in indicator names (e.g. male, female, total), then these should be translated into a filter column, with the data being pivoted. + +### Examples of bad practice + +The following would not be accepted for publication via the API. + +#### Example 1 - Attainment rates and scores + +::: {.table-responsive} + +| count_pupils_grade9to5 | count_pupils_grade9to4 | count_pupils_grade9to1 | percent_pupils_grade9to5 | percent_pupils_grade9to4 | percent_pupils_grade9to1 | progress8_score_male | progress8_score_female | progress8_score |attainment8_score_male | attainment8_score_female | attainment8_score | +|------------------------|------------------------|------------------------|--------------------------|--------------------------|-------------------------|----------------------|------------------------|-----------------|----------------------|------------------------|-----------------| +| 30 | 40 | 50 | 60 | 80 | 100 | 0.2 | 0.21 | 0.21 |0.09 | 0.08 | 0.10 | + +::: + +#### Example 2 - Pupil counts, percents and characteristics + +::: {.table-responsive} + +| count_schools | count_pupils_male | count_pupils_female | count_pupils_total | percent_pupils_male | percent_pupils_female | percent_pupils_total | +|---------------|-------------------|---------------------|--------------------|---------------------|-----------------------|----------------------| +| 2 | 120 | 130 | 250 | 48 | 52 | 100 | + +::: + +### Examples of good practice + +The following would be accepted for publication via the API. Note that in some cases, as in Example 1, splitting the data into separate data files can help with producing tidy data structures. + +#### Example 1a - Attainment rates + +::: {.table-responsive} + +|attainment_metric | count_pupils | percent_pupils | +|--------------------|--------------|----------------| +| Grades 9-5 | 30 | 60 | +| Grades 9-4 | 40 | 80 | +| Grades 9-1 | 50 | 100 | + +::: + +#### Example 1b - Attainment scores + +| sex | attainment_metric | score_average | +|--------|-------------------|----------------| +| Female | Progress 8 | 0.21 | +| Male | Progress 8 | 0.20 | +| Total | Progress 8 | 0.21 | +| Female | Attainment 8 | 0.08 | +| Male | Attainment 8 | 0.09 | +| Total | Attainment 8 | 0.08 | + +#### Example 2 - Pupil counts, percents and characteristics + +::: {.table-responsive} + +| sex | count_schools | count_pupils | percent_pupils | +|--------------------|---------------|--------------|----------------| +| Male | 2 | 30 | 60 | +| Female | 2 | 40 | 80 | +| Total | 2 | 50 | 100 | + +::: + +### + +All data uploaded to EES should be in a tidy data structure form, but this is more strictly regulated for data intended for use with the API. More information on building tidy data structures can be found in the [tidy data structure section](../satistics-production/ud.html#tidy-data-structure). + From 9fc0db0da38bc8a3f22736e246446becbf5465ca Mon Sep 17 00:00:00 2001 From: Rich Bielby Date: Thu, 12 Sep 2024 09:55:20 +0100 Subject: [PATCH 02/18] Rearranged good and bad examples of tidy data structures --- _quarto.yml | 1 + statistics-production/api-data-standards.qmd | 48 ++++++++++++-------- 2 files changed, 31 insertions(+), 18 deletions(-) diff --git a/_quarto.yml b/_quarto.yml index 4aa26fe..023d601 100644 --- a/_quarto.yml +++ b/_quarto.yml @@ -89,6 +89,7 @@ website: - statistics-production/pub.qmd - RAP/rap-statistics.qmd - statistics-production/ud.qmd + - statistics-production/api-data-standards.qmd - statistics-production/ees.qmd - statistics-production/examples.qmd - statistics-production/embedded-charts.qmd diff --git a/statistics-production/api-data-standards.qmd b/statistics-production/api-data-standards.qmd index 42dfd67..2b6eb9d 100644 --- a/statistics-production/api-data-standards.qmd +++ b/statistics-production/api-data-standards.qmd @@ -32,11 +32,11 @@ Examples of these that do and don't meet the API data standards are provided in The key thing on tidy data structure is to avoid filter items being included within indicator col_names. Where you have collections of related terms appearing in indicator names (e.g. male, female, total), then these should be translated into a filter column, with the data being pivoted. -### Examples of bad practice +### Example 1 - Attainment rates and scores -The following would not be accepted for publication via the API. +#### Example of bad practice -#### Example 1 - Attainment rates and scores +The following would not be accepted for publication via the API. ::: {.table-responsive} @@ -44,23 +44,13 @@ The following would not be accepted for publication via the API. |------------------------|------------------------|------------------------|--------------------------|--------------------------|-------------------------|----------------------|------------------------|-----------------|----------------------|------------------------|-----------------| | 30 | 40 | 50 | 60 | 80 | 100 | 0.2 | 0.21 | 0.21 |0.09 | 0.08 | 0.10 | -::: - -#### Example 2 - Pupil counts, percents and characteristics - -::: {.table-responsive} - -| count_schools | count_pupils_male | count_pupils_female | count_pupils_total | percent_pupils_male | percent_pupils_female | percent_pupils_total | -|---------------|-------------------|---------------------|--------------------|---------------------|-----------------------|----------------------| -| 2 | 120 | 130 | 250 | 48 | 52 | 100 | +: Attainment grade rates and scores in non-tidy format ::: -### Examples of good practice - -The following would be accepted for publication via the API. Note that in some cases, as in Example 1, splitting the data into separate data files can help with producing tidy data structures. +#### Example of good practice -#### Example 1a - Attainment rates +The following would be accepted for publication via the API. In this case, splitting the data into separate data files is required in order to create tidy data structures. ::: {.table-responsive} @@ -70,9 +60,11 @@ The following would be accepted for publication via the API. Note that in some c | Grades 9-4 | 40 | 80 | | Grades 9-1 | 50 | 100 | +: Attainment grade rates in tidy format + ::: -#### Example 1b - Attainment scores +::: {.table-responsive} | sex | attainment_metric | score_average | |--------|-------------------|----------------| @@ -83,7 +75,25 @@ The following would be accepted for publication via the API. Note that in some c | Male | Attainment 8 | 0.09 | | Total | Attainment 8 | 0.08 | -#### Example 2 - Pupil counts, percents and characteristics +: Attainment scores in tidy format + +::: + +### Example 2 - Pupil counts, percents and characteristics + +#### Example of bad practice + +::: {.table-responsive} + +| count_schools | count_pupils_male | count_pupils_female | count_pupils_total | percent_pupils_male | percent_pupils_female | percent_pupils_total | +|---------------|-------------------|---------------------|--------------------|---------------------|-----------------------|----------------------| +| 2 | 120 | 130 | 250 | 48 | 52 | 100 | + +: Pupil counts and percentages in non-tidy format + +::: + +#### Example of good practice ::: {.table-responsive} @@ -93,6 +103,8 @@ The following would be accepted for publication via the API. Note that in some c | Female | 2 | 40 | 80 | | Total | 2 | 50 | 100 | +: Pupil counts and percentages in tidy format + ::: ### From 2c7682c8f00061d75b6c0349ef7991d8a9f29615 Mon Sep 17 00:00:00 2001 From: Rich Bielby Date: Thu, 12 Sep 2024 10:08:27 +0100 Subject: [PATCH 03/18] Added api section on filter standardisation --- statistics-production/api-data-standards.qmd | 21 ++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/statistics-production/api-data-standards.qmd b/statistics-production/api-data-standards.qmd index 2b6eb9d..2d53eea 100644 --- a/statistics-production/api-data-standards.qmd +++ b/statistics-production/api-data-standards.qmd @@ -111,3 +111,24 @@ The following would be accepted for publication via the API. In this case, split All data uploaded to EES should be in a tidy data structure form, but this is more strictly regulated for data intended for use with the API. More information on building tidy data structures can be found in the [tidy data structure section](../satistics-production/ud.html#tidy-data-structure). +## Standardised filter col_names and items + +The explore education and statistics platforms team alongside the data harmonisation champions group and publication teams are developing a series of [standardised filter-sets](../statistics-production/ud.html#common-harmonised-variables) that teams are required to adhere to when creating data for the API. These are being built iteratively as more data is put forward for the API, so if the current standards don't cater to your data set, you can contribute to building the harmonised standards for others to follow. + +The standards can be used to create individual filter columns or combined filters (i.e. breakdown_topic / breakdown_topic). + +Areas for which harmonised standards are currently available are: + +- [establishment / school / provider characteristics](../statistics-production/ud.html#establishment-characteristics) +- [ethnicity](../statistics-production/ud.html#ethnicity) +- [sex and gender](../statistics-production/ud.html#sex-and-gender) +- [special educational needs](../statistics-production/ud.html#special-educational-needs) + +Areas which are currently under development are: + +- attainment metrics +- disadvantaged status +- free school meal status + +We encourage contributions to and feedback on all of the above and any other filter topic. + From e33d96c88c77f878ac0d7afd3aee3d1d87dd7faf Mon Sep 17 00:00:00 2001 From: Rich Bielby Date: Thu, 12 Sep 2024 12:03:12 +0100 Subject: [PATCH 04/18] Added filter standardisation to API guidance --- statistics-production/api-data-standards.qmd | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/statistics-production/api-data-standards.qmd b/statistics-production/api-data-standards.qmd index 2d53eea..d0029e1 100644 --- a/statistics-production/api-data-standards.qmd +++ b/statistics-production/api-data-standards.qmd @@ -30,7 +30,9 @@ Examples of these that do and don't meet the API data standards are provided in ## Tidy data structure -The key thing on tidy data structure is to avoid filter items being included within indicator col_names. Where you have collections of related terms appearing in indicator names (e.g. male, female, total), then these should be translated into a filter column, with the data being pivoted. +The key thing on tidy data structure is to avoid filter items being included within indicator col_names. Where +you have collections of related terms appearing in indicator names (e.g. male, female, total), then these +should be translated into a filter column, with the data being pivoted. ### Example 1 - Attainment rates and scores @@ -132,3 +134,19 @@ Areas which are currently under development are: We encourage contributions to and feedback on all of the above and any other filter topic. +### Examples of common non-standard col_names + +::: {.table-responsive} + +| Non-standard | Potential standard equivalents | +|-----------------------------|----------------------------------| +| ethnicity | ethnicity_major, ethnicity_minor | +| sex_pupils | sex | +| school_type | establishment_type, establishment_type_group or education_phase | +| pupil_sen_status | sen_status | +| characteristic_primary_need | sen_primary_need | + +: Example non-standard col_names and their potential equivalents in the standardised framework. + +::: + From a61c142242781aa8f1b3d1e2aab4fb4915d25f07 Mon Sep 17 00:00:00 2001 From: Rich Bielby Date: Thu, 12 Sep 2024 12:06:38 +0100 Subject: [PATCH 05/18] Added additional col_name examples to filter standardisation in the API guidance --- statistics-production/api-data-standards.qmd | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/statistics-production/api-data-standards.qmd b/statistics-production/api-data-standards.qmd index d0029e1..c97d426 100644 --- a/statistics-production/api-data-standards.qmd +++ b/statistics-production/api-data-standards.qmd @@ -138,13 +138,15 @@ We encourage contributions to and feedback on all of the above and any other fil ::: {.table-responsive} -| Non-standard | Potential standard equivalents | -|-----------------------------|----------------------------------| -| ethnicity | ethnicity_major, ethnicity_minor | -| sex_pupils | sex | +| Non-standard | Potential standard equivalents | +|-----------------------------|------------------------------------| +| ethnicity | ethnicity_major, ethnicity_minor | +| characteristic_sex | sex | | school_type | establishment_type, establishment_type_group or education_phase | -| pupil_sen_status | sen_status | -| characteristic_primary_need | sen_primary_need | +| pupil_sen_status | sen_status | +| characteristic_primary_need | sen_primary_need | +| characteristic_topic | breakdown_topic, breakdown_topic_establishment | +| characteristic | breakdown, breakdown_establishment | : Example non-standard col_names and their potential equivalents in the standardised framework. From f21fa089c99e972f0c1e0bad366cedf88987868b Mon Sep 17 00:00:00 2001 From: Rich Bielby Date: Thu, 12 Sep 2024 12:25:47 +0100 Subject: [PATCH 06/18] Added indicator naming examples and character limits on different elements --- statistics-production/api-data-standards.qmd | 39 +++++++++++++++++++- statistics-production/ud.qmd | 2 +- 2 files changed, 39 insertions(+), 2 deletions(-) diff --git a/statistics-production/api-data-standards.qmd b/statistics-production/api-data-standards.qmd index c97d426..5e2e5f2 100644 --- a/statistics-production/api-data-standards.qmd +++ b/statistics-production/api-data-standards.qmd @@ -134,7 +134,7 @@ Areas which are currently under development are: We encourage contributions to and feedback on all of the above and any other filter topic. -### Examples of common non-standard col_names +### Examples of common non-standard filter col_names ::: {.table-responsive} @@ -152,3 +152,40 @@ We encourage contributions to and feedback on all of the above and any other fil ::: +## Standardised indicator col_names + +Indicators should be named in line with the [indicator naming conventions set out in the open data standards](../statistics-production/ud.html#indicator-names). + +### Examples of common non-standard indicator col_names + +::: {.table-responsive} + +| Non-standard | Potential standard equivalents | +|----------------------------------------|------------------------------------------| +| number_of_pupils | pupil_count | +| NumberOfLearners, NumLearners | pupil_count, learner_count | +| total_male, total_female | pupil_count (plus sex filter) | +| pt_SEN_support | pupil_percent (plus SEN status filter) | +| num_provider, num_providers | establishment_count | +| no_schools, num_schools, total_schools | establishment_count | +| num_inst, total_institutions, number_institutions, inst_count | establishment_count | + +: Example non-standard indicator col_names and their potential equivalents in the standardised framework. + +::: + +## Character limits for col_names and filter items + +Character limits for fields in data uploaded to the API are: + +::: {.table-responsive} + +| Element | Character limit | +|---------------------------------|-----------------| +|Filter / indicator column names | 50 characters | +|Filter / indicator column labels | 80 characters | +|Filter items / location names | 120 characters | + +: Character limits on column names, column labels and filter items. + +::: diff --git a/statistics-production/ud.qmd b/statistics-production/ud.qmd index 3a58fb7..0c6409d 100644 --- a/statistics-production/ud.qmd +++ b/statistics-production/ud.qmd @@ -1260,7 +1260,7 @@ knitr::include_graphics("../images/change_date_format.PNG") The indicators are the variables showing the measurements/statistics themselves, such as the number of pupils. These can be of different formats (e.g. text, numeric), although are numeric by default. The number of indicators will vary across publications and data files.
-

**Every variable in your dataset should have its own column, and each column should be a single data type**. E.g. do not create an indicator column called "pupils" that has both the number and percentage of pupils in it. Instead, create two separate columns for each measure.

+

**Every variable in your data set should have its own column, and each column should be a single data type**. E.g. do not create an indicator column called "pupils" that has both the number and percentage of pupils in it. Instead, create two separate columns for each measure.

As an example, the number and percentage of pupil enrolments are the indicators in this dataset: From d1fe0369acd4647c5813b038e10fc6f69ab205cf Mon Sep 17 00:00:00 2001 From: Rich Bielby Date: Fri, 13 Sep 2024 10:36:14 +0100 Subject: [PATCH 07/18] Few updates based on first PR comments --- index.qmd | 3 ++ statistics-production/api-data-standards.qmd | 39 ++++++++++---------- 2 files changed, 23 insertions(+), 19 deletions(-) diff --git a/index.qmd b/index.qmd index bfd1d24..4b35d87 100644 --- a/index.qmd +++ b/index.qmd @@ -49,6 +49,9 @@ We hope it can prove a useful community driven resource for everyone from the mo [Open data standards](statistics-production/ud.html) - Guidance on how to structure data files +[Statistics API data standards](statistics-production/api-data-standards.html) +- Guidance on the standards to meet for API data sets + [Explore education statistics (EES)](statistics-production/ees.html) - Tips on using the explore education statistics service diff --git a/statistics-production/api-data-standards.qmd b/statistics-production/api-data-standards.qmd index 5e2e5f2..e60e7b5 100644 --- a/statistics-production/api-data-standards.qmd +++ b/statistics-production/api-data-standards.qmd @@ -1,5 +1,5 @@ --- -title: "API Data Standards" +title: "Statistics API Data Standards" --- ```{r include=FALSE} @@ -28,6 +28,22 @@ that only return warnings, plus manual inspection by the platform gatekeepers. T Examples of these that do and don't meet the API data standards are provided in the following sections. +## Character limits for col_names and filter items + +Character limits for fields in data uploaded to the API are: + +::: {.table-responsive} + +| Element | Character limit | +|---------------------------------|-----------------| +|Filter / indicator column names | 50 characters | +|Filter / indicator column labels | 80 characters | +|Filter items / location names | 120 characters | + +: Character limits on column names, column labels and filter items. + +::: + ## Tidy data structure The key thing on tidy data structure is to avoid filter items being included within indicator col_names. Where @@ -109,13 +125,13 @@ The following would be accepted for publication via the API. In this case, split ::: -### +### Summary -All data uploaded to EES should be in a tidy data structure form, but this is more strictly regulated for data intended for use with the API. More information on building tidy data structures can be found in the [tidy data structure section](../satistics-production/ud.html#tidy-data-structure). +All data uploaded to EES should be in a tidy data structure form, but this is more strictly regulated for data intended for use with the API. More information on building tidy data structures can be found in the [tidy data structure section](../statistics-production/ud.html#tidy-data-structure). ## Standardised filter col_names and items -The explore education and statistics platforms team alongside the data harmonisation champions group and publication teams are developing a series of [standardised filter-sets](../statistics-production/ud.html#common-harmonised-variables) that teams are required to adhere to when creating data for the API. These are being built iteratively as more data is put forward for the API, so if the current standards don't cater to your data set, you can contribute to building the harmonised standards for others to follow. +The explore education and statistics platforms team alongside the data harmonisation champions group and publication teams are developing a series of [standardised filter specifications](../statistics-production/ud.html#common-harmonised-variables) that teams are required to adhere to when creating data for the API. These are being built iteratively as more data is put forward for the API, so if the current standards don't cater to your data set, you can contribute to building the harmonised standards for others to follow. The standards can be used to create individual filter columns or combined filters (i.e. breakdown_topic / breakdown_topic). @@ -174,18 +190,3 @@ Indicators should be named in line with the [indicator naming conventions set ou ::: -## Character limits for col_names and filter items - -Character limits for fields in data uploaded to the API are: - -::: {.table-responsive} - -| Element | Character limit | -|---------------------------------|-----------------| -|Filter / indicator column names | 50 characters | -|Filter / indicator column labels | 80 characters | -|Filter items / location names | 120 characters | - -: Character limits on column names, column labels and filter items. - -::: From 3a6fad6c67f29575b2104d00976f4f8f267942d8 Mon Sep 17 00:00:00 2001 From: Rich Bielby Date: Fri, 13 Sep 2024 14:35:32 +0100 Subject: [PATCH 08/18] Adjusting col_names in examples --- statistics-production/api-data-standards.qmd | 24 ++++++++++---------- 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/statistics-production/api-data-standards.qmd b/statistics-production/api-data-standards.qmd index e60e7b5..86dec8f 100644 --- a/statistics-production/api-data-standards.qmd +++ b/statistics-production/api-data-standards.qmd @@ -58,7 +58,7 @@ The following would not be accepted for publication via the API. ::: {.table-responsive} -| count_pupils_grade9to5 | count_pupils_grade9to4 | count_pupils_grade9to1 | percent_pupils_grade9to5 | percent_pupils_grade9to4 | percent_pupils_grade9to1 | progress8_score_male | progress8_score_female | progress8_score |attainment8_score_male | attainment8_score_female | attainment8_score | +| pupil_count_grade9to5 | pupil_count_grade9to4 | pupil_count_grade9to1 | pupil_percent_grade9to5 | pupil_percent_grade9to4 | pupil_percent_grade9to1 | progress8_score_male | progress8_score_female | progress8_score |attainment8_score_male | attainment8_score_female | attainment8_score | |------------------------|------------------------|------------------------|--------------------------|--------------------------|-------------------------|----------------------|------------------------|-----------------|----------------------|------------------------|-----------------| | 30 | 40 | 50 | 60 | 80 | 100 | 0.2 | 0.21 | 0.21 |0.09 | 0.08 | 0.10 | @@ -72,7 +72,7 @@ The following would be accepted for publication via the API. In this case, split ::: {.table-responsive} -|attainment_metric | count_pupils | percent_pupils | +| grade_range | pupil_count | pupil_percent | |--------------------|--------------|----------------| | Grades 9-5 | 30 | 60 | | Grades 9-4 | 40 | 80 | @@ -84,14 +84,14 @@ The following would be accepted for publication via the API. In this case, split ::: {.table-responsive} -| sex | attainment_metric | score_average | -|--------|-------------------|----------------| -| Female | Progress 8 | 0.21 | -| Male | Progress 8 | 0.20 | -| Total | Progress 8 | 0.21 | -| Female | Attainment 8 | 0.08 | -| Male | Attainment 8 | 0.09 | -| Total | Attainment 8 | 0.08 | +| sex | accountability_measure | score_average | +|--------|------------------------|----------------| +| Female | Progress 8 | 0.21 | +| Male | Progress 8 | 0.20 | +| Total | Progress 8 | 0.21 | +| Female | Attainment 8 | 0.08 | +| Male | Attainment 8 | 0.09 | +| Total | Attainment 8 | 0.08 | : Attainment scores in tidy format @@ -103,7 +103,7 @@ The following would be accepted for publication via the API. In this case, split ::: {.table-responsive} -| count_schools | count_pupils_male | count_pupils_female | count_pupils_total | percent_pupils_male | percent_pupils_female | percent_pupils_total | +| school_count | pupil_count_male | pupil_count_female | pupil_count_total | pupil_percent_male | pupil_percent_female | pupil_percent_total | |---------------|-------------------|---------------------|--------------------|---------------------|-----------------------|----------------------| | 2 | 120 | 130 | 250 | 48 | 52 | 100 | @@ -115,7 +115,7 @@ The following would be accepted for publication via the API. In this case, split ::: {.table-responsive} -| sex | count_schools | count_pupils | percent_pupils | +| sex | school_count | pupil_count | pupil_percent | |--------------------|---------------|--------------|----------------| | Male | 2 | 30 | 60 | | Female | 2 | 40 | 80 | From 99387493232d61a5240e93fd7bf084a5da4d7f19 Mon Sep 17 00:00:00 2001 From: Rich Bielby Date: Fri, 13 Sep 2024 14:48:16 +0100 Subject: [PATCH 09/18] Added example on hierarchical filtering --- statistics-production/api-data-standards.qmd | 52 +++++++++++++++++++- 1 file changed, 50 insertions(+), 2 deletions(-) diff --git a/statistics-production/api-data-standards.qmd b/statistics-production/api-data-standards.qmd index 86dec8f..769a7e9 100644 --- a/statistics-production/api-data-standards.qmd +++ b/statistics-production/api-data-standards.qmd @@ -50,7 +50,55 @@ The key thing on tidy data structure is to avoid filter items being included wit you have collections of related terms appearing in indicator names (e.g. male, female, total), then these should be translated into a filter column, with the data being pivoted. -### Example 1 - Attainment rates and scores +### Example 1 - Metrics with hierarchical filters + +#### Example of bad practice + +The following would not be accepted for publication via the API. + +::: {.table-responsive} + +| attendance |overall_absence | authorised_absence | unauthorised_absence | attendance_percent |overall_absence_percent | authorised_absence_percent | unauthorised_absence_percent | +|------------|----------------|--------------------|----------------------|------------|----------------|-------------------|----------------------| +| 180 | 20 | 12 | 8 | 90 | 10 | 6 | 4 | + +: Attendance statistics in non-tidy format + +::: + +#### Example of good practice + +The following would be accepted for publication via the API. In this case, creating a hierarchical filter combination allows a clear representation of the data. + +::: {.table-responsive} + +| attendance_status | attendance_type | session_count | session_percent | +|--------------------|----------------------|----------------|-----------------| +| Attendance | Total | 180 | 90 | +| Absence | Total | 20 | 10 | +| Absence | Authorised absence | 12 | 6 | +| Absence | Unauthorised absence | 8 | 4 | + +: Attendance statistics in tidy format with hierarchical filters + +::: + +::: {.table-responsive} + +| sex | accountability_measure | score_average | +|--------|------------------------|----------------| +| Female | Progress 8 | 0.21 | +| Male | Progress 8 | 0.20 | +| Total | Progress 8 | 0.21 | +| Female | Attainment 8 | 0.08 | +| Male | Attainment 8 | 0.09 | +| Total | Attainment 8 | 0.08 | + +: Attainment scores in tidy format + +::: + +### Example 2 - Metrics with non-compatible filters #### Example of bad practice @@ -97,7 +145,7 @@ The following would be accepted for publication via the API. In this case, split ::: -### Example 2 - Pupil counts, percents and characteristics +### Example 3 - Pupil counts, percents and characteristics #### Example of bad practice From 0f0b042bc4850684a5f0c25f43d686343f30bf53 Mon Sep 17 00:00:00 2001 From: Rich Bielby Date: Fri, 13 Sep 2024 14:50:13 +0100 Subject: [PATCH 10/18] Adjusting language around standardised filter sets... --- statistics-production/api-data-standards.qmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/statistics-production/api-data-standards.qmd b/statistics-production/api-data-standards.qmd index 769a7e9..1d44bda 100644 --- a/statistics-production/api-data-standards.qmd +++ b/statistics-production/api-data-standards.qmd @@ -179,7 +179,7 @@ All data uploaded to EES should be in a tidy data structure form, but this is mo ## Standardised filter col_names and items -The explore education and statistics platforms team alongside the data harmonisation champions group and publication teams are developing a series of [standardised filter specifications](../statistics-production/ud.html#common-harmonised-variables) that teams are required to adhere to when creating data for the API. These are being built iteratively as more data is put forward for the API, so if the current standards don't cater to your data set, you can contribute to building the harmonised standards for others to follow. +The explore education and statistics platforms team alongside the data harmonisation champions group and publication teams are developing a series of [standardised filters](../statistics-production/ud.html#common-harmonised-variables) that teams are required to use when creating data for the API. These are being built iteratively as more data is put forward for the API, so if the current standards don't cater to your data set, you can contribute to building the harmonised standards for others to follow. The standards can be used to create individual filter columns or combined filters (i.e. breakdown_topic / breakdown_topic). From 9a3cf40783246f8e2b951446f593400fc542a064 Mon Sep 17 00:00:00 2001 From: Rich Bielby Date: Fri, 13 Sep 2024 15:07:15 +0100 Subject: [PATCH 11/18] Shifting tidy data examples around a little more and renaming examples --- statistics-production/api-data-standards.qmd | 60 ++++++++++---------- 1 file changed, 30 insertions(+), 30 deletions(-) diff --git a/statistics-production/api-data-standards.qmd b/statistics-production/api-data-standards.qmd index 1d44bda..76b2741 100644 --- a/statistics-production/api-data-standards.qmd +++ b/statistics-production/api-data-standards.qmd @@ -50,7 +50,35 @@ The key thing on tidy data structure is to avoid filter items being included wit you have collections of related terms appearing in indicator names (e.g. male, female, total), then these should be translated into a filter column, with the data being pivoted. -### Example 1 - Metrics with hierarchical filters +### Example 1 - Three metrics with a single filter + +#### Example of bad practice + +::: {.table-responsive} + +| school_count | pupil_count_male | pupil_count_female | pupil_count_total | pupil_percent_male | pupil_percent_female | pupil_percent_total | +|---------------|-------------------|---------------------|--------------------|---------------------|-----------------------|----------------------| +| 2 | 120 | 130 | 250 | 48 | 52 | 100 | + +: Pupil counts and percentages in non-tidy format + +::: + +#### Example of good practice + +::: {.table-responsive} + +| sex | school_count | pupil_count | pupil_percent | +|--------------------|---------------|--------------|----------------| +| Male | 2 | 30 | 60 | +| Female | 2 | 40 | 80 | +| Total | 2 | 50 | 100 | + +: Pupil counts and percentages in tidy format + +::: + +### Example 2 - Metrics with hierarchical filters #### Example of bad practice @@ -98,7 +126,7 @@ The following would be accepted for publication via the API. In this case, creat ::: -### Example 2 - Metrics with non-compatible filters +### Example 3 - Metrics with non-compatible filters #### Example of bad practice @@ -145,34 +173,6 @@ The following would be accepted for publication via the API. In this case, split ::: -### Example 3 - Pupil counts, percents and characteristics - -#### Example of bad practice - -::: {.table-responsive} - -| school_count | pupil_count_male | pupil_count_female | pupil_count_total | pupil_percent_male | pupil_percent_female | pupil_percent_total | -|---------------|-------------------|---------------------|--------------------|---------------------|-----------------------|----------------------| -| 2 | 120 | 130 | 250 | 48 | 52 | 100 | - -: Pupil counts and percentages in non-tidy format - -::: - -#### Example of good practice - -::: {.table-responsive} - -| sex | school_count | pupil_count | pupil_percent | -|--------------------|---------------|--------------|----------------| -| Male | 2 | 30 | 60 | -| Female | 2 | 40 | 80 | -| Total | 2 | 50 | 100 | - -: Pupil counts and percentages in tidy format - -::: - ### Summary All data uploaded to EES should be in a tidy data structure form, but this is more strictly regulated for data intended for use with the API. More information on building tidy data structures can be found in the [tidy data structure section](../statistics-production/ud.html#tidy-data-structure). From 188161c7ddbcddf05d6d4a13636c91dab7798fb2 Mon Sep 17 00:00:00 2001 From: Rich Bielby Date: Fri, 13 Sep 2024 15:09:59 +0100 Subject: [PATCH 12/18] Removing rogue extra table --- statistics-production/api-data-standards.qmd | 15 --------------- 1 file changed, 15 deletions(-) diff --git a/statistics-production/api-data-standards.qmd b/statistics-production/api-data-standards.qmd index 76b2741..3868a9f 100644 --- a/statistics-production/api-data-standards.qmd +++ b/statistics-production/api-data-standards.qmd @@ -111,21 +111,6 @@ The following would be accepted for publication via the API. In this case, creat ::: -::: {.table-responsive} - -| sex | accountability_measure | score_average | -|--------|------------------------|----------------| -| Female | Progress 8 | 0.21 | -| Male | Progress 8 | 0.20 | -| Total | Progress 8 | 0.21 | -| Female | Attainment 8 | 0.08 | -| Male | Attainment 8 | 0.09 | -| Total | Attainment 8 | 0.08 | - -: Attainment scores in tidy format - -::: - ### Example 3 - Metrics with non-compatible filters #### Example of bad practice From 40ca12c2d111cd8178893c9b05801029e49b3e6e Mon Sep 17 00:00:00 2001 From: Rich Bielby Date: Fri, 13 Sep 2024 16:32:02 +0100 Subject: [PATCH 13/18] Adding example of pivoted data leading to unecessary levels of not applicable and duplication of fields --- statistics-production/api-data-standards.qmd | 45 ++++++++++++++++++-- 1 file changed, 41 insertions(+), 4 deletions(-) diff --git a/statistics-production/api-data-standards.qmd b/statistics-production/api-data-standards.qmd index 3868a9f..308710e 100644 --- a/statistics-production/api-data-standards.qmd +++ b/statistics-production/api-data-standards.qmd @@ -50,6 +50,11 @@ The key thing on tidy data structure is to avoid filter items being included wit you have collections of related terms appearing in indicator names (e.g. male, female, total), then these should be translated into a filter column, with the data being pivoted. +All data uploaded to EES should be in a tidy data structure form, but this is more strictly regulated for data intended for use with the API. More information on building tidy data structures can be found in the [tidy data structure section](../statistics-production/ud.html#tidy-data-structure). + +The following give examples of how different examples of data structures could be adapted. + + ### Example 1 - Three metrics with a single filter #### Example of bad practice @@ -127,6 +132,42 @@ The following would not be accepted for publication via the API. ::: +In this case, the different metrics contain different types of values that are split by very different filters. Specifically pupil counts and pupil percents are split into grade thresholds, whereas the score based metrics are not. If we were to try and pivot this data as one file, it would lead to an unreasonably large number of cells with no valid entries (i.e. large numbers of z's). For example, pivoting might create something like the following table, which suffers from both a large number of not applicable columns and duplication of data unecessarily. + +::: {.table-responsive} + +| sex | grade_range |accountability_measure | pupil_count | pupil_percent | score_average | +|--------|--------------------|-----------------------|--------------|----------------|---------------| +| Total | Grades 9-5 | z | 30 | 60 | z | +| Total | Grades 9-4 | z | 40 | 80 | z | +| Total | Grades 9-1 | z | 50 | 100 | z | +| Total | z | Attainment 8 | 50 | 100 | 0.21 | +| Female | z | Attainment 8 | 50 | 100 | 0.21 | +| Male | z | Attainment 8 | 50 | 100 | 0.20 | +| Total | z | Progress 8 | 50 | 100 | 0.08 | +| Female | z | Progress 8 | 50 | 100 | 0.08 | +| Male | z | Progress 8 | 50 | 100 | 0.09 | + +: Attainment grade rates in tidy format + +::: + +::: {.table-responsive} + +| sex | accountability_measure | score_average | +|--------|------------------------|----------------| +| Female | Progress 8 | 0.21 | +| Male | Progress 8 | 0.20 | +| Total | Progress 8 | 0.21 | +| Female | Attainment 8 | 0.08 | +| Male | Attainment 8 | 0.09 | +| Total | Attainment 8 | 0.08 | + +: Attainment scores in tidy format + +::: + + #### Example of good practice The following would be accepted for publication via the API. In this case, splitting the data into separate data files is required in order to create tidy data structures. @@ -158,10 +199,6 @@ The following would be accepted for publication via the API. In this case, split ::: -### Summary - -All data uploaded to EES should be in a tidy data structure form, but this is more strictly regulated for data intended for use with the API. More information on building tidy data structures can be found in the [tidy data structure section](../statistics-production/ud.html#tidy-data-structure). - ## Standardised filter col_names and items The explore education and statistics platforms team alongside the data harmonisation champions group and publication teams are developing a series of [standardised filters](../statistics-production/ud.html#common-harmonised-variables) that teams are required to use when creating data for the API. These are being built iteratively as more data is put forward for the API, so if the current standards don't cater to your data set, you can contribute to building the harmonised standards for others to follow. From e723ac4c469fa725862df198b77ee27c462dea35 Mon Sep 17 00:00:00 2001 From: Rich Bielby Date: Fri, 13 Sep 2024 16:37:53 +0100 Subject: [PATCH 14/18] Updated title on badly pivoted data --- statistics-production/api-data-standards.qmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/statistics-production/api-data-standards.qmd b/statistics-production/api-data-standards.qmd index 308710e..2fad2df 100644 --- a/statistics-production/api-data-standards.qmd +++ b/statistics-production/api-data-standards.qmd @@ -148,7 +148,7 @@ In this case, the different metrics contain different types of values that are s | Female | z | Progress 8 | 50 | 100 | 0.08 | | Male | z | Progress 8 | 50 | 100 | 0.09 | -: Attainment grade rates in tidy format +: Example of pivoted data showing excessive duplicated and not applicable fields. ::: From 2f3c066ba87a27dca95e7d47a985f0d91e10ca6c Mon Sep 17 00:00:00 2001 From: Rich Bielby Date: Fri, 13 Sep 2024 17:19:43 +0100 Subject: [PATCH 15/18] Removing extra table and adding location code limit --- statistics-production/api-data-standards.qmd | 30 ++++++-------------- 1 file changed, 8 insertions(+), 22 deletions(-) diff --git a/statistics-production/api-data-standards.qmd b/statistics-production/api-data-standards.qmd index 2fad2df..72f22a1 100644 --- a/statistics-production/api-data-standards.qmd +++ b/statistics-production/api-data-standards.qmd @@ -34,11 +34,12 @@ Character limits for fields in data uploaded to the API are: ::: {.table-responsive} -| Element | Character limit | -|---------------------------------|-----------------| -|Filter / indicator column names | 50 characters | -|Filter / indicator column labels | 80 characters | -|Filter items / location names | 120 characters | +| Element | Character limit | +|----------------------------------|-----------------| +| Location codes | 30 characters | +| Filter / indicator column names | 50 characters | +| Filter / indicator column labels | 80 characters | +| Filter items / location names | 120 characters | : Character limits on column names, column labels and filter items. @@ -132,7 +133,8 @@ The following would not be accepted for publication via the API. ::: -In this case, the different metrics contain different types of values that are split by very different filters. Specifically pupil counts and pupil percents are split into grade thresholds, whereas the score based metrics are not. If we were to try and pivot this data as one file, it would lead to an unreasonably large number of cells with no valid entries (i.e. large numbers of z's). For example, pivoting might create something like the following table, which suffers from both a large number of not applicable columns and duplication of data unecessarily. + +In this case, the different metrics contain different types of values that are split by very different filters. Specifically pupil counts and pupil percents are split into grade thresholds, whereas the score based metrics are not. If we were to try and pivot this data as one file, it would lead to an unreasonably large number of cells with no valid entries (i.e. large numbers of z's). For example, pivoting might create something like the following table, which suffers from both a large number of not applicable columns and duplication of data unnecessarily. ::: {.table-responsive} @@ -152,22 +154,6 @@ In this case, the different metrics contain different types of values that are s ::: -::: {.table-responsive} - -| sex | accountability_measure | score_average | -|--------|------------------------|----------------| -| Female | Progress 8 | 0.21 | -| Male | Progress 8 | 0.20 | -| Total | Progress 8 | 0.21 | -| Female | Attainment 8 | 0.08 | -| Male | Attainment 8 | 0.09 | -| Total | Attainment 8 | 0.08 | - -: Attainment scores in tidy format - -::: - - #### Example of good practice The following would be accepted for publication via the API. In this case, splitting the data into separate data files is required in order to create tidy data structures. From a7d3fb6da24cc18e755158dc20553cc2c6e93e5d Mon Sep 17 00:00:00 2001 From: Rich Bielby Date: Fri, 13 Sep 2024 17:21:56 +0100 Subject: [PATCH 16/18] Moving some text around due to paragraph line spacing not looking great below tables --- statistics-production/api-data-standards.qmd | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/statistics-production/api-data-standards.qmd b/statistics-production/api-data-standards.qmd index 72f22a1..1cc6a0b 100644 --- a/statistics-production/api-data-standards.qmd +++ b/statistics-production/api-data-standards.qmd @@ -121,6 +121,14 @@ The following would be accepted for publication via the API. In this case, creat #### Example of bad practice +In this case, the different metrics contain different types of values that are split +by very different filters. Specifically pupil counts and pupil percents are split into +grade thresholds, whereas the score based metrics are not. If we were to try and pivot +this data as one file, it would lead to an unreasonably large number of cells with no +valid entries (i.e. large numbers of z's). For example, pivoting might create something +like the following table, which suffers from both a large number of not applicable +columns and duplication of data unnecessarily. + The following would not be accepted for publication via the API. ::: {.table-responsive} @@ -133,9 +141,6 @@ The following would not be accepted for publication via the API. ::: - -In this case, the different metrics contain different types of values that are split by very different filters. Specifically pupil counts and pupil percents are split into grade thresholds, whereas the score based metrics are not. If we were to try and pivot this data as one file, it would lead to an unreasonably large number of cells with no valid entries (i.e. large numbers of z's). For example, pivoting might create something like the following table, which suffers from both a large number of not applicable columns and duplication of data unnecessarily. - ::: {.table-responsive} | sex | grade_range |accountability_measure | pupil_count | pupil_percent | score_average | From 40d38d4bd46d41b4df97e9f54d8d659e1253e594 Mon Sep 17 00:00:00 2001 From: Rich Bielby Date: Fri, 13 Sep 2024 17:24:26 +0100 Subject: [PATCH 17/18] Minor rewording --- statistics-production/api-data-standards.qmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/statistics-production/api-data-standards.qmd b/statistics-production/api-data-standards.qmd index 1cc6a0b..2b58e66 100644 --- a/statistics-production/api-data-standards.qmd +++ b/statistics-production/api-data-standards.qmd @@ -121,7 +121,7 @@ The following would be accepted for publication via the API. In this case, creat #### Example of bad practice -In this case, the different metrics contain different types of values that are split +In the example below, the different metrics contain different types of values that are split by very different filters. Specifically pupil counts and pupil percents are split into grade thresholds, whereas the score based metrics are not. If we were to try and pivot this data as one file, it would lead to an unreasonably large number of cells with no From 69b29acbde7093c421df713242575bcb4e678b36 Mon Sep 17 00:00:00 2001 From: Rich Bielby Date: Fri, 13 Sep 2024 17:27:58 +0100 Subject: [PATCH 18/18] More tweaking of text structure around examples --- statistics-production/api-data-standards.qmd | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/statistics-production/api-data-standards.qmd b/statistics-production/api-data-standards.qmd index 2b58e66..3fd8ad7 100644 --- a/statistics-production/api-data-standards.qmd +++ b/statistics-production/api-data-standards.qmd @@ -123,11 +123,7 @@ The following would be accepted for publication via the API. In this case, creat In the example below, the different metrics contain different types of values that are split by very different filters. Specifically pupil counts and pupil percents are split into -grade thresholds, whereas the score based metrics are not. If we were to try and pivot -this data as one file, it would lead to an unreasonably large number of cells with no -valid entries (i.e. large numbers of z's). For example, pivoting might create something -like the following table, which suffers from both a large number of not applicable -columns and duplication of data unnecessarily. +grade thresholds, whereas the score based metrics are not. The following would not be accepted for publication via the API. @@ -141,6 +137,14 @@ The following would not be accepted for publication via the API. ::: +#### Example of pivoting leading to excessive duplications and not applicable characters + +If we were to try and pivot +the above data as one file, it would lead to an unreasonably large number of cells with no +valid entries (i.e. large numbers of z's). For example, pivoting might create something +like the following table, which suffers from both a large number of not applicable +columns and duplication of data unnecessarily. + ::: {.table-responsive} | sex | grade_range |accountability_measure | pupil_count | pupil_percent | score_average |