Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translate data sqids #34

Merged
merged 17 commits into from
Oct 21, 2024
Merged

Translate data sqids #34

merged 17 commits into from
Oct 21, 2024

Conversation

rmbielby
Copy link
Contributor

@rmbielby rmbielby commented Oct 3, 2024

Brief overview of changes

Adding in some translation of SQIDs into understandable labels and codes based on a data set's meta data.

Why are these changes being made?

SQIDs don't make any sense to end users, they're going to want them translating and we can save them the effort.

Detailed description of changes

I've added some parsing functions to handle the different types of objects. These are called in by parse_api_dataset(). Basic structure is along the lines:

  • User runs query_dataset(), which then runs post_dataset() or get_datatset()
    • As they retrieve pages of data, post_dataset() and get_datatset() translate the IDs on the fly using parse_api_dataset()
      • parse_api_dataset() then calls the following sub-functions:
        • parse_sqids_locations()
        • parse_sqids_filters()
        • parse_sqids_indicators()

Additional information for reviewers

Related to discussion earlier, this isn't great for performance for large (paged) data sets as it runs this for every page. It would ultimately be better to run at the end of the process, but there's some not-insignificant rejigging in order to do that. Happy to look at that, but might be something for a separate issue after the initial bit of internal testing has started.

Issue ticket number/s and link

Resolves #29

@rmbielby rmbielby added the new feature New feature or request label Oct 3, 2024
@rmbielby rmbielby added this to the Ready for user testing milestone Oct 3, 2024
@rmbielby rmbielby self-assigned this Oct 3, 2024
R/parse_data_squids.R Fixed Show fixed Hide fixed
R/parse_data_squids.R Fixed Show fixed Hide fixed
R/parse_data_squids.R Fixed Show fixed Hide fixed
R/parse_data_squids.R Fixed Show fixed Hide fixed
R/parse_squids.R Fixed Show fixed Hide fixed
R/parse_squids.R Fixed Show fixed Hide fixed
R/parse_squids.R Fixed Show fixed Hide fixed
R/parse_squids.R Fixed Show fixed Hide fixed
R/parse_squids.R Fixed Show fixed Hide fixed
…icators and filters potentially having the same sqid)
R/parse_sqids.R Fixed Show fixed Hide fixed
R/parse_sqids.R Fixed Show fixed Hide fixed
R/parse_sqids.R Fixed Show fixed Hide fixed
R/parse_sqids.R Fixed Show fixed Hide fixed
R/parse_sqids.R Fixed Show fixed Hide fixed
…g it over with a cup of tea. Think this works cleaner.
R/examples.R Fixed Show fixed Hide fixed
R/parse_api_dataset.R Fixed Show fixed Hide fixed
dplyr::filter(!!rlang::sym("col_id") == column_sqid) |>
dplyr::select("item_label", "item_id") |>
dplyr::rename(
!!rlang::sym(col_name) := "item_label",

Check warning

Code scanning / lintr

no visible global function definition for ':=' Warning

no visible global function definition for ':='
dplyr::filter(!!rlang::sym("col_id") == column_sqid) |>
dplyr::select("item_label", "item_id") |>
dplyr::rename(
!!rlang::sym(col_name) := "item_label",

Check warning

Code scanning / lintr

no visible global function definition for ':=' Warning

no visible global function definition for ':='
@rmbielby rmbielby changed the title Translate data squids Translate data sqids Oct 3, 2024
R/parse_sqids.R Fixed Show fixed Hide fixed
tests/testthat/test-query_dataset.R Fixed Show fixed Hide fixed
R/parse_api_dataset.R Fixed Show fixed Hide fixed
R/parse_sqids.R Fixed Show fixed Hide fixed
@rmbielby rmbielby marked this pull request as ready for review October 18, 2024 15:02
R/parse_api_dataset.R Fixed Show fixed Hide fixed
Copy link
Contributor

@cjrace cjrace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments inline in the code, and then some bigger ones, which you would probably have been expecting, happy for these to go into separate issues to pick up at a later point if they're not quick to add in this PR:

  1. We should add parsing for time, including a lookup data set for codes to identifiers
  2. Locations parsing needs more refinement, would suggest a lookup data set for geographic level to code, including the prefix for each level's locations columns and also special handling for new_la_code / old_la_code and other less standard locations (RSC regions come to mind)
  3. Should we give an option for whether or not to parse the output from query_dataset()? It should be the default to parse it, though might be nice to have the option to see the raw version too, will also be a good way to demonstrate how the IDs work by showing both versions as examples to users
  4. Performance feels like it's going very backwards here, will definitely want a rewrite at a later point, though agree it is best to just get something out for testing and then optimise once we're more certain this will definitely be the approach we're running with

query_dataset(example_id(), indicators = c("E40qF", "QuVwb", "9Aw4v"))

Microbenchmarking before:
min lq mean median uq max neval
72.3144 75.9789 82.20526 77.926 82.88525 251.242 100

After
min lq mean median uq max neval
184.6963 193.3727 203.332 199.0265 209.6245 343.5782 100

  1. Very minor but mildly painful inconsistency in query_dataset() docs I spotted by accident...
    image

NAMESPACE Show resolved Hide resolved
R/examples.R Outdated Show resolved Hide resolved
R/examples.R Show resolved Hide resolved
R/parse_sqids.R Outdated Show resolved Hide resolved
R/post_dataset.R Show resolved Hide resolved
_pkgdown.yml Show resolved Hide resolved
R/parse_sqids.R Outdated Show resolved Hide resolved
Copy link

codecov bot commented Oct 21, 2024

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

@rmbielby rmbielby merged commit 6effbec into main Oct 21, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create parse element ids to labels functionality
2 participants