Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating base get_meta() function to retrieve data set meta data #19

Merged
merged 26 commits into from
Sep 10, 2024

Conversation

rmbielby
Copy link
Contributor

@rmbielby rmbielby commented Sep 5, 2024

Brief overview of changes

Analysts will need a function to retrieve the basic meta data for a given data set. This adds that in the form of get_meta(). To support this, I've also created a first step in creating an error handling script that helps translate html connection codes.

Why are these changes being made?

We need a function to connect to the meta data held on data sets on the EES API. The meta data holds the column and indicator info on a given data set, including the filter item and indicator codes required to query the dataset via the API.

Detailed description of changes

I've add the following functions:

  • get_meta_response()
  • http_request_error()

get_meta_response() will take a dataset_id, dataset_version and api_version and deliver the meta data associated with that dataset. This can be returned as the basic query result provided by the API (parse = FALSE) or an initial R friendly structured list contianing the results (parse = TRUE).

http_request_error() will translate any http return codes (e.g. 200, 404, 504 etc) and translate these into a broad-brush error message. This could be expanded in the future to be more fine grained and informative, but I've kept it fairly top level for now (i.e. it only picks up whether it's 2XX, 4XX or 5XX).

And following comments, I've created an extra bunch of functions to do the additional parsing I'd been saving for later PRs:

  • get_meta()
  • parse_meta_filter_columns()
  • parse_meta_filter_item_ids()
  • parse_meta_indicator_columns()
  • parse_meta_location_ids()

parse_meta_filter_columns(), parse_meta_filter_item_ids(), parse_meta_indicator_columns() and parse_meta_location_ids() tidy up the individual outputs in the structured list returned by get_meta_response() into individual data frames. Finally, get_meta() is the function I'm intending most end users to actually use and it just a wrapper that runs get_meta_response() and then applies the 4 parse functions to it to create a single structured list of data frames.

Issue ticket number/s and link

#1

And now #9, #10, #11, #12 and #13 as well.

@rmbielby rmbielby requested a review from cjrace September 5, 2024 12:07
@rmbielby rmbielby self-assigned this Sep 5, 2024
R/get_meta.R Fixed Show fixed Hide fixed
R/get_meta.R Fixed Show fixed Hide fixed
Copy link
Contributor

@cjrace cjrace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we change the format of the parsed ones a little so they're a bit less nested? Happy with the general list format, though might be nice to do some more clean up so there's only ever one layer of nesting and all data frames in the list are easier to use (might make our own lives ees-ier for other functions too)

  1. $locations which currently gives level.code and level.label as well as locations, we already have $geographicLevels, so could drop the level.code / level.label, just have a single data frame with code, name and id cols for locations? Will make it print nicer in the console and be easier to reuse as a lookup?

image

  1. Could we split $filters into a filter_columns table and a filter_options table? Again flattening this out a bit will make it easier to reuse as a lookup and print in a more friendly way. For filter options, I'd imagine one flat table with all options, and a column for what filters they apply to, and I guess if you do that, you could leave it called filters, and not need a separate columns table if it's one big dataframe with cols like filter_column_id filter_column_label filter_option_id filter_option_label

image

DESCRIPTION Show resolved Hide resolved
R/get_meta.R Show resolved Hide resolved
R/get_meta.R Outdated Show resolved Hide resolved
tests/testthat/test-get_meta.R Outdated Show resolved Hide resolved
tests/testthat/test-helper_functions.R Outdated Show resolved Hide resolved
R/get_meta.R Outdated Show resolved Hide resolved
R/get_meta.R Fixed Show resolved Hide resolved
R/get_meta.R Fixed Show fixed Hide fixed
R/get_meta.R Fixed Show fixed Hide fixed
R/get_meta.R Fixed Show fixed Hide fixed
R/get_meta.R Fixed Show fixed Hide fixed
R/get_meta.R Fixed Show fixed Hide fixed
R/get_meta.R Fixed Show fixed Hide fixed
R/get_meta.R Fixed Show fixed Hide fixed
R/get_meta.R Fixed Show fixed Hide fixed
R/get_meta.R Fixed Show fixed Hide fixed
R/get_meta.R Fixed Show fixed Hide fixed
R/get_meta.R Fixed Show fixed Hide fixed
R/get_meta.R Fixed Show fixed Hide fixed
R/get_meta.R Fixed Show fixed Hide fixed
R/get_meta.R Fixed Show fixed Hide fixed
R/get_meta.R Fixed Show resolved Hide resolved
R/get_meta.R Fixed Show fixed Hide fixed
R/get_meta.R Fixed Show fixed Hide fixed
R/get_meta.R Fixed Show fixed Hide fixed
R/get_meta.R Fixed Show fixed Hide fixed
Copy link
Contributor

@cjrace cjrace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice solution / code generally - a neat way to structure the functions so that the main user query doesn't repeat 5 calls just to get the metadata out. Few comments in the code specifically, plus:

  1. Should we separate out the functions we expect users to use in the _pkgdown_yml file for the reference list? Currently it's one flat list, and I expect most users will only need a couple of the functions to start with.

  2. Should we have before / after test data saved in the test folder to check the parse_meta... functions against?

DESCRIPTION Show resolved Hide resolved
R/get_meta.R Fixed Show resolved Hide resolved
R/get_meta.R Fixed Show resolved Hide resolved
R/get_meta.R Outdated Show resolved Hide resolved
tests/testthat/test-helper_functions.R Outdated Show resolved Hide resolved
R/helper_functions.R Outdated Show resolved Hide resolved
R/get_meta.R Fixed Show fixed Hide fixed
R/get_meta.R Fixed Show fixed Hide fixed
R/get_meta.R Fixed Show fixed Hide fixed
@rmbielby
Copy link
Contributor Author

  1. Should we separate out the functions we expect users to use in the _pkgdown_yml file for the reference list? Currently it's one flat list, and I expect most users will only need a couple of the functions to start with.

I've tried some sort of structuring that makes rough sense to me as something that could be extended sensibly as we add more functionality.

@rmbielby
Copy link
Contributor Author

  1. Should we have before / after test data saved in the test folder to check the parse_meta... functions against?

I guess so...

I've added meta test data into a testdata/ folder and written a test for each of the parsing functions. For the data format, I've picked RDS as:

  • I want to be able to write lists of data frames and not just individual data frames
  • The save and read functions for rds are included in base R, so don't need any extra packages

@rmbielby rmbielby merged commit c3ef4b6 into main Sep 10, 2024
9 checks passed
@rmbielby rmbielby deleted the get-meta branch September 18, 2024 09:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment