Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HAPI-133 food security #55

Merged
merged 14 commits into from
Oct 27, 2023
Merged

HAPI-133 food security #55

merged 14 commits into from
Oct 27, 2023

Conversation

turnerm
Copy link
Member

@turnerm turnerm commented Oct 23, 2023

Decided to start by adding CH data which is p-coded. Leaving as draft for now, but a couple of questions @b-j-mills:

  1. How can I make it so the scraper only scrapes the countries in our list, instead of all in the file?
  2. The Nigeria data is also in this file and p-coded, but is missing a padding zero with respect to the gazetteer. For example, the admin 2 region of Ushongo is NG0722 is the CH, and NG007022 in the gazetteer. Is there any way to transform this?

@turnerm turnerm requested a review from b-j-mills October 23, 2023 21:54
@turnerm turnerm marked this pull request as draft October 23, 2023 21:54
@github-actions
Copy link

github-actions bot commented Oct 23, 2023

Test Results

3 tests  ±0   3 ✔️ ±0   2m 25s ⏱️ +44s
1 suites ±0   0 💤 ±0 
1 files   ±0   0 ±0 

Results for commit 4c03ede. ± Comparison against base commit 6092b05.

♻️ This comment has been updated with latest results.

@b-j-mills
Copy link
Collaborator

  1. You could add a filter to the yaml to filter by country - either using prefilter, filter, or external_filter. I like to look through the hdx-scraper-covid-viz yaml for scraper setup examples because it covers so many different cases. You could try using:
    prefilter: "adm0_pcod3 in [country_list]"
    filter_cols: "adm0_pcod3"

  2. There are a couple functions that might help in hdx-python-country: convert_admin1_pcode_length or fuzzy_pcode. Otherwise we could try writing some helpful functions in utilities, maybe using get_pcode_length as a helper.

@turnerm
Copy link
Member Author

turnerm commented Oct 24, 2023

@b-j-mills thanks so much for the great advice! Some more questions below:

  1. I implemented the filter as you suggested, works perfectly. The only thing is, do you know if there's a way not to have to hard-code the countries in the filter itself, but use e.g. the HAPI_countries dictionary?
  2. Ah that's great, the length converter seems perfect. But do you know if there is a way to apply it at the config file level? Or would we have to either (1) write a custom scraper that deals with the pcodes manually, (2) make a change in the scraper. If it's (1) or (2) then I may skip Nigeria and tackle another country.
  3. One more question. Do you know if it's possible to also get admin0 and admin1 rows? i.e., admin0 data would be rows where admin1 and admin2 are not filled in. I've tried to use the covid scraper as an example, which actually uses the exact same dataset, however if I copy the national scraper here I still end up getting all admin levels.

@mcarans
Copy link
Contributor

mcarans commented Oct 24, 2023

@turnerm I think 1 could be accommodated by allowing AdminLevel in the hdx-python-country level to accept a list of countries so it doesn't get all countries from the dataset on HDX. Then only p-codes within the desired list would be accepted.

For 2, it would be possible to add imports to hdx-python-scraper for whatever functions you need so that they are accessible at the config file level.

For 3, do you mean making a single configurable scraper that first tries admin2, then if not filled in tries admin1, then if that's not filled in, gets admin0? If so, then that is not currently supported and would require making a custom scraper.

@b-j-mills
Copy link
Collaborator

  1. Does the pcode have to match our list of pcodes in order to be scraped? @mcarans do you know off the top of your head? If it doesn't, the pcode handling could be done in the populating step. I'm not sure which step it makes more sense in.

  2. Is the filtering you were using in the branch yet? I can check it out and make some changes if you send it to me.

@mcarans
Copy link
Contributor

mcarans commented Oct 25, 2023

@b-j-mills The p-code read from the dataset has to match a p-code within the AdminLevel object. (With hapi-pipelines the AdminLevel object is currently configured to read all countries from the dataset on HDX.)

Base automatically changed from HAPI-234/dataset-provider-info to main October 25, 2023 07:39
@b-j-mills
Copy link
Collaborator

@turnerm I took a look at the data and filters and I think this one may require a custom scraper. We're scraping all of the historical data as well, right? Not just the latest?
To get all of the data at admin0, 1, or even 2 we have to sum each variable by year, date, and chtype. If it's not a custom scraper we would have to explicitly write out each subset like this, which is possible but very tedious.

@turnerm turnerm marked this pull request as ready for review October 25, 2023 21:12
@turnerm
Copy link
Member Author

turnerm commented Oct 25, 2023

@b-j-mills @mcarans thanks so much for all of your feedback. Going back to the quesitons:

  1. I tried removing the filter, and even with admin_exact = True, it pulled in pcodes from countries not in the HAPI_countries list. I've left the filter in for now, Mike should I make a ticket to investigate this?
  2. I've created HAPI-263 to address the pcode cleaning issue
  3. @b-j-mills oops you are completely right, for some reason I thought I had spotted different disaggregation levels in the data and that's just not the case. But anyway Mike pointed out in case this is needed, a filter can be used to set admin2 to null.

So with all that it's ready to review!

@turnerm turnerm requested a review from mcarans October 25, 2023 21:21
country["input"] = country["input"] + default["input"]
country["output"] = country["output"] + default["output"]
country["output_hxl"] = country["output_hxl"] + default["output_hxl"]
for list_name in ["input", "list", "output", "output_hxl"]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to use tuples rather than lists because the former are immutable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point, changed!

@mcarans
Copy link
Contributor

mcarans commented Oct 26, 2023

@turnerm Regarding 1, please make a ticket. I'm not sure if the solution is related to the countries being read into AdminLevel, but anyway I've made a PR: OCHA-DAP/hdx-python-country#28

@turnerm
Copy link
Member Author

turnerm commented Oct 26, 2023

The tests are working on my machine, but not on GHA due to the hapi-schema branch requirements. Once HAPI-132 is closed and merged then the dependency can be changed and hopefully everything will work here, so that it can be merged.

@github-actions
Copy link

Pull Request Test Coverage Report for Build 6667812961

  • 120 of 123 (97.56%) changed or added relevant lines in 8 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+1.0%) to 91.888%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/hapi/pipelines/database/admins.py 11 12 91.67%
src/hapi/pipelines/database/food_security.py 56 58 96.55%
Totals Coverage Status
Change from base Build 6651498037: 1.0%
Covered Lines: 657
Relevant Lines: 715

💛 - Coveralls

@github-actions
Copy link

Pull Request Test Coverage Report for Build 6667812405

  • 120 of 123 (97.56%) changed or added relevant lines in 8 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+1.0%) to 91.888%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/hapi/pipelines/database/admins.py 11 12 91.67%
src/hapi/pipelines/database/food_security.py 56 58 96.55%
Totals Coverage Status
Change from base Build 6651498037: 1.0%
Covered Lines: 657
Relevant Lines: 715

💛 - Coveralls

@turnerm turnerm merged commit 76c0d12 into main Oct 27, 2023
@turnerm turnerm deleted the HAPI-133-add-food-security branch October 27, 2023 13:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants