Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nssp patching code #2000

Open
wants to merge 36 commits into
base: main
Choose a base branch
from
Open

nssp patching code #2000

wants to merge 36 commits into from

Conversation

minhkhul
Copy link
Contributor

@minhkhul minhkhul commented Jul 23, 2024

Description

Add patching feature to nssp.

One detail that set this implementation apart from most of our other indicators is that nssp data comes in our db as weekly form instead of daily.

Basically, all weekly data coming in to our db has the issue_date column always marked as the first date of the epiweek that the reporting date falls into, rather than the reporting date itself. For example, I query nssp data from our api that was just put into db on July 5th 2024 and got something like this:

geo_value                  signal source geo_type time_type time_value  \
0      54079  pct_ed_visits_combined   nssp   county      week 2022-09-25
**issue**  lag  missing_value  missing_stderr  missing_sample_size  value  \
0  **2024-06-30**   93              0               1                    1   3.27

Note the issue is 2024-06-30 despite this data coming in on July 5th. It's because 2024-06-30 is the start date of epiweek 202427, so any data coming in that week will be marked under that issue date.

Furthermore, the reason why we use this directory format is because of this batch_issue_format implementation in acquisition, which demands folder format to be issue_yyyymmdd including for patches of weekly data.

Fixes

nssp/tests/test_patch.py Outdated Show resolved Hide resolved
@minhkhul minhkhul requested a review from aysim319 July 25, 2024 19:35
nssp/delphi_nssp/patch.py Outdated Show resolved Hide resolved
nssp/delphi_nssp/run.py Outdated Show resolved Hide resolved
nssp/tests/test_patch.py Outdated Show resolved Hide resolved
from .run import run_module


def good_patch_config(params, logger):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can bake this in the read_params() in delphi_utils.... @nmdefries thoughts?
pydantic doesn't a full pledged support for json schema validation, but seeing how the read_params has no validation what so ever, we could look into it sometime in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I wrote this method and half way through was like hmm this should probably be generalized since it may be applicable to other indicators too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeahh I'm like oh I also would like that for mine 👀 and also saw that the current read params isn't doing any validations at all

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, could make sense to add this or a similar fn to delphi_utils. Probably we'd want it and other params validation to be separate fns from read_params, since we don't always want params validation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned it the #2002 and I think for right now it's better to remove this and deal with this in another issue

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@minhkhul it's up to you to keep or remove the good_patch_config function. While we do intend to move it to utils at some later point, we don't know when that will be. Pros of leaving it here are that it is performing the check, and we can use it as a starting point for the future fn. Cons are that we have to remember it's here and use it, and then also replace it with the utils version once that's available.

nssp/tests/test_patch.py Outdated Show resolved Hide resolved
nssp/tests/test_patch.py Outdated Show resolved Hide resolved
nssp/delphi_nssp/run.py Show resolved Hide resolved
nssp/delphi_nssp/pull.py Show resolved Hide resolved
@minhkhul
Copy link
Contributor Author

@nmdefries
I realized how confusing I wrote the readme to be. So I just made some major changes to that, then added code to auto-download the source backup files here. With this, patcher will only need to adjust params then run env/bin/python -m delphi_nssp.patch to create patch data.

None of any other extra steps.

The code chunk in the readme was not generating issue-specific dataset all at once, but just for one issue. I use that code and daily scheduling to made a daily copy of what the source API returns and store that copy as in a csv on bigchunk-dev-02.delphi.cmu.edu. So we have a history since march for this indicator. This means on the server there are these daily csv where each csv is source data for one issue:

[user@bigchunk-dev-02 nssp]$ pwd
/common/source_backup/nssp
[user@bigchunk-dev-02 nssp]$ ls
2024-04-17.csv  2024-05-21.csv  2024-06-25.csv  2024-08-03.csv
2024-04-18.csv  2024-05-22.csv  2024-06-26.csv  2024-08-04.csv
2024-04-19.csv  2024-05-23.csv  2024-06-27.csv  2024-08-05.csv
...

I originally put the code chunk in readme to show how if there'a a patching need, one can set up their own daily/weekly backup starting from the start of an outage to the end. But it's really not needed now that I made the relevant csv files on bigchunk-dev-02.delphi.cmu.edu all readable to anyone with access to that server.

Then I added the auto-download source backup csv files.

So now we avoid having patching be a multi-step process.

nssp/delphi_nssp/patch.py Outdated Show resolved Hide resolved
@minhkhul minhkhul requested a review from aysim319 August 30, 2024 21:05
nssp/delphi_nssp/patch.py Outdated Show resolved Hide resolved
nssp/delphi_nssp/patch.py Outdated Show resolved Hide resolved
…e_dir not exist/empty + adjust tests accordingly
@minhkhul
Copy link
Contributor Author

minhkhul commented Sep 4, 2024

Note: About custom_run flag, non-patch custom run doesn't really exist in nssp like it does in google-symptoms due to the differences in how the two sources handle revision data and how we grab those data. We'll keep the flag in nssp still to maintain consistency and to disambiguate params.json.

nssp/delphi_nssp/pull.py Outdated Show resolved Hide resolved
nssp/delphi_nssp/pull.py Outdated Show resolved Hide resolved
nssp/delphi_nssp/pull.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NSSP patching script and NSSP patch april 2024 to july 2024
4 participants