Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix NoroSTAT skipping epiweeks on year boundaries #691

Closed
brookslogan opened this issue Aug 22, 2021 · 1 comment
Closed

Fix NoroSTAT skipping epiweeks on year boundaries #691

brookslogan opened this issue Aug 22, 2021 · 1 comment

Comments

@brookslogan
Copy link
Contributor

brookslogan commented Aug 22, 2021

Investigation on #382 revealed another issue with NoroSTAT acquisition: calculation of epiweeks. See this snapshot for a problematic example: 26-Dec-2020 maps to 202052 and 2-Jan 2021 maps to 202101, skipping week 202053 (which exists)! [FIXME: revisiting this: these dates are 7 days apart, so they should have consecutive week numbers, so the input doesn't look problematic; using get_ew I get the correct values 202052 and 202053. And the mentioned dates don't even apply to the linked snapshot. The underlying problem here needs some additional investigation.]

Ambiguity

[Some of this discussion still applies; I'm not sure if 1-Aug is meant, or if that's an approximate date and they mean to start at some fixed epiweek of each season. I'm not sure whether or not this is testable without just asking the upstream data provider.]

It's a bit unclear how to resolve this. 1-Aug is used as the start of the season many/all snapshots and for multiple seasons within snapshots, so it's not referring to a particular weekday of an epiweek or other type of week. Maybe it's just an imprecise way to say epiweek 31 of a season? However, the last 53-epiweek season was 2014/2015, but this snapshot has it with only 52 entries, just like the adjacent years. So assuming 1-Aug is epiweek 31 would result in skipping 201530 --- this is still much more preferable to skipping a week in December, though! (Weeks 30 & 31 are on the season boundary, generally with much less activity, while Dec & Jan are much more active.) (Note: due to differences in wday's, epiweek 201453 was not skipped as 202053 is / would be.)

The way to get this precisely right would be to find something online or ask the page maintainers for some indication what epiweeks / other weeks these correspond to. In absence of such clarification, the above guess is preferable to the current approach.

Fix approach

First, season_db_to_epiweek --- its code and maybe its interface --- should be updated. Then the norostat_point* tables (not norostat*raw* ones) should be backed up and dropped or cleared. Then the norostat data updater should be run again --- it should repopulate the norostat point table history from the raw table history.

Impact

Currently, norostat is auth-gated in the API; this doesn't impact public API users, but does/would impact the norovirus nowcasting system (although this system may have very limited to no current active use). #382 may have coincidentally prevented this issue from actually occurring yet. If #382 and this are indeed different issues, this should be fixed before or with #382.

@brookslogan
Copy link
Contributor Author

I can't reproduce this now. Since the epiweek calculation code for this endpoint and in the utils repo has not changed, I suspect that this bug does not exist, and seemed to appear from misreading some of the seasons/years involved. The specific example I cited seems to involve such mistakes, at least.

@brookslogan brookslogan closed this as not planned Won't fix, can't repro, duplicate, stale Feb 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant