Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update NoroSTAT Data [delphi.epidata.acquisition.norostat.norostat_update] failing in Automation #382

Closed
korlaxxalrok opened this issue Jan 19, 2021 · 4 comments
Labels

Comments

@korlaxxalrok
Copy link
Contributor

korlaxxalrok commented Jan 19, 2021

Update NoroSTAT Data has been consistently failing in Automation:

This looks like a DB fix.

Starting step [id=24113|step_id=52|name=<span class="tag_research"> </span>Update NoroSTAT Data]
   Running cmd...
Traceback (most recent call last):
  File "/home/automation/.pyenv/versions/3.8.2/lib/python3.8/site-packages/mysql/connector/connection_cext.py", line 487, in cmd_query
    self._cmysql.query(query,
_mysql_connector.MySQLInterfaceError: Duplicate entry '2020-11-17-2020-11-19 17:38:23.356390-5-201931' for key 'PRIMARY'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/automation/.pyenv/versions/3.8.2/lib/python3.8/runpy.py", line 193, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/automation/.pyenv/versions/3.8.2/lib/python3.8/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/automation/driver/delphi/epidata/acquisition/norostat/norostat_update.py", line 56, in <module>
    main()
  File "/home/automation/driver/delphi/epidata/acquisition/norostat/norostat_update.py", line 53, in main
    norostat_sql.update_point()
  File "/home/automation/driver/delphi/epidata/acquisition/norostat/norostat_sql.py", line 429, in update_point
    cursor.executemany('''
  File "/home/automation/.pyenv/versions/3.8.2/lib/python3.8/site-packages/mysql/connector/cursor_cext.py", line 352, in executemany
    return self.execute(stmt)
  File "/home/automation/.pyenv/versions/3.8.2/lib/python3.8/site-packages/mysql/connector/cursor_cext.py", line 264, in execute
    result = self._cnx.cmd_query(stmt, raw=self._raw,
  File "/home/automation/.pyenv/versions/3.8.2/lib/python3.8/site-packages/mysql/connector/connection_cext.py", line 491, in cmd_query
    raise errors.get_mysql_exception(exc.errno, msg=exc.msg,
mysql.connector.errors.IntegrityError: 1062 (23000): Duplicate entry '2020-11-17-2020-11-19 17:38:23.356390-5-201931' for key 'PRIMARY'
   Fail [cmd=python3 -m delphi.epidata.acquisition.norostat.norostat_update]
@sgratzl sgratzl added chore acquisition changes acquisition logic labels Jun 17, 2021
@brookslogan
Copy link
Contributor

brookslogan commented Aug 22, 2021

Hopefully, this message indicates that the history of the raw table is still being recorded successfully, but there are just some troubles updating the history of the time series encoded. My first guess is that there is some problem with the conversion of day-month row labels + season column labels into epiweeks in either (a) just that 2020-11-19 17:38:23.356390 issue and nearby issues, or (b) all issues since they shifted from showing 2018-2019 & 2019-2020 to showing 2019-20 & 2020-21.

Guess 1a, looks partially wrong: maybe the current method of assigning epiweeks does not align adjacent seasons or years correctly, producing duplicate epiweek entries in adjacent seasons or years.

  • Why it's wrong: the duplicate epiweek value is 2020-11-17, not on a season or year boundary.
  • Why it's only partially wrong: 26-Dec-2020 maps to 202052 and 2-Jan 2021 maps to 202101, skipping week 202053 (which exists)! [This appears to be false; trying again I see these map to 202052 and 202053, which seem correct.]

Guess 1b, need the raw table as of 2020-11-19 17:38:23.356390 to fully check: the site temporarily had some problematic/duplicate row and/or column labels as it was transitioned from showing 2018-2019 & 2019-2020 to 2019-20 & 2020-21 data. The nearest archive.org snapshots here and here don't seem problematic, though.

@brookslogan
Copy link
Contributor

brookslogan commented Aug 22, 2021

This issue should be addressed with or after #691 to prevent a skipping epiweek 202053. [#691 seems like it was based on a misreading of the years involved.]

@brookslogan
Copy link
Contributor

Quickly revisiting this issue:

  • This seems like a low impact issue, as it only impacts an auth-gated norovirus nowcasting system that likely has very limited use.
  • Best next step to debug seems to check the raw table as of 2020-11-19 17:38:23.356390 to see if there's some obvious duplication or problematic epiweek calculations. Fix is likely either (a) fixing our time calculations and cleaning up any problematic database entries, or (b) if there's some obvious bug in the input data, adding processing code to clean it up in the raw table -> point estimate table to clean it up.

@melange396
Copy link
Collaborator

norostat acquisition no longer exists, see #1207

@melange396 melange396 closed this as not planned Won't fix, can't repro, duplicate, stale Aug 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants