You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
insert_or_update() (which is really both insert_or_update_batch() and insert_or_update_bulk() right now; they should be resolved into one in #989) can have problems when given a set of rows that includes more than one issue. It is not that multiple issue values themselves are problematic, but the desired result may not be achieved when there are multiple rows for the same source/signal/time/geo that have different issues.
The problem stems from the fact that the SQL code in fix_is_latest_issue_sql kinda naively inherently assumes only one issue will be present for each (source, signal, geo_type, geo_value, time_type, time_value). It should be noted that it sort of presumes most rows inserted will be a latest issue, which is why the is_latest_issue column defaults to 1 in insert_into_loader_sql. It should also be noted that "overwrites" will happen when load.issue>latest.issue, but ALSO when they are ==, which lets us update/patch previously-written values that may have been incorrect (then the SQL that moves the load table data to latest and history tables has an ON DUPLICATE clause that preserves the unique key epimetric_id for the same unique composite key tuple).
In practice, the problematic behavior does not happen because the only time insert_or_update() is called (in csv_to_database.py:upload_archive()), it is handed rows returned by CsvImporter.load_csv(), which only uses a single issue value at a time. However, it is possible that we might write something else in the future that does not account for this expected invariance, so we should take steps to resolve it now.
Non-compliant calls to insert_or_update() can still happen in test code via calls to _insert_rows(). This github search provides a list of them.
Possible ways to address this:
add a constraint to the load table to enforce that no multiple-issue collisions happen
insert_or_update()
(which is really bothinsert_or_update_batch()
andinsert_or_update_bulk()
right now; they should be resolved into one in #989) can have problems when given a set of rows that includes more than oneissue
. It is not that multipleissue
values themselves are problematic, but the desired result may not be achieved when there are multiple rows for the same source/signal/time/geo that have different issues.The problem stems from the fact that the SQL code in
fix_is_latest_issue_sql
kinda naively inherently assumes only oneissue
will be present for each(source, signal, geo_type, geo_value, time_type, time_value)
. It should be noted that it sort of presumes most rows inserted will be a latest issue, which is why theis_latest_issue
column defaults to 1 ininsert_into_loader_sql
. It should also be noted that "overwrites" will happen whenload.issue>latest.issue
, but ALSO when they are==
, which lets us update/patch previously-written values that may have been incorrect (then the SQL that moves the load table data to latest and history tables has an ON DUPLICATE clause that preserves the unique keyepimetric_id
for the same unique composite key tuple).In practice, the problematic behavior does not happen because the only time
insert_or_update()
is called (incsv_to_database.py:upload_archive()
), it is handed rows returned byCsvImporter.load_csv()
, which only uses a singleissue
value at a time. However, it is possible that we might write something else in the future that does not account for this expected invariance, so we should take steps to resolve it now.Non-compliant calls to
insert_or_update()
can still happen in test code via calls to_insert_rows()
. This github search provides a list of them.Possible ways to address this:
issue
from its UNIQUE INDEX)fix_is_latest_issue_sql
calculation to work with multipleissue
sis_latest_issue=0
for non-maximalissue
rows.Doing unit/integration testing for these conditions should be ~easy, and some work on this appears to have already been done in #925
The text was updated successfully, but these errors were encountered: