Big query script #72

LadyChristina · 2024-06-03T17:12:17Z

All Submissions:

Have you followed the guidelines in our Contributing documentation?
Have you verified that there aren't any other open Pull Requests for the same update/change?
Does the Pull Request pass all tests?

Description

Updated BigQuery data collection script so that it only takes one date as input (to_date) and performs the data collection until that date, starting from the last time each ledger was updated (which is saved in a json file called last_update.json and is updated automatically after each succesful data collection).

dimkarakostas

This assumes that the same granularity is always used. If say someone uses first monthly, collects the files, and then sets daily, then the script will start from the last (monthly) updated point and will not collect the (daily) data before that. That's fine for now, but I'd suggest to open an enhancement issue for the future for this.

dimkarakostas · 2024-06-04T10:39:17Z

data_collection_scripts/big_query_balance_data.py

+    :param granularity: The granularity of the data collection. Can be 'day', 'week', 'month', or 'year'.
+    :return: A dictionary with ledgers as keys and the corresponding start dates as values.
+    """
+    with open(hlp.ROOT_DIR / "data_collection_scripts/last_update.json") as f:


If the file doesn't exist, this will throw an error. If that's intended OK, otherwise I suggest you create a separate function which checks if the files exists and, if not, initializes it somehow (eg. reading the files in the input folder).

dimkarakostas · 2024-06-04T10:44:08Z

data_collection_scripts/big_query_balance_data.py

+        ledger_snapshot_dates = {ledger: [hlp.get_date_string_from_date(to_date)] for ledger in ledgers}
+    else:
+        ledger_from_dates = get_from_dates(granularity=granularity)
+        ledger_snapshot_dates = {ledger: hlp.get_dates_between(ledger_from_dates[ledger], to_date, granularity) for ledger in ledgers}


This will also throw an error if the ledger is not in the file (if that's intended, fine).

LadyChristina · 2024-06-06T14:05:47Z

This assumes that the same granularity is always used. If say someone uses first monthly, collects the files, and then sets daily, then the script will start from the last (monthly) updated point and will not collect the (daily) data before that. That's fine for now, but I'd suggest to open an enhancement issue for the future for this.

Fixed. Now last_update.json includes is defined per granularity.

LadyChristina added 3 commits June 3, 2024 17:50

Update balance data collection script

0dffe38

Update last_update.json after data collection

bafc8b6

Ensure last update dates correct if errors

b5ff32e

LadyChristina requested a review from dimkarakostas as a code owner June 3, 2024 17:12

dimkarakostas reviewed Jun 4, 2024

View reviewed changes

Add granularity to last_update file

cb69e06

LadyChristina requested a review from dimkarakostas June 6, 2024 14:07

dimkarakostas approved these changes Jun 7, 2024

View reviewed changes

dimkarakostas merged commit 85041c2 into main Jun 7, 2024
1 check passed

dimkarakostas deleted the big_query_script branch June 7, 2024 08:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Big query script #72

Big query script #72

LadyChristina commented Jun 3, 2024 •

edited

Loading

dimkarakostas left a comment

dimkarakostas Jun 4, 2024

dimkarakostas Jun 4, 2024

LadyChristina Jun 6, 2024

LadyChristina commented Jun 6, 2024

Big query script #72

Big query script #72

Conversation

LadyChristina commented Jun 3, 2024 • edited Loading

Description

dimkarakostas left a comment

Choose a reason for hiding this comment

dimkarakostas Jun 4, 2024

Choose a reason for hiding this comment

dimkarakostas Jun 4, 2024

Choose a reason for hiding this comment

LadyChristina Jun 6, 2024

Choose a reason for hiding this comment

LadyChristina commented Jun 6, 2024

LadyChristina commented Jun 3, 2024 •

edited

Loading