-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Big query script #72
Big query script #72
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumes that the same granularity is always used. If say someone uses first monthly, collects the files, and then sets daily, then the script will start from the last (monthly) updated point and will not collect the (daily) data before that. That's fine for now, but I'd suggest to open an enhancement issue for the future for this.
:param granularity: The granularity of the data collection. Can be 'day', 'week', 'month', or 'year'. | ||
:return: A dictionary with ledgers as keys and the corresponding start dates as values. | ||
""" | ||
with open(hlp.ROOT_DIR / "data_collection_scripts/last_update.json") as f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the file doesn't exist, this will throw an error. If that's intended OK, otherwise I suggest you create a separate function which checks if the files exists and, if not, initializes it somehow (eg. reading the files in the input folder).
ledger_snapshot_dates = {ledger: [hlp.get_date_string_from_date(to_date)] for ledger in ledgers} | ||
else: | ||
ledger_from_dates = get_from_dates(granularity=granularity) | ||
ledger_snapshot_dates = {ledger: hlp.get_dates_between(ledger_from_dates[ledger], to_date, granularity) for ledger in ledgers} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will also throw an error if the ledger is not in the file (if that's intended, fine).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
Fixed. Now last_update.json includes is defined per granularity. |
All Submissions:
Description
Updated BigQuery data collection script so that it only takes one date as input (
to_date
) and performs the data collection until that date, starting from the last time each ledger was updated (which is saved in a json file calledlast_update.json
and is updated automatically after each succesful data collection).