-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ETL-505] Add Garmin transforms to S3 to JSON job #65
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice refactor - I appreciate the split out functions! I left some comments/questions.
and isinstance(json_obj["CustomFields"][field_name], str) | ||
): | ||
if len(json_obj["CustomFields"][field_name]) > 0: | ||
# This JSON string was written in a couple different ways |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, this is nasty. Is this worth letting care evolution know about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't seen the [{\\\"id\\\": ...
format in the production data (we would log a warning if we ran into a JSONDecodeError
), so whatever issues they had with the pilot data seem to have been resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We just have to make sure we are monitoring the logs for these warnings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯 awesome!
In addition to the Garmin transforms, I refactored the S3 to JSON job to be more functional. What used to just be
write_file_to_json_dataset
has been split up into three different functions:
transform_json
- which takes care of any inserts/updates to the JSON datatransform_object_to_array_of_objects
- which is a helper function fortransform_json
get_output_filename
- which takes care of filename / S3 object name formatting.As a result of refactoring in the job, the tests have been refactored as well. Some tests were deleted and replaced with a more specific test.