Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ETL-505] Add Garmin transforms to S3 to JSON job #65

Merged
merged 1 commit into from
Jul 27, 2023
Merged

Conversation

philerooski
Copy link
Contributor

In addition to the Garmin transforms, I refactored the S3 to JSON job to be more functional. What used to just be

  • write_file_to_json_dataset

has been split up into three different functions:

  • transform_json - which takes care of any inserts/updates to the JSON data
  • transform_object_to_array_of_objects - which is a helper function for transform_json
  • get_output_filename - which takes care of filename / S3 object name formatting.

As a result of refactoring in the job, the tests have been refactored as well. Some tests were deleted and replaced with a more specific test.

@philerooski philerooski requested a review from a team as a code owner July 25, 2023 01:38
Copy link
Member

@thomasyu888 thomasyu888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice refactor - I appreciate the split out functions! I left some comments/questions.

src/glue/jobs/s3_to_json.py Outdated Show resolved Hide resolved
src/glue/jobs/s3_to_json.py Outdated Show resolved Hide resolved
src/glue/jobs/s3_to_json.py Show resolved Hide resolved
src/glue/jobs/s3_to_json.py Show resolved Hide resolved
and isinstance(json_obj["CustomFields"][field_name], str)
):
if len(json_obj["CustomFields"][field_name]) > 0:
# This JSON string was written in a couple different ways
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, this is nasty. Is this worth letting care evolution know about this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen the [{\\\"id\\\": ... format in the production data (we would log a warning if we ran into a JSONDecodeError), so whatever issues they had with the pilot data seem to have been resolved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just have to make sure we are monitoring the logs for these warnings.

src/glue/jobs/s3_to_json.py Show resolved Hide resolved
src/glue/jobs/s3_to_json.py Show resolved Hide resolved
tests/test_s3_to_json.py Show resolved Hide resolved
tests/test_s3_to_json.py Outdated Show resolved Hide resolved
tests/test_s3_to_json.py Show resolved Hide resolved
@philerooski philerooski temporarily deployed to develop July 25, 2023 20:25 — with GitHub Actions Inactive
@philerooski philerooski temporarily deployed to develop July 25, 2023 20:25 — with GitHub Actions Inactive
@philerooski philerooski temporarily deployed to develop July 25, 2023 20:25 — with GitHub Actions Inactive
@philerooski philerooski temporarily deployed to develop July 25, 2023 20:26 — with GitHub Actions Inactive
@philerooski philerooski temporarily deployed to develop July 25, 2023 20:30 — with GitHub Actions Inactive
@philerooski philerooski temporarily deployed to develop July 25, 2023 20:30 — with GitHub Actions Inactive
@philerooski philerooski temporarily deployed to develop July 25, 2023 20:36 — with GitHub Actions Inactive
@philerooski philerooski temporarily deployed to develop July 25, 2023 20:38 — with GitHub Actions Inactive
@philerooski philerooski temporarily deployed to develop July 26, 2023 21:37 — with GitHub Actions Inactive
@philerooski philerooski temporarily deployed to develop July 26, 2023 21:37 — with GitHub Actions Inactive
@philerooski philerooski temporarily deployed to develop July 26, 2023 21:37 — with GitHub Actions Inactive
@philerooski philerooski temporarily deployed to develop July 26, 2023 21:37 — with GitHub Actions Inactive
@philerooski philerooski temporarily deployed to develop July 26, 2023 21:41 — with GitHub Actions Inactive
@philerooski philerooski temporarily deployed to develop July 26, 2023 21:41 — with GitHub Actions Inactive
@philerooski philerooski temporarily deployed to develop July 26, 2023 21:55 — with GitHub Actions Inactive
@philerooski philerooski temporarily deployed to develop July 26, 2023 21:58 — with GitHub Actions Inactive
Copy link
Member

@thomasyu888 thomasyu888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 awesome!

@philerooski philerooski merged commit f831597 into main Jul 27, 2023
14 of 16 checks passed
@philerooski philerooski deleted the etl-505 branch July 27, 2023 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants