Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix broken sync json parsing and harmonize file reading #373

Merged
merged 4 commits into from
Oct 7, 2024

Conversation

scovich
Copy link
Collaborator

@scovich scovich commented Oct 4, 2024

It turned out the sync reader was not being exercised by basic read tests. Enabling it exposed a broken json parsing algo that had already been fixed in the default reader.

Factor out the json parsing to a shared function that both engines can use.

While we're at it, factor out sync reader logic that both parquet and json readers can use.

Update the basic read unit tests to use both readers.

Fixes #372

Relevant upstream feature request: apache/arrow-rs#6522

Copy link

codecov bot commented Oct 4, 2024

Codecov Report

Attention: Patch coverage is 86.00000% with 21 lines in your changes missing coverage. Please review.

Project coverage is 76.83%. Comparing base (092ee67) to head (f095b71).
Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
kernel/src/engine/arrow_utils.rs 86.58% 1 Missing and 10 partials ⚠️
kernel/src/engine/sync/mod.rs 84.61% 0 Missing and 6 partials ⚠️
kernel/src/engine/sync/json.rs 83.33% 0 Missing and 2 partials ⚠️
kernel/src/engine/sync/parquet.rs 81.81% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #373      +/-   ##
==========================================
+ Coverage   76.55%   76.83%   +0.27%     
==========================================
  Files          45       47       +2     
  Lines        9375     9418      +43     
  Branches     9375     9418      +43     
==========================================
+ Hits         7177     7236      +59     
+ Misses       1799     1789      -10     
+ Partials      399      393       -6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

.iter()
.map(|col| table_schema.field(col).cloned().unwrap())
.collect();
Arc::new(Schema::new(selected_fields))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just an indentation change. Review with whitespace changes hidden if you're seeing this comment:
image

Copy link
Collaborator

@hntd187 hntd187 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGMT

@zachschuermann zachschuermann self-requested a review October 7, 2024 22:02
Copy link
Collaborator

@nicklan nicklan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, just had one small suggestion. thanks for the fixes!

kernel/src/engine/arrow_utils.rs Outdated Show resolved Hide resolved
@scovich scovich merged commit 7f535a2 into delta-io:main Oct 7, 2024
12 checks passed
Copy link
Collaborator

@zachschuermann zachschuermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh nick beat me lgtm haha

@scovich scovich deleted the broken-sync-json branch November 8, 2024 21:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

read.rs tests fail when using sync engine
4 participants