Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: execute_uncommitted for merge insert #3233

Merged
merged 2 commits into from
Jan 13, 2025

Conversation

wjones127
Copy link
Contributor

Allows separating write and commit step of merge-insert.

@github-actions github-actions bot added enhancement New feature or request python labels Dec 11, 2024
@codecov-commenter
Copy link

codecov-commenter commented Dec 12, 2024

Codecov Report

Attention: Patch coverage is 75.75758% with 8 lines in your changes missing coverage. Please review.

Project coverage is 78.44%. Comparing base (397edeb) to head (425c8a6).
Report is 12 commits behind head on main.

Files with missing lines Patch % Lines
rust/lance/src/dataset/write/merge_insert.rs 75.75% 7 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3233      +/-   ##
==========================================
- Coverage   78.63%   78.44%   -0.20%     
==========================================
  Files         250      250              
  Lines       89836    90137     +301     
  Branches    89836    90137     +301     
==========================================
+ Hits        70646    70704      +58     
- Misses      16281    16514     +233     
- Partials     2909     2919      +10     
Flag Coverage Δ
unittests 78.44% <75.75%> (-0.20%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@wjones127 wjones127 force-pushed the merge-insert-uncommitted branch 2 times, most recently from fd4508d to 1173939 Compare December 20, 2024 20:13
@wjones127 wjones127 marked this pull request as ready for review December 20, 2024 20:13
expose in python

refactor: make transaction marshalling easier

cleanup

fix tests

fix path backward compatibility

fix repr

get changes back
@wjones127 wjones127 force-pushed the merge-insert-uncommitted branch from 1173939 to f5146af Compare January 8, 2025 23:28
Copy link
Contributor

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, it took a while but I think I finally understand what this execute_uncommitted stuff is about :). Thanks for working on this. I think it will be useful for everything to have an execute_uncommitted variant, especially if we start working on balanced storage more and we would need to keep track of multiple sets of fragments. Much easier to just keep track of a collection of transactions.

Comment on lines 2774 to 2775
Operation that updates rows in the dataset.
Attributes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Operation that updates rows in the dataset.
Attributes
Operation that updates rows in the dataset.
Attributes

Not sure if this is required or not but it feels nicer to have a line break between summary and header

@dataclass
class Update(BaseOperation):
"""
Operation that updates rows in the dataset.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make a comment that this operation should not insert new rows? Or would that not be a bad thing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it's used by merge-insert for upsert. So it's allowed to insert new rows.

Comment on lines +218 to +230
let py = new_data.py();
let new_data = convert_reader(new_data)?;

let job = self
.builder
.try_build()
.map_err(|err| PyValueError::new_err(err.to_string()))?;

let (transaction, stats) = RT
.spawn(Some(py), job.execute_uncommitted(new_data))?
.map_err(|err| PyIOError::new_err(err.to_string()))?;

let stats = Self::build_stats(&stats, py)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit: this could be encapsulated in a helper method to cut down on repetition as it is shared by the committed variant. This might also help avoid issues in the future where we change one but not the other and don't notice.

@wjones127 wjones127 merged commit 6e76529 into lancedb:main Jan 13, 2025
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants