-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: execute_uncommitted
for merge insert
#3233
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3233 +/- ##
==========================================
- Coverage 78.63% 78.44% -0.20%
==========================================
Files 250 250
Lines 89836 90137 +301
Branches 89836 90137 +301
==========================================
+ Hits 70646 70704 +58
- Misses 16281 16514 +233
- Partials 2909 2919 +10
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
fd4508d
to
1173939
Compare
expose in python refactor: make transaction marshalling easier cleanup fix tests fix path backward compatibility fix repr get changes back
1173939
to
f5146af
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, it took a while but I think I finally understand what this execute_uncommitted
stuff is about :). Thanks for working on this. I think it will be useful for everything to have an execute_uncommitted
variant, especially if we start working on balanced storage more and we would need to keep track of multiple sets of fragments. Much easier to just keep track of a collection of transactions.
python/python/lance/dataset.py
Outdated
Operation that updates rows in the dataset. | ||
Attributes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Operation that updates rows in the dataset. | |
Attributes | |
Operation that updates rows in the dataset. | |
Attributes |
Not sure if this is required or not but it feels nicer to have a line break between summary and header
@dataclass | ||
class Update(BaseOperation): | ||
""" | ||
Operation that updates rows in the dataset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make a comment that this operation should not insert new rows? Or would that not be a bad thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it's used by merge-insert for upsert. So it's allowed to insert new rows.
let py = new_data.py(); | ||
let new_data = convert_reader(new_data)?; | ||
|
||
let job = self | ||
.builder | ||
.try_build() | ||
.map_err(|err| PyValueError::new_err(err.to_string()))?; | ||
|
||
let (transaction, stats) = RT | ||
.spawn(Some(py), job.execute_uncommitted(new_data))? | ||
.map_err(|err| PyIOError::new_err(err.to_string()))?; | ||
|
||
let stats = Self::build_stats(&stats, py)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit: this could be encapsulated in a helper method to cut down on repetition as it is shared by the committed variant. This might also help avoid issues in the future where we change one but not the other and don't notice.
Allows separating write and commit step of merge-insert.