-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support REPLACE INTO for INSERT statements #12516
Support REPLACE INTO for INSERT statements #12516
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this contirbution @fmeringdal -- this code looks good to me. I am worried about the lack of tests, however
Basically if there isn't test coverage we could potentially break this feature without realizing it in some future refactor.
maybe we could add something in
https://github.com/apache/datafusion/tree/main/datafusion/core/tests/user_defined
That shows that a mock up user defined TableProvider
will have insert_into
called with the correct value of replace_into
when the relevant SQL is planned 🤔
It is unfortunate we don't seem to already have such a test
datafusion/catalog/src/table.rs
Outdated
@@ -273,6 +273,7 @@ pub trait TableProvider: Sync + Send { | |||
_state: &dyn Session, | |||
_input: Arc<dyn ExecutionPlan>, | |||
_overwrite: bool, | |||
_replace_into: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please document this new parameter and explain what it means (bonus points for documenting what _overwrite
means too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That shows that a mock up user defined TableProvider will have insert_into called with the correct value of replace_into when the relevant SQL is planned
Test added in datafusion/core/tests/user_defined/insert_operation.rs
.
Can you please document this new parameter and explain what it means (bonus points for documenting what _overwrite means too)
Documentation added to the variants of the InsertOp
enum.
Thanks for the quick review @alamb 🙏 I will get some tests added. I wanted to run an idea by you before I implement the change to restructure the pub enum WriteOp {
InsertOverwrite,
InsertReplace,
InsertInto,
Delete,
Update,
Ctas,
} To: pub enum WriteOp {
Insert(InsertOp),
Delete,
Update,
Ctas,
}
pub enum InsertOp {
Append, // Represents a regular INSERT INTO operation
Overwrite, // Represents an INSERT OVERWRITE operation
Replace, // Represents an INSERT OR REPLACE operation
} The purpose of such a change would be so that |
I agree it would be a good change 👌 Thank you! |
Marking as draft as I think this PR is no longer waiting on feedback. Please mark it as ready for review when it is ready for another look |
bb5f9f7
to
2291342
Compare
b000964
to
6c95327
Compare
Ready for another round of review! Changes applied since the first iteration:
|
This commit introduces an `InsertOp` enum to replace the boolean `overwrite` flag to provide a more clear and flexible control over how data is inserted. This change updates the following APIs and configs to reflect the change: `TableProvider::insert_into`, `FileSinkConfig` and `DataFrameWriteOptions`.
6c95327
to
10297cc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @fmeringdal for this PR and for making it easy to review the PR (nicely documented, nicely written code, and nicely tested) 🏆
I had this PR checked out anyways so I took the liberty of merging up from main to resolve the conflicts, as well as added a license notice to the file and fixed a clippy issue.
use datafusion_physical_plan::{DisplayAs, ExecutionMode, ExecutionPlan, PlanProperties}; | ||
|
||
#[tokio::test] | ||
async fn insert_operation_is_passed_correctly_to_table_provider() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
…ce-into-for-insert
Merged up to get CI fix / hopefully pass |
🚀 |
Which issue does this PR close?
Closes #12515 .
Are these changes tested?
Yes, in
datafusion/core/tests/user_defined/insert_operation.rs
.Are there any user-facing changes?
Yes,
overwrite
boolean flags has been replaced byInsertOp
enum in the following APIs and configs:TableProvider::insert_into
,FileSinkConfig
,DataFrameWriteOptions
.