-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace python based integration test with sqllogictest #4462
Comments
I think this would be a nice project for someone to get to know DataFusion better -- I think it would involve learning about sqllogictest-rs and porting the existing code from there. |
Hi there! I am new to Datafusion and would like to learn more about it. I did some investigation regarding sqllogictest, sqllogictest-rs and datafusion. And I now see some ways to implement a solution. My main question is: what level of compatibility between Postgres and Datafusion would you like to check? Regarding implementation I imagine some ways: 1. The very basic way without changes to For example,
Pros:
Cons:
2. Another option that uses existing features is:
Pros:
Cons:
3. Use query labels to compare of labeled queries, but run multiple database engines at the same time. The queries would look like
Pros:
Cons:
4. Create an The
If
Pros:
Cons:
5. Introduce a way to run a statement/query on multiple databases at the same time. This can be either achieved by using labels
or the way that is similar to what was proposed in the description of this
Pros:
Cons:
To summarize my understanding of
In order to demonstrate how Postgres compatibility could look like using I would be glad to hear some feedback regarding my pr and thoughts about which way to proceed. |
Hi @melgenek
That is great! Welcome!
I think for this ticket we should strive for "the same level as is currently verified using the python integration tests" as much as possible.
yes this is a classic "floating point rounding error" type situation and why it is typically not a great idea to directly compare floating point values. In terms of how to implement this, would something like this work (initially):
I also left some feedback on #4834 (review) -- does that make sense? |
Thank you for the feedback.
I think this is a good suggestion. I'll update the Postgres/Datafusion clients to have a normalization step for results. In order to proceed further, I'd like to clarify the way how you'd like to define Regular Having explicit results is a totally viable approach, especially having So do we agree, that the way to go is to force all |
Yes, that would be my preferred solution -- it will ensure that DataFusion is producing consistent answers with Postgres and that the answers can also be verified by looking at the Just so it is clear, I do not expect every single .slt file now (or ever) to produce the same answers as postgres (as datafusion supports different syntax and likely also has existing discrepancies, etc). I think we should start simple with a few explicitly marked files used to do "with postgres verification" and we can expand the scope of which files are tested against slt files over time. |
* Move Datafusion engine to a package * Run postgres and datafusion tests and compare outputs * clippy and fmt * Remove unused line * Floating types rounding * Make basic types compatible * Format with BigDecimal to keep arbitrary precision. * Fill postgres queries with results * Run Postgres separately from DataFusion. Merge slt files * fmt * clippy * Postgres step in github actions * Run sqllogictest with Postgres on plain ubuntu * Update sqllogictest readme * Revert unnecessary change * Add license headers * Don't log PG_COMPAT value * clippy and fmt * remove whitespaces in readme
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
We have a great
integration
test from @jimexist 🦾 https://github.com/apache/arrow-datafusion/tree/master/integration-tests which runs a limited number of queries against data in both postgres and datafusion and compares the resultsThe major downside is that it does not get updated with new coverage very often. I believe this is due to two factors:
cargo test
Describe the solution you'd like
I would like to port all the existing coverage in
integration-test
to the sqllogictest framework aded in #4395 and remove the python based versionIn order to run the same tests against postgres, we could reuse some of the code from sqllogictest-bin to implement a postgres driver: https://github.com/risinglightdb/sqllogictest-rs/blob/main/sqllogictest-bin/src/engines/postgres.rs
I think we would likely have to add some sort of annotation to the tests like
And then not run tests annotated like that by default
Describe alternatives you've considered
Keep the existing
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: