-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compare result based on semantics #93
Comments
Why not format the timestamp types into strings? |
I'd suggest implementing a custom result type and a custom comparator in sqllogictest. It's impossible to cover all data types in this framework. |
I'm not sure what you means. A timestamp type can be format into different string with the same semantics.
Sounds like a good way to solve this. I can try it. |
You can format it to a fixed format (e.g., represented by hours, or represented by days, not sure how we can do this) before comparing. |
So does it also mean '*.slt' file should use the fixed format? |
Yes. All should use days / hours, probably, if we only want to work on strings instead of interpreting the actual values. |
If we can unify the '*.slt' file, this way seems to be a most simple way to solve this. I will:
|
To summarize, when a type has more than one representations for the same value, an engine can convert it to a canonial form. The limitation is that the test cases should also be written in canonical form, which will also make some engines imcompatible, e.g., The behavior can be tweaked by using different engines. e.g., if some users do want to test the results AS-IS, they can just use a non-canonical engine. |
After introducing the custom comparator, we don't need the canonical form. We can directly make postgres-extended to be a non-canonical engine. |
After introducing the custom comparator, pg-extended is still kind of canonical. Assume A1 has canonical form A.
|
Why not change all |
I try to create a CustomResult, but I find a this will caused we can't use unstable_sort() in runner. The reason is we can't compare the trait object.
|
For sqllogictest, I think the binary format isn't appropriate for us. So I try to modify the rust-postgres to request text format result in extended query protocol and use it in extended-engine locally. It can run all the e2e test in risingwave. I try to push this modification to rust-postgres but there are no reaction now. So I think whether we can maintain a downstream version of rust-postgres first. |
Hi there! For now, we went the "canonical form" way. The string representation is what is compared, and both Datafusion and Postgres have to produce the same string representations.
The drawback of this approach is that I had to rewrite the The current cross-engine cross-type logic is basically
Regarding the @ZENOTME's idea of custom comparators: the problem with them is that each engine implementation would need to know how to parse an opaque "expected result" string in order to compare it. It's a quite fragile approach because an engine cannot predict what users write in the outputs. Additionally, the errors are now quite clear, because there are The comparator logic also arguably seems more complicated to implement and maintain:
Defining "canonical" type representations seems like a big task. But, in general, the "canonical value" approach seems to be somewhat easier both to understand and implement for multiple engines, and multiple representations per type. |
Background
I'm trying to introduce INTERVAL type in Postgres-extend engine and find a problem caused by the way we test result.
In sqllogictest::AsyncDB, our run interface requires to return a string to compare with the expect result.
async fn run(&mut self, sql: &str) -> Result<String, Self::Error>
But there are some case in which the String is different but the semantics is the same.
Such as
interval '30 days'
andinterval ' 720:00:00'
, there are different string but the same semantics.For some type(such as interval), there are different string format to express a same thing in same semantics. So do we need to add a way to compare the 'semantics' rather than compare the 'string'. I try to think a way to fix it but I don't wether it's worth or necessary. Because this case only exist in 'Interval', 'timstamptz' and other time-related type. So we can also declare the format must be equal to result from psql.
Maybe we can fix it by...
make the result can be multi-format. Such as:
The text was updated successfully, but these errors were encountered: