-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential regression in Schema / nullability calculations after upgrade to 42.0.0 #12560
Comments
I'm running into this behavior after #11989, specifically seeing schema mismatches where the only thing that is different is that a field's metadata disappears at some point (so the schemas are the same except for a field's metadata). E.g.: &physical_input_schema = Schema {
fields: [
Field {
name: "alias1",
data_type: Utf8,
nullable: true,
dict_id: 0,
dict_is_ordered: false,
metadata: {},
},
],
metadata: {},
}
&physical_input_schema_from_logical = Schema {
fields: [
Field {
name: "alias1",
data_type: Utf8,
nullable: true,
dict_id: 0,
dict_is_ordered: false,
metadata: {
"some_key": "some_value"
},
},
],
metadata: {},
} I've yet to figure out exactly where the metadata is being dropped and I haven't figured out a reproducer either. I suggested comparing only the fields' non-metadata fields here, but @jayzhan211 pointed out that that's more of a workaround than an actual fix, as it's still a problem if the metadata is disappearing. The issue that I'm running into, though, seems to be somewhat different than the issue that others (like @phillipleblanc) are running into, where some fields completely disappear from the schema (see here). I don't think these are the same issue, exactly (since they manifest differently), but they may have the same root cause/solution, so I think it's fair to keep them all under this issue unless needed otherwise. I'll work on getting a fix or reproducer today |
Just to add to the visibility: we are also observing the same behavior after updating to
We are getting this error when reading some parquet files using Datafusion. I have verified with other tools (parquet cli, DuckDB) |
Is there any small parquet file that has the same error, if we can reproduce the error, it is easier to find the root cause. |
We have been digging more regarding this error and it seems it is not related to the Datafusion upgrade, I apologize for the confusion. Therefore there's no file I can provide :( |
We have also been digging in what we saw in InfluxDB 3.0 and I filed #12687 to track it separately. Let's close this omnibus issue and we file individual issues for specific problems as we find them |
Describe the bug
@phillipleblanc and @itsjunetime have both hit upgrades related to nullability and other metadata in schemas after the DataFusion 42 upgrade.
In addition, @ion-elgreco has it something similar while updating in delta.rs (see delta-io/delta-rs#2886)
I am filing this ticket to make this more visibility
To Reproduce
Not sure (maybe someone could create a self contained reproducer of the problem)
Expected behavior
No response
Additional context
This might have been introduced here: #11989
There is a discussion happening here #11989 (comment)
The text was updated successfully, but these errors were encountered: