-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow DynamicFileCatalog support to query partitioned file #12671
Comments
take |
There is a test case to assert querying partitioned table fails datafusion/datafusion/sqllogictest/test_files/dynamic_file.slt Lines 28 to 30 in 810e908
However, I found that it's not a partitioned-table-related issue. The root cause is the input files, {
"index_columns":[
{
"kind":"range",
"name":null,
"start":0,
"stop":2,
"step":1
}
],
"column_indexes":[
{
"name":null,
"field_name":null,
"pandas_type":"unicode",
"numpy_type":"object",
"metadata":{
"encoding":"UTF-8"
}
}
],
"columns":[
{
"name":"f0",
"field_name":"f0",
"pandas_type":"int64",
"numpy_type":"int64",
"metadata":null
},
{
"name":"f1",
"field_name":"f1",
"pandas_type":"unicode",
"numpy_type":"object",
"metadata":null
},
{
"name":"f2",
"field_name":"f2",
"pandas_type":"bool",
"numpy_type":"object",
"metadata":null
}
],
"creator":{
"library":"pyarrow",
"version":"15.0.0"
},
"pandas_version":"2.2.1"
} and {
"index_columns":[
{
"kind":"range",
"name":null,
"start":0,
"stop":2,
"step":1
}
],
"column_indexes":[
{
"name":null,
"field_name":null,
"pandas_type":"unicode",
"numpy_type":"object",
"metadata":{
"encoding":"UTF-8"
}
}
],
"columns":[
{
"name":"f0",
"field_name":"f0",
"pandas_type":"int64",
"numpy_type":"int64",
"metadata":null
},
{
"name":"f1",
"field_name":"f1",
"pandas_type":"unicode",
"numpy_type":"object",
"metadata":null
},
{
"name":"f2",
"field_name":"f2",
"pandas_type":"bool",
"numpy_type":"bool",
"metadata":null
}
],
"creator":{
"library":"pyarrow",
"version":"15.0.0"
},
"pandas_version":"2.2.1"
} Their
We can create an external table for them and query them well. datafusion/datafusion/sqllogictest/test_files/arrow_files.slt Lines 53 to 61 in a0a635a
I guess it may be an issue of |
Is your feature request related to a problem or challenge?
#11035 supports to query files through their URL. If the target dataset is partitioned,
DynamicFileCatalog
can't recognize the partitioned columns well.Given the file structure like:
If we tried to query it through the dynamic file catalog
The result is
The partitioned column
c_date
won't be used.Describe the solution you'd like
When inferring the
ListingTableConfig
, we can register the table partition column automatically. I think we can invokeListingOption::infer_partitions
to infer the required partition columns at the runtime.Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: