-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] dask-sql creates all tables as lower case. #481
Comments
Though it wasn't my decision, I would imagine this is to be in line with Postgres (the SQL engine which we primarily test against)? Spinning up a DB here, I see that we aren't able to create two distinct tables named Perhaps the better change here is to reconsider the default value of True we've set for import dask.config
with dask.config.set({"sql.identifier.case_sensitive": False}):
c.sql("SELECT * FROM UPPER_CASE") But maybe we should consider making that the default behavior (cc @ayushdg in case you have thoughts around this) |
Thanks making dask case insensitive solves the issue from a query stand point. FWIW, I think the calcite default dialect is |
I generally agree with this sentiment if this is the expected behavior for Postgres, though I think more consideration needs to be made on if / how we should shift our case handling to this style, as this is a large change from our current handling (which essentially ignores quotes when parsing table names and uses Also I will note that we are currently exploring an overhaul of our current SQL parsing machinery from Calcite to DataFusion (check out https://github.com/dask-contrib/dask-sql/tree/datafusion-sql-planner and some of the recently opened |
Thanks for this insight @charlesbluca. I did investigate using Thanks for sharing the work on DaskFusion, this does look interesting, however I will be sticking to using Calcite parser in my stack so hopefully this will remain consistent in turns of the support for parser syntax if you switch over to this solution. Not having a Java dependency and using the |
To followup does this mean that in a newer environment the initial example posted still has issues even when
Are you able to elaborate a bit further on this? Is the expectation here for |
The dask
create_table
method adds to the list of tables with lower case which means that any queries that are executed must use the lower case name.What happened:
All tables are registered as lower case, so I am unable to include any queries with upper case table names - which requires me to convert all queries to
lower()
.What you expected to happen:
I would expect that the case would be preserved when adding to schemas, or at least like in the case of #177 there would be an original and lower case verison.
Minimal Complete Verifiable Example:
See the following code snippet that creates a dask dataframe, and then creates two tables, one with
lower_case
name, and another withUPPER_CASE
.The second call to print the tables lists both as lower case:
And the second select fails to find the table with upper case
Anything else we need to know?:
Perhaps there is a reason why everything was made
lower()
- but I can't seem to find it in the docs.Environment:
2022.4.1
3.9
mac-osx arm64
conda
The text was updated successfully, but these errors were encountered: