Delete the rewrite_table_scan from the ExecutionPlan and use it within an AnalyzerRule #61

hozan23 · 2024-09-19T14:23:20Z

Hello,

We are thinking of moving the recently added rewrite_table_scan function into an AnalyzerRule instead of running it in the ExecutionPlan. This change makes sense since the AnalyzerRule is responsible for the semantic changes when transforming the LogicalPlans.

What do you think? @phillipleblanc

cc: @backkem

The text was updated successfully, but these errors were encountered:

phillipleblanc · 2024-09-19T14:59:29Z

Makes sense to me. One caveat we've run into is that you can't blindly rewrite the plan on the DataFusion side, since DataFusion still needs the table names it knows about to work properly. The LogicalPlan generated by rewrite_table_scan is only useful as input to the Unparser

backkem · 2024-09-19T15:21:13Z

Yea, good point.

We have an implementation where we basically have two separate schemas: the external or "model" schema represented by a ModelSchemaProvider and ModelTableProviders and a internal/storage schema represented by a StorageSchemaProvider and StorageTableProviders. The former "model" schema is used during query parsing. The ModelTableProviders have a method to fetch the corresponding StorageTableProvider. It's the rewrite analyser that rewrites the entire LogicalPlan from using ModelTableProviders to StorageTableProviders. The storage schema is used during execution.

I'd love to hear your insight on that approach. I think it's more general. For example, it would work when using plain table providers or when trying to federate using Substrait plans. On the other hand, it could introduce too many new abstraction into the mix.

phillipleblanc · 2024-09-20T01:36:28Z

In general I'm a fan of simpler systems when possible. I'm not really convinced that adding these abstractions buys us all that much. i.e. for plain table providers, its relatively straightforward to keep state on the internal name and when you call .scan() on it, you don't even have to care what the name datafusion has for you (we do this for all of our non-federated custom providers). For Substrait plan federation, I would just have its implementation call rewrite_table_scan directly - similar to how the SQL federation is doing it.

I do see how it would be nice though if that was already handled at a different layer so that the SQL/Substrait federation providers can just call the unparser directly on the plan its given.

backkem · 2024-09-20T07:06:25Z

Yea, I guess our use-case for this is somewhat different from federation. It's more related to creating a separate "model" or "semantic layer" on/over top of the storage.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delete the rewrite_table_scan from the ExecutionPlan and use it within an AnalyzerRule #61

Delete the rewrite_table_scan from the ExecutionPlan and use it within an AnalyzerRule #61

hozan23 commented Sep 19, 2024

phillipleblanc commented Sep 19, 2024

backkem commented Sep 19, 2024

phillipleblanc commented Sep 20, 2024

backkem commented Sep 20, 2024

Delete the rewrite_table_scan from the ExecutionPlan and use it within an AnalyzerRule #61

Delete the rewrite_table_scan from the ExecutionPlan and use it within an AnalyzerRule #61

Comments

hozan23 commented Sep 19, 2024

phillipleblanc commented Sep 19, 2024

backkem commented Sep 19, 2024

phillipleblanc commented Sep 20, 2024

backkem commented Sep 20, 2024