Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: different sql dialects #38

Open
big-c-note opened this issue Jun 25, 2024 · 15 comments
Open

Feature request: different sql dialects #38

big-c-note opened this issue Jun 25, 2024 · 15 comments

Comments

@big-c-note
Copy link
Contributor

big-c-note commented Jun 25, 2024

Hey, thanks for the codebase.

I'm making use of it for a project where I need to convert dbml into databricks sql create table statements.

Do you want pydbml to produce arbitrary sql dialects?

If so I could make a contribution. Loosely thinking I could turn .sql into a dictionary and store different sql dialects that can be accessed via key like "ansi" or "databricks".

From what I can tell Databricks is severely lacking erd tooling. Plus being able to generate arbitrary sql dialect create table statements makes model management super portable via dbml and pydbml

Anyway, let me know and I can push this upstream

@big-c-note
Copy link
Contributor Author

Actually better to pass a sql_dialect argument when instantiating PyDBML. It can default to ansi or whatever you currently have

@Vanderhoof
Copy link
Owner

Hi!
Thanks for the suggestion. To be honest, I started this project a long time ago, and since then realized that many things could've been implemented better. .sql is one of them. My problem with it being a part of each model class is that it doesn't really belong there, SQL generation should be a separate workflow, this will be better for maintanability and will unlock the custom dialects support.

Since you mentioned it, maybe it's time to rewrite this part of the project. I'd suggest to approach it this way:

We remove .sql from every class, and create a Renderer class for each dialect, that will accept any DBML model-object (Table, Index, etc) and return an SQL string.

Something like this:

class PostgresqlRenderer:
    def render(model: SQLObject) -> str:
        ...
        
class DatabricksRenderer:
    def render(model: SQLObject) -> str:
        ...

My only concern is that I probably won't be able to maintain any extra dialects apart from the default one due to very little free time I have right now.

But since a detached renderer approach is so flexible, the custom renderers don't have to be part of the PyDBML package. You can release the BricksRenderer as a separate library, and anyone who needs it will just need to install it.

What do you think?

@big-c-note
Copy link
Contributor Author

Yes that is a better approach. Very flexible. And yes I agree you don't want to bite off more than you can chew. It's best to keep pydbml super lightweight and I was actually starting to think along these lines after I mentioned it.

Let me know how I can help. But happy to create a separate package for Databricks if you make these changes

@Vanderhoof
Copy link
Owner

Alright, let's do that! I will create the base renderer and update the rest of the code to use it. Then I will add a section on the main description page with the renderers catalogue, where yours will be the first one : }

I'll hope to start this week

@big-c-note
Copy link
Contributor Author

Nice, nice. I'm hoping to have lots of free time on it next week. Cheers

@Vanderhoof
Copy link
Owner

@big-c-note it took a bit more time and effort than I anticipated 😅

Screenshot 2024-07-17 at 09 59 26

I'll be addressing the other raised github issues in the upcoming few days, and then will make a release. You are welcome to look at the new renderer classes in the PR in the mean time

@big-c-note
Copy link
Contributor Author

This is awesome! I have gotten busy with another project, but I'll work on this as soon as I get a breather. Thank you so much

@Vanderhoof
Copy link
Owner

New renderer classes are released in 1.1.0 🥳

@big-c-note
Copy link
Contributor Author

You rock!!

@Vanderhoof
Copy link
Owner

Thanks 😄
While the docs on the renderers are in the making, here's a quick intro:

The idea is that you subclass a BaseRenderer class like this:

from pydbml.renderer.base import BaseRenderer

class MySQLRenderer(BaseRenderer):
    model_renderers = {}

(see default renderer as an example)

And then you define the renderer functions for each model. The function should have one argument - the model you need to render, and it should return a rendered sql string for that model. Use the renderer_for decorator of your class (it's inherited from the base) on the renderer function. This will add this function to the model_renderers mapping and that's how your renderer class will determine which function to use for which type of model:

@MySQLRenderer.renderer_for(Column)
def render_column(model: Column) -> str:
    # do the rendering and
    # return the string

(see default renderer for the column model).

And that's basically it. Finally, to use your renderer class instead of the default one, you just supply it as an argument to the database:

mydb = Database(sql_renderer=MySQLRenderer)

@big-c-note
Copy link
Contributor Author

big-c-note commented Aug 1, 2024

Awesome. I'll keep an eye out for docs as well. Huge thanks to you again.

This is really going to be cool for our project. It'll take me a minute to circle back, but this should play a big role in our model management software.

Cheers

@big-c-note
Copy link
Contributor Author

big-c-note commented Aug 6, 2024

@Vanderhoof
Would you accept something like #44 ? To pass sql_renderer through the parser? My intention would be to use parsed: Database = PyDBML(path_to_dbml, sql_renderer=DatabricksRenderer) and be able to receive the sql via parsed.sql attribute.

I am still working on my changes locally to ensure they work for my use case, but wanted to run this by you to see if you would support this usecase. Happy to make any changes to tests or however you want to organize the code

@Vanderhoof
Copy link
Owner

@big-c-note sorry, my github notifications are messed up. I missed your comment, although did approve your PR. I totally agree, this was the intended use, I just forgot to add it to the parser!

@ar-to
Copy link

ar-to commented Sep 19, 2024

Hello,

First of all, thanks for creating/maintaining this package!

I currently work with postgresql and running into weird syntax issues when parsing it via this package but not with the main cli from dbml. Seems like the main one auto-detects the dialect and maybe I'm missing something but I must of assumed this one was doing the same. I get around my need by using the main library which is in nodejs but I wrote some python tests to help me validate through my CT pipeline. I upgrade to the most recent (1.1.1) and got even more weird errors so kept my current one at 1.0.11 for now but may have to stop using it and rely on the cli if I can't resolve as the tests would become stale. Anyways, if you all can point me to the right attribute as I did not see a renderer for postgresql and it seems from above we have to make one ourselves during implementation. Feels like the kind of feature this package should have but I understand lack of time to maintain so just wondering what you all suggest.

Thanks again

@Vanderhoof
Copy link
Owner

Vanderhoof commented Sep 20, 2024

@ar-to Hi!
Just to confirm, you are experiencing syntax errors when trying to execute in PostgreSQL the code generated by PyDBML? Because PyDBML does not support SQL -> DBML conversion, only the other way around

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants