Replies: 2 comments 2 replies
-
I'll note that in work on #6996, while sqlglot definitely has its own learning curve, I think that there are fewer layers to peel back when fixing things, vs. the sometimes many-layered onion that sqlalchemy can be. |
Beta Was this translation helpful? Give feedback.
-
+1Regarding the dialect problem. The SQLAlchemy 2.0 move has been difficult for the BigQuery team to address. I suspect sqlglot's approach and smaller public interface would make breaking changes easier to deal with when they happen. I don't have much experience with sqlglot myself but have heard good things from BQ customers. I'll try to find some time to evaluate it myself. Some considerations: What do we do with snapshot tests? I think they still provide utility with a move to sqlglot, but we might want to make sure we're pinning a certain version of sqlglot when generating and comparing snapshots. How easy is it to extend if there are operations we want to use that aren't supported in sqlglot yet? What is the scope of sqlglot? For example, are DDL and UDF generation/registration included in its scope? If not, is it feasible to plug in our implementations. |
Beta Was this translation helpful? Give feedback.
-
A few people have asked about this topic, so I'd like to start a discussion and
state some of my thinking about moving our SQL backends to
sqlglot and away from SQLAlchemy.
Hopefully this goes without saying, but nothing here should be interpreted as
negative toward SQLAlchemy. SQLAlchemy is a wonderful project, and provides
huge value. It has served us well for years.
I think some the challenges that make SQLAlchemy difficult for us to continue
to use are very specific to ibis, that is, the problem of backend
cat-herding.
Why move away from SQLAlchemy?
Dialect sprawl
I think the primary reason to move away from SQLAlchemy is one of its greatest
strengths: pluggable dialects.
Library developers can build a SQLAlchemy dialect that natively integrates with
most of the rest of SQLAlchemy as an independent project that need not be tied
to sqlalchemy itself.
One of the challenges with this architecture with ibis is that because we have
so many backends that use SQLAlchemy we're often tied to the timelines of
whatever person or company happens to be maintaining (or not!) the dialect for
a given backend.
This is fine if you have 1 or 2 backends, but at 9 sqlalchemy backends we feel
the burden of having to deal with upstream projects' priorities.
In addition to dialects there are separate database driver packages as well,
which adds to this sprawl.
Again, I think pluggability is one of SQLAlchemy's strengths. Combining it with
ibis is what makes it challenging to continue to use.
Other reasons
There are other reasons too, but I believe they are less important than
addressing the dialect sprawl issue.
age of SQLAlchemy) of development remains to be seen :)
semantically equivalent
Why sqlglot?
sqlglot is a relatively new project and I think it has a lot of promise.
It's a SQL parser and transpiler with support for many of the dialects of SQL that
we care about.
Why should we move to sqlglot?
Dialects are centralized in one place
sqlglot dialects are centralized in one place and are much easier to discover,
understand and contribute to.
Amazingly, the team working on it never seems to have any open issues and it's
not because there are no bugs! Code is reviewed quickly and bugs are fixed
extremely fast. If something isn't going to be implemented you won't be left hanging.
If something doesn't work, there's usually a workaround or extension point.
The fact that dialects are in one place addresses my main concern with
continuing to use SQLAlchemy.
Transformations
The translation functionality in sqlglot has recently unlocked some
long-awaited features for us, like supporting unnest for the BigQuery
backend in way that is
consistent with other
backends.
These transformations ensure that backend behaviors agree with each other, even
when doing so results in extremely complex SQL.
The output of this code is:
This is not a transformation that I would like to encode in ibis if we can
avoid it.
Performance
sqlglot allows us to more easily separate expression compilation from execution.
We can choose to combine sqlglot with ADBC, or duckdb's native driver, or
whatever other thing comes along because we're not tied to a specific driver.
Potential downsides
We've already moved the clickhouse backend to sqlglot, and are making
improvements to it.
One of the key features of SQL that ClickHouse doesn't
support is correlated
subqueries.
SQLAlchemy handles some of the details of correlation for us, so we need to
port another backend that supports them to see what handling them looks like.
The way joins are represented in sqlglot is very different from sqlalchemy. We
likely to address join chaining in one way or another before moving a bunch of
our other backends to sqlglot, because sqlglot doesn't have any special
handling for this problem.
Goals for the porting process
Non-goals
Let's discuss!
Beta Was this translation helpful? Give feedback.
All reactions