Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(backends): speed up most memtable existence checks #10067

Merged
merged 11 commits into from
Sep 11, 2024

Conversation

cpcloud
Copy link
Member

@cpcloud cpcloud commented Sep 9, 2024

Adds some slightly faster checks for memtable existence. At some point we may want to expand this to non-memtables, but for now these are only implemented for memtables.

@cpcloud cpcloud added this to the 9.5 milestone Sep 9, 2024
@cpcloud cpcloud added postgres The PostgreSQL backend sqlite The SQLite backend pyspark The Apache PySpark backend snowflake The Snowflake backend risingwave The RisingWave backend labels Sep 9, 2024
@cpcloud cpcloud added trino The Trino backend performance Issues related to ibis's performance labels Sep 9, 2024
Copy link
Member

@jcrist jcrist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple questions. Once #9695 lands we could also change the default implementation to

try:
    self.table(name)
    return True
except TableNotFound:
    return False

ibis/backends/sqlite/__init__.py Outdated Show resolved Hide resolved
ibis/backends/trino/__init__.py Outdated Show resolved Hide resolved
@cpcloud
Copy link
Member Author

cpcloud commented Sep 10, 2024

Going to leave off postgres for now. Looking into it.

@cpcloud cpcloud force-pushed the faster-memtable-existence-checks branch 2 times, most recently from faf78a6 to 89d4253 Compare September 11, 2024 11:46
@cpcloud cpcloud added clickhouse The ClickHouse backend mysql The MySQL backend mssql The Microsoft SQL Server backend oracle The Oracle backend exasol Issues related to the exasol backend labels Sep 11, 2024
@cpcloud
Copy link
Member Author

cpcloud commented Sep 11, 2024

Ok, I managed to audit all the backends except for Druid and Flink. Those can be tackled in a follow-up.

@cpcloud cpcloud force-pushed the faster-memtable-existence-checks branch 2 times, most recently from 4a0537b to 4f2ab29 Compare September 11, 2024 12:45
@@ -411,11 +411,18 @@ def _register_udfs(self, expr: ir.Expr) -> None:
self._session.udf.register(f"unwrap_json_{typ.__name__}", unwrap_json(typ))
self._session.udf.register("unwrap_json_float", unwrap_json_float)

def _in_memory_table_exists(self, name: str) -> bool:
sql = f"SHOW TABLES IN {self.current_database} LIKE '{name}'"
Copy link
Member Author

@cpcloud cpcloud Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow self._session.catalog.tableExists doesn't give the same answer as the code I have here, even when passing a fully quoted default.$memtable (the second dbName parameter is deprecated and the deprecation warning suggests passing the fully qualified name instead.)

@cpcloud cpcloud force-pushed the faster-memtable-existence-checks branch from 4f2ab29 to 0188256 Compare September 11, 2024 12:56
@cpcloud cpcloud changed the title perf(backends): speed up some memtable existence checks perf(backends): speed up most memtable existence checks Sep 11, 2024
@cpcloud
Copy link
Member Author

cpcloud commented Sep 11, 2024

Clouds are passing:

…/ibis on  faster-memtable-existence-checks is 📦 v9.4.0 via 🐍 v3.10.14 via ❄️   impure (ibis-3.10.14-env)
❯ pytest -m 'snowflake or bigquery' -n 8 --dist loadgroup --snapshot-update -q
bringing up nodes...
x.....................................s....x.....x..s.........................x..........x........................x........xx....x..........................x..xx..................xx.............xx.........x [  5%]
.....................x....x...x.....x..............x........................x..............x...x..........................x.....xx.xx.........................x...........x.......x.....x................x.... [ 10%]
...........xx.....x.x........xx...x....x..x...x....x.x.........xx.xx.xxxxxxxxxx.x.x..x..x...xx..xxxx...x.x.xx..xxx..xx.x.....xxx.x.....xxx.xx.xxx.x..xxxx.x...x....x.x.xxx.xx.xx.....xx.x...xx.xx..xxx..xxx... [ 15%]
x.x.x.x.x..xx.x......................x.xx.........x.xx.x.s.ssssssssssssssssssss..xx.......x........................................x..................x..........x.x..............................x........... [ 20%]
x.................xx...................x..x....x......x......x...x.x.....x...........x.........x..s..x........x......x.....................x.................x......x............x.......x.x.x.xx....xxx.xs..x [ 25%]
..xx...x................................x........s...x.......ssssssssssssssssssssssssssssssssssssssssssssssssssss.ssssssssssssssssssssssssssss.sssssssssssssssssss........x.............s..............x...... [ 30%]
.....x.....................................x........x........................x............................................................x..x................x..........x..xx..xx........x...x............... [ 35%]
....................................................................x...........x..xx...x....x.x...........................................x................................x.......................x......... [ 40%]
.x....................x.....................x..x.....................................xxx...x........................................x...x...x.....x.x............x.x.........x.......xx.............x......... [ 45%]
....x......s........x............x............x...x.x...x.................x.............x.................x.......x......x...x.....x.x.........x.xxx....................x.x.........x.......................x. [ 50%]
.x.x...x..........x............x........................X......x........x.xxx.........x........x....x...x.......x.x.x.x...x......xxx.X........x..x.x..x.x.....xx.x....xxX..X....x........X....x...xxxxxx.xx.xx [ 56%]
x.xxxx.xx.xxxx.xxx.xxx.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx..x.x.xx.....................x............x..x...x......................x.x... [ 61%]
........x......x.x.x.....x...x..............................x...x.x..................x......x...............xx..x...................xx.x.........x.x.s..s...x.......x.........xxxXx.xs...s........x.x......... [ 66%]
..x...............x....x.x.xxx...xx.xx.x..x...x...x...x.x..........................s....ss.......x................................x..............x....x..xx.........x.......x.....x............x.xx.x.....x.xx [ 71%]
...x..xx...x....xxx..x.....x............x..xx.x.....x.x.........xx.x..x..............x.........x...............x........................................x..................................................... [ 76%]
.......x.....x...........................................xx....x.............xx................................................x..x..........x....................x.x..............x.......x..x.....xx.x..xxxx [ 81%]
xxxxx.....xx.xx.x.xx....x.xxxxx..x..x..xx...............x....x....xx.....x.............................x..................................................................................x................... [ 86%]
....x...................................................................................x..xx.....................................x........................................................................... [ 91%]
.............................................................................................................................................s........x..x.................................................... [ 96%]
....................................................................s...s.......x........................................s.....                                                                                [100%]
3330 passed, 138 skipped, 567 xfailed, 6 xpassed in 922.55s (0:15:22)

@cpcloud cpcloud force-pushed the faster-memtable-existence-checks branch from 0188256 to bcbf310 Compare September 11, 2024 13:15
@cpcloud cpcloud enabled auto-merge (squash) September 11, 2024 13:27
@cpcloud cpcloud merged commit a205ab7 into ibis-project:main Sep 11, 2024
81 checks passed
@cpcloud cpcloud deleted the faster-memtable-existence-checks branch September 11, 2024 13:40
ncclementi pushed a commit to ncclementi/ibis that referenced this pull request Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clickhouse The ClickHouse backend exasol Issues related to the exasol backend mssql The Microsoft SQL Server backend mysql The MySQL backend oracle The Oracle backend performance Issues related to ibis's performance postgres The PostgreSQL backend pyspark The Apache PySpark backend risingwave The RisingWave backend snowflake The Snowflake backend sqlite The SQLite backend trino The Trino backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants