Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: several Table.sample improvements #10207

Merged
merged 4 commits into from
Sep 24, 2024
Merged

Conversation

jcrist
Copy link
Member

@jcrist jcrist commented Sep 24, 2024

This PR:

  • Implements sample for polars
  • Moves the errors for risingwave/druid to be compile time rather than execution time
  • Centralizes the implementation of compiling tablesamples, fixing a TODO in the compiler
  • Compiles Table.sample to the backend's native TABLESAMPLE clause when possible for bigquery, postgres, mssql, oracle, and impala. This is much more efficient when using method="block" as it lets the backend avoid reading in whole blocks of data, dramatically reducing IO. This was the main goal of this PR.
  • Adds a SQL snapshot test to ensure the native implementation is being used when possible, and expands our execution tests to ensure execution works for all cases.

@github-actions github-actions bot added tests Issues or PRs related to tests polars The polars backend labels Sep 24, 2024
@jcrist jcrist added impala The Apache Impala backend postgres The PostgreSQL backend bigquery The BigQuery backend mssql The Microsoft SQL Server backend oracle The Oracle backend labels Sep 24, 2024
@lostmygithubaccount
Copy link
Member

lostmygithubaccount commented Sep 24, 2024

how for Polars? my understanding is lazyframes still don't support .sample per pola-rs/polars#3933 (apologies if I'm missing this)

for posterity see #10207 (comment)

ibis/backends/sql/rewrites.py Outdated Show resolved Hide resolved
ibis/backends/sql/rewrites.py Outdated Show resolved Hide resolved
ibis/backends/tests/test_sql.py Outdated Show resolved Hide resolved
ibis/backends/tests/test_generic.py Outdated Show resolved Hide resolved
…e` to native `TABLESAMPLE` syntax when possible
@jcrist jcrist merged commit 321a3b5 into ibis-project:main Sep 24, 2024
77 checks passed
@jcrist jcrist deleted the improve-sample branch September 24, 2024 21:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bigquery The BigQuery backend impala The Apache Impala backend mssql The Microsoft SQL Server backend oracle The Oracle backend polars The polars backend postgres The PostgreSQL backend tests Issues or PRs related to tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants