Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: BigQuery backend table.to_pyarrow_batches is not honoring the chunk_size parameter #10257

Open
1 task done
ruiyang2015 opened this issue Sep 30, 2024 · 1 comment
Open
1 task done
Labels
bug Incorrect behavior inside of ibis

Comments

@ruiyang2015
Copy link

What happened?

for following code for bigquery:

c = ibis.bigquery.connect(...)
t = c.table('some table')
for y in t.to_pyarrow_batches(chunk_size=1_000_000):  # <- change this parameter does not take effect
  print(y.num_rows)

for our case, the record returned is always 4k rows instead of larger set we expect.
tried same code for Duckdb/Snowflake, both can return proper sized pyarrow table

What version of ibis are you using?

9.0.0

What backend(s) are you using, if any?

BigQuery

Relevant log output

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@ruiyang2015 ruiyang2015 added the bug Incorrect behavior inside of ibis label Sep 30, 2024
@cpcloud
Copy link
Member

cpcloud commented Sep 30, 2024

It's possible that we're not encoding the chunk size in the right way. I recall there being some complexity around how paged results are related to chunks.

Maybe @tswast knows: is it possible to get back an exact chunk size (modulo the last chunk which will be <= the requested chunk size)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior inside of ibis
Projects
Status: backlog
Development

No branches or pull requests

2 participants