Skip to content

What is the best way to download large tables? #5801

Answered by gforsyth
evlaw-ea asked this question in Q&A
Discussion options

You must be logged in to vote

Hey @evlaw-ea -- glad to hear that the ETL code porting is working well!!

We are thinking about ways to move data around, but it's a tricky problem (actually a bunch of tricky problems).

  1. Stream batches of data from Snowflake such that I can possess them individually? Would to_pyarrow_batches be of any help here? (Say, does it download all the data into memory before turning them into a RecordBatchReader? If yes, then pass.)

I think this will get around your out-of-memory issues, but it will not be performant. It's going to pull down N tuples at a time, where N is the batch size, and then you can operate on those batches in sequence.

2. Downcast the data type downloaded from Snowflake…

Replies: 2 comments 5 replies

Comment options

You must be logged in to vote
1 reply
@evlaw-ea
Comment options

Answer selected by evlaw-ea
Comment options

You must be logged in to vote
4 replies
@evlaw-ea
Comment options

@cpcloud
Comment options

@cpcloud
Comment options

@evlaw-ea
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants