Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does ozo need to fetch all data from postgres before it starts to e.g. push it back to a vector? #295

Open
psmaron opened this issue Apr 20, 2021 · 2 comments
Labels

Comments

@psmaron
Copy link

psmaron commented Apr 20, 2021

I'm wondering how ozo exchanges data between PostgreSQL. I know sockets are used between the two and if transferred data is big enough, it's divided and sent by chunks, e.g. of 16KB each, so that when we are fetching 2MB of data, 2000/16 = 128 packets will be sent.
Assuming I want to push fetched rows to a vector, my question is: will ozo wait for all packets and only then start to push back to vector, or it'll push back to vector on the fly, releasing memory of 'used' (pushed back) packets? The reason between these 2 approaches is that (in some small time window) in the first case we'll need to reserve about 2*2MB of memory on the application side (2MB for the vector and 2MB for all data transferred from postgres), while the former case requires only roughly 2MB (2MB for the vector, and some memory for not-yet-pushed-to-vector packets).
I'm guessing ozo needs to have all the data from postgres transferred before it starts to push it back to a vector, am I right?

@psmaron psmaron changed the title Does ozo Does ozo need to fetch all data from postgres before it starts to e.g. push it back to a vector? Apr 20, 2021
@thed636
Copy link
Contributor

thed636 commented Apr 21, 2021

Hi!

The library doesn't provide a stream interface. So the result should be received completely before the operation continuation call. And this is not only the library limits but the underlying libpq library limit. The best approach is to combine the application logic with a proper database query to fetch data by a limited amount of rows. Streaming is not the best choice due to holding a transaction during the operation. Long transactions aren't good for database performance. We tried to utilize single-row mode for a kind of streaming, but that was slow and led to the long transaction with the database performance issues.

Please see also #230 with a short discussion about CURSOR and COPY.

Hope that helps.

@psmaron
Copy link
Author

psmaron commented Apr 22, 2021

All is clear, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants