Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing implementation of maxBuffer in Postgres #68

Open
dennismphil opened this issue Apr 12, 2018 · 9 comments
Open

Missing implementation of maxBuffer in Postgres #68

dennismphil opened this issue Apr 12, 2018 · 9 comments

Comments

@dennismphil
Copy link

dennismphil commented Apr 12, 2018

@willfarrell @ZJONSSON

The documentation states

# etl.postgres.script(pool, schema, table [,options])

Collects data and builds up a postgres statement to insert/update data until the buffer is more than maxBuffer (customizable in options). Then the maxBuffer is reached, a full sql statement is pushed downstream. When the input stream has ended, any remaining sql statement buffer will be flushed as well.

However the code of Postgres is missing the maxBuffer implementation.

@ZJONSSON
Copy link
Owner

Thanks @dennismphil it would make sense to add this. We've had good experience with the mysql implementation as we can better calibrate the ETL load with maxBuffer (instead of number of records).

Do you want to take a stab at this in a PR?

@dennismphil
Copy link
Author

Yes I will take a stab at this. Please assign to me

@dennismphil
Copy link
Author

@ZJONSSON Taking a look at the current code, wondering if we should bring in a dependency something like Squel.js for a better readable code? Thoughts?

@ZJONSSON
Copy link
Owner

I generally like to avoid adding dependencies unless there is a good reason. Do you have an example of how the code above would look with Squel.js ?

@willfarrell
Copy link
Contributor

I agree that we should limit adding in dependencies. If it cannot be avoided, I would recommend knex over squel. Mostly because it supports more database engines and has a larger community.

@dennismphil
Copy link
Author

We could use Knex. I'll rewrite the existing code using knex in one file and let's see if you like the structure better. Will let @ZJONSSON / @willfarrell to veto it out if it does not look like it adds more clarity.

@dennismphil
Copy link
Author

I am not liking knex after working for a while for all the workarounds to make it use as a query builder. Will drop the library and use plain string concatenation itself. WIll see how to organize it better.

@rbreejen
Copy link

rbreejen commented Sep 19, 2018

Hi, I was reading about this topic. Would be interested to see this implemented wrt Postgres. How are things going?

As far as I am understanding, can you confirm that the change would look like the following?:

from:
INSERT INTO tmp(col_a,col_b) VALUES('a1','b1')
INSERT INTO tmp(col_a,col_b) VALUES,('a2','b2')
INSERT INTO tmp(col_a,col_b) VALUES,('a3','b3')

to (as long as MaxBuffer is not exceeded):
INSERT INTO tmp(col_a,col_b) VALUES('a1','b1'),('a2','b2'),('a3','b3')

Pros: faster performance

@ZJONSSON
Copy link
Owner

Yes that's basically what the maxBuffer is for - i.e. to capture as many values as possible until a certain maximum and then send off the query

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants