Missing implementation of `maxBuffer` in Postgres #68

dennismphil · 2018-04-12T06:51:23Z

The documentation states

# etl.postgres.script(pool, schema, table [,options])

Collects data and builds up a postgres statement to insert/update data until the buffer is more than maxBuffer (customizable in options). Then the maxBuffer is reached, a full sql statement is pushed downstream. When the input stream has ended, any remaining sql statement buffer will be flushed as well.

However the code of Postgres is missing the maxBuffer implementation.

The text was updated successfully, but these errors were encountered:

ZJONSSON · 2018-04-13T00:51:46Z

Thanks @dennismphil it would make sense to add this. We've had good experience with the mysql implementation as we can better calibrate the ETL load with maxBuffer (instead of number of records).

Do you want to take a stab at this in a PR?

dennismphil · 2018-04-13T01:23:55Z

Yes I will take a stab at this. Please assign to me

dennismphil · 2018-04-13T03:17:55Z

@ZJONSSON Taking a look at the current code, wondering if we should bring in a dependency something like Squel.js for a better readable code? Thoughts?

ZJONSSON · 2018-04-15T21:59:24Z

I generally like to avoid adding dependencies unless there is a good reason. Do you have an example of how the code above would look with Squel.js ?

willfarrell · 2018-04-16T02:34:13Z

I agree that we should limit adding in dependencies. If it cannot be avoided, I would recommend knex over squel. Mostly because it supports more database engines and has a larger community.

dennismphil · 2018-04-16T16:45:39Z

We could use Knex. I'll rewrite the existing code using knex in one file and let's see if you like the structure better. Will let @ZJONSSON / @willfarrell to veto it out if it does not look like it adds more clarity.

dennismphil · 2018-04-26T01:04:27Z

I am not liking knex after working for a while for all the workarounds to make it use as a query builder. Will drop the library and use plain string concatenation itself. WIll see how to organize it better.

rbreejen · 2018-09-19T19:30:02Z

Hi, I was reading about this topic. Would be interested to see this implemented wrt Postgres. How are things going?

As far as I am understanding, can you confirm that the change would look like the following?:

from:
INSERT INTO tmp(col_a,col_b) VALUES('a1','b1')
INSERT INTO tmp(col_a,col_b) VALUES,('a2','b2')
INSERT INTO tmp(col_a,col_b) VALUES,('a3','b3')

to (as long as MaxBuffer is not exceeded):
INSERT INTO tmp(col_a,col_b) VALUES('a1','b1'),('a2','b2'),('a3','b3')

Pros: faster performance

ZJONSSON · 2018-09-19T20:14:04Z

Yes that's basically what the maxBuffer is for - i.e. to capture as many values as possible until a certain maximum and then send off the query

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing implementation of `maxBuffer` in Postgres #68

Missing implementation of `maxBuffer` in Postgres #68

dennismphil commented Apr 12, 2018 •

edited

Loading

ZJONSSON commented Apr 13, 2018

dennismphil commented Apr 13, 2018

dennismphil commented Apr 13, 2018

ZJONSSON commented Apr 15, 2018

willfarrell commented Apr 16, 2018

dennismphil commented Apr 16, 2018

dennismphil commented Apr 26, 2018

rbreejen commented Sep 19, 2018 •

edited

Loading

ZJONSSON commented Sep 19, 2018

Missing implementation of maxBuffer in Postgres #68

Missing implementation of maxBuffer in Postgres #68

Comments

dennismphil commented Apr 12, 2018 • edited Loading

ZJONSSON commented Apr 13, 2018

dennismphil commented Apr 13, 2018

dennismphil commented Apr 13, 2018

ZJONSSON commented Apr 15, 2018

willfarrell commented Apr 16, 2018

dennismphil commented Apr 16, 2018

dennismphil commented Apr 26, 2018

rbreejen commented Sep 19, 2018 • edited Loading

ZJONSSON commented Sep 19, 2018

Missing implementation of `maxBuffer` in Postgres #68

Missing implementation of `maxBuffer` in Postgres #68

dennismphil commented Apr 12, 2018 •

edited

Loading

rbreejen commented Sep 19, 2018 •

edited

Loading