Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add buffering for runner output capture
Updating the step execution for every line can be extremely costly for jobs producting millions of lines. While this is an edge case (jobs should really not use stdout/stderr to output massive amount of data), the consequences are severe: 1. Job execution time is dramatically impacted, since each line causes a query to the database. 2. Each update produces a new line in the database (since PostgreSQL uses MVCC), which results in an insane amount of dead tuples. As an example, a jobs producing 5 megabytes of output could easily result in 20+ GB of dead tuples in PostgreSQL. While the vacuum process ultimately frees that space, it is simply not fast enough when updates are performed hundreds or thousand of times per second. Buffering aggressively for at least one second solves the problem at the expense of memory usage. An even better solution would be to update after X seconds and Y bytes, but this requires a more sophisticated buffered reader.
- Loading branch information