Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Transaction Support for Indexer runFunction Invocations #432

Closed
wants to merge 9 commits into from

Conversation

darunrs
Copy link
Collaborator

@darunrs darunrs commented Nov 28, 2023

Currently, partial failure of database actions is permitted. This can cause non-deterministic behavior depending on the state of the user's database. To help aid with that, we want to wrap all database calls in a transaction and commit the changes only if no errors occurred.

Users may want to handle errors themselves in some cases, so we only rollback transaction if an uncaught error surfaces to the runner, meaning it was not caught by user code.

In order to support transactions, each invocation of user code must have all queries go through the same client. After committing, the client should be released. Due to this behavior, we need to do away with pools, as we no longer expect any indexer to use more than one client at any point in time.

There is a problem that the database can only support up to a certain number of active clients at a time. Thus, frequent and busy transactions can cause competing indexers to throttle. Eventually, the number of active connections will grow high enough to induce throttling. So, we will want to introduce a solution like pgBouncer soon. I was using 'SELECT sum(numbackends) FROM pg_stat_database;' in psql locally and observed as high as 84 active database connections, which is close to my local DB's max of 97.

I tried my best to reproduce this behavior by backing up many blocks for all indexers as well as having a 90k+ historical backfill for an indexer which makes simple context.db invocations. I did observe a bit of throttling but did not observe any "too many clients' errors. This is probably due to the fact that all indexers don't have their own pool, but instead one client they make connect calls with.

@darunrs
Copy link
Collaborator Author

darunrs commented Nov 28, 2023

The latest commit has the test code I used to validate switching from pool to client. If you run yarn test dml it will run the DmlHandler tests, which includes that extra test I added at the top. You'll see it overloads the DB. At least, it does on my machine anyway. If you move the DmlHandler.create function call to the outside of the for loop, then the problem goes away.

There's some problems I had with this approach:

  1. I felt there was no point in creating a pool instance for each indexer on launch if all they're going to do is manage a single client anyway.
  2. Every single indexer would get a pool. I couldn't find any resources regarding max pool count, but each pool consumes resources either way. It felt a bit wasteful to have that overhead. If there is a max, then we have another boundary condition now. Now that said, we could instead create a client upon Indexer creation, since a client is more lightweight. We can just call connect and release on it. I can try experimenting with that to see if there's no problems with that approach.
  3. Since every indexer has a pool, the original case of 100 subsequent invocations to DmlHandler.create (not even simultaneous), would come to pass soon when we have enough active indexers. The issue is that instead of maybe waiting a while to connect, we instead get an unrecoverable error. Though, that does give me an idea to concretely verify that my code does in fact not produce that kind of error if enough connect invocations are being made. Will test that.

So, given that, I decided to have each indexer create a client on creation but only connect with it and release it when needed.

@darunrs
Copy link
Collaborator Author

darunrs commented Nov 28, 2023

We had a team discussion over the changes. Relevant points:

  1. This impacts somewhat invisible behavior and changes implicit behavior of all indexers. The scope is wide and the detectability is low. So, releasing a change like this deserves a lot of scrutiny before release.
  2. PgBouncer (or something like it) is more or less guaranteed as the next step. There's a strong case to simply wait and release this kind of change in its best possible form, which is just implementing a PgBouncer solution.
  3. Integration tests are sorely lacking in this regard to building confidence that changes don't negatively impact things in ways a unit test may not catch.
  4. There's a case to be had regarding velocity vs completeness in terms of feature development. Waiting until a fully featured set of code including PgBouncer vs releasing this now and a PgBouncer update later.

@darunrs darunrs linked an issue Dec 1, 2023 that may be closed by this pull request
@darunrs
Copy link
Collaborator Author

darunrs commented Mar 18, 2024

Will re-implement later.

@darunrs darunrs closed this Mar 18, 2024
@darunrs darunrs deleted the transactionSupport branch March 18, 2024 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Transactions for Indexer Code Execution
1 participant