Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ruby: Implement a buffer pool #214

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

byroot
Copy link
Contributor

@byroot byroot commented Dec 14, 2024

The trilogy client eagerly allocate a 32kiB buffer, and grows it as needed. It's never freed not shrunk until the connection is closed. Since by default the MySQL max_allowed_packet is 16MiB, long living connections will progressively grow to that size.

For basic usage it's not a big deal, but some applications may have dozens if not hundreds of connections that are mostly idle.

A common case being multi-tenant applications with horizontal sharding. In such cases you only ever query one database but have open connections to many databases.

This situation might lead to a lot of memory retained by trilogy connections and never really released, looking very much like a memory leak.

This can be reproduced with a simple script:

require 'trilogy'
connection_pool = []

50.times do
  t = Trilogy.new(database: "test")
  t.query("select '#{"a" * 16_000_000}' as a")
  connection_pool << t
end

puts "#{`ps -o rss= -p #{$$}`} kiB"
$ ruby /tmp/trilogy-leak.rb
927120 kiB

If we instead take over the buffer lifetime management, we can implement some pooling for the buffers, we can limit the total number of buffer to as many connections are actually in use concurrently.

The same reproduction script with the current branch:

$ ruby -Ilib:ext /tmp/trilogy-leak.rb
108144 kiB

NB: The implementation isn't quite ideal, as the buffers are allocated by the Ruby extension with ruby_xmalloc, but grown by trilogy itself when needed with raw realloc. It works but ideally there should be some hook in the trilogy C library so we can use the ruby allocator all the way. I can try to work on that if it's really a blocker

Also cc @shanempope this is very likely to explain the memory growth observed last month after reforking stops, because:

  • Active Record closes all DB clients on fork, so as long as the app reforks, this "leak" is kept in check.
  • When we hit OOM, we do a heap dump from a forked child, so Trilogy here again, Active Record close all clients and free the buffers. Making the source of the growth disappear.

All this to say you probably want to experiment with this branch or something like it.

@adrianna-chang-shopify @jhawthorn @matthewd

@byroot
Copy link
Contributor Author

byroot commented Dec 14, 2024

The macOS build seem to fail consistently, but It doesn't contain any usable info and I can't reproduce it locally :/

Copy link
Collaborator

@adrianna-chang-shopify adrianna-chang-shopify left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation makes sense to me. Can you tell me more about this:

NB: The implementation isn't quite ideal, as the buffers are allocated by the Ruby extension with ruby_xmalloc, but grown by trilogy itself when needed with raw realloc

Is this still the case? I see us using RB_ALLOC_N when initializing the pool, and then RB_REALLOC_N when resizing. Doesn't the former use ruby_xmalloc and the latter use ruby_xrealloc? What am I missing? 🤔

}

if (!pool->capa) {
pool->entries = RB_ALLOC_N(buffer_pool_entry, 16);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding correctly, we set the default buffer_pool_max_size to be 8, but then allocate 16 entries here when initializing a new pool. Wouldn't it make more sense to allocate 8 entries to start with (or to allocate the configured max size?) Why 16 specifically?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's a bit of a difference. Those should probably match.

Initially I had some code that would dynamically size the pool based on the number of live Trilogy objects, but that caused issues with the GC (since the pool is retained by an object, when both the pool and the client are GCed, the pool may be freed before the clients). Hence why I fell back to just a global setting like this.

I started with 16, but realized that 16 * 16MiB is 256 MiB, so figured 8 was already plenty.

@byroot
Copy link
Contributor Author

byroot commented Dec 16, 2024

Is this still the case? I see us using RB_ALLOC_N when initializing the pool, and then RB_REALLOC_N when resizing.

That's for the pool itself. The buffers inside the pool are reallocated from buffer.c in the C library:

uint8_t *new_buff = realloc(buffer->buff, new_cap);

So there is a mix of regular malloc/free/realloc and the ruby versions of those. Which works most of the time but is kinda incorrect and would crash if Ruby is built with some specific debug flags on.

@jhawthorn
Copy link
Member

jhawthorn commented Dec 18, 2024

Mixing the xmalloc and malloc has actually caused us significant issues in terms of causing extra GC due to the accounting "leak". We do test in our CI (occasionally) with -DCALC_EXACT_MALLOC_SIZE to ensure that they aren't mismatched.

@byroot
Copy link
Contributor Author

byroot commented Dec 19, 2024

Mixing the xmalloc and malloc has actually caused us significant issues in terms of causing extra GC

Makes sense.

I suppose there's two way to fix that:

  • Refactor even more heavily
  • Just alloc and free those buffer with regular malloc from contrib/ruby

Do you have a preference?

The trilogy client eagerly allocate a 32kiB buffer, and grows
it as needed. It's never freed not shrunk until the connection
is closed. Since by default the MySQL `max_allowed_packet` is
16MiB, long living connections will progressively grow to that
size.

For basic usage it's not a big deal, but some applications may
have dozens if not hundreds of connections that are mostly idle.

A common case being multi-tenant applications with horizontal
sharding. In such cases you only ever query one database but
have open connections to many databases.

This situation might lead to a lot of memory retained by trilogy
connections and never really released, looking very much like a
memory leak.

This can be reproduced with a simple script:

```ruby
require 'trilogy'
connection_pool = []

50.times do
  t = Trilogy.new(database: "test")
  t.query("select '#{"a" * 16_000_000}' as a")
  connection_pool << t
end

puts "#{`ps -o rss= -p #{$$}`} kiB"
```

```
$ ruby /tmp/trilogy-leak.rb
927120 kiB
```

If we instead take over the buffer lifetime management, we
can implement some pooling for the buffers, we can limit the total
number of buffer to as many connections are actually in use
concurrently.

The same reproduction script with the current branch:

```
$ ruby -Ilib:ext /tmp/trilogy-leak.rb
108144 kiB
```
@byroot
Copy link
Contributor Author

byroot commented Dec 19, 2024

Alright, I did the later and that fixed the macOS build, so yeah, that indeed wasn't a good idea.

@igrigorik
Copy link

FYI, a much simpler strategy that we tried and proved to perform well:
igrigorik@9120d5d

@byroot
Copy link
Contributor Author

byroot commented Dec 20, 2024

It may be much simpler but is also much less effective.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants