-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ruby: Implement a buffer pool #214
base: main
Are you sure you want to change the base?
Conversation
The macOS build seem to fail consistently, but It doesn't contain any usable info and I can't reproduce it locally :/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation makes sense to me. Can you tell me more about this:
NB: The implementation isn't quite ideal, as the buffers are allocated by the Ruby extension with ruby_xmalloc, but grown by trilogy itself when needed with raw realloc
Is this still the case? I see us using RB_ALLOC_N
when initializing the pool, and then RB_REALLOC_N
when resizing. Doesn't the former use ruby_xmalloc
and the latter use ruby_xrealloc
? What am I missing? 🤔
} | ||
|
||
if (!pool->capa) { | ||
pool->entries = RB_ALLOC_N(buffer_pool_entry, 16); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm understanding correctly, we set the default buffer_pool_max_size
to be 8, but then allocate 16 entries here when initializing a new pool. Wouldn't it make more sense to allocate 8 entries to start with (or to allocate the configured max size?) Why 16 specifically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's a bit of a difference. Those should probably match.
Initially I had some code that would dynamically size the pool based on the number of live Trilogy
objects, but that caused issues with the GC (since the pool is retained by an object, when both the pool and the client are GCed, the pool may be freed before the clients). Hence why I fell back to just a global setting like this.
I started with 16, but realized that 16 * 16MiB
is 256 MiB
, so figured 8 was already plenty.
That's for the pool itself. The buffers inside the pool are reallocated from Line 39 in c080567
So there is a mix of regular |
Mixing the xmalloc and malloc has actually caused us significant issues in terms of causing extra GC due to the accounting "leak". We do test in our CI (occasionally) with |
Makes sense. I suppose there's two way to fix that:
Do you have a preference? |
The trilogy client eagerly allocate a 32kiB buffer, and grows it as needed. It's never freed not shrunk until the connection is closed. Since by default the MySQL `max_allowed_packet` is 16MiB, long living connections will progressively grow to that size. For basic usage it's not a big deal, but some applications may have dozens if not hundreds of connections that are mostly idle. A common case being multi-tenant applications with horizontal sharding. In such cases you only ever query one database but have open connections to many databases. This situation might lead to a lot of memory retained by trilogy connections and never really released, looking very much like a memory leak. This can be reproduced with a simple script: ```ruby require 'trilogy' connection_pool = [] 50.times do t = Trilogy.new(database: "test") t.query("select '#{"a" * 16_000_000}' as a") connection_pool << t end puts "#{`ps -o rss= -p #{$$}`} kiB" ``` ``` $ ruby /tmp/trilogy-leak.rb 927120 kiB ``` If we instead take over the buffer lifetime management, we can implement some pooling for the buffers, we can limit the total number of buffer to as many connections are actually in use concurrently. The same reproduction script with the current branch: ``` $ ruby -Ilib:ext /tmp/trilogy-leak.rb 108144 kiB ```
Alright, I did the later and that fixed the macOS build, so yeah, that indeed wasn't a good idea. |
FYI, a much simpler strategy that we tried and proved to perform well: |
It may be much simpler but is also much less effective. |
The trilogy client eagerly allocate a 32kiB buffer, and grows it as needed. It's never freed not shrunk until the connection is closed. Since by default the MySQL
max_allowed_packet
is 16MiB, long living connections will progressively grow to that size.For basic usage it's not a big deal, but some applications may have dozens if not hundreds of connections that are mostly idle.
A common case being multi-tenant applications with horizontal sharding. In such cases you only ever query one database but have open connections to many databases.
This situation might lead to a lot of memory retained by trilogy connections and never really released, looking very much like a memory leak.
This can be reproduced with a simple script:
If we instead take over the buffer lifetime management, we can implement some pooling for the buffers, we can limit the total number of buffer to as many connections are actually in use concurrently.
The same reproduction script with the current branch:
NB: The implementation isn't quite ideal, as the buffers are allocated by the Ruby extension with
ruby_xmalloc
, but grown by trilogy itself when needed with rawrealloc
. It works but ideally there should be some hook in thetrilogy
C library so we can use the ruby allocator all the way. I can try to work on that if it's really a blockerAlso cc @shanempope this is very likely to explain the memory growth observed last month after reforking stops, because:
Trilogy
here again, Active Record close all clients and free the buffers. Making the source of the growth disappear.All this to say you probably want to experiment with this branch or something like it.
@adrianna-chang-shopify @jhawthorn @matthewd