Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

device: Allow buffer memory growth to be limited at run time #69

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

gitlankford
Copy link

The infinite memory growth allowed by the default PreallocatedBuffersPerPool setting causes processes to be oom-killed on low memory devices. This occurs even when a soft limit is set with GOMEMLIMIT. Specifically running tailscale on a linux device (openwrt, mips, 128MB RAM) will exhaust all memory and be oom-killed when put under heavy load. Allowing this value to be overwritten as is done in the iOS build will allow tuning to cap memory expansion and prevent oom-kill.

see tailscale issue thread for further info:
tailscale/tailscale#7272

The infinite memory growth allowed by the default PreallocatedBuffersPerPool
setting causes processes to be oom-killed on low memory devices.
This occurs even when a soft limit is set with GOMEMLIMIT.
Specifically running tailscale on a linux device (openwrt, mips, 128MB RAM)
will exhaust all memory and be oom-killed when put under heavy load.
Allowing this value to be overwritten as is done in the iOS build will
allow tuning to cap memory expansion and prevent oom-kill.

see tailscale issue thread for further info:
  tailscale/tailscale#7272

Signed-off-by: Seth Lankford <[email protected]>
@gitlankford
Copy link
Author

Thank you for your patience.

I'm sure there are other ways to solve this issue (specific low-mem builds, etc).
This method follows the prior art of the iOS build.

@zx2c4
Copy link
Member

zx2c4 commented Mar 3, 2023

Not sure it's a good idea to add random knobs like this for third parties to twiddle and trip over. I wonder if there's some heuristic that could be used instead, which would always work and dynamically scale accordingly? ratelimiter.c in the kernel does this, for example, with something pretty kludgy but it does work:

int wg_ratelimiter_init(void)
{
        mutex_lock(&init_lock);
        if (++init_refcnt != 1)
                goto out;

        entry_cache = KMEM_CACHE(ratelimiter_entry, 0);
        if (!entry_cache)
                goto err;

        /* xt_hashlimit.c uses a slightly different algorithm for ratelimiting,
         * but what it shares in common is that it uses a massive hashtable. So,
         * we borrow their wisdom about good table sizes on different systems
         * dependent on RAM. This calculation here comes from there.
         */
        table_size = (totalram_pages() > (1U << 30) / PAGE_SIZE) ? 8192 :
                max_t(unsigned long, 16, roundup_pow_of_two(
                        (totalram_pages() << PAGE_SHIFT) /
                        (1U << 14) / sizeof(struct hlist_head)));
        max_entries = table_size * 8;

        table_v4 = kvcalloc(table_size, sizeof(*table_v4), GFP_KERNEL);
        if (unlikely(!table_v4))
                goto err_kmemcache;

#if IS_ENABLED(CONFIG_IPV6)
        table_v6 = kvcalloc(table_size, sizeof(*table_v6), GFP_KERNEL);
        if (unlikely(!table_v6)) {
                kvfree(table_v4);
                goto err_kmemcache;
        }
#endif 

@gitlankford
Copy link
Author

Thanks @zx2c4 - I am happy to experiment with that sort of dynamic scaling, but will be less confident about being able to do it up to your standards given my current limited GO experience. Perhaps that could be a future workstream?

In my experiments with different values, I found that when the PreallocatedBuffersPerPool was too high, the GC would be working overtime until oom-kill, and when it was "just right" the GC would only run every 2 minutes in the forced GC run. (an indirect measurement of the memory situation)

The other thing to note is that the router I'm running this on is dedicated to running this VPN, so the memory usage is otherwise extremely stable. That is probably the the most common setup for devices of this size (also: noswap).

@qdm12
Copy link

qdm12 commented May 10, 2024

Any news on this? It looks like someone had the issue as well over here: qdm12/gluetun#2036

@shaheerem
Copy link

Any ETA when will this be merged or the other solution mentioned by @zx2c4 implemented?
This increased our memory consumption for 10.5k peers upto 18GB, whereas after setting the buffer size to 4096 (as it is also being set in the android version) it went down to 2GB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants