Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Socket leak #434

Open
jyounggo opened this issue Oct 12, 2022 · 43 comments
Open

Socket leak #434

jyounggo opened this issue Oct 12, 2022 · 43 comments
Labels

Comments

@jyounggo
Copy link

jyounggo commented Oct 12, 2022

Bug description

We are running z2jh (https://z2jh.jupyter.org/en/stable/) and found there is a socket leak in proxy pod.
The number of socket is constantly increasing (over 60k) and the kernel generates an error after a week ,kernel: TCP: out of memory -- consider tuning tcp_mem.

I have checked the number of sockets using lsof.

/srv/configurable-http-proxy $ lsof
1	/usr/local/bin/node	socket:[48829679]
1	/usr/local/bin/node	socket:[48829681]
1	/usr/local/bin/node	socket:[48825415]
1	/usr/local/bin/node	socket:[48825417]
1	/usr/local/bin/node	socket:[48829792]
1	/usr/local/bin/node	socket:[48829790]
1	/usr/local/bin/node	socket:[48829783]
1	/usr/local/bin/node	socket:[48829785]
/srv/configurable-http-proxy $ lsof | wc -l
64708
/srv/configurable-http-proxy $ lsof | wc -l
64719

Your personal set up

The config.yaml related to proxy

proxy:
  secretToken: xxxx

  service:
    loadBalancerIP: x.x.x.x
  https:
    enabled: true
    hosts:
     - "exmaple.com"
    letsencrypt:
      contactEmail: "[email protected]"
  chp: # proxy pod, running jupyterhub/configurable-http-proxy
    livenessProbe:
      enabled: true
      initialDelaySeconds: 60
      periodSeconds: 20
      failureThreshold: 10 # retry 10 times before declaring failure
      timeoutSeconds: 3
      successThreshold: 1
    resources:
      requests:
        cpu: 1000m # 0m - 1000m
        memory: 5000Mi # Recommended is 100Mi - 600Mi -- we seem to run at 4.3GB a lot
  traefik: # autohttps pod (optional, running traefik/traefik)
    resources:
      requests:
        cpu: 1000m # 0m - 1000m
        memory: 512Mi # 100Mi - 1.1Gi
  secretSync: # autohttps pod (optional, sidecar container running small Python script)
    resources:
      requests:
        cpu: 10m
        memory: 64Mi
@jyounggo jyounggo added the bug label Oct 12, 2022
@welcome
Copy link

welcome bot commented Oct 12, 2022

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

@manics
Copy link
Member

manics commented Oct 12, 2022

This sounds similar to #388

Would you mind adding your comment on that issue and closing this one to avoid duplicates? Thanks!

@yuvipanda
Copy link
Contributor

@manics that looks like a memory leak while this is a socket leak. Should be different no?

@yuvipanda
Copy link
Contributor

We just saw a node with a lot of proxy pods spew a lot of out of memory -- consider tuning tcp_mem while a lot of the proxies stopped working. Could be related?

@consideRatio

This comment was marked as resolved.

@minrk
Copy link
Member

minrk commented Feb 6, 2023

ws release is unlikely to be relevant, since it's only used in tests.

@consideRatio
Copy link
Member

I think the node version could be important debugging info because we are relying on node library's http(s) server here:

// proxy requests separately
var proxyCallback = logErrors(this.handleProxyWeb);
if (this.options.ssl) {
this.proxyServer = https.createServer({ ...this.options.ssl, ...httpOptions }, proxyCallback);
} else {
this.proxyServer = http.createServer(httpOptions, proxyCallback);
}
// proxy websockets
this.proxyServer.on("upgrade", bound(this, this.handleProxyWs));
this.proxy.on("proxyRes", function (proxyRes, req, res) {
that.metrics.requestsProxyCount.labels(proxyRes.statusCode).inc();
});

CHP node Date
4.6.1 v20.10.0 Nov 27 2023
4.6.0 v18.18.2 Sep 19 2023
4.5.6 v18.17.1 Aug 10 2023
4.5.5 v18.15.0 Apr 3 2023
4.5.4 v18.12.1 Dec 5 2022
4.5.3 v16.17.0 Sep 9 2022
4.5.2 v16.17.0 Aug 19 2022
4.5.1 v16.13.2 Feb 3 2022
4.5.0 v14.17.3 Jul 19 2021

@consideRatio
Copy link
Member

I think this remains a problem in 4.6.1 using node 20 in on GKE with linux kernel 5.15 based on info from @shaneknapp.

@consideRatio
Copy link
Member

CHP has multiple node HTTP servers working in parallell, one for its own REST API, one for proxying, one for metrics.

It would be good to conclude if the growing tcp memory / sockets etc are associated with a specific instance of these.

@consideRatio
Copy link
Member

Looked at one CHP process and saw for example...

# cd /proc/<pid of chp process>
cat net/sockstat
sockets: used 556
TCP: inuse 174 orphan 2 tw 36 alloc 1572 mem 40536
UDP: inuse 0 mem 6
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory 0

This wasn't expected to have anything close to 500 open connections or similar, so I think its very safe to say that this reproduces. This is from latest chp running with node 20 on linux kernel 5.15 nodes.

@consideRatio
Copy link
Member

I'm not sure when I expect a socket to be closed. When it times out based on a "timeout" option? I think the timeout option may be infinite.

Is the issue that there is simply nothing that makes us destroy sockets once created, because we default to an infinite timeout?

@consideRatio
Copy link
Member

@minrk and others, is it safe to change a default for a timeout value here to something quite extreme, like 24 hours? I don't want us to disrupt users that are semi active and runs into issues at the 24th hour - but they wouldn't as long as they are semi-active right?

We have two timeout args matching the node-http-proxy options at https://github.com/http-party/node-http-proxy?tab=readme-ov-file#options, timeout and proxyTimeout - should we try setting them to 24 hours both, or just one?

@consideRatio
Copy link
Member

consideRatio commented Feb 2, 2024

There is also a related issue reported in node-http-proxy - http-party/node-http-proxy#1510.

The node-http-proxy was foked and had that issue fixed with a one line commit at Jimbly/http-proxy-node16@56283e3, to use a .destroy function instead of a .end function. I figure the difference may be that .end allows for re-use of a process file descriptor perhaps?

Looking closer at that fork, there is also another memory leak fixed in another commit according to the commit message: Jimbly/http-proxy-node16@ba0c414 . This is detailed in a PR as well: http-party/node-http-proxy#1559

Those two memory fix commits are the only actual fixes in the fork, where the rest is just docs etc.

Maybe we should do a build of chp based on the forked node-http-proxy project and push a tag that users can opt into? Like jupyterhub/configurable-http-proxy:4.6.1-fork image?

@consideRatio
Copy link
Member

I pushed jupyterhub/configurable-http-proxy:4.6.1-fork and quay.io/jupyterhub/configurable-http-proxy:4.6.1-fork (amd64 only, not aarch64 also) where I just updated package.json to reference node-http-proxy16 instead and made it be imported in a .js file (commits seen here: https://github.com/consideRatio/configurable-http-proxy/commits/main/).

If someone wants to try if this fork help, just reference 4.6.1-fork instead as a image tag.

@manics
Copy link
Member

manics commented Feb 3, 2024

Nice research! I don't think we should have a separate jupyterhub/configurable-http-proxy:4.6.1-fork image, it adds confusion.

If we think the fork is production ready then we should either switch CHP to use it if it's fully trusted, or vendor it if it's not (as previously discussed in #413 (comment))

@shaneknapp
Copy link

shaneknapp commented Feb 5, 2024

i'll be testing out jupyterhub/configurable-http-proxy:4.6.1-fork in some of our smaller hubs this week and will report back!

edit: here are my changes in case anyone wants to see them!

berkeley-dsep-infra/datahub#5501

@minrk
Copy link
Member

minrk commented Feb 5, 2024

If it works, 👍 to just switching the dependency and publishing a new release, without any -fork or other unusual tags.

I think the sustainable longer-term solution is to vendor http2-proxy, which I started here but haven't had time to finish.

It would be great to be able to have some actual tests to exercise these things, since it's been hard to control or verify.

@shaneknapp
Copy link

ok, i identified the most problematic hub... the chp pod been getting OOMKilled and stack tracing at least every 2-3 days. i just deployed the test fork of the chp pod to it and will keep an eye on things over the rest of the week.

in other news, this fix seems (read: seems) to use less tcp memory than before. it's tough to say for certain, but at the very least w/my latest deployment on the problematic hub i'll have something that mildly on fire to watch, vs the others i've deployed on much less trafficked hubs.

🤞

@shaneknapp
Copy link

shaneknapp commented Feb 7, 2024

ok, i identified the most problematic hub... the chp pod been getting OOMKilled and stack tracing at least every 2-3 days. i just deployed the test fork of the chp pod to it and will keep an eye on things over the rest of the week.

in other news, this fix seems (read: seems) to use less tcp memory than before. it's tough to say for certain, but at the very least w/my latest deployment on the problematic hub i'll have something that mildly on fire to watch, vs the others i've deployed on much less trafficked hubs.

🤞

womp womp. that pod has restarted three times in the past hour after deploying the -fork chp:

[Wed Feb  7 20:59:45 2024] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
[Wed Feb  7 20:59:45 2024] Call Trace:
[Wed Feb  7 20:59:45 2024]  <TASK>
[Wed Feb  7 20:59:45 2024]  dump_stack_lvl+0x4c/0x67
[Wed Feb  7 20:59:45 2024]  dump_header+0x53/0x240
[Wed Feb  7 20:59:45 2024]  oom_kill_process+0x10e/0x1d0
[Wed Feb  7 20:59:45 2024]  out_of_memory+0x496/0x5a0
[Wed Feb  7 20:59:45 2024]  ? mem_cgroup_iter+0x213/0x280
[Wed Feb  7 20:59:45 2024]  try_charge_memcg+0x7dd/0x9f0
[Wed Feb  7 20:59:45 2024]  charge_memcg+0x42/0x1a0
[Wed Feb  7 20:59:45 2024]  __mem_cgroup_charge+0x2d/0x80
[Wed Feb  7 20:59:45 2024]  handle_mm_fault+0x1088/0x15d0
[Wed Feb  7 20:59:45 2024]  do_user_addr_fault+0x279/0x4f0
[Wed Feb  7 20:59:45 2024]  exc_page_fault+0x78/0xf0
[Wed Feb  7 20:59:45 2024]  asm_exc_page_fault+0x22/0x30
[Wed Feb  7 20:59:45 2024] RIP: 0033:0x7e707bb07081
[Wed Feb  7 20:59:45 2024] Code: 48 ff c1 eb f3 c3 48 89 f8 48 83 fa 08 72 14 f7 c7 07 00 00 00 74 0c a4 48 ff ca f7 c7 07 00 00 00 75 f4 48 89 d1 48 c1 e9 03 <f3> 48 a5 83 e2 07 74 05 a4 ff ca 75 fb c3 48 89 f8 48 29 f0 48 39
[Wed Feb  7 20:59:45 2024] RSP: 002b:00007e707af865a8 EFLAGS: 00010202
[Wed Feb  7 20:59:45 2024] RAX: 000000fbf7993c48 RBX: 0000000000000044 RCX: 0000000000000050
[Wed Feb  7 20:59:45 2024] RDX: 0000000000000638 RSI: 000027b131f9cc38 RDI: 000000fbf7994000
[Wed Feb  7 20:59:45 2024] RBP: 00007e707af86640 R08: 000000fbf7993c48 R09: 000000fbf7993c40
[Wed Feb  7 20:59:45 2024] R10: 000000fbf7993c41 R11: 000027b131f9c879 R12: 00001fe7f33411c9
[Wed Feb  7 20:59:45 2024] R13: 000000fbf798dd50 R14: 00007e70760e2ed0 R15: 0000000000000640
[Wed Feb  7 20:59:45 2024]  </TASK>
[Wed Feb  7 20:59:45 2024] memory: usage 1048484kB, limit 1048576kB, failcnt 209
[Wed Feb  7 20:59:45 2024] swap: usage 0kB, limit 9007199254740988kB, failcnt 0
[Wed Feb  7 20:59:45 2024] Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod6efce583_ef72_49b2_a3a4_af239a6c15f1.slice:
[Wed Feb  7 20:59:45 2024] anon 982282240
                           file 0
                           kernel_stack 245760
                           pagetables 28696576
                           percpu 46240
                           sock 442368
                           shmem 0
                           file_mapped 0
                           file_dirty 0
                           file_writeback 0
                           swapcached 0
                           anon_thp 0
                           file_thp 0
                           shmem_thp 0
                           inactive_anon 982249472
                           active_anon 8192
                           inactive_file 0
                           active_file 0
                           unevictable 0
                           slab_reclaimable 16552048
                           slab_unreclaimable 44657080
                           slab 61209128
                           workingset_refault_anon 0
                           workingset_refault_file 0
                           workingset_activate_anon 0
                           workingset_activate_file 0
                           workingset_restore_anon 0
                           workingset_restore_file 0
                           workingset_nodereclaim 0
                           pgfault 4469590
                           pgmajfault 0
                           pgrefill 0
                           pgscan 0
                           pgsteal 0
                           pgactivate 1
                           pgdeactivate 0
                           pglazyfree 0
                           pglazyfreed 0
                           thp_fault_alloc 0
                           thp_collapse_alloc 0
[Wed Feb  7 20:59:45 2024] Tasks state (memory values in pages):
[Wed Feb  7 20:59:45 2024] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[Wed Feb  7 20:59:45 2024] [3783484] 65535 3783484      257        1    28672        0          -998 pause
[Wed Feb  7 20:59:45 2024] [3825947] 65534 3825947   410957   250184 29212672        0           999 node
[Wed Feb  7 20:59:45 2024] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=cri-containerd-5f8f41d80b0d3993bdb2aafc8f480cd15cc623f5cb512212d187f99bbe550a13.scope,mems_allowed=0,oom_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod6efce583_ef72_49b2_a3a4_af239a6c15f1.slice,task_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod6efce583_ef72_49b2_a3a4_af239a6c15f1.slice/cri-containerd-5f8f41d80b0d3993bdb2aafc8f480cd15cc623f5cb512212d187f99bbe550a13.scope,task=node,pid=3825947,uid=65534
[Wed Feb  7 20:59:45 2024] Memory cgroup out of memory: Killed process 3825947 (node) total-vm:1643828kB, anon-rss:958732kB, file-rss:42004kB, shmem-rss:0kB, UID:65534 pgtables:28528kB oom_score_adj:999

i'm also seeing a lot of these martian source messages in dmesg with this hub's IP:

[Wed Feb  7 21:17:15 2024] IPv4: martian source 10.20.3.65 from 10.20.3.8, on dev eth0
[Wed Feb  7 21:17:15 2024] ll header: 00000000: 42 01 0a 00 00 1d 42 01 0a 00 00 01 08 00

10.20.3.8 is an nginx-ingress-controller.

and sometimes after it's killed (but not every time) we get the following in dmesg:

[Wed Feb  7 21:36:18 2024] TCP: request_sock_TCP: Possible SYN flooding on port 8000. Sending cookies.  Check SNMP counters.

are we DOSing ourselves somehow?

@shaneknapp
Copy link

i'm also seeing a lot of these martian source messages in dmesg with this hub's IP:

[Wed Feb  7 21:17:15 2024] IPv4: martian source 10.20.3.65 from 10.20.3.8, on dev eth0
[Wed Feb  7 21:17:15 2024] ll header: 00000000: 42 01 0a 00 00 1d 42 01 0a 00 00 01 08 00

10.20.3.8 is an nginx-ingress-controller.

are we DOSing ourselves somehow?

i killed that pod and the martian source messages have stopped... lets see if this helps things.

so. confusing.

@shaneknapp
Copy link

i'm also seeing a lot of these martian source messages in dmesg with this hub's IP:

[Wed Feb  7 21:17:15 2024] IPv4: martian source 10.20.3.65 from 10.20.3.8, on dev eth0
[Wed Feb  7 21:17:15 2024] ll header: 00000000: 42 01 0a 00 00 1d 42 01 0a 00 00 01 08 00

10.20.3.8 is an nginx-ingress-controller.
are we DOSing ourselves somehow?

i killed that pod and the martian source messages have stopped... lets see if this helps things.

so. confusing.

nope. that pod's chp is still getting OOMKilled. time for a break. :)

@consideRatio
Copy link
Member

What is its normal memory use and what its k8s request/limit in memory?

@shaneknapp
Copy link

shaneknapp commented Feb 7, 2024

chp pod is set to have 1Gi ram as the limit and request is 64Mi. it's definitely hitting the limit pretty regularly. edit: we expect normal usage to be ~500Mi.

there's also a corresponding cpu spike when it OOMs:
image

there are ~280 logged in to the hub currently.

@shaneknapp
Copy link

shaneknapp commented Feb 7, 2024

here's what overall tcp mem usage for the core node (as reported from /proc/net/sockstat) looks like:
image

you can see the overall memory used drop significantly each time i deploy the 4.6.1-fork chp to an existing hub.

all hub and proxy pods for any given hub are all on this node. it's a GCP n2-highmem-8.

@shaneknapp
Copy link

shaneknapp commented Feb 7, 2024

i've reverted 4.6.1-fork on that node. it was causing many users to get 503s, which happen during the chp restarting. a quick reload bring their notebooks back but it was seriously disruptive.

that node is still regularly hitting the max memory, but just not as quickly(?) as with 4.6.1-fork. i think.

we don't see this behavior on other chp deployments

i looked at chp pod restarts across our hubs. we're seeing this intermittently across our biggest deployments (both 4.6.1 and 4.6.1-fork... which all run on the same core node).

the smaller usage hub's chps usually run between ~200-300MiB (~200 users max at any time, no massive spikes in logins etc).

the larger hubs run ~400+ MiB but depending on user count, the chp pods eventually run out of memory and are OOMKilled.

today was spent looking in to our largest class, but the other two had definitely been experiencing the same issue to slightly lesser impact. now that i know what to look for, i'll keep an eye on these individual pods' memory usage and know better what to look for in the logs. the biggest takeaway so far is that tcp_mem is king, and if it wasn't for that we'd have had at least one big outage over the past few weeks. throwing ram at the problem isn't always the right solution, but at least we've been able to winnow out some of the impacts of heavy hub usage (and hopefully get some bugs fixed!). :)

@shaneknapp
Copy link

shaneknapp commented Feb 8, 2024

i also suspect the martian source and SYN_FLOOD errors are excellent breadcrumbs leading to bad behavior somewhere(s).

disclaimer: i have a few tabs open and suspect that the former might be something isn't cleaning up when something in the proxy stack goes away. the latter might just be a symptom of this, but i can neither confirm nor deny.

https://en.wikipedia.org/wiki/Martian_packet

so martians are involved somehow, causing network flooding and a DOS?

@shaneknapp
Copy link

shaneknapp commented Feb 8, 2024

current state of the most hub (~175+ users)...
image

it's been an afternoon, that's for sure. ;)

@shaneknapp
Copy link

@consideRatio i've rolled back the 11 hubs that had 4.6.1-fork deployed. it seems like when we had about 200 or more users logged in, the proxy pod's memory would spike and eventually get OOMKilled and restart the configurable-http-proxy process on the pod.

after rolling back to 4.6.1 i can see that the proxy pod's memory usage has much smoothed out compared to before.
image

le sigh. :(

@yuvipanda
Copy link
Contributor

yuvipanda commented Feb 8, 2024

fwiw, here's a similar memory graph from one of the hubs I help run now, which sees about similar usage to some of the big berkeley hubs. This is over the last 30 days.

image

This is just regular 4.6.1

@shaneknapp
Copy link

fwiw, here's a similar memory graph from one of the hubs I help run now, which sees about similar usage to some of the big berkeley hubs. This is over the last 30 days.
This is just regular 4.6.1

i looked at our historical data, and the big hub in question yesterday typically requested ~750MiB at peak-usage times over the past 30 days, but 4.6.1-fork spiked to over 1GiB almost immediately:
image

@consideRatio
Copy link
Member

@shaneknapp is this fork vs non-fork only, or fork+timeout flags vs non-fork?

@shaneknapp
Copy link

shaneknapp commented Feb 9, 2024 via email

@shaneknapp
Copy link

btw i rolled the fork back out to a couple of smaller hubs, minus the timeout settings. everything seems cromulent, but the only way to really test this is to have a lot (200+) of people logging in within a short period of time.

@shaneknapp
Copy link

quick update here: this fix really does look promising. orphaned sockets seem to drop significantly, and memory usage doesn't explode wildly and cause users to receive 500s.

berkeley-dsep-infra/datahub#5501 (comment)

@minrk
Copy link
Member

minrk commented Feb 22, 2024

huzzah! Thank you so much for testing and reporting, @shaneknapp. @consideRatio do you want to switch the dependency to the fork and make a real release of CHP?

@shaneknapp
Copy link

shaneknapp commented Feb 22, 2024

@minrk -- while i'm confident it helps, i'm also confident that it doesn't fix the problem outright.

while we're not getting the spiky and constant OOMKills w/this test fork, there is still a pretty significant memory leak somewhere:
Screenshot 2024-02-22 at 9 08 19 AM

i checked our javascript heap as well (this is in Mi):

/srv/configurable-http-proxy $ node -e 'console.log(v8.getHeapStatistics().heap_size_limit/(1024*1024))'
792

as you can see we're bouncing against that pretty quickly (we had a 3-day weekened this week so the figures are a little smaller than usual):
Screenshot 2024-02-22 at 9 11 01 AM

so there is still a significant memory leak.

maybe we're exposed another bug during the testing... we're also seeing many 503s for ECONNREFUSED on our two biggest hubs. these pop up after che chp has been at the heap limit for a couple of hours and it looks like people's proxy from the core node (w/the hub and chp pods) is disappearing. this is SUPER disruptive and is impacting coursework.

06:47:19.515 [ConfigProxy] �[31merror�[39m: 503 GET /user/<username>/api/kernels/e55827ba-035e-4bc2-94a3-8abc6e763896/channels connect ECONNREFUSED 10.20.0.226:8888

another quick update:

the 503 errors we're getting are appearing on high traffic hubs running both the vanilla and fork versions of the chp. they're appearing in multiples of 30 (30 or 60). i think this behavior might be related to, but not caused by the chp.

@shaneknapp
Copy link

i just deployed the fork to prod for all of our hubs -- the fork seems to be holding up quite well on the high-traffic hubs (>1k users/day, high compute loads), so now we're rolling it out for the rest.

if this continues to squelch the memory spikes/OOMKills for another week i'd feel comfortable giving my thumbs-up to roll the fork in to a new release!

@shaneknapp
Copy link

shaneknapp commented Mar 7, 2024

@consideRatio @minrk

alright, it's been a week and i feel very comfortable in saying that we should definitely roll these changes in to a release branch asap.

it doesn't fix the problem outright (our highest-traffic/load hubs still have one or two chp OOMKills per day w/250+ concurrent users), but it's a significant improvement over vanilla 4.6.1!

i firmly believe that we should still investigate further, and even after deploying 4.6.1-fork, we had another OOMKill/chp outage on march 5th that impacted ~300 users. yesterday, i sat down w/GCP and core node kernel logs, plus grafana/prom data, and put together a minute-by-minute timeline of how things went down. since i'll be on a plane for a few hours today, i'm hoping to get this transcribed from paper to an update on this github issue to help w/debugging.

TL;DR: chp is running out of heap space (792M) under load and going in to a ~45m death spiral before getting OOMKilled.

@shaneknapp
Copy link

quick ping here... is the 4.6.1-fork release going to be merged in to main any time soon?

@minrk
Copy link
Member

minrk commented May 23, 2024

@shaneknapp thanks for the ping. I opened #539, then we can make a release with it.

@manics
Copy link
Member

manics commented Jun 7, 2024

it doesn't fix the problem outright (our highest-traffic/load hubs still have one or two chp OOMKills per day w/250+ concurrent users), but it's a significant improvement over vanilla 4.6.1!

@shaneknapp http-proxy-node16 fixes a change made in NodeJs 15.5+: #539 (review)

Since it improves but doesn't fully fix the issue it might be worth also testing an older CHP image (or building your own) based on NodeJS <=15.4

@manics
Copy link
Member

manics commented Aug 13, 2024

4.6.2 was released 2 months ago with a fix for the leaking sockets. Can we close this issue?

@shaneknapp
Copy link

#388 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants