-
-
Notifications
You must be signed in to change notification settings - Fork 798
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Django Channels Memory Leak on every message or connection #2094
Comments
Does the same thing happen with other protocol servers, such as hypercorn and Daphne? |
I've tested The interesting thing is, while Here are the commands I used for each:
|
And can you see from any of the tools, memray perhaps, which objects are consuming the memory? (I'd expect a |
@cacosandon Also, can you try with the PubSub layer, and see if the results are different there? Thanks. |
Sure! I'll try to find time today to prepare a report on |
So I tried multiple combinations. All HTML reports from But below there are screenshots from them. First, tried with Redis Channels (not PubSub) to get memory leaks. With
So, the leaks report include memory that was never released back, but I don't know how to interpret it correctly. Seems like
Here is the screenshot of the
Then tried with
The interesting part is that
Then, I tried with garbage collect for
And finally tried with
Just in case, I also removed all Hope all these reports help understanding the constant memory increase. Right now I am trying to move my application to |
I've made it to make For some reason the websocket messages that were bytes-only were sent as Added a PR for that: #2097 async def websocket_receive(self, message):
"""
Called when a WebSocket frame is received. Decodes it and passes it
to receive().
"""
- if "text" in message:
+ if "text" in message and message["text"] is not None:
await self.receive(text_data=message["text"])
else:
await self.receive(bytes_data=message["bytes"]) Testing now in staging 🤞 |
Hi @cacosandon Looking at the uploaded report, for e.g. The |
Hey @carltongibson, thank you for taking a look. Yep, but if you zoom in On the other hand, |
@carltongibson, do you have any clue about what's happening? Or what else can I try? I'm willing to try anything! |
@cacosandon Given that you report it happening with the pub sub layer and different servers, not really. You need to identify where the leak is happening. Then it's possible to say something. |
@carltongibson all my samples are from using Some things I've noticed:
I don't know how nobody else is having this problem. Maybe they just don't send large messages 🤔 |
Hi @cacosandon — are you able to identify where the leak is happening? Alas, I haven't had time to dig into this further for you. Without that it's difficult to say too much. If you can identify a more concrete issue, there's a good chance we can resolve it. |
@carltongibson no :( that's actually the thing that I'm struggling on: finding the memory leak 😓 I really tried every tool to detect it, but nothing noticeable or strange in the reports.. |
I wouldn't assume that. 😉 I've been silently watching and hoping you find more than I did when I looked. We had some success changing servers from Here are some other things I've watched: |
@mitgr81 what tools do you use to monitor and restart? For now I would love to implement that. Will take a look on those resources! |
We're rocking a bespoke monitor for docker containers. It's pretty simple; essentially we label each container with a valid restart time and a memory limit (among other rules); and the "container keeper" looks for them. |
@cacosandon - Just curious if you've had any more luck than I have on this. |
Hey @mitgr81! No updates yet. We're using garbage collection on every message or new connection. This has helped a bit, but the memory still slowly increases and hits the max in about a week. We usually deploy and restart automatically the machines 2-3 times a week, which temporarily fixes the issue. I hope the Channels team can look into this to see if it's a general problem with memory leaks. cc @carltongibson |
We're running into the same issue. Daphne process used up over 50gigs of RAM on our server before it crashed. |
Hi @cacosandon, Are you using uvicorn or daphne in production? or hypercorn? |
Hey! uvicorn for now. |
@cacosandon: What are the variables you haven't changed? It sounds like you've swapped everything out (including your application's business logic) and the problem still exists which is troubling. Have you tried simplifying the code down until the problem doesn't exist? AIUI from this thread the channel-layer concept seems to be the cause, but have you tried to stub out the channel-layer code in various ways to see where the problem originates? (if it's not the channel-layer, then the same principle applies: just keep axing code until you've got the simplest program possible that still repros the problem) (investing in a test harness that artificially generates problematic conditions might aid in discovering the problem by speeding up the testing cycle, if you haven't already done so) |
Hey @bigfootjon! Running this repository: https://github.com/cacosandon/django-channels-memory-leak, you'll notice the memory leaks. If you remove the sending of large messages, then the problem disappears unless you open/close connections fast enough to make the memory go up again. It's literally the basic setup of Django Channels, so I don't know what else I should remove. I think the next step is going deeper into Django Channels source code and start modifying things there. I don't have a lot of time now to do this, so we have mitigated it by monitoring and restarting our servers (for now). |
If you find some time to investigate I think removing code from channels is the right approach. If the memory charts aren’t doing it then the opposite (finding ways NOT to allocate memory) is the only path forward |
I tried to investigate this and in my tests I found an interesting hint: changing the "serializer" changes the memory footprint by a lot. Since both import json
import random
from channels_redis.core import RedisChannelLayer as _RedisChannelLayer
class RedisChannelLayer(_RedisChannelLayer):
### Serialization ###
def serialize(self, message):
"""
Serializes message to a byte string.
"""
message = json.dumps(message).encode('utf-8')
if self.crypter:
message = self.crypter.encrypt(message)
# As we use an sorted set to expire messages we need to guarantee uniqueness, with 12 bytes.
random_prefix = random.getrandbits(8 * 12).to_bytes(12, "big")
return random_prefix + message
def deserialize(self, message):
"""
Deserializes from a byte string.
"""
# Removes the random prefix
message = message[12:]
if self.crypter:
message = self.crypter.decrypt(message, self.expiry + 10)
return json.loads(message.decode('utf-8')) As you can see the memory in the JSON test return back to a "normal" level (there is still some memory which was not released, but much less than with msgpack). I tested this on python 3.10 inside an alpine-docker container. Also seems that there is a problem with msgpack and python 3.12: msgpack/msgpack-python#612 I would like to have more time to perform furhter tests and learn how to better use the memory profile, for the moment I hope that this may help someoneelse to find a solution. |
Cross linking #1948 which is a long-standing known memory leak in channels. |
Thanks @acu192. You're absolutely right. There's an unresolved chain of thought, and a likely fix sat there (for life reasons on my part I suppose.) @cacosandon if you could test the linked PRs and feedback, that would help greatly. |
Hey! Yes, would love to help. Tried and raised some errors, surely because it's outdated and needs a rebase. Will ping him in the PR! |
@cacosandon do note there's two related PRs. One for channels and one for channels-redis. You'll need to apply them both. |
@carltongibson Already rebased both
and these are the results for Uvicorn with PubSub: It seems the problem persists. I believe @sevdog's investigation around the serializer is likely the root cause, given its generic nature (whether using PubSub or not, and regardless of uvicorn, daphne, or hypercorn, even with a minimal example). I can test with other settings later. Let me know! |
@cacosandon OK, thanks for trying it. As to root cause, I still need to get a minimal reproduce nailed down here, but yes maybe... We're getting closer to it I suppose 😅 |
@cacosandon, if you have some spare cycles could you try switching to the json serializer that @sevdog whipped up? django/channels_redis#398 |
hey! this is the test I'm running:
async def receive(self, text_data):
content = await self.decode_json(text_data)
# Send message to room group
await self.channel_layer.group_send(
self.room_group_name, {"type": "chat.message", "content": content}
)
for i in range(500):
# Create a struct of variable Mb from 1 to 5
struct = bytearray(1024 * 1024 * random.randint(1, 5))
message = bytes(struct)
await self.channel_layer.group_send(
self.room_group_name, {"type": "chat.binary", "message": message}
)
print(f"Sent message {i + 1} of 500")
del struct
gc.collect() here is the repo: https://github.com/cacosandon/django-channels-memory-leak/blob/main/chat/consumers.py Screen.Recording.2024-09-22.at.12.59.41.1.movhere are the results: with original
|
We have detected a memory leak in the websocket application. An issue exists on the django channels github repo about this topic. We also detected that this memory leak can have side effect on the asgi application, the application can become slower leading to timeout. To remove this side effect we decided to create a deploy dedicated to the websocket application and nginx is reponsible to use the right backend based on the request path. django/channels#2094
We have detected a memory leak in the websocket application. An issue exists on the django channels github repo about this topic. We also detected that this memory leak can have side effect on the asgi application, the application can become slower leading to timeout. To remove this side effect we decided to create a deploy dedicated to the websocket application and nginx is reponsible to use the right backend based on the request path. django/channels#2094
"Hello! Do you mean to use 'pubsub redis' and set 'serializer_format': 'json' to fix the issue?" |
Hi, is this happening in a rare edge case only or does that mean that every project that uses django channels will end up with a memory leak? I wanted to use channels for a new project but now I am unsure if that issue will apply to my project as well. Thx. |
@ceelian we're still waiting for someone to fully diagnose it. The trouble with these things is they're very set up specific. It seems to be an issue for folks pushing significant amounts of data through the channel layer. If you're using the channel layer for simple messaging it shouldn't be an issue. (If you have large data packages, you can always put them in a blob storage for collection, saving the channel layer just for notifications.) Plenty of folks are Channels successfully. |
@carltongibson thank you very much for the quick response and the explanation. |
even if you have small messages you will face this problem with memory, it just make it more slower to raise limits |
@lesc6 that's concerning, how do you/should we manage this in production systems? |
I'm having a memory leak in Django Channels using
uvicorn
.Every "memory crash" is a restart/deploy 👇
This not just happens within my project, but also with the tutorial basic chat example.
Here is the repository with that minimal example and memory profiling: https://github.com/cacosandon/django-channels-memory-leak
This happens locally, in the server, with/without
DEBUG
, just by reconnecting or sending messages (in the example I've added large messages so you can notice the memory leak).The memory is never released.. even if the user disconnects after.
I've proved it with
memory-profiler
andmemray
(both commands were added in the README so you can reproduce)Dependencies:
I (think that) have really tried everything; deleting objects, manual garbage collection, etc. Nothing prevents the memory to increase and to never be released back. Any insights? 🙏
The text was updated successfully, but these errors were encountered: