Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix regex OOM issue on large pages #264

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

DanielOaks
Copy link
Contributor

This fixes an issue where the regexes near the start of getXMLPage kill the process if it runs out of memory.

As described in the comment, we do the .split before we do the regexes because if regexes run out of memory the whole process tanks, whereas if .split runs out of memory it just throws a MemoryError we can catch and deal with.

This lets us download larger pages without dumpgenerator.py just dying unexpectedly.

note: I think this may also be affecting us in another spot as well. I'll check and see whether this sort of fix would fix this other issue I'm running into as well.

edit: Now that I'm back home, been looking and it may actually have something to do with the Linux OOM killer. Time to do more research and maybe hopefully find out how to get it to except rather than kill us!

@nemobis
Copy link
Member

nemobis commented Oct 25, 2015

The change is sane, but we might want to make that replacement faster. In particular, we should iterate over lines and replace them one by one, I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants