Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This fixes an issue where the regexes near the start of
getXMLPage
kill the process if it runs out of memory.As described in the comment, we do the
.split
before we do the regexes because if regexes run out of memory the whole process tanks, whereas if.split
runs out of memory it just throws aMemoryError
we can catch and deal with.This lets us download larger pages without
dumpgenerator.py
just dying unexpectedly.note: I think this may also be affecting us in another spot as well. I'll check and see whether this sort of fix would fix this other issue I'm running into as well.
edit: Now that I'm back home, been looking and it may actually have something to do with the Linux OOM killer. Time to do more research and maybe hopefully find out how to get it to except rather than kill us!