-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
out of memory in shuffle stage #95
Comments
I've always been wondering how a shuffle stage would be able to handle all entities in memory. |
Thanks @ZiglioNZ |
I see but the mapper are many instances and the shuffler just one, right? |
Yes, strangely shuffle is running only a single instance. |
Oh, that wasn't clear. Anyway, the point is the same: one single instance has to handle lots of things in memory. Not sure how to solve that but it's a problem I'm gonna face: I need to serialize a large number of entities in time order and I'm not sure whether the mapper could partition the query in a way that preserves the order, so a shuffler is probably needed. Hopefully I'll make some progress this week. |
@ZiglioNZ What I don't understand is, why is this task not getting distributed on multiple instances? A shuffler has the same number of shards as the number of output files in the mapper. Since that number is 5000 in my case, I can see 5000 shards for the shuffler, on the mapreduce status page. But still, only one instance is used to run the shuffle-hash phase. That is not very clear to me. |
See my reply on StackOverflow Arie | Ozarov | [email protected] | 415-624-6429 On Mon, Jun 6, 2016 at 2:20 AM, Mayank Bhagya [email protected]
|
Thanks, luckily I use Java then ;-) |
I have ~50M entities stored in datastore. Each item can be of one type out of a total 7 types.
Next, I have a simple MapReduce job that counts the number of items of each type. The Mapper emits (type, 1). The reducer simply adds the number of 1s received for each type.
When I run this job with 5000 shards, the map-stage runs fine. It uses a total of 20 instances which is maximum possible based on my task-queue configuration.
However, the shuffle-hash stage makes use of only one instance and fails with an out-of-memory error. I am not able to understand why only one instance is being used for hashing and how can I fix this out-of-memory error in /mapreduce/kickoffjob_callback/15714734************.
I have tried writing a combiner but I never saw a combiner stage on the mapreduce status page or in the logs.
The text was updated successfully, but these errors were encountered: