Skip to content
This repository has been archived by the owner on Feb 13, 2021. It is now read-only.

GC overhead limit when mining wikipedia and extracting anchor text #18

Open
shubhamagarwal92 opened this issue May 29, 2018 · 2 comments

Comments

@shubhamagarwal92
Copy link

Hi

I am following the steps provided here to train my model.

I have pre-processed the datapack. But when I am trying to "Build Data Structures and extract anchor text", I am having this GC overhead issue.

screen shot 2018-05-29 at 09 14 53

I have even increased the MAPRED and HADOOP memory to 15G and even provided opts for
Dmapreduce.reduce.java.opts and Dmapreduce.reduce.memory.mb

My system has 8 cores 32 GB, using java 8. This is the snippet of command that I am following.

hadoop \
jar target/FEL-0.1.0-fat.jar \
com.yahoo.semsearch.fastlinking.io.ExtractWikipediaAnchorText \
-Dmapreduce.map.env="JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64" \
-Dmapreduce.reduce.env="JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64" \
-Dyarn.app.mapreduce.am.env="JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64" \
-Dmapred.job.map.memory.mb=15144 \
-Dmapreduce.map.memory.mb=15144 \
-Dmapreduce.reduce.memory.mb=15144 \
-Dmapred.child.java.opts="-Xmx15g" \
-Dmapreduce.map.java.opts='-Xmx15g -XX:NewRatio=8 -XX:+UseSerialGC' \
-Dmapreduce.reduce.java.opts="-Xmx15g -XX:NewRatio=8 -XX:+UseSerialGC" \
-input wiki/${WIKI_MARKET}/${WIKI_DATE}/pages-articles.block \
-emap wiki/${WIKI_MARKET}/${WIKI_DATE}/entities.map \
-amap wiki/${WIKI_MARKET}/${WIKI_DATE}/anchors.map \
-cfmap wiki/${WIKI_MARKET}/${WIKI_DATE}/alias-entity-counts.map \
-redir wiki/${WIKI_MARKET}/${WIKI_DATE}/redirects

Could you please suggest why this might be happening?

Pardon me as I am novice to hadoop and java

@shubhamagarwal92
Copy link
Author

@aasish Could you please comment as to how should I resolve this?

@shubhamagarwal92
Copy link
Author

FYI, I solved the issue with this shell script. README needs to be updated.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant