GC overhead limit when mining wikipedia and extracting anchor text #18

shubhamagarwal92 · 2018-05-29T08:28:45Z

Hi

I am following the steps provided here to train my model.

I have pre-processed the datapack. But when I am trying to "Build Data Structures and extract anchor text", I am having this GC overhead issue.

I have even increased the MAPRED and HADOOP memory to 15G and even provided opts for
Dmapreduce.reduce.java.opts and Dmapreduce.reduce.memory.mb

My system has 8 cores 32 GB, using java 8. This is the snippet of command that I am following.

hadoop \
jar target/FEL-0.1.0-fat.jar \
com.yahoo.semsearch.fastlinking.io.ExtractWikipediaAnchorText \
-Dmapreduce.map.env="JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64" \
-Dmapreduce.reduce.env="JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64" \
-Dyarn.app.mapreduce.am.env="JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64" \
-Dmapred.job.map.memory.mb=15144 \
-Dmapreduce.map.memory.mb=15144 \
-Dmapreduce.reduce.memory.mb=15144 \
-Dmapred.child.java.opts="-Xmx15g" \
-Dmapreduce.map.java.opts='-Xmx15g -XX:NewRatio=8 -XX:+UseSerialGC' \
-Dmapreduce.reduce.java.opts="-Xmx15g -XX:NewRatio=8 -XX:+UseSerialGC" \
-input wiki/${WIKI_MARKET}/${WIKI_DATE}/pages-articles.block \
-emap wiki/${WIKI_MARKET}/${WIKI_DATE}/entities.map \
-amap wiki/${WIKI_MARKET}/${WIKI_DATE}/anchors.map \
-cfmap wiki/${WIKI_MARKET}/${WIKI_DATE}/alias-entity-counts.map \
-redir wiki/${WIKI_MARKET}/${WIKI_DATE}/redirects

Could you please suggest why this might be happening?

Pardon me as I am novice to hadoop and java

The text was updated successfully, but these errors were encountered:

shubhamagarwal92 · 2018-06-01T10:52:18Z

@aasish Could you please comment as to how should I resolve this?

shubhamagarwal92 · 2018-07-18T13:50:00Z

FYI, I solved the issue with this shell script. README needs to be updated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GC overhead limit when mining wikipedia and extracting anchor text #18

GC overhead limit when mining wikipedia and extracting anchor text #18

shubhamagarwal92 commented May 29, 2018

shubhamagarwal92 commented Jun 1, 2018

shubhamagarwal92 commented Jul 18, 2018

GC overhead limit when mining wikipedia and extracting anchor text #18

GC overhead limit when mining wikipedia and extracting anchor text #18

Comments

shubhamagarwal92 commented May 29, 2018

shubhamagarwal92 commented Jun 1, 2018

shubhamagarwal92 commented Jul 18, 2018