Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

null pointer exception executing the tool with bavarian wiki #7

Open
renepickhardt opened this issue Oct 14, 2015 · 4 comments
Open

Comments

@renepickhardt
Copy link

hey Marcus I tried

git clone
mvn compile
mvn package

so far so good (ok I had a little trouble figuring out that the easiest way to respect external dependencies is switching to the target directory and running from there)

then

cd target
wget https://dumps.wikimedia.org/barwiki/20151002/barwiki-20151002-pages-articles-multistream-index.txt.bz2
wget https://dumps.wikimedia.org/barwiki/20151002/barwiki-20151002-pages-articles-multistream.xml.bz2

when I now run
`java -jar wikiforia-1.2.1.jar -pages barwiki-20151002-pages-articles-multistream.xml.bz2 -output res.xml``

I receive the following output:

[2015-10-14 15:14:55.728 | main | INFO  | se.lth.cs.nlp.wikiforia.App] Wikiforia v1.2.1 by Marcus Klang
Exception in thread "main" java.lang.NullPointerException
    at se.lth.cs.nlp.mediawiki.parser.MultistreamBzip2XmlDumpParser.toString(MultistreamBzip2XmlDumpParser.java:480)
    at se.lth.cs.nlp.wikiforia.Pipeline.run(Pipeline.java:73)
    at se.lth.cs.nlp.wikiforia.App.convert(App.java:239)
    at se.lth.cs.nlp.wikiforia.App.main(App.java:413)

looking at

return String.format("Multistreamed Bzip2 XML Dump parser { \n * Threads: %s, \n * Batch size: %s, \n * Index: %s, \n * Pages: %s, \n * Basepath: %s \n}",

I see that there must be some class fields not initialized but I didn't go into further debugging.

ls shows me that the file res.xml was created so I assume that passing arguments works and something else in the class field is not correctly set.

Did I do something wrong? Is the tool just not working with bavarian wikipedia? comparing git has I found this in git log

commit 04e80b46ecc1bb487419fb9f831258be78413f07
Author: Marcus Klang <[email protected]>
Date:   Tue Mar 24 11:08:08 2015 +0100

    * Added French, German and Spanish configurations

which made me wonder that my dump could be the reason. Thanks for help!
I am not particularly interested in the bavarian wikipedia but I wanted to test the tool with small data (:

best Rene

@renepickhardt
Copy link
Author

@renepickhardt
Copy link
Author

yeah I have tried the german edition of wikipedia. it also doesn't work.

@thesoulshell
Copy link

I just tried english version of simple wikipedia and got the same result

@tobymao
Copy link

tobymao commented Mar 18, 2016

@thesoulshell you need to use the absolute path... /home/user/enwiki.xml.bz2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants