This small python project allows you to convert ZIM files, as found in the Kiwix WikiPedia library, to a SQLite database that is read by the WikiReader plugin of KOReader I am building.
Default is a single threaded conversion, but you can specify --num-cores 4
to use more cores and thus speed up conversion.
I created this plugin for KoReader during sometime off: koreader/koreader#9534 It has not been merged yet and probably never will be, but I am using it myself and is fairly stable and works for me.
If you have no experience programming or simply want a database file that is preconverted, that can be found here. This database contains 114303 popular articles of english wikipedia as of september 2022.
Conversion is pretty fast, the above database is converted from ZIM to my SQLite based format in about 1 to 3 minutes depending on the number of cores on my laptop.
You can download a dump of WikiPedias most popular articles from their servers, or use a mirror like this one. I recommend using a dump starting with wikipedia_en_top_nopic
.
On the command line you could for example do this:
wget -O wikipedia.zim http://ftp.acc.umu.se/mirror/wikimedia.org/other/kiwix/zim/wikipedia/wikipedia_en_top_nopic_2022-09.zim
# First install the 2 dependencies with pip:
pip install -r requirements.txt
# Then run the command line interface like this:
python3 --zim-file ./wikipedia.zim --output-db ./zim_articles.db
Then simply transfer this .db
file to a storage medium KOReader can access, and set it as the database in the plugin menu.
You can manually install the 2 dependencies and just run the python file with appropriate arguments. But if needed
you can also build and run the docker if preferred, example when the zim file is called wikipedia.zim
in the current dir:
docker build --tag zim-converter .
docker run --rm -it -v $(pwd):/project zim-converter --zim-file /project/wikipedia.zim --output-db /project/zim_articles.db