kiwix - off-line wikipedia

kiwix - off-line wikipedia

Boruch Baum boruch_baum at gmx.com
Thu Mar 7 23:22:19 IST 2019


I would normally post the core of this message to somewhere
liwix-specific, but I find their project infrastructure and organization
are so poor that I don't see any web-searchable resource for it.

The kiwix[1] idea is great, and seems to be at least semi-officially
supported by the wikimedia foundation. You can download a huge selection
of wikimedia databases[2], including the many Hebrew ones, and use their
local cross-platform GUI program or their localhost web-server to view.
Works great...

... except for the database that everyone really wants, the complete english
file 'wikipedia_en_all_novid_2018-10.zim'[3], which is probably very
frustrating for all the people who spent hours or days downloading the
79 gigabyte file.

Here are the issues, and the fix.

Once the file is downloaded and checked against the md5sum link[4] on the
download page, the kiwix program will indicate that the file is not a
valid zim file. This seems to be because it was prepared with an
incompatible zim version, but the only real reason is a single flipped
bit in the file's "magic number". Change the bit, and the hours or days
you spent downloading the file have been redeemed. Use your favorite hex
editor to change the value of byte number five from '06' to '05'.In
linux, to observe the issue, compare the file to a recognized zim file
using 'xxd -l 5 [filename]'. And, when you perform the edit using your
favorite hex editor, you probably don't want to rename the output file,
since that would change the duration of the operation from instantaneous
to however long it takes your system to copy 79 gigabytes. What worked
for me was the debian ncurses program 'hexeditor'.

Now you can use the kiwix GUI program to open the file and add it to
your library.

A second and very general issue is the quality of the project's
documentation. For the purposes of this post, what's relevant is that in
order to use the kiwix localhost web-server for a database, one seems to
need to first open that zim file using the kiwix GUI program, which adds
the database to the kiwix 'library'. Then you can start / restart the
web-server and view / search the data.

A third issue is a hiccup that seems to be misleading or just plain
wrong in an error message produced by the kiwix web server. It will
complain that it cannot open the search index for the 79Gb file.
However, it does seem to perform "search ahead"s for that database just
fine. Go figure.

All in all, this has become a project that I love to hate. A great idea,
with just enough awful in the code and in the support to remind me what
it was like to be a computer user in the 1980's.

references:
[1] https://kiwix.org/
[2] https://wiki.kiwix.org/wiki/Content_in_all_languages
[3] http://download.kiwix.org/zim/wikipedia_en_all_novid.zim.torrent
    Really. for a file this large, use the bittorrent option.
[4] http://download.kiwix.org/zim/wikipedia_en_all_novid.zim.md5

-- 
hkp://keys.gnupg.net
CA45 09B5 5351 7C11 A9D1  7286 0036 9E45 1595 8BC0



More information about the Linux-il mailing list