Skip to content
rgaudin edited this page Oct 26, 2021 · 2 revisions

openZIM CMS

We distribute thousands of ZIM files and dozens of update/new ones each day. All are created by the Zimfarm and published more or less automatically on https://download.kiwix.org. The catalog is automatically updated once a day at https://download.kiwix.org/library/library_zim.xml for the static version and at https://library.kiwix.org (OPDS) for the dymamic/API/service version (based on library_zim.xml).

This works fine but the automatic nature of the process limits us in several ways like toggling publication of a ZIM, ensuring Q&A, fine-tuning metadata or integrating with partners.

GOAL

The Kiwix CMS is the publishing counter-part of the Zimfarm. Zimfarm Zimfarm allows non-tech people to manage ZIM creation ; CMS is a Web tool allowing non-tech people to handle publishing of ZIM files.

Publishing is all about managing the central catalog/library: how it is presented and where it is made available. It is neither about creation nor distribution.

Concepts

The CMS core concept is the Title, as an entry in a Library. The Title describes a piece with a Name, Title, Description, Location and a few other metadata.

ZIM Catalog lists Books which are representations of the Title, as a ZIM files. Titles can be split across different flavors: with pictures, with videos, introduction only, etc. Each Book can also have multiple dated revisions.

Each ZIM file Book should be attached to a Title and this process of matching a Book to a Title is called “reconciliation”. Reconciliation would normally happen automatically via the the ZIM Name metadata. Should it fail (there are many reasons for it to), it will be done manually. Each ZIM file (Book/revision) is identified by its UUID.

Beside the core Title management, the CMS has two other types of important concepts:

  • Ingesters are responsible for bringing books (and optional associated metadata) into the CMS. First one will be the Zimfarm ingester, to work off Zimfarm generated ZIM files. Another one would allow manually adding a ZIM file.
  • Digesters are responsible for exporting views of the CMS database. First one will be the one generating the library.xml file for the central Cataglog with latest revisions of published Books. Other ones could generate an RSS feed, or an XLS export of a subset of the data.

Additional modules

  • A notification handler
  • A “garbage collector” in charge of removing old revisions and detecting incoherence like: revisions of the same title in two different directories, other kind of oddities.

Features

  • Minimal User mgmt + auth, only two roles: anonymous, publisher.
  • First-time ingestion of book collection from library.xml
  • Books CRUD
  • Metadata updates: Popularity, category, etc.
  • Manually publish/un-publish a book
  • Manually publish/un-publish a flavor (per book)
  • Manually publish/un-publish a ZIM file
  • Move storage location (only for local files)
  • Publish a ZIM automatically for files stored in HTTP, LOCAL, S3 (call to API from Zimfarm)
  • Define Book QA constraints (see zimcheck json output)
  • Check Book QA against constraints if required
  • Export catalog in XML (library_zim.xml)
  • Book Overview with basic search/filter
  • Logs for publish/un-publish actions with basic search/filter
  • Corresponding atom feed

Technical Stack

  • Python 3.8+
  • MariaDB server
  • Flask API/backend
  • Properly unit tested
  • OpenApi spec
  • Vue.js frontend
  • Common code formatting and QA tools (codefactor, codecov)
  • Dockerized
  • CI with Github Actions
  • CD with Github Actions and ghcr
Clone this wiki locally