-
-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What to test with zimcheck and how ? #340
Comments
Then by @holta Very thoughtful response from @rgaudin, and a big thank you to all building_construction I strongly support organic and free-form metadata standards (what's needed are strong norms and strong guidelines not bureaucratic rules) that allow grassroots initiatives to collaborate & innovate efficiently. In fact even semi-structured data sometimes has an extremely valuable place along the way — thereby empowering regional and specialized communities to build their own ZIM files, with the metadata that their region/profession/culture truly needs. For this reason I very strongly support allowing "free-form metadata fields" that not only permit but encourage grassroots (not centralized) community innovation to truly flourish. Then later on, as strong community norms are independently nurtured + demonstrated + proven year-by-year-by-year, the world should honor those great grassroots practices — as they become more official metadata standards. Central authorities (Kiwix) should provide basic guardrails & guidelines of course, but that's sufficient +1 Thank you to everyone including @veloman-yunkan and @kelson42 and @mgautierfr working very hard on this critical question, helping it to evolve quickly in coming years, and every step of the way. |
This is a interesting question. I mainly see two kinds of testing: The first one regroups all tests that are technically mandatory. The exact definition is subject to discussion, but at first glance, I would say that a failing test in this category would make libzim raised a exception at a moment. I can think of:
In the second group, I would put all other tests that may be good to have (for better quality) but not mandatory:
I would say that the first group are error when the second group are warning. |
This issue is created before this interesting discussion about what to test started in #339 becomes too big and cannibalizes the PR review.
Initially from @rgaudin:
I have mixed feelings about this. On one hand, this mainly highlights the shortcomings of such an approach but on the other hand, simple checks are better than none.
Couple of comments (already identified):
Once again we fall short on setting clear goals for our tools. zimcheck's description is “zimcheck checks the quality of a ZIM file.”. Does that mean that whenever zimcheck doesn't report an issue, the ZIM is guaranteed to be valid?
I join @kelson42 in thinking we want basic checks for now that we could extend in the future.
And that's it. The rest can be discussed and extended in separate tickets, raised by actual needs.
Although it serves a different purpose, scraperlib now (not being used yet) enforces correct metadata with more elaborate checks (actual language code, proper PNG with correct sizes, etc) so most of what we produce shall be valid in this regard.
The text was updated successfully, but these errors were encountered: