-
-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support new v1 directory listings #709
Support new v1 directory listings #709
Conversation
@mossroy Just to let you know I've got as far as I can with this PR, until we have a test ZIM with a valid (non-empty) v1 listing. (see #708 (comment)). Bear in mind that any test of this PR using one of the test ZIMs in http://tmp.kiwix.org/nons_zims/ will currently act as if it's using a v1 listing, but in fact it is using a v0 listing, as confirmed by the diagnostic console.log entry that is written shortly after loading a new ZIM. The random button will occasionally show non-article assets with such a ZIM. |
ab2f52d
to
5bd90b1
Compare
I've rebased this branch on master, have removed the rigged code, and have introduced a test for an empty title listing (which will be rejected gracefully), as suggested by mgautier. The code will fall back to the next best listing (which is usually the full |
Quickly tested and working with the NEW I think this is ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested on Firefox OS, with several old ZIM files and wikipedia_fr_basketball_maxi_2019-08_nons.zim
I did not see any issue, except the main page that can not be scrolled on my device (but I doubt it is a regression of this PR)
Congratulations!
I did not have a deep look in the code, but did not see any problem
OK, thanks. I need to test on more of the supplied ZIMs. The Basektball landing page has an issue common to many themed Wkipedia ZIMs in jQuery mode whereby only some of the tiles are loaded. We have had an open issue #552 for this (since August 2019). As you say this is unrelated to this PR. It is to do with the dependency on the 'masonry' JavaScript library to produce these tiled views. It's possible to work around it by preventing some of the CSS from loading, or else we could maybe inject the required scripts into jQuery mode, but that would be a separate support option for Wikipedia-based archives. |
Forgot to add that, as soon as we have a "no-namespace" reference ZIM file (like an updated version of Ray Charles archive, that would also cover zstd compression?), we should add unit tests on it (while keeping at least a few unit tests on the previous format, to cover compatibility) |
Agreed. |
I have found one issue with reading the v1 title index of the new Ray Charles no-namespace ZIM. Opening this ZIM and searching for "r" only displays two title entries with this PR: "Ray Charles" and "Ray Charles discography". However, the landing page shows there should be 15 articles beginning with "r" (see screen shot below). I need to check whether this is a problem with this PR or a problem with the coding of this sample ZIM. @mgautierfr Would you happen to know whether this ZIM only contains a sample subset of the articles in the ZIM? |
So I've found the issue that is preventing search from finding some of the title listings. The issue is that the title field of the Directory Entries contains html markup in some cases, e.g. @mgautierfr I assume this is a format error in the sample Ray Charles ZIM, and that you are not intending to implement rich text formatting for directory titles? Result of searching for |
@Jaifroid Looks like a bug in MWoffliner, similar to openzim/mwoffliner#605. Could you please open a ticket in openzim/mwoffliner and give a recent file where this problem occurs? |
@kelson42 To be clear, this PR is for reading the NEW sample no-namespace ZIM archives with v1 title listings that @mgautierfr has uploaded to http://tmp.kiwix.org/nons_zims/ for us to test this PR on. I believe these have not been produced with mwoffliner, but with a conversion script. However, if you think they have been produced my mwoffliner I'd be happy to open a ticket. I am only seeing this with the sample Ray Charles ZIM provided (I haven't tested for this particular issue in others). |
@Jaifroid I understand that these zim files have been recreated with zim tool "zimrecreate". But at the origin, there is a ZIM file created with MWoffliner and zimrecreate has for sure not "invented" the DOM node in the title. I believe you probably have the same bug already with the original zim file and this would not be a regression then. This should be verified. |
@kelson42 I've just tested, and this problem is not in the latest Ray Charles (March 2021). I think the sample Ray Charles no-namespace ZIM has been made with an old Ray Charles that may well have had the old openzim/mwoffliner#605 issue (or similar). Since it's not in the current Ray Charles, then I think we can assume it's fixed. |
@mgautierfr Would be nice if you could just confirm this (and redo the test ZIM file based on a recent version). |
As the ray_charles name implies, it is based on a 2018-09 zim file, which is pretty old. |
@mgautierfr Thank you! That was the missing piece in the puzzle and confirms that this PR is working correctly. There are no titles in the article pointer list beginning with an html tag, and a search for 'r' yields the expected articles. For information, a readout of the ZIM heade after reading the v1 title index is in the screenshot below. |
In addition to previous tests, I've now tested on FFOS simulator and IE11 with the newest no-namespace Ray Charles ZIM. Everything seems to be working as expected, so I'll go ahead and merge now. @mossroy I don't consider the latest Ray Charles to be a "reference version", because it was created with ZIMRecreate rather than with mwoffliner. Therefore I don't think we're ready yet to add tests specific to a new Ray Charles ZIM. As it may represent some work, it would be better to add tests on the first valid no-namespace Ray Charles produced by MWOffliner with updated libzim code. |
All right, I agree |
This is a draft of support for the new directory listings, corresponding to issue #708.
In line with a suggestion from @mgautierfr , I've implemented it in an extensible way: a developer can add further (future) listings in the
ZIMArchive
function ofzimArchive.js
by adding more known listings to the array of Listing objects. Obviously the only two listed there are currently as in code block below. This is very much just-in-case, as there are no plans to extend the listings.Just note that, for testing with the ZIMs we have available, the code is currently rigged to fill out the
articlePtrPos
andarticleCount
metadata fromX/listing/titleOrdered/v0
because we don't have a ZIM (at least not the ones I've tested) with a non-emptyX/listing/titleOrdered/v1
. The rigged code will be changed (a one-letter change) once we have a valid ZIM for testing. It is clearly commented where I've rigged it inzimArchive.js
.I've only done preliminary testing on two
nons
ZIMs (Ray Charles and Basketball), and on one legacy ZIM (Ray Charles, plus it is passing push tests with a split Ray Charles).