Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In browser KWIC indexer #230

Open
DocOtak opened this issue Oct 23, 2024 · 6 comments
Open

In browser KWIC indexer #230

DocOtak opened this issue Oct 23, 2024 · 6 comments
Labels
GitHub CF use of GitHub

Comments

@DocOtak
Copy link
Member

DocOtak commented Oct 23, 2024

I'm making this issue under the github label since it is not a standard name request.

I've opened PR #228 that adds an in browser KWIC indexer. This is an attempt to solve two problems we currently have:

  • The KWIC indexer is broken or not runnable, the most recent standard name table didn't have a KWIC index generated for it
  • The generated KWIC index html is quite large (8+MB) and we are approaching some limits github has for repository and deployed website sizes.

Additional features made possible:

  • Can generated a KIWC index for every version of the standard name table
  • The ability to download the generated HTML for offline use (or hosting someplace else)
  • No need to update the indexer with each new version of the standard name table, only publishing the XML.

Attn @JonathanGregory

@DocOtak DocOtak added the GitHub CF use of GitHub label Oct 23, 2024
@JonathanGregory
Copy link
Contributor

Thanks for doing this useful work and opening the issue to describe it. Since at this stage it's a test of a new facility, I think it's fine to merge it and see if it works as expected! Please go ahead.

@DocOtak
Copy link
Member Author

DocOtak commented Oct 28, 2024

Now that I merged in the #228, the KWIC Index is available at: https://cfconventions.org/vocabularies/kwic-index.html

Because this is hosted on the CF website, versions of the standard name table do not need the full URI and can be shortened from my original discussion:

I find the version on the CF website to be quite performant.

@JonathanGregory Is there an idea with how this vocabularies repo might be structured? I could put together a straw man demo that combines that KWIC indexer and the stylesheet proposed in: #231 (including file downloader). I'm tempted to do this just to see what kind of file space we'd be looking at.

Even if we end up hosting static HTML, these JS/XLST implementations and static download links I think might be very useful for the workflow. Since it just needs a browser to run, it could be how all the static files are generated.

@larsbarring
Copy link

Just echoing what Jonathan writes -- this is impressive. When I load the kwik page (using the link you give), it takes about 7-8 seconds, which I think is very acceptable given the content. I have just made a comment regarding the XSLT file over in vocabularies#231.

@DocOtak
Copy link
Member Author

DocOtak commented Oct 28, 2024

@larsbarring I was looking into the rendering times when building it. The static KWIC indexes on the CF website take just as long to render, but can do so in a streaming way. So the top is displaying to you as the lower parts of the page are still being rendered.

I liked the idea from @JonathanGregory of having the "current" KWIC and CF standard name tables be static, and all the previous version be just the XML with the XSLT (name table) and Javascript (KWIC Index) to display them. My napkin calculations have this reducing the total amount of data in vocabularies to around 200MB (from ~800MB), but I want to actually try it. I think this would allow about 100 more standard name table version until the size limits become a problem again.

@larsbarring
Copy link

Yes, I agree 👍 with @JonathanGregory and you regarding have the current version static and the other ones generated.

@JonathanGregory
Copy link
Contributor

@DocOtak, many thanks for your continuing work on this. I suggest that it would be helpful if you and the vocabularies team (Alison @japamment, Fran @feggleton and Ellie @efisher008) could discuss the appropriate setup of the vocabularies repo. Others might have views as well, such as Lars @larsbarring, arising from his previous work on the standard name table.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GitHub CF use of GitHub
Projects
None yet
Development

No branches or pull requests

3 participants