Migration and Structure of Repository and XML Files #243

DocOtak · 2024-12-11T02:09:22Z

As presented at the @cf-convention/info-mgmt meeting, I have a proposed structure for the vocabularies repository. This proposal includes changes to what is in the repository itself and how the files are presented on the cf website, particularly in the case of non current versions of the documents. This restructure attempts to have a consistent structure, save disk space, and simplify the update process.

Proposed Structure

Choices I made:

Publish from /docs rather than / to allow non website assets and documentation to exist, /docs was chosen because it was a default on github pages.
Only retain build html for the most current versions of the tables to save space
Reference XML stylesheets in all the XMLs to allow html versions to continue to exist.
Update some of the XML stylesheets to use the <details> and <summary> html tags so the triangle images and javascript are no longer needed to view the long definitions (standard name table search still needs JavaScript)

Structure

Everything that gets published on the website exists under a non root directory called /docs.
In there there are several other directories for the versioned XML tables:

area-type-table
cf-standard-names
standardized-region-list

Each of these contain two sub directories:

current
version

The cf-standard-names directory also has a kiwc directory that has the KIWC indexer.

The current directory has the prebuilt html and a copy of the most recent xml document and, in the case of cf-standard-names, also a prebuilt kiwc_index.html file. The version directory contains one or more numbered directories that only contains that versions xml file.

There are several "support" directories:

docs
stylesheets
schema

The docs directory has some ancillary files that I felt should be included here rather than the cf-website repo (guidelines and contributors). The stylesheets directory contains a the stylesheets that are applied to the XML documents to turn them into HTML. The directory structure is flat and the stylesheets are versioned by filename. The schema directory contains the validation schemas for the versioned XML tables.

Stylesheet Changes

Since we would not be hosting all the built html versions of tables, for someone to download the HTML versions, the stylesheets have been modified to contain a link that downloads the current page (using Javscript). The stylesheets for area-type-table and standardized-region-list have been modified to use <details> and <summary> html tags to show/hide the descriptions rather than custom javascript with triangle images. This change makes the resulting HTML fully self contained.

KWIC Index

As an experiment, the Key Word In Context indexer was rewritten to run client side in Javascript. This is included under /docs/cf-standard-names/kiwc/index.hml. By default it will load the current standard name table and generate the index. However, the URI of the standard name table it is loading is configurable via the query string tableURI. This query string can be relative, so something like index.html?tableURI=../version/1/cf-standard-name-table.xml would generate an index for version 1 of the standard name table.

Update Process

I wanted to simplify the update process to primarily use a web browser as the tool that generates most of the derived files that need to be updated. I envision this being a two step process:

Publish a new version in the version directory
Update the current directory contents

Example Update Process for CF Standard Names

As of writing this issue, the current cf standard names version is 87.

Update the version

A new cf-standard-name-table.xml version 88 has been generated by the cf editor, ensuring that it includes the stylesheet link at the top of the file.
Make a new directory named 88 in the /docs/cf-standard-names/version/ directory, so the full path would be /docs/cf-standard-names/version/88
Place the cf-standard-name-table.xml into this 88 directory, the full path would be /docs/cf-standard-names/version/88/cf-standard-name-table.xml
Commit these changes and wait for the site to build (should only be a few seconds/minutes)

Update the current directory

Replace the cf-standard-name-table.xml with a copy of the most current one (88 in this example) in /docs/cf-standard-names/current
Navigate to the new version: https://cfconventions/vocabularies/cf-standard-names/version/88/cf-standard-name-table.xml
Download the resulting html by clicking the download button on that page.
Rename it index.html and replace the index.html in /docs/cf-standard-names/current`
Navigate to the kwic index and point it at the most current version by changing the query parameter: https://cfconventions/vocabularies/cf-standard-names/kwik/index.html?tableURI=../version/88/cf-standard-name-table.xml
Download the resulting html by click the download button on that page.
Rename this file to kwic_index.html and replace the file of the same name in /docs/cf-standard-names/current
Commit the changes and wait for the site to build

If everything looks OK, links would need to be updated in the CF Conventions website repository. The process would be almost exactly the same for any of the other tables/vocabularies. Though they would not have a kiwc index.

Next Steps

PR Migrate Vocabularies and Reorganize Repository Structure #242 Merging this into the vocabularies repository would function independently under https://cfconventions.org/vocabularies/ as we work on the final structure and procedures. It would not disrupt the current practices in the cf conventions website repo. We would need to change the github pages config slightly for the vocabularies repo.
- If a very different structure is desired, we should hold on merging this PR since it would live in the git history forever.
@japamment Wanted to do an update walkthrough with me once v88 of the standard name table was actually pressed
The GRIB and AIMP columns of the standard name table are being removed soon which will have some changes to the XML and stylesheets
Technical documentation for how this site works and instructions for updating should be written and included in the repository.

The text was updated successfully, but these errors were encountered:

JonathanGregory · 2024-12-17T22:48:39Z

Dear Barna @DocOtak

Thanks for doing all this very useful and ingenious work. As I understand it, the main advantage is the saving of space, both for the GitHub repo and the Pages website. The latter is more urgent, and we have been reaching the limit. Space will be saved because we will not keep static copies of the HTML table and KWIC index for the previous versions, since JavaScript will make them as required from the XML. Keeping only the XML also avoids the small danger of the table and KWIC becoming inconsistent with the XML.

A couple of small comments occur to me:

/docs/docs looks potentially confusing to me. I'd suggest /docs/etc or /docs/misc.
In the other repos, guideline files are in /, along with README. That makes them more obvious, I think. I appreciate that we might want to show some guidelines on the Pages site as well, but perhaps we could arrange a way to mirror a file to /docs if necessary?
Is there a need for the /version subdirectory? We could have /88, /87, etc. directly under cf-standard-names, on the same level as /current. That seems logical to me.

Best wishes

Jonathan

DocOtak · 2024-12-17T23:39:17Z

@JonathanGregory Thank you for this feedback!

Some clarification, Javascript is only essential for the kiwc index, the search function on the standard name tables, and just due to historic usage, opening the standard name definition. Everything else is standard application of xml stylesheets which make the HTML in the browser. Removal of the need for javascript in the standard name table for displaying the definition could be done pretty easily I think, but I didn't consider it at this point since my motivation was to remove the need for the image assets (the triangles that indicate open/closed).

For your bullet points, here are some of my reasonings:

/docs/docs came from https://github.com/cf-convention/cf-convention.github.io/tree/main/Data/cf-standard-names/docs and I didn't change the name, the first /docs was forced somewhat by github for pages to work, the seconds docs was simply because I moved that directory from the other repo. Happy to have this be named something different, or even some evaluation on the need to retain at all.
Is this referring to the contents of /docs/docs? If so, it is like this due to the above point, where I just moved that directory as is. We can and should include more things in /.
My primary motivation for this is to make URIs that read like a sentence: https://cfconventions.org/vocabularies/cf-standard-names/version/86/ "reads" better to me than https://cfconventions.org/vocabularies/cf-standard-names/86/ since I can "read" that the number is probably a "version" number. If any change were to be made, I would suggest that the /current directory actually goes inside the /version one so that it also "reads" like a sentence: https://cfconventions.org/vocabularies/cf-standard-names/version/current/ This structure is how macOS frameworks are set up so I took some inspiration from them. I also somewhat view this structure with the /version directory to be "self documenting". My desire for a "readable" URL is also why /version is not pluralized even when there are many versions in that directory.

JonathanGregory · 2024-12-18T12:23:05Z

@DocOtak

Oh yes, I forgot about the XML being automatically interpreted as HTML with reference to the stylesheet. In this way, the brower is saving us disk space.

Maybe we don't need /docs/docs? I think this concerns just two files: guidelines.html and standard-name-contributors.html - right? These are certainly files about the vocabularies, but perhaps we could keep them in the website repo (as they are now)? There are two other files in cf-standard-names/docs currently: standardized-region-names.html, which will go in its own directory structure, and CF_vocabs_JSON_links.html, which hasn't been updated for seven years and isn't linked on the Vocabularies page, so perhaps it's redundant.

If we do need a miscellaneous directory to be shown on Pages, I suggest /docs/etc. I think the repetition in /docs/docs could be suspected of being a mistake, and would thus cause confusion.

Yes, I understand that /version/86 is more self-explanatory than /86. At the moment, however, 86 and current are on the same level, beneath cf-convention.github.io/Data/cf-standard-names. This seems rather logical to me. Also, I admit to a personal preference for flatter directory structures! If we put current under version as well, I think version would be the only subdirectory of cf-standard-names; that strikes me as an unnecessary complexity. Maybe cf-standard-names/v86 or cf-standard-names/version86 would be OK, if it's not clear what just 86 means? I wonder what others think.

Best wishes

Jonathan

DocOtak added the GitHub CF use of GitHub label Dec 11, 2024

DocOtak mentioned this issue Dec 11, 2024

Migrate Vocabularies and Reorganize Repository Structure #242

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migration and Structure of Repository and XML Files #243

Migration and Structure of Repository and XML Files #243

DocOtak commented Dec 11, 2024

JonathanGregory commented Dec 17, 2024

DocOtak commented Dec 17, 2024

JonathanGregory commented Dec 18, 2024

Migration and Structure of Repository and XML Files #243

Migration and Structure of Repository and XML Files #243

Comments

DocOtak commented Dec 11, 2024

Proposed Structure

Choices I made:

Structure

Stylesheet Changes

KWIC Index

Update Process

Example Update Process for CF Standard Names

Update the version

Update the current directory

Next Steps

JonathanGregory commented Dec 17, 2024

DocOtak commented Dec 17, 2024

JonathanGregory commented Dec 18, 2024