Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migration and Structure of Repository and XML Files #243

Open
DocOtak opened this issue Dec 11, 2024 · 3 comments
Open

Migration and Structure of Repository and XML Files #243

DocOtak opened this issue Dec 11, 2024 · 3 comments
Labels
GitHub CF use of GitHub

Comments

@DocOtak
Copy link
Member

DocOtak commented Dec 11, 2024

As presented at the @cf-convention/info-mgmt meeting, I have a proposed structure for the vocabularies repository. This proposal includes changes to what is in the repository itself and how the files are presented on the cf website, particularly in the case of non current versions of the documents. This restructure attempts to have a consistent structure, save disk space, and simplify the update process.

Proposed Structure

Choices I made:

  • Publish from /docs rather than / to allow non website assets and documentation to exist, /docs was chosen because it was a default on github pages.
  • Only retain build html for the most current versions of the tables to save space
  • Reference XML stylesheets in all the XMLs to allow html versions to continue to exist.
  • Update some of the XML stylesheets to use the <details> and <summary> html tags so the triangle images and javascript are no longer needed to view the long definitions (standard name table search still needs JavaScript)

Structure

Everything that gets published on the website exists under a non root directory called /docs.
In there there are several other directories for the versioned XML tables:

  • area-type-table
  • cf-standard-names
  • standardized-region-list

Each of these contain two sub directories:

  • current
  • version

The cf-standard-names directory also has a kiwc directory that has the KIWC indexer.

The current directory has the prebuilt html and a copy of the most recent xml document and, in the case of cf-standard-names, also a prebuilt kiwc_index.html file. The version directory contains one or more numbered directories that only contains that versions xml file.

There are several "support" directories:

  • docs
  • stylesheets
  • schema

The docs directory has some ancillary files that I felt should be included here rather than the cf-website repo (guidelines and contributors). The stylesheets directory contains a the stylesheets that are applied to the XML documents to turn them into HTML. The directory structure is flat and the stylesheets are versioned by filename. The schema directory contains the validation schemas for the versioned XML tables.

Stylesheet Changes

Since we would not be hosting all the built html versions of tables, for someone to download the HTML versions, the stylesheets have been modified to contain a link that downloads the current page (using Javscript). The stylesheets for area-type-table and standardized-region-list have been modified to use <details> and <summary> html tags to show/hide the descriptions rather than custom javascript with triangle images. This change makes the resulting HTML fully self contained.

KWIC Index

As an experiment, the Key Word In Context indexer was rewritten to run client side in Javascript. This is included under /docs/cf-standard-names/kiwc/index.hml. By default it will load the current standard name table and generate the index. However, the URI of the standard name table it is loading is configurable via the query string tableURI. This query string can be relative, so something like index.html?tableURI=../version/1/cf-standard-name-table.xml would generate an index for version 1 of the standard name table.

Update Process

I wanted to simplify the update process to primarily use a web browser as the tool that generates most of the derived files that need to be updated. I envision this being a two step process:

  1. Publish a new version in the version directory
  2. Update the current directory contents

Example Update Process for CF Standard Names

As of writing this issue, the current cf standard names version is 87.

Update the version

  1. A new cf-standard-name-table.xml version 88 has been generated by the cf editor, ensuring that it includes the stylesheet link at the top of the file.
  2. Make a new directory named 88 in the /docs/cf-standard-names/version/ directory, so the full path would be /docs/cf-standard-names/version/88
  3. Place the cf-standard-name-table.xml into this 88 directory, the full path would be /docs/cf-standard-names/version/88/cf-standard-name-table.xml
  4. Commit these changes and wait for the site to build (should only be a few seconds/minutes)

Update the current directory

  1. Replace the cf-standard-name-table.xml with a copy of the most current one (88 in this example) in /docs/cf-standard-names/current
  2. Navigate to the new version: https://cfconventions/vocabularies/cf-standard-names/version/88/cf-standard-name-table.xml
  3. Download the resulting html by clicking the download button on that page.
  4. Rename it index.html and replace the index.html in /docs/cf-standard-names/current`
  5. Navigate to the kwic index and point it at the most current version by changing the query parameter: https://cfconventions/vocabularies/cf-standard-names/kwik/index.html?tableURI=../version/88/cf-standard-name-table.xml
  6. Download the resulting html by click the download button on that page.
  7. Rename this file to kwic_index.html and replace the file of the same name in /docs/cf-standard-names/current
  8. Commit the changes and wait for the site to build

If everything looks OK, links would need to be updated in the CF Conventions website repository. The process would be almost exactly the same for any of the other tables/vocabularies. Though they would not have a kiwc index.

Next Steps

  • PR Migrate Vocabularies and Reorganize Repository Structure #242 Merging this into the vocabularies repository would function independently under https://cfconventions.org/vocabularies/ as we work on the final structure and procedures. It would not disrupt the current practices in the cf conventions website repo. We would need to change the github pages config slightly for the vocabularies repo.
    • If a very different structure is desired, we should hold on merging this PR since it would live in the git history forever.
  • @japamment Wanted to do an update walkthrough with me once v88 of the standard name table was actually pressed
  • The GRIB and AIMP columns of the standard name table are being removed soon which will have some changes to the XML and stylesheets
  • Technical documentation for how this site works and instructions for updating should be written and included in the repository.
@JonathanGregory
Copy link
Contributor

Dear Barna @DocOtak

Thanks for doing all this very useful and ingenious work. As I understand it, the main advantage is the saving of space, both for the GitHub repo and the Pages website. The latter is more urgent, and we have been reaching the limit. Space will be saved because we will not keep static copies of the HTML table and KWIC index for the previous versions, since JavaScript will make them as required from the XML. Keeping only the XML also avoids the small danger of the table and KWIC becoming inconsistent with the XML.

A couple of small comments occur to me:

  • /docs/docs looks potentially confusing to me. I'd suggest /docs/etc or /docs/misc.

  • In the other repos, guideline files are in /, along with README. That makes them more obvious, I think. I appreciate that we might want to show some guidelines on the Pages site as well, but perhaps we could arrange a way to mirror a file to /docs if necessary?

  • Is there a need for the /version subdirectory? We could have /88, /87, etc. directly under cf-standard-names, on the same level as /current. That seems logical to me.

Best wishes

Jonathan

@DocOtak
Copy link
Member Author

DocOtak commented Dec 17, 2024

@JonathanGregory Thank you for this feedback!

Some clarification, Javascript is only essential for the kiwc index, the search function on the standard name tables, and just due to historic usage, opening the standard name definition. Everything else is standard application of xml stylesheets which make the HTML in the browser. Removal of the need for javascript in the standard name table for displaying the definition could be done pretty easily I think, but I didn't consider it at this point since my motivation was to remove the need for the image assets (the triangles that indicate open/closed).

For your bullet points, here are some of my reasonings:

@JonathanGregory
Copy link
Contributor

@DocOtak

Oh yes, I forgot about the XML being automatically interpreted as HTML with reference to the stylesheet. In this way, the brower is saving us disk space.

Maybe we don't need /docs/docs? I think this concerns just two files: guidelines.html and standard-name-contributors.html - right? These are certainly files about the vocabularies, but perhaps we could keep them in the website repo (as they are now)? There are two other files in cf-standard-names/docs currently: standardized-region-names.html, which will go in its own directory structure, and CF_vocabs_JSON_links.html, which hasn't been updated for seven years and isn't linked on the Vocabularies page, so perhaps it's redundant.

If we do need a miscellaneous directory to be shown on Pages, I suggest /docs/etc. I think the repetition in /docs/docs could be suspected of being a mistake, and would thus cause confusion.

Yes, I understand that /version/86 is more self-explanatory than /86. At the moment, however, 86 and current are on the same level, beneath cf-convention.github.io/Data/cf-standard-names. This seems rather logical to me. Also, I admit to a personal preference for flatter directory structures! If we put current under version as well, I think version would be the only subdirectory of cf-standard-names; that strikes me as an unnecessary complexity. Maybe cf-standard-names/v86 or cf-standard-names/version86 would be OK, if it's not clear what just 86 means? I wonder what others think.

Best wishes

Jonathan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GitHub CF use of GitHub
Projects
None yet
Development

No branches or pull requests

2 participants