Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Request for comments]: Site Versioning #1870

Draft
wants to merge 38 commits into
base: master
Choose a base branch
from

Conversation

kaixin-hc
Copy link
Contributor

@kaixin-hc kaixin-hc commented Apr 2, 2022

What is the purpose of this pull request?
MVP for #1009

  • Documentation update
  • Bug fix
  • Feature addition or enhancement
  • Code maintenance
  • Others, please explain:

Overview of changes:

  • Add a command to archive a version under a given name. Archived versions are stored in HTML format, in a folder with the version name, which is inside the version folder(default name for the version folder is "version"). All links in the versioned site will link to other pages in the versioned site.

To summarize the current solution, what it does is build the website and place it into a folder within the repository for it to be versioned (default location is version/, but you can customise this). When the site is built, by default the build action copies the versioned site over into _site, and it is deployed. If the user decides to store versions in different folders, then the URL to the version will simply be different.

TODO:

  • Amend the way that links have their baseUrl changed such that it works even if baseUrl != ''
  • Need to test for performance. (only a small difference: on MarkBind its 1/4th of a second, from ~ 4.75s to 5s.
  • Multiple versions -- saving a subsequent version should not keep the version folders
  • Clean up code
  • Add documentation
  • Add tests (?) How to test.

Anything you'd like to highlight / discuss:

Benefits of the solution proposed in this PR

  • Simple & customisable
  • Since .html and asset files are kept, past changes in markbind version, assets used, etc will be no issue --> each version contains everything it needs to deploy as a site
  • (Relatively) performant. I don't have proof of this yet, but I believe copying the files of html folders/asset folders should be a relatively small cost, as compared to generating from markbind files.

Potential Drawbacks

  • The user does need to remember which folder they store their versions in, since they can change the name of the folder, etc.
  • Additional manual(?) cost in implementing frontend components and creating backups of their versions in branches/repositories if they happen to use git (could be mitigated by future PRs targetting this)

Thoughts

  • Is the site.json file copied over in the versioned files still being used? ETA: with further investigation, site.json is not copied over, a siteData.json is being created instead.

CAVEATS:

  • If the baseURL is changed, the baseURL of the versioned site does not change.

Testing instructions:

  • markbind archive <versionName> [versionFolderName] --> builds and saves the site in a given version and folder name.
  • markbind archive <versionName> builds and saves the site with a given version name in a folder called version

Access the site by manually appending <versionFolderName>/<version name> after the baseURL in the deployed site.

Proposed commit message: (wrap lines at 72 characters)

Implement a basic versioning CLI command

Site versioning is key for documentation use, and education websites
may want to keep past versions for archival purposes as well.

Let's implement a markbind-cli command, markbind archive, to allow
users to easily version their website. When markbind build/serve is run,
all links within the versioned site will point to their versioned equivalent.


Checklist: ☑️

  • Updated the documentation for feature additions and enhancements
  • Added tests for bug fixes or features
  • Linked all related issues
  • No unrelated changes

Stretch goals/future improvements

  • Additional CLI commands to support renaming, moving versions, and deleting versions, so that users do not ever need to modify versions.json manually.

  • Support live preview updating when the versions property of site.json is changed.

  • Editing of previous versions. My preferred solution is saving versions in separate branches, holding all files, except previously archived versions. Update the deployed files by navigating to a separate branch (and take advantage of git; like say if you fix a bug which affects all versions, you can cherrypick the commit over to the versioned branch). This will also support reverting to previous versions (working and deploying off of the versioned branch...?). caveat: this is dependent on git.

    • Further support updating gh-pages from a versioned branch to update past versions. I think this should be possible, as the ghpages extension supports an "add" which would not remove existing files. We could remove the existing "version" directory and import the files from the branch back into main/the current branch, from which you can deploy as normal.
    • Believe this solution can account for MarkBind versions changing over time, as the master only holds the HTML files of each versioned site. Might require building and deploying to be done using NPM to ensure appropriate version control
  • Front end component to easily navigate between versioned branches? Personally leaning towards a bannet, like in the 3281 site(note: banner component does not exist), but a component in the footer with a default message that you can change (like the "generated by markbind" message) might be sufficiently versatile to work as a default implementation. We could also have an auto-generated dropdown component which users can place in their navbar as desired. Key difficulty: making it versatile and customisable for all types of sites while also working "out of the box", since websites don't need to follow any set format, this may be difficult.

@jonahtanjz
Copy link
Contributor

jonahtanjz commented Apr 3, 2022

Nice work @kaixin-hc 👍

Just some initial thoughts.

Is the site.json file copied over in the versioned files still being used?

Can the user edit the archived site? From what I can see now, it only stores the built site (the .html files) so I am assuming once a site is archived, the user can't edit the site anymore? If that is the case, then I don't think the site.json needs to be copied over.

If the baseURL is changed, the baseURL of the versioned site does not change.

Will this cause any issues? If the versioned sites are contained within the main site, then I suspect this would cause some issues with the links within the versioned sites.

Archived versions are stored in HTML format, in a folder with the version name, which is inside the version folder(default name for the version folder is "version").

What happens when the user decides to store versions in different folders? We will need to keep track of where each versioned site is located. Maybe within the main site.json file? This will also help with the implementation of the frontend component to navigate between the versioned sites.

@jonahtanjz jonahtanjz requested a review from ryoarmanda April 3, 2022 05:58
@kaixin-hc
Copy link
Contributor Author

kaixin-hc commented Apr 4, 2022

Thanks for the review @jonahtanjz ! (also, theres been a lot of discussion in the issue #1009, but I'm not sure if I should reply mainly here or there) Some replies:

Will this cause any issues? If the versioned sites are contained within the main site, then I suspect this would cause some issues with the links within the versioned sites.

It does cause issues in terms of deployment - if the baseUrl of the current website changes (for example, it being moved to a repository with a different name and deployed with github pages), the links in the previous website do not change accordingly. I can't see any clean workaround besides navigating back to a past commit of that version or the person keeping another copy of that version, and having the person rebuild the files. Otherwise you'd need to parse the HTML intelligently to figure out which URLs to change the baseUrl of(intra-links), and which to keep static(external links), which is not trivial. However, the changing of the baseUrl is also not a frequently expected use-case.

We will need to keep track of where each versioned site is located.

I think we can track which folders are "version folders" in site.json - it will be necessary in order to ignore certain existing files when creating a new version. This can be done in two ways as @ang-zeyu suggested, a dedicated versions.json file like in docusaurus (simpler) or tracking the source and baseUrls for each version, if we need to change it in the future (developing it to allow versioned sites to point to other repos, etc -- cross origined versioning).

What happens when the user decides to store versions in different folders?

To summarize the current solution, what it does is build the website and place it into a folder within the repository for it to be versioned (default location is version/, but you can customise this). When the site is built, by default the build action copies the versioned site over into _site, and it is deployed. If the user decides to store versions in different folders, then the URL to the version will simply be different.

From my discussion with @damithc today, the following features are features to discuss in future commits

  • Front end UI component pointing to the version --> he prefers to leave it off for now, and leave it up to users to customize
  • Using git branches to store previous versions (my proposal in the PR description to "Saving versions in separate branches"). To avoid a reliance on git (as users might not be using git to version control or deploy), this is not desirable as a built in feature and may be considered as an addon
    • Because of this, in response to your question "Can we edit the versioned site", the answer is not conveniently/through MarkBind. It is of course possible to edit the past versions manually (navigate to desired commit, re-archive the site under the desired version name, moving the created folder to replace the past copy of the past version files). But I also realised that site.json isn't kept, exactly; it's transformed into siteData.json, which is used to generate the titles and index the pages for search, for example. So I think all those files need to be created.

@kaixin-hc kaixin-hc changed the title DRAFT, WIP: Site Versioning [Request for comments]: Site Versioning Apr 9, 2022
@kaixin-hc
Copy link
Contributor Author

@ang-zeyu @ryoarmanda, while I still need to write tests, the basic implementation is done - would appreciate a review if you have time so that I can check I'm on the right track!

const archivedBaseUrl = this.siteConfig.baseUrl === ''
? `/${versionPath}`
: `${this.siteConfig.baseUrl}/${versionPath}`;
this.siteConfig.baseUrl = archivedBaseUrl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks like on the right track, I think the biggest missing thing may be:

copy over the generated version of the website into a sub folder under the _sites folder for subsequent deployment

I realised with this note, perhaps we could also store the baseUrl used to archive the version (if we ever want to support changing baseurls) inside versions.json, as differing baseUrls shouldn't be copied over to _site.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean by "differing baseUrls sholudn't be copied over to _site" - from my understanding, the HTML files do away with references to baseURL by replacing references to baseURL with the baseURL. But yes, I can definitely store the archived site's baseUrl!

Copy link
Contributor

@jonahtanjz jonahtanjz Apr 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean by "differing baseUrls sholudn't be copied over to _site"

I believe for now, if we do a markbind build after the archive command, the archived HTML files will be copied over to the _site folder (which assumes that they have the same baseUrl as the main site). If the archived sites have a different baseUrl, then their HTML files should not be copied over to _site folder since they are probably not going to be part of the same deployment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I see...

It makes sense that differing baseUrls shouldn't be copied over (as they wouldn't be able to be deployed), but I wonder if there are any cases where the baseUrls might differ, but you might still want to include it as part of the same deployment and it would work (something like a root baseUrl being 'root' and the subsite baseUrl being "root/subsite" - it still could be part of the same deployment (?)).

I think copying it over is not a significant cost or a bug, and if the user wants to it is easy to manually exclude by indicating that folder as a folder to ignore. But it shouldn't be too hard to exclude the files as well 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I wonder if there are any cases where the baseUrls might differ, but you might still want to include it as part of the same deployment and it would work (something like a root baseUrl being 'root' and the subsite baseUrl being "root/subsite" - it still could be part of the same deployment (?)).

Subsites are meant to be entire sites able to be deployed independently.

Our support for subsites is more about content reuse https://markbind.org/userGuide/reusingContents.html#reusing-contents-across-sites, in particular revolving around resolving {{baseUrl}} properly

E.g.

Main site, baseUrl is '/cat'
Subsite , baseUrl is '/dog'

subsite index.md

[some link]({{baseUrl}}/foobar)

If you deploy the subsite independently, baseUrl resolves to /dog.

If you deploy the main site, baseUrl resolves to the main site base url + sub folder (i.e. /cat/subfolder/of/subsite).

This is important wherever there are links involved (link validaiton, relative link resolving, <include>s, ...).

by indicating that folder as a folder to ignore.

If I'm getting this part right (ignore as in the ignore property), the versioning system's copying mechanism needs to be independent of asset copying (the ignore property).

e.g. if you have

ignore: ["xx.png"]

but at the point of archiving, xx.png wasn't present (meaning the author did indeed want to copy it over). The copying for versioning should still copy xx.png over.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point on ignore however is this: If you had archived a site, say into versions/v1. Then, you ran markbind build/serve. MarkBind's asset copying would incorrectly / incompletely copy some parts of it over (since it is in the project source folder). This copying mechanism therefore needs to be implemented separately of this. The asset copying should also exclude said archive folders.

Copy link
Contributor Author

@kaixin-hc kaixin-hc Apr 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ang-zeyu Oh :0 So the desired default behaviour, is to serve only the current site, and not any archived sites, because of the cost in copying over large sites? That makes sense. I had assumed the opposite, which is that if the user had archived the site previously they would want to check it when building and serving.

Currently, my understanding/implementation is:

  • When the site is archived, the site is built with a changed baseUrl and placed into a certain folder, the location of which is tracked in versions.json - previously archived versions will not be re-archived.
    • (Currently working on subsite versions not being re-archived, should be done soon)
  • Every time the site is built/served, the contents of _site are wiped clean, and replaced by the newly built files + copied over assets (which now include the versioned sites).
    • To avoid deploying versioned sites, this is currently relying on the user manually specifying to ignore that folder.
    • Asset copying "correctly" moves over all the versioned sites, allowing the user to navigate to past versions in their live preview
  • Every time the site is deployed, the contents of _site and only those contents are transferred to gh-pages branch --> this will include all versioned sites which were not specifically ignored by the user.

And what I understand from your suggestion is that:

  • when archiving, also move the archived site files into a subfolder in _site
  • change the way the site is built and served such that only the current site is built and served, ignoring the new subfolders in _site which hold versions.
    • Hence we can't wipe _site clean, and will need to store the names of subfolders to "ignore"(either not overriding with versioning or not serving with markbind serve)
    • And I assume also adding options to allow the build and serve commands to also show archived sites, if specified.
      (One clarifying question: does this mean you suggest to store the archived site two places, both in _site and in the folder <archivePath>/<versionName> ?)

I can give it a shot, but do you think I can leave this as an improvement for another PR? I think it might be a little tricky because it affects big commands and features and might take more than a few days to implement carefully (might also make reviewing easier)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, nothing so complex, ignore the additional point on markbind serve first, I'll come back to that later.
Your current understanding is mostly already correct.

and replaced by the newly built files + copied over assets (which now include the versioned sites).

Asset copying "correctly" moves over all the versioned sites, allowing the user to navigate to past versions in their live preview

This is the main issue, it is incorrect to rely on asset copying for this. #1870 (comment)

You'll have to implement an independent copying mechanism.

If you need convincing, try archiving our documentation with markbind archive v1. Then, add appveyorAddNewProject.png to ignore, and run markbind build. The image will be missing from the archived site in versions/v1/images.

Copy link
Contributor

@ang-zeyu ang-zeyu Apr 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*Please make sure you get the above comment before reading this comment, otherwise it'll only be more confusing

~in which case (for markbind serve), I think to avoid copying the entire archived site over every file change/add/delete (expensive to copy large site over), we could also aim to do this only once 🤔 (realised its not possible to preview archived sites in markbind serve with #267 not done yet) (might be possible)

Back to this, my concern here was just with this:

  1. author runs markbind serve
  2. your yet-to-implement copying mechanism copies the versioned sites over
  3. author edits files... continues with development
    3.1 Triggers your copying mechanism everytime (file change/addition/deletion). This is rather disk intensive / expensive and might slow the development experience a fair bit.

change the way the site is built and served such that only the current site is built and served, ignoring the new subfolders in _site which hold versions.

Not in the way you are thinking, but I realised with your comment you'll also need to add the versioned site folders to the pagesExclude property here.
https://markbind.org/userGuide/siteJsonFile.html#pagesexclude

This is as:

  • we support building html files as pages as well
  • .md files may be copied to output folders as an asset in some projects

I believe for now, if we do a markbind build after the archive command, the archived HTML files will be copied over to the _site folder (which assumes that they have the same baseUrl as the main site). If the archived sites have a different baseUrl, then their HTML files should not be copied over to _site folder since they are probably not going to be part of the same deployment.

Back to @jonahtanjz's comment here as well, you'll need to take note of this when implementing the copying as well. (i.e. don't copy archived sites with different baseurl from the current one in site.json)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining that again, and for the simple example - that's really useful to test my thoughts.

I hadn't realized that we "supported building html files as pages". I had the understanding that "building" a page was converting it from .md to a generated .html - will take a closer look at this.

Copy link
Contributor

@tlylt tlylt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kaixin-hc for making good progress on this PR! I have tested the CLI command and functionality wise it seems to work as expected. Left some comments for discussion.

For reviewers, the following sections of UG describe this new feature, can go through them before starting

docs/userGuide/cliCommands.md Show resolved Hide resolved
docs/userGuide/cliCommands.md Outdated Show resolved Hide resolved
docs/userGuide/cliCommands.md Outdated Show resolved Hide resolved
docs/userGuide/cliCommands.md Show resolved Hide resolved
docs/userGuide/versioning.md Outdated Show resolved Hide resolved
docs/userGuide/versioning.md Outdated Show resolved Hide resolved
docs/userGuide/versioning.md Outdated Show resolved Hide resolved
packages/cli/index.js Outdated Show resolved Hide resolved
packages/cli/index.js Outdated Show resolved Hide resolved
packages/core/src/Site/index.js Outdated Show resolved Hide resolved
Copy link
Contributor

@tlylt tlylt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One point on versions.json for discussion:

  • Sample content of versions.json is as follows:
{
  "versions": [
    {
      "version_name": "v1.2.2",
      "build_ver": "3.1.1",
      "output": "version/v1.2.2"
    },
    {
      "version_name": "v1.2.3",
      "build_ver": "3.1.1",
      "output": "version/v1.2.3"
    }
  ]
}

Wonder if we should have a root level attribute "archivePath" to keep track of the output instead of duplicating it in individual "output" (assuming that all versions will share the same root path)
So something like this:

{
  "archivePath": "version",
  "versions": [
    {
      "version_name": "v1.2.2",
      "build_ver": "3.1.1",
    },
    {
      "version_name": "v1.2.3",
      "build_ver": "3.1.1",
    }
  ]
}

@jonahtanjz
Copy link
Contributor

Wonder if we should have a root level attribute "archivePath" to keep track of the output instead of duplicating it in individual "output" (assuming that all versions will share the same root path)

@tlylt I think since we may want to extend this to allow having versions in different repos or branches, it may be better to have them as separate entries.

@tlylt
Copy link
Contributor

tlylt commented Apr 11, 2022

Wonder if we should have a root level attribute "archivePath" to keep track of the output instead of duplicating it in individual "output" (assuming that all versions will share the same root path)

@tlylt I think since we may want to extend this to allow having versions in different repos or branches, it may be better to have them as separate entries.

In that case would a separate attribute "achivePath" within the individual object be cleaner? (instead of composing the output, list out the ingredients of the output path)

So something like:

{
  
  "versions": [
    {
      "version_name": "v1.2.2",
      "build_ver": "3.1.1",
	  "archivePath": "version",
    },
    {
      "version_name": "v1.2.3",
      "build_ver": "3.1.1",
	  "archivePath": "version",
    }
  ]
}

@kaixin-hc
Copy link
Contributor Author

In that case would a separate attribute "achivePath" within the individual object be cleaner?

Okay, I can do this! I was considering it originally, but thought output might be better as I think most of the time the use would use the composed output rather than the separate elements. Having the additional versatility is probably a good thing though

@damithc
Copy link
Contributor

damithc commented Apr 11, 2022

@kaixin-hc minor:

  • Is versions (instead of version) a better name for the folder?
  • Is the versions.json feature is already supported in this PR? If so, I assume versions.json is generated automatically? I didn't see any mention of it in the documentation pages.

@kaixin-hc
Copy link
Contributor Author

  • Is versions (instead of version) a better name for the folder?

I decided on version because this is used to auto-generate the URL, and I think having a url that is version/v1.1.1/ is nicer than versions/v1.1.1/

  • Is the versions.json feature is already supported in this PR? If so, I assume versions.json is generated automatically? I didn't see any mention of it in the documentation pages.

Yep, it is automatically generated and updated when new versions are created! You can find the reference to it in the documentation, but I'll be updating it with an example and perhaps a bit of information about the potential issues in modifying versions.json.

@ang-zeyu
Copy link
Contributor

ang-zeyu commented May 9, 2022

I've taken a bit to think about this, and I think I prefer the current implementation of 1) Specify desired versions in site.json 2) Override as specifically you want with the versions flag. Reasons being:

I think the main question here is "are there usecases where you might want to override the data in site.json"? and I think we have already thought of some, although they aren't the most common or typical

I'm not convinced implementing the reverse leads to that many convenience benefits

Thanks for the explanation, I think we are already on the same page. I'm also ok with site.json followed by --versions override.
Meant altering the defaults for the more common use case like how you're doing it for --versions currently:

// site.json
  "versions": null // by default (or if the property dosen't exist) -- deploy all versions.

From what I see in SiteConfig.js it currently seems to be [] by default. (no versions copied)

Any case where a site is archived but not intended to be deployed. I don't think this will be that common for documentation, but I could see people doing so for their own records, teachers doing so for versions of courses they are no longer teaching but think they might want to reference one day, read-only copies of notes

  • I don't think keeping this feature makes the overall implementation that much more complex.
    • The default use would just be to specify the desired version in site.json, and not think about it after that

ok 👍, convinced of the use case.

Though, to add some perspective on the second argument for generality, even "small" features like this add a commitment, which can introduce other non-implementation complexity issues down the line (deprecation, breaking changes, integration with some other feature, contradiction with other yet-to-be-added features). Which makes supporting rare use cases not too worthwhile in the first place. From a user standpoint, adding features does increase product complexity as well, in some cases that make things less appealing. On the contrary, an appealing feature should also not be forgone in view of implementation complexity, if possible with the resources.

The best example here might be bootstrap themes, which while initially simple in implementation, has many contradictions with #903, and a whole bunch of unforseen implementation issues that are too tedious to maintain / resolve.

Copy link
Contributor

@ang-zeyu ang-zeyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while you're working on the rest, some documentation nits + one more follow up on the above!

docs/userGuide/versioning.md Outdated Show resolved Hide resolved
docs/userGuide/versioning.md Outdated Show resolved Hide resolved

<box type="warning">

Modify versions.json with caution as it may result in unnecessary files being included or necessary files being excluded.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion, as its a little difficult to see how this will pan out, we could leave this to be entirely internally managed by the cli commands for now:

Suggested change
Modify versions.json with caution as it may result in unnecessary files being included or necessary files being excluded.
Avoid modifying versions.json as it may result in unnecessary files being included or necessary files being excluded.
// remove the rest

if we don't want support these, could also move these chunks on versions.json further below or as a separate section -- as "extra non user-facing information"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mm, I agree with you, its just that I haven't implemented CLI commands to manage this yet so it might become necessary to edit it as a stopgap measure (changing the name of versions, for example, or deleting versions). I think using CLI commands to support renaming, deletion and maybe moving would be ideal, but might be outside the scope of this PR? Though I don't think it would be too difficult!

docs/userGuide/versioning.md Outdated Show resolved Hide resolved
docs/userGuide/versioning.md Show resolved Hide resolved
docs/userGuide/cliCommands.md Outdated Show resolved Hide resolved
docs/userGuide/cliCommands.md Outdated Show resolved Hide resolved
docs/userGuide/cliCommands.md Outdated Show resolved Hide resolved
} else {
await this.addVersions(desiredVersions
.filter(vers => versionsToGenerate.includes(vers.versionName)));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow up on these for the --versions option to make things consistent between the flag and site.json:

Hmm I'm actually thinking we copy all versions over by default (since if the user archives a site, they probably want to include it in always given this PR assumes single deployment). The use case for -v might be to retroactively "hide" a version of the site. (though something like a future markbind archive delete command might also work). There's also this quote "Might help with the issue of copying over large site", which assumes all archived sites are copied over by default.

Meant altering the defaults for the more common use case like how you're doing it for --versions currently:

How about:

  • --versions - no versions copied, similar to how "versions": [] would work
  • --versions v1 v2 v3 - v1 v2 v3 deployed
  • --all-versions - to open the option of overriding site.json to deploy all versions (similar to how at the current there's no way to override site.json to deploy no versions) -- though, this use case should be quite rare, so we might get by with not supporting it for now

@kaixin-hc kaixin-hc requested a review from tlylt June 23, 2022 07:23
@kaixin-hc
Copy link
Contributor Author

Really sorry for the delay! @jonahtanjz @ang-zeyu pinging as you were previously involved in these conversations.

I have implemented tests for the new functionality (just unit tests) and I think this PR may be sufficient to stand alone. Other changes since the last review

  • Documentation changes as requested
  • Version flag for building and serving now leads to no sites being deployed.
  • Fixed bugs involving file paths/non posix style code for windows

I also updated the list of future-PRs for this feature for reference, these can be made into separate issues for other contributors to pick up. Thanks!

Copy link
Contributor

@tlylt tlylt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work @kaixin-hc ! Left some documentation comments for consideration pls

docs/userGuide/versioning.md Outdated Show resolved Hide resolved
docs/userGuide/versioning.md Outdated Show resolved Hide resolved

<box type="warning">

Modify versions.json with caution as it may result in unnecessary files being included or necessary files being excluded.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Modify versions.json with caution as it may result in unnecessary files being included or necessary files being excluded.
Avoid modifying `versions.json` as it may result in unnecessary files being included or necessary files being excluded.

Comment on lines +58 to +60
* You may safely change the `versionName` of a version, **provided that it is unique** in versions.json. If you have specified versions to deploy in `site.json`, make sure you update the [versions property](siteJsonFile.md#versions) there as well.

* The baseUrl is used when setting the intra-site links; if you later change the baseUrl, previously saved versions with the past baseUrl will not be built/deployed even if specified because it would be a broken implementation.
Copy link
Contributor

@tlylt tlylt Jun 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps referring to the above comment thread from Ze Yu, we can make a new section of versions.json and move this down together with the versions.json example.

docs/userGuide/versioning.md Outdated Show resolved Hide resolved
<div id="archiveWarning">
<box type="warning">

Warning: If the folder at `<archivePath>` already exists, the contents will be overwritten and your previous files may be lost. Only do so if you need to replace all the archived files with the current site files.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is referenced on the site versioning page, and the "only do so" is slightly unclear (do what?:-) reading it from https://deploy-preview-1870--markbind-master.netlify.app/userguide/versioning

docs/userGuide/cliCommands.md Outdated Show resolved Hide resolved
docs/userGuide/cliCommands.md Outdated Show resolved Hide resolved
docs/userGuide/cliCommands.md Outdated Show resolved Hide resolved
docs/userGuide/cliCommands.md Outdated Show resolved Hide resolved
Copy link
Contributor

@tlylt tlylt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further comments added for consideration pls

@@ -641,7 +648,7 @@ class Site {
* @param baseUrl user defined base URL (if exists)
* @returns {Promise}
*/
async generate(baseUrl) {
async generate(baseUrl, versionsToGenerate = false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps can use a more boolean-like parameter name, e.g. shouldGenerateVersion ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current may be more suitable as it may be an array, not perfectly semantic but once we've converted this part to typescript it should be clearer too.

https://www.npmjs.com/package/commander ("variadic option")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, its not only a boolean so I ended up choosing this name. It's not perfect but I couldn't think of anything better - suggestions appreciated!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, understand that the variadic option is used here, just slightly uncomfortable with the options-boolean-or-value aspect 🤯. I think perhaps a suggestion would be to process the options.versions and split the versionTogenerate into two variables shouldGenerate and versions before passing them into the generate function. The existing solution is fine too.

A separate nit that I happen to realize: line 648, need to add the new parameter into the jsdoc comment for this generate function

packages/core/src/Site/index.js Outdated Show resolved Hide resolved
@@ -98,6 +98,8 @@ program
.option('-p, --port <port>', 'port for server to listen on (Default is 8080)')
.option('-s, --site-config <file>', 'specify the site config file (default: site.json)')
.option('-d, --dev', 'development mode, enabling live & hot reload for frontend source files.')
.option('-v, --versions [versionNames...]',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there are no values passed in, there's no point in the flag? In that case, perhaps we can make the values compulsory? What do you think?
Also, may I check if there's a typo -> deploy no versions is it serve no versions? (Else, deploy seems a little ambiguous~).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meant altering the defaults for the more common use case like how you're doing it for --versions currently

I think this part was missed out @kaixin-hc (more than likely if you archive a site you'll want to deploy it too. Also @tlylt's point above)

logically speaking, just this portion (SiteConfig.js)

this.versions = siteConfigJson.versions !== undefined
      ? siteConfigJson.versions
      : every version;

(implementation wise you might want to do the check here instead)

// change
if (versionsToGenerate === true) {
      // copy no versions if the version flag is passed without arguments
} else if (versionsToGenerate === false) {
      await this.copyVersions(desiredVersions
        .filter(vers => this.siteConfig.versions.includes(vers.versionName)));
} else {
     await this.copyVersions(desiredVersions
       .filter(vers => versionsToGenerate.includes(vers.versionName)));
}

// into 
if (versionsToGenerate === true) {
      // copy no versions if the version flag is passed without arguments
} else if (versionsToGenerate is an array) {
  // cli override takes precedence
  await this.copyVersions(desiredVersions
        .filter(vers => versionsToGenerate.includes(vers.versionName)))
} else if (/* versionsToGenerate === false (implied) && */ this.siteConfig.versions is an array) {
  // does site.json have the property? use it next if so
      await this.copyVersions(desiredVersions
        .filter(vers => this.siteConfig.versions.includes(vers.versionName)));
} else {
  // **the default** - deploy all previously archived versions versions
  await this.copyVersions(desiredVersions
      .filter(vers => versionsToGenerate.includes(vers.versionName)));
}

// and change (SiteConfig.js)
this.versions = siteConfigJson.versions !== undefined
    ? siteConfigJson.versions : [];

// into just 
this.versions = siteConfigJson.versions;

The flag behaviour looks good though!

@@ -298,6 +300,8 @@ program
.option('--baseUrl [baseUrl]',
'optional flag which overrides baseUrl in site.json, leave argument empty for empty baseUrl')
.option('-s, --site-config <file>', 'specify the site config file (default: site.json)')
.option('-v, --versions [versionNames...]',
'specify versions to be deployed. if flag is used without specification, deploy no versions')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above

/**
* Filters the versions based on the options passed, and copies the desired versions for later deployment
*/
async copySpecifiedVersions(versionsToGenerate) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might need to change how this function works if versionsToGenerate is updated based on the above comments.

packages/core/src/Site/index.js Outdated Show resolved Hide resolved
// Save version data
this.versionData = await this.writeVersionsFile(versionName, archivePath);
// Exclude versioned files from archiving.
this.ignoreVersionFiles('');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps can set '' as the default parameter value?

}

// Do not transfer the versions file into the archived site
// TODO: replace the thing in brackets with rootVersionPath and see if it still works
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, what is "the thing in brackets" and see what still works? Slightly confusing about the context when reading this, some clarification might be helpful. What do you think?

fs.copy(path.format(path.parse(versionFolder)), path.join(SITE_FOLDER_NAME, versionFolder));
});
await Promise.all(versionFoldersArray);
logger.info('Versioned site(s) copied');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to find out how many so that we can give either site or sites? What do you think?

Copy link
Contributor

@ang-zeyu ang-zeyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for continuing work on this! @kaixin-hc

I think we're almost there, a few more minor implementation details:

.description('archive a version of the site, which is not affected by later changes to the site')
.action((versionName, options) => {
const archivePath = options.archivePath || `version/${versionName}`;
const rootFolder = path.resolve(process.cwd());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wonder if we should add in archive [root] <versionName> to make things consistent with the other commands (cliUtil.findRootFolder), but it should be backward compatible in any case so we can decide on this later.

I think minimally we could add in a small note in the cli commands page that the site archived is the cwd's one.

@@ -98,6 +98,8 @@ program
.option('-p, --port <port>', 'port for server to listen on (Default is 8080)')
.option('-s, --site-config <file>', 'specify the site config file (default: site.json)')
.option('-d, --dev', 'development mode, enabling live & hot reload for frontend source files.')
.option('-v, --versions [versionNames...]',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meant altering the defaults for the more common use case like how you're doing it for --versions currently

I think this part was missed out @kaixin-hc (more than likely if you archive a site you'll want to deploy it too. Also @tlylt's point above)

logically speaking, just this portion (SiteConfig.js)

this.versions = siteConfigJson.versions !== undefined
      ? siteConfigJson.versions
      : every version;

(implementation wise you might want to do the check here instead)

// change
if (versionsToGenerate === true) {
      // copy no versions if the version flag is passed without arguments
} else if (versionsToGenerate === false) {
      await this.copyVersions(desiredVersions
        .filter(vers => this.siteConfig.versions.includes(vers.versionName)));
} else {
     await this.copyVersions(desiredVersions
       .filter(vers => versionsToGenerate.includes(vers.versionName)));
}

// into 
if (versionsToGenerate === true) {
      // copy no versions if the version flag is passed without arguments
} else if (versionsToGenerate is an array) {
  // cli override takes precedence
  await this.copyVersions(desiredVersions
        .filter(vers => versionsToGenerate.includes(vers.versionName)))
} else if (/* versionsToGenerate === false (implied) && */ this.siteConfig.versions is an array) {
  // does site.json have the property? use it next if so
      await this.copyVersions(desiredVersions
        .filter(vers => this.siteConfig.versions.includes(vers.versionName)));
} else {
  // **the default** - deploy all previously archived versions versions
  await this.copyVersions(desiredVersions
      .filter(vers => versionsToGenerate.includes(vers.versionName)));
}

// and change (SiteConfig.js)
this.versions = siteConfigJson.versions !== undefined
    ? siteConfigJson.versions : [];

// into just 
this.versions = siteConfigJson.versions;

The flag behaviour looks good though!

@@ -114,6 +114,8 @@ class SiteConfig {
*/
this.plantumlCheck = siteConfigJson.plantumlCheck !== undefined
? siteConfigJson.plantumlCheck : true; // check PlantUML's prerequisite by default
this.versions = siteConfigJson.versions !== undefined
? siteConfigJson.versions : [];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per above suggestion

@@ -641,7 +648,7 @@ class Site {
* @param baseUrl user defined base URL (if exists)
* @returns {Promise}
*/
async generate(baseUrl) {
async generate(baseUrl, versionsToGenerate = false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current may be more suitable as it may be an array, not perfectly semantic but once we've converted this part to typescript it should be clearer too.

https://www.npmjs.com/package/commander ("variadic option")

.map(x => path.relative(pathToDirWithVersion, x)) // assumes versions files are in the 'root' of site
.map(x => path.dirname(x));

pathsToVersionFiles.forEach(p => this.ignoreVersionFiles(p));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the additional map referring to pathsToVersionFiles? I can't find anything out of the ordinary (let me know if im misunderstanding something too), but here's my take with baseUrlMap, iteratively.

// pathToRootDir parameter unneeded

this.baseUrlMap.forEach((p) => {
    // Just the top half of your existing implementation, with some tiny modifications
    // all pathToRootDir is changed to "p", the relative path from root to the (sub)site

    const pathToVersionFile = path.join(this.rootPath, p, VERSIONS_DATA_NAME);

    // find the versions json file and ignore all archives of the current site
    if (fs.pathExistsSync(pathToVersionFile)) {
      const versionFileJson = fs.readJSONSync(pathToVersionFile);

      versionFileJson.versions.forEach((vers) => {
        // throwing in the other **separate** comment on posix path handling, p may be a windows path
        const filePath = utils.ensurePosix(path.join(p, vers.archivePath));
        const filePathGlob = filePath + '/**';   // though, you might need to check if the '/' already exists
        this.siteConfig.ignore.push(filePathGlob );
      });
    }
   
   // posix path handling again
    this.siteConfig.ignore.push(ensurePosix(path.join(p, VERSIONS_DATA_NAME)));
})

Making sure ignoreVersionFiles is called after collectBaseUrl too.
Implementation wise you could specifically keep the this.buildManagers line in collectBaseUrl at that place (to prevent any regressions from reordering that), but reorder + extract the rest of it before ignoreVersionFiles.

.map(x => path.relative(pathToDirWithVersion, x)) // assumes versions files are in the 'root' of site
.map(x => path.dirname(x));

pathsToVersionFiles.forEach(p => this.ignoreVersionFiles(p));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(separate comment)

let's move the .ignore here to a separate property for clarity, maybe site.versionsIgnore, then add it in where .ignore was originally used.

e.g. (buildAssets function)

    const fileIgnore = ignore().add([...this.siteConfig.ignore, outputFolder]);
// into
    const fileIgnore = ignore().add([...this.siteConfig.ignore, ...this.versionsIgnore, outputFolder]);

this should also accomodate detecting changes in the ignore property properly during markbind serve. (live reload of that property is supported)

* Holds the work for generating a site from scratch.
*/
async buildSiteHelper() {
this.collectAddressablePages();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in the way you are thinking, but I realised with your comment you'll also need to add the versioned site folders to the pagesExclude property here.
https://markbind.org/userGuide/siteJsonFile.html#pagesexclude

This is as:

we support building html files as pages as well
.md files may be copied to output folders as an asset in some projects

I think missed this part too (implementation is inside this function)
#1870 (comment)

},
];

// TODO:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use a functional test with some "special" procedures for this (much like the special procedures for testing markbind convert vs other sites)

Thanks for this. I was actually just expecting functional tests test/e2e/functional which are easier to maintain.
For example,

// testSites.js

const testVersioningSites = [
  'test_site_versioning/1',
];

// test.js
testVersioningSites .forEach((siteName) => {
  console.log(`Running ${siteName} versioning tests`);
  const nonMarkBindSitePath = path.join(siteName, 'non_markbind_site');
  try {
    execSync(`node ../../index.js version versionName`, execOptions);
    compare(siteName, 'expected1', '_site');

    // The downside specific to MarkBind's functional tests is that we need the expected directory (which if there are too many of might begin to slow git a little)
   // But you don't have to create one site for every behaviour you want to test,
   // e.g. get creative with something like this

    execSync(`node ../../index.js version versionName --versions`, execOptions);
    expect versions dir to be empty

    ... more ...
    
  } catch (err) {
    printFailedMessage(err, siteName);
    cleanupConvert(path.resolve(__dirname, siteName));
    process.exit(1);
  }
  cleanupConvert(path.resolve(__dirname, siteName));
});

In general, note that we aim to test public interfaces instead of private ones. In the case of markbind (a cli app), especially about Site.js's functionalities which integrate multiple things, that's predominantly e2e behaviour.

You may have noticed the tests are quite brittle (e.g. what if we reordered the execution of readSiteConfig / writeVersionsFile), these are whimsical implementation details that once changed would require changing the tests too.

A lot of our existing unit tests frankly don't adhere to this well too, and definitely need improving.

Though for clarification -- this is not me making a case for e2e tests, just "which is easier to write and maintain" in this case (and still tests what we want to test), which varies from project to project. For example, deeper implementation details that are sufficiently complex and well encapsulated, e.g. a hypothetical pure function like replaceInvalidCharacters, can and probably should be unit-tested, if we expect them to work a certain way.

I'm ok with keeping the tests here since you've already done quite a bit of work here, but if you have the time, I think we can adapt these to functional tests. But we can definitely work on porting the tested behaviours otherwise in the future :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops... Maybe for a future PR 😓 I don't think I'll have the time in the coming months.

@ang-zeyu ang-zeyu mentioned this pull request Sep 22, 2022
11 tasks
@tlylt tlylt marked this pull request as draft April 10, 2023 10:36
@kaixin-hc
Copy link
Contributor Author

@MarkBind/active-devs @damithc Hi all! Sorry for no progress on this for so long. Can I check if this feature is still relevant to figure out if this should be revised or just closed?

@damithc
Copy link
Contributor

damithc commented Sep 23, 2023

I think it is still relevant, @kaixin-hc

@kaixin-hc
Copy link
Contributor Author

Okay! Just to update that I've revisited the code and these comments to some extent and I think the original principles were sound, but the refactoring done since when I first worked on this and now is making the merge quite painful, which is why there's been some delay; but I'll make the fixes either here or in a fresh PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants