Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version support and automatic CI build #2214

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

intelkevinputnam
Copy link

Description

replaces #1785

Version of docs built using CI: https://intelkevinputnam.github.io/scikit-learn-intelex/

There are two major changes in this PR:

  1. Version switching support
  2. CI deployment to GitHub pages via GitHub action

Version Switching Support

All published versions must be tracked in doc/versions.json as this will allow all subsequent versions of the docs to automatically be able to switch to the latest without needing to rebuild the docs.

The version is supplied by the installed version of scikit-learn-intelex (latest by default in the CI script). This way building the docs is not dependent on the oneDAL environment with the goal of making it easier to contribute to and edit the docs. It should also make builds faster.

Since the move to the UXL Foundation, version switching is broken in the latest version of the docs because of there is a hard coded path to the versions.json that no longer exists (it was intel.github.com/scikit-learn-intelex). It also is now settable in the conf.py (html_context)

CI Deployment to GitHub Pages via GitHub Action

GitHub pages will be updated automatically. A new directory will be added for each version of scikit-learn-intelex. When the same version is built twice, it is overwritten. See .github/workflows/publish.yml for the script. The branch will need to be changed to main before it will work on the main branch. I'm not sure it is necessary to have the docs built on every merge into main. Adding workflow_dispatch: to the list of conditions for running the action can provide a handy "deploy whenever you feel like it" option.

To build the docs locally, run ./build-docs.sh. Docs can be previewed locally by starting a web server (python3 -m http.server) in the _build directory and using localhost:8000/scikit-learn-intelex as the URL.


Checklist to comply with before moving PR from draft:

PR completeness and readability

  • I have reviewed my changes thoroughly before submitting this pull request.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes or created a separate PR with update and provided its number in the description, if necessary.
  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have added a respective label(s) to PR if I have a permission for that.
  • I have resolved any merge conflicts that might occur with the base branch.

@david-cortes-intel
Copy link
Contributor

@intelkevinputnam I'm seeing several misrendered things in the docs from your branch.

For example:
image

This is how it looks for me in a local build:
image

make html

source ./set_version.sh
export SPHINXPROJ=scikit-learn-intelex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also have daal4py with docs under doc/daal4py.

import subprocess
import os

def get_version():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@icfaust Could you please look into whether this captures the version number correctly.

Copy link
Contributor

@icfaust icfaust Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would prefer if we got rid of this file and set_version.sh entirely and replaced it with

DOC_VERSION=$(python -m pip freeze | grep -oP "(?<=scikit-learn-intelex==)\d*\.\d*")

doc/sources/_static/custom.css Show resolved Hide resolved
# -- Project information -----------------------------------------------------

project = "Intel(R) Extension for Scikit-learn*"
copyright = "Intel"
author = "Intel"

# The short X.Y version
version = "2025.0.0"
version = os.environ.get('DOC_VERSION','')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about changing the version to "latest" here when it doesn't find the env. variable?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there isn't a version, the build should fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean: would be ideal to have a version "latest" building the docs directly from the master branch, for unreleased versions. So in that case, the release versions would have the env. variable whereas the master branch wouldn't.

@@ -0,0 +1,19 @@
[
{
"name": "2025.0 (latest)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make this file auto-generated from the releases that we have as github tags instead? It's BTW missing some version numbers, such as 2024.7.0:
https://github.com/uxlfoundation/scikit-learn-intelex/releases

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file will need to be manually curated, since it is the list of versions that populates the switcher for all published versions. You'll need to remove old versions and stipulate which one is latest. The build script shouldn't know which is latest, since you could possible publish an older version (that isn't latest)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list of releases can be obtained programmatically. This repository keeps branches with "rls" in the name for all released versions, which can be obtained with a command like this from the root of the repository:

git branch -r | grep rls

And since it follows date-based versioning, the most up-to-date one will always be the last one in alphabetical order, as returned by that command. So it should be possible to auto-generate this file with all the releases that exist, and keep it auto-updated to point to the last when a new branch comes in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the issue would be that some versions are missing docs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to agree with @david-cortes-intel on this one, we could technically have the (latest) tag be updated on a github create trigger when it has rls in it.

git config --global user.email "${GITHUB_ACTOR}@github.com"
git add .
git commit -m "latest HTML output"
git push -f https://${GITHUB_ACTOR}:${{secrets.GITHUB_TOKEN}}@github.com/${GITHUB_REPOSITORY}.git HEAD:gh-pages
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: what happens if something goes wrong during the build? For example, we might run into errors such as failing to import the modules from which autodoc generates documentation, which would manifest as a "warning" being issued but some docs still being built.

Could there be a check for build warnings that would make this script fail (avoiding pushing anything to GH) if something goes wrong?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any errors in the github workflow will stop the docs from publishing. We could turn on "error on warning" in the sphinx build, if we wanted to gate against any issue in the Sphinx build.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that'd be ideal, so that we don't accidentally end up with empty docs. But would be better if it could ignore some warnings selectively - currently there are some warnings when building daal4py which I'm not sure how to solve and which do not impact the resulting docs that get built.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example benign warnings when building daal4py:

/home/dcortes/repos/scikit-learn-intelex/doc/daal4py/docstring of daal4py._daal4py.classifier_prediction_result:1: WARNING: duplicate object description of daal4py.classifier_prediction_result, other instance in algorithms, use :noindex: for one of them

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.github.com/en/actions/security-for-github-actions/security-guides/automatic-token-authentication#example-1-passing-the-github_token-as-an-input let me know if i am missing something, sentence just before this link: "Commits pushed by a GitHub Actions workflow that uses the GITHUB_TOKEN do not trigger a GitHub Pages build."

@@ -0,0 +1,19 @@
[
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about "stable" version tag in addition to latest?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this file needs to be manually curated, we can add any tags we want to it.

@david-cortes-intel
Copy link
Contributor

Perhaps this PR could be a good opportunity to address one topic around the version numbers: in open source projects, version "latest" or "dev" is typically meant for docs from "latest commit of the master branch", while the "last released version" is typically "stable".

This is the case for the docs of the scikit-learn package which this repository builds upon, and for most other machine-learning-related libraries such as gradient boosting frameworks and similar.

Could we make the switch here so that the freshest release would be "stable" and have "latest" as something built periodically from the main branch?

@intelkevinputnam
Copy link
Author

@intelkevinputnam I'm seeing several misrendered things in the docs from your branch.

For example: image

This is how it looks for me in a local build: image

Since the build script is dependent on the latest released version of the scikit-learn-intelex package, the changes must not have propagated to it as of the first build. The changes you mention appear to have propagated to the released package now, so the you'll see what you expect if you check again.

@@ -25,6 +25,8 @@

# -- Path setup --------------------------------------------------------------

import json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this import is not used.

@david-cortes-intel
Copy link
Contributor

Verified that we can indeed push changes to this PR (CC @yuejiaointel )

Copy link
Contributor

@icfaust icfaust left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is my initial review, we probably need someone to review the js, since I have no idea on that.

- name: install dependencies
run: |
sudo apt update
sudo apt install -y pandoc git
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sudo apt install -y pandoc git
sudo apt install -y pandoc

How come git is necessary?

Comment on lines +20 to +22
cd doc
source set_version.sh
echo Generating version $DOC_VERSION
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cd doc
source set_version.sh
echo Generating version $DOC_VERSION
DOC_VERSION=$(python -m pip freeze | grep -oP "(?<=scikit-learn-intelex==)\d*\.\d*")
echo Generating version $DOC_VERSION
echo "DOC_VERSION=${DOC_VERSION}" >> "$GITHUB_ENV"

Add it to the github environment to save some work/lines going forward.

Comment on lines +31 to +33
cd doc
source set_version.sh
cd ..
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cd doc
source set_version.sh
cd ..

With the previous suggestion these 3 lines are unnecessary as DOC_VERSION will be propagated in the environment between steps.

Comment on lines +25 to +28
cd doc
chmod +x build-doc.sh
./build-doc.sh
mv _build/scikit-learn-intelex ../../output
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cd doc
chmod +x build-doc.sh
./build-doc.sh
mv _build/scikit-learn-intelex ../../output
doc/build-doc.sh
mv doc/_build/scikit-learn-intelex ../output

Sorry since I am not so knowledgeable on the docs side. If build-doc.sh doesn't have the right permissions, we can set those in the repo to what is necessary (does it hurt to store build-doc.sh as 755 instead)?

mv -f ../output/$DOC_VERSION .
mv ../output/versions.json .
mv ../output/index.html .
rm -rf doc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this rm wipe the git checkout of the build-doc.sh?

sudo apt update
sudo apt install -y pandoc git
pip3 install -r requirements-doc.txt
pip3 install scikit-learn-intelex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had forgotten to comment here but: we'd probably want to change this logic towards either building from source, or downloading artifacts from github - otherwise this wouldn't work out when pushing to release branches (would build docs for the wrong version), nor when trying to build docs for unreleased versions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally agree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants