Skip to content
This repository has been archived by the owner on Jun 7, 2023. It is now read-only.

Commit

Permalink
ADD notebook sections for dependency management
Browse files Browse the repository at this point in the history
Signed-off-by: Francesco Murdaca <[email protected]>
  • Loading branch information
Francesco Murdaca committed Sep 7, 2021
1 parent 0153ab5 commit ced8235
Show file tree
Hide file tree
Showing 33 changed files with 218 additions and 17 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Thoth Tutorial - manage your dependencies in Jupyter notebooks.

This tutorial is used to show how to manage dependencies for Jupyter Notebooks to allow reproducibility and shareability. In this way you learn how to move from local machine to cloud, running on [Operate First][1], and how you can enable contributions on your project from other team members.
This tutorial is used to show how to manage dependencies for Jupyter Notebooks to allow reproducibility and shareability. You will learn how to move from local to cloud development using [Operate First][1] environment and how you can enable contributions on your project.

Once the tutorial is completed, you will be able to run your work on [Project Meteor][2].

Expand Down
Binary file removed docs/images/GotoGitBoxPanel.png
Binary file not shown.
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
Binary file added docs/images/JupyterLabHorusAdd.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/JupyterLabHorusCheckAfterDiscover.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/JupyterLabHorusCheckInitial.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/JupyterLabHorusShowAfterAdd.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
Binary file added docs/images/JupyterLabRequirementsExtension.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/JupyterLabRequirementsExtensionMC.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/JupyterLabStartExistingNotebook.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/JupyterLabStartNewNotebook.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
Binary file modified docs/images/TakeLinkForkedRepo.png
18 changes: 9 additions & 9 deletions docs/push-changes.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,38 +12,38 @@ In order to do that from within JupyterHub using the [Jupyterlab Git extension](
1. Go to Git Box panel on the left to check what files have been changed.

<div style="text-align:center">
<img alt="Go to Git Box Panel" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/GotoGitBoxPanel.png">
<img alt="Go to Git Box Panel" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/JupyterLabGitBoxPanel.png">
</div>

2. Stage the files you want to push to your GitHub repo.

<div style="text-align:center">
<img alt="Stage the files" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/StageFiles.png">
<img alt="Stage the files" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/JupyterLabGitStageFiles.png">
</div>

3. Add Summary of your changes (a.k.a commit message) and select Commit.

<div style="text-align:center">
<img alt="Commit Changes" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/CommitChanges.png">
<img alt="Commit Changes" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/JupyterLabGitCommitChanges.png">
</div>

NOTE: _If you are doing this for the first time, git requires user email and user name to be set.(The extension will open a Dialog Form to insert them)_

<div style="text-align:center">
<img alt="Insert User Name and Email" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/InsertUserNameEmail.png">
<img alt="Insert User Name and Email" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/JupyterLabGitInsertUserNameEmail.png">
</div>


4. Select Push Changes.

<div style="text-align:center">
<img alt="Use Button to Push Changes" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/UseButtonToPushChanges.png">
<img alt="Use Button to Push Changes" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/JupyterLabGitUseButtonToPushChanges.png">
</div>

5. Insert your Github account name and your GitHub token to push to the GitHub repo you cloned.

<div style="text-align:center">
<img alt="Push Changes with GitHub token" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/PushGitHubToken.png">
<img alt="Push Changes with GitHub token" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/JupyterLabGitPush.png">
</div>

## Push your changes using the terminal in JupyterLab
Expand All @@ -53,13 +53,13 @@ If you want to clone a repo and push changes through the Terminal, you can use t
1. Open terminal from the icon in the Launcher.

<div style="text-align:center">
<img alt="Open Terminal" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/OpenTerminal.png">
<img alt="Open Terminal" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/JupyterLabOpenTerminal.png">
</div>

2. Start using the git commands:

<div style="text-align:center">
<img alt="Use Git Commands" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/UseTerminal.png">
<img alt="Use Git Commands" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/JupyterLabUseTerminal.png">
</div>

- `git clone <repo>` to clone a new repo.
Expand All @@ -69,7 +69,7 @@ If you want to clone a repo and push changes through the Terminal, you can use t
- `git commit -m "<commit message>"` to commit the changes in stage. _NOTE: If you are doing this for the first time, git requires user email and user name to be set._

<div style="text-align:center">
<img alt="First commit" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/FirsCommit.png">
<img alt="First commit" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/JupyterLabTerminalFirstCommit.png">
</div>

- `git push origin <branch>` to push your changes.
Expand Down
4 changes: 2 additions & 2 deletions docs/setup-initial-environment.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ Once your image is ready and you are in the Jupyterlab UI, you can use the Git e
1. Click the Git extension button from Jupyterlab UI:

<div style="text-align:center">
<img alt="Look for Git extension button" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/ElyraGitExtension.png">
<img alt="Look for Git extension button" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/JupyterLabGitExtension.png">
</div>

2. Take HTTPS link of the GitHub repo you want to clone, for this tutorial use your forked one from this repo:
Expand All @@ -40,7 +40,7 @@ Once your image is ready and you are in the Jupyterlab UI, you can use the Git e
3. Insert the link taken from your forked repo in the JupyterLab Git Extension: e.g. `https://github.com/AICoE/manage-dependencies-tutorial.git`

<div style="text-align:center">
<img alt="Clone your repo" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/CloneYourRepo.png">
<img alt="Clone your repo" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/JupyterLabCloneYourRepo.png">
</div>


Expand Down
155 changes: 155 additions & 0 deletions docs/start-notebook-and-manage-dependencies.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
# Reproducibility of Jupyter Notebooks

Reproducibility and shareability of notebooks is very important if you want to allow others to repeat your experiments and avoid issues due to dependencies management.
When using `pip install <package_name>` is not possible to verify which software stack was used to run the notebook and therefore another user cannot repeat the same experiment.
Check the video [here](https://www.youtube.com/watch?v=ifyQ2oSxjnU) if you want to know more.

In order to avoid this issues, dependencies for Jupyter notebooks in this tutorial are managed using the JupyterLab extension [jupyterlab-requirements][1].

You can use this extension for each of your notebook to guarantee they have the correct dependencies. This extension is able to add/remove dependencies, lock them and store them in the notebook metadata. In this way all the dependencies information required to repeat the environment are shipped with the notebook.

In particular, in the notebook metadata you will find:

- `requirements (Pipfile)`;

- `requirements lock with all versions and hashes (Pipfile.lock)`;

- `dependency resolution engine` used (Thoth or Pipenv);

- `.thoth.yaml configuration file` (only for Thoth resolution engine).

All this information can allow reproducibility and shareability of the notebook.


## Manage dependencies with the jupyterlab-requirements extension

There are 3 ways to interact with [jupyterlab-requirements][1] JupyterLab extension:

- using `%horus` magic commands directly in your notebook's cells (preferred approach). To learn more about how to use the `%horus` magic commands check out the guide [here](https://github.com/thoth-station/jupyterlab-requirements#horus-magic-command) or the video [here](https://www.youtube.com/watch?v=FjVxNTXO70I)

<div style="text-align:center">
<img alt="JupyterLab Requirements Horus magic commands" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/JupyterLabRequirementsExtensionMC.png">
</div>

- using the `horus` CLI directly from terminal or integrated in pipelines ([check video](https://www.youtube.com/watch?v=fW0YKugL26g&t)).

<div style="text-align:center">
<img alt="JupyterLab Requirements Horus CLI" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/JupyterLabRequirementsExtensionCLI.png">
</div>

- using the `Manage Dependencies` button that appears in the notebook when it is opened:

<div style="text-align:center">
<img alt="JupyterLab Requirements UI" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/JupyterLabRequirementsExtension.jpg">
</div>


### Start working on a new notebook

1. Let's start a new notebook from JupyterLab console (or using the JupyterLab menu).

<div style="text-align:center">
<img alt="Start new notebook" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/JupyterLabStartNewNotebook.png">
</div>

2. Run cell with `%horus check` to check the status of your notebook:

<div style="text-align:center">
<img alt="Horus check initial command" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/JupyterLabHorusCheckInitial.png">
</div>

As you can see, initially there are errors reported because no Pipfile or Pipfile.lock exist for this notebook.

3. Run cell with `%horus requirements --add <package-name>` to add a new package.

For example, let's add a common package used in ML projects: `%horus requirements --add boto3`

<div style="text-align:center">
<img alt="Horus requirements add" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/JupyterLabHorusAdd.png">
</div>

4. You can check the dependencies content of your notebook by running `%horus show`:

<div style="text-align:center">
<img alt="Horus show command after add" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/JupyterLabHorusShowAfterAdd.png">
</div>

5. Run `%horus lock` to lock dependencies using Thoth resolution engine.

If you are interested in a specific recommendation from Thoth, add `--recommendation-type <recommendation-type>`, default is `latest` ("latest", "stable", "performance", "security").

By default, Thoth will discover the runtime environment you are running on. If you want to receive a recommendation for a specific runtime environment, you can use the following flags:

- `--os-name`
- `--os-version`
- `--python-version`

6. Run cell with `%horus check` to check the status of your notebook or `%horus show` to show the content of your notebook.

If you want to show a specific part of your dependencies information stored in the notebook metadata, you can use the following flags:

- `--pipfile`
- `--pipfile-lock`
- `--thoth-config` (only if Thoth resolution engine was used)


### Bring your notebook and make it reproducible

1. Let's open the notebook `my-notebook` provided in `notebooks` folder.

<div style="text-align:center">
<img alt="Start my notebook" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/JupyterLabStartExistingNotebook.png">
</div>

This is a notebook I was working on and I want to make it reproducible because I did not state any dependencies.

NOTE: _If you have a notebook with `!pip install <package-name>` cells, they will be removed and converted to commands that allow reproducibility once you start the notebook._

2. Run `%horus discover`, so that Thoth can discover the packages that you are using in your dependencies

<div style="text-align:center">
<img alt="Horus discover command" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/JupyterLabHorusDiscover.png">
</div>

As you can notice the `%horus discover` command is able to read the content of your notebook and identify packages that are actually used in the notebook. `numpy` for example is imported but never used, therefore is not added to the requirements. The library that is able to identify libraries is called `invectio`, have a look [here](https://github.com/thoth-station/invectio) if you want to know more.

NOTE: _If you want to edit some dependencies, you can simply add them again with your specific requirements (`%horus requirement --add`)._

3. Run cell with `%horus check` to check the status of your notebook.

<div style="text-align:center">
<img alt="Horus check after discover" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/JupyterLabHorusCheckAfterDiscover.png">
</div>

4. Run `%horus lock` to lock dependencies using Thoth resolution engine.

<div style="text-align:center">
<img alt="Horus lock command" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/JupyterLabHorusLock.png">
</div>

If you are interested in a specific recommendation from Thoth, add `--recommendation-type <recommendation-type>`, default is `latest` ("latest", "stable", "performance", "security").

By default, Thoth will discover the runtime environment you are running on. If you want to receive a recommendation for a specific runtime environment, you can use the following flags:

- `--os-name`
- `--os-version`
- `--python-version`

5. Run cell with `%horus check` to check the status of your notebook or `%horus show` to show the content of your notebook.

If you want to show a specific part of your dependencies information stored in the notebook metadata, you can use the following flags:

- `--pipfile`
- `--pipfile-lock`
- `--thoth-config` (only if Thoth resolution engine was used)


## Next Step

[Push changes to GitHub](./push-changes.md)

## References

* [jupyterlab-requirements][1]

[1]: https://github.com/thoth-station/jupyterlab-requirements
10 changes: 5 additions & 5 deletions docs/thoth-aicoe-services.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,15 @@ Start by installing the Kebechet GitHub application, called Khebut by [following
Once the application is installed, you will need to add Thoth's bot (Sesheta) as collaborator. Navigate to the repo where you intalled Khebut. Under the repository's **Settings**, go to **Manage Access** and click on "Invite a collaborator" and add Thoth Bot Sesheta. Sesheta is a friendly Thoth bot who is used to help automate tasks. [Follow this link](https://github.com/AICoE/aicoe-ci/issues/new?assignees=goern%2Charshad16&labels=area%2Fcyborgs%2Cbot%2Csig%2Fcyborgs&template=request_sesheta.yaml&title=Help+with+Sesheta+invite) and fill out the form to have Sesheta accept your invitation. Please note: there is sometimes a delay in Sesheta's invite acceptance.

<div style="text-align:center">
<img alt="Invite Sesheta" src="https://raw.githubusercontent.com/aicoe/elyra-aidevsecops-tutorial/master/docs/images/InviteSesheta.png">
<img alt="Invite Sesheta" src="https://raw.githubusercontent.com/aicoe/elyra-aidevsecops-tutorial/master/docs/images/SeshetaInvite.png">
</div>

### 2. Enable issues on your fork

Sesheta, the bot that will assist you in this tutorial, communicates through issues. On your fork, the issues tab may not be enabled automatically. In order to enable issues, go to the **Settings** tab and check the box next to "Issues".

<div style="text-align:center">
<img alt="Enable Fork Issues" src="https://raw.githubusercontent.com/aicoe/elyra-aidevsecops-tutorial/master/docs/images/EnableForkIssues.png">
<img alt="Enable Fork Issues" src="https://raw.githubusercontent.com/aicoe/elyra-aidevsecops-tutorial/master/docs/images/SeshetaEnableForkIssues.png">
</div>

### 3. Add/Edit `.thoth.yaml`
Expand Down Expand Up @@ -118,11 +118,11 @@ Once you modify the `.aicoe.yaml` push the changes to your repo. Check [push cha
Some of the pipelines used in the Thoth project are maintained by bots. Therefore you can simply open an issue asking for a release (e.g patch, minor, major) and the bots will handle your request. Once the request is completed, the bot will also automatically close the issue, as you can see from the images below:

<div style="text-align:center">
<img alt="Open Issue Release" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/OpenIssueRelease.png">
<img alt="Open Issue Release" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/KhebutOpenIssueRelease.png">
</div>

<div style="text-align:center">
<img alt="Pull Request Release" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/PullRequestRelease.png">
<img alt="Pull Request Release" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/KhebutPullRequestRelease.png">
</div>

Fun fact, the `CHANGELOG` in the release is also created using an AI model that clusters pull requests. You can find more information about this model in our [glyph][3] project.
Expand All @@ -132,7 +132,7 @@ Once the issue is closed by the bot, a new tag is created in the GitHub repo. Th
Once the image has been created by the Tekton pipelines, you can find it in your registry (e.g. Quay):

<div style="text-align:center">
<img alt="Image on Registry" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/ImageRegistry.png">
<img alt="Image on Registry" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/QuayImageRegistry.png">
</div>

## Dependencies updates in the repo
Expand Down
46 changes: 46 additions & 0 deletions notebooks/my-notebook.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "a062c8a0-eb89-41f0-ab79-c91654996603",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d8a51407-903e-4408-bfcf-b9a7a488c4f9",
"metadata": {},
"outputs": [],
"source": [
"df = pd.DataFrame()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

0 comments on commit ced8235

Please sign in to comment.