Skip to content
This repository has been archived by the owner on Jun 7, 2023. It is now read-only.

Add tutorial steps for manage dependencies #3

Merged
merged 29 commits into from
Sep 15, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
6d473e9
Add pre-requisite
Sep 7, 2021
5ec6b49
Add push changes
Sep 7, 2021
4e0fa18
Add Thoth and AICOE steps
Sep 7, 2021
cef4436
Setup initial environment docs
Sep 7, 2021
0153ab5
Update README for tutorial
Sep 7, 2021
ced8235
ADD notebook sections for dependency management
Sep 7, 2021
8b1c5ca
Adjust context
Sep 8, 2021
760c9ef
Add what you will learn section
Sep 8, 2021
64115b6
update pre-commit
Sep 8, 2021
f8c1ac4
Add %horus clean section for bring your own notebook
Sep 8, 2021
766c636
Separate bring your own notebook section with 2 use cases: no depende…
Sep 9, 2021
6afe91e
Add more images for explaining steps
Sep 9, 2021
34fec4b
Add file for version
Sep 9, 2021
4716c50
Improve section on setting aicoe-ci
Sep 9, 2021
787fdf8
Add share your work section and improve other parts
Sep 9, 2021
2f3a915
Add introduction on where to run this tutorial
Sep 10, 2021
967e022
Add project template link
Sep 10, 2021
e3a20ec
Add missing configs
Sep 13, 2021
598a4ec
Improve and correct README.md
Sep 14, 2021
880a291
Improve and correct README.md
Sep 14, 2021
838e1c2
Improve and correct README.md
Sep 14, 2021
9e8e852
Improve and correct README.md
Sep 14, 2021
19edbbf
Improve and correct README.md
Sep 14, 2021
bb7ce09
Adjust from reviews
Sep 14, 2021
ac5db4d
Update docs/share-your-work.md
Sep 14, 2021
86670e2
Update docs/share-your-work.md
Sep 14, 2021
8e5118c
Update docs/share-your-work.md
Sep 14, 2021
3fdef81
Update docs/start-notebook-and-manage-dependencies.md
Sep 14, 2021
6dda0c4
Add note on which approach to use with jl-req
Sep 14, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 8 additions & 13 deletions .aicoe-ci.yaml
Original file line number Diff line number Diff line change
@@ -1,14 +1,9 @@
# Setup and configuring aicoe-ci with configuration file `.aicoe-ci.yaml`
# Example `.aicoe-ci.yaml` with a full list of config options is available here: https://github.com/AICoE/aicoe-ci/blob/master/docs/.aicoe-ci.yaml
check:
# Uncomment following line to build a public image of this repo
# - thoth-build
# Uncomment following lines to build a public image of this repo
# build:
# build-stratergy: Source
# build-source-script: "image:///opt/app-root/builder"
# base-image: quay.io/thoth-station/s2i-custom-notebook:latest
# registry: quay.io
# registry-org: aicoe
# registry-project: <CHANGE-ME>
# registry-secret: aicoe-pusher-secret
- thoth-build
build:
base-image: "quay.io/thoth-station/s2i-thoth-ubi8-py38:v0.28.0"
build-stratergy: Source
registry: quay.io
registry-org: thoth-station
registry-project: manage-dependencies-tutorial
registry-secret: thoth-station-thoth-pusher-secret
16 changes: 2 additions & 14 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ repos:
- id: check-merge-conflict
- id: end-of-file-fixer
- id: name-tests-test
- id: check-added-large-files
- id: check-byte-order-marker
- id: check-case-conflict
- id: check-docstring-first
Expand All @@ -34,29 +35,16 @@ repos:
- id: end-of-file-fixer
- id: trailing-whitespace

- repo: https://github.com/pre-commit/mirrors-mypy
rev: v0.902
hooks:
- id: mypy
exclude: '^(docs|tasks|tests)|setup\.py'
args: [--ignore-missing-imports]

- repo: https://github.com/psf/black
rev: 21.6b0
hooks:
- id: black

- repo: https://github.com/tomcatling/black-nb
rev: '0.5.0'
rev: "0.5.0"
hooks:
- id: black-nb

# Enable this in repositories with python packages.
# - repo: https://github.com/mgedmin/check-manifest
# rev: '0.39'
# hooks:
# - id: check-manifest

- repo: https://github.com/s-weigand/flake8-nb
rev: v0.3.0
hooks:
Expand Down
20 changes: 20 additions & 0 deletions .prow.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,23 @@ presubmits:
limits:
memory: "1Gi"
cpu: "500m"

- name: thoth-mypy-py38
decorate: true
skip_report: false
always_run: true
context: aicoe-ci/prow/mypy
spec:
containers:
- image: quay.io/thoth-station/thoth-pytest-ubi8-py38:v0.12.5
command:
- "/usr/local/bin/mypy"
- "."
- "--ignore-missing-imports"
resources:
requests:
memory: "1Gi"
cpu: "300m"
limits:
memory: "2Gi"
cpu: "500m"
11 changes: 11 additions & 0 deletions Pipfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
[[source]]
name = "pypi"
url = "https://pypi.org/simple"
verify_ssl = true

[dev-packages]

[packages]

[requires]
python_version = "3.8"
20 changes: 20 additions & 0 deletions Pipfile.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

140 changes: 79 additions & 61 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,61 +1,79 @@
project-template
==============================

template for the team to use

Project Organization
------------

├── LICENSE
├── Makefile <- Makefile with commands like `make data` or `make train`
├── Pipfile <- Pipfile stating package configuration as used by Pipenv.
├── Pipfile.lock <- Pipfile.lock stating a pinned down software stack with as used by Pipenv.
├── README.md <- The top-level README for developers using this project.
├── data
│   ├── external <- Data from third party sources.
│   ├── interim <- Intermediate data that has been transformed.
│   ├── processed <- The final, canonical data sets for modeling.
│   └── raw <- The original, immutable data dump.
├── docs <- A default Sphinx project; see sphinx-doc.org for details
├── models <- Trained and serialized models, model predictions, or model summaries
├── notebooks <- Jupyter notebooks. Naming convention is a number (for ordering),
│ the creator's initials, and a short `-` delimited description, e.g.
│ `1.0-jqp-initial-data-exploration`.
├── references <- Data dictionaries, manuals, and all other explanatory materials.
├── reports <- Generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures <- Generated graphics and figures to be used in reporting
├── requirements.txt <- The requirements file stating direct dependencies if a library
│ is developed.
├── setup.py <- makes project pip installable (pip install -e .) so src can be imported
├── src <- Source code for use in this project.
│   ├── __init__.py <- Makes src a Python module
│ │
│   ├── data <- Scripts to download or generate data
│   │   └── make_dataset.py
│ │
│   ├── features <- Scripts to turn raw data into features for modeling
│   │   └── build_features.py
│ │
│   ├── models <- Scripts to train models and then use trained models to make
│ │ │ predictions
│   │   ├── predict_model.py
│   │   └── train_model.py
│ │
│   └── visualization <- Scripts to create exploratory and results oriented visualizations
│   └── visualize.py
├── .thoth.yaml <- Thoth's configuration file
├── .aicoe-ci.yaml <- AICoE CI configuration file (https://github.com/AICoE/aicoe-ci)
└── tox.ini <- tox file with settings for running tox; see tox.readthedocs.io


--------

<p><small>Project based on the <a target="_blank" href="https://drivendata.github.io/cookiecutter-data-science/">cookiecutter data science project template</a>. #cookiecutterdatascience</small></p>
# Thoth Tutorial - manage your dependencies in Jupyter notebooks.

This tutorial is used to show how to manage dependencies for Jupyter Notebooks using Python to allow reproducibility and shareability.

Even though many developers (including data scientists) focus on their core problems when working on their experiments, there is one aspect that can make these projects not reusable.
One of the first steps during the development of a project is the `selection of libraries or dependencies`. When someone runs `pip install <package-name>`, they might not be aware that along with the library that is going to be installed, a direct dependency, many other dependencies will be installed on your machine, so called transitive dependencies. Any change in one of those dependencies can break your experiment. It's fundamental to have a way to state all the dependencies used, including the operating system, python interpreter and hardware that was used to run a certain experiment.

pacospace marked this conversation as resolved.
Show resolved Hide resolved
Dependency management is one of the most important requirements for reproducibility. Having dependencies clearly stated allows portability of notebooks, so they can be shared safely with others, reused in other projects or simply reproduced. If you want to know more about this issue in the data science domain, have a look at this [article](https://developers.redhat.com/blog/2021/03/19/managing-python-dependencies-with-the-thoth-jupyterlab-extension/) or this [video](https://www.youtube.com/watch?v=ifyQ2oSxjnU).

[Project Thoth][1] keeps dependencies up to date by giving recommendations through developer's daily tools. Thanks to this service, developers (including data scientists) do not have to worry about managing the dependencies after they are selected, since conflicts can be handled by Thoth bots and automated pipelines. Having this AI support can benefit AI projects, offering improvements such as performance improvements due to optimized dependencies and additional security since insecure libraries cannot be introduced. If you want to know more, have a look at [Thoth's website](https://thoth-station.ninja/docs/developers/adviser/integration.html).

Within the different Thoth integations, in this tutorial we are going to focus on the JupyterLab extension for dependency management, which is called [jupyterlab-requirements][2].

You can use this extension for each of your notebooks to guarantee they have the correct dependencies. This extension is able to add/remove dependencies, lock them and store them in the notebook metadata. In this way, all the dependencies information required to repeat the environment are shipped with the notebook.

In particular, the following notebook metadata is created for you, when you use Thoth's dependency management tool:

- `requirements` (Pipfile);

- `requirements locked` with all versions and hashes of libraries (direct and transitive ones) (Pipfile.lock);

- `dependency resolution engine` used (Thoth or Pipenv);

- `configuration file containing runtime environment` (only for Thoth resolution engine).

All this information can allow reproducibility and shareability of the notebook.


## What you will learn with this tutorial?

At the end of this tutorial you will be able to manage dependencies for your projects in Jupyter Notebooks, enabling others to reproduce what you did and allowing them to contribute to it. The last section will teach also how to enable [Kebechet Bot][5] to keep dependencies automatically up to date for you and how you can setup and use automatic pipelines from [AICoE CI][6] to create release and images of your projects that you can easily share with others.


## Where you will run this tutorial?

[Operate First][1] is an open infrastructure environment started at Red Hat's Office of the CTO. It has been selected to run this tutorial since it is an open source initiative that fulfills all the requirements stated above. Anyone with a Google account can log in and start developing. To learn more about Operate First, visit the [website](https://www.operate-first.cloud/) or [GitHub community](https://github.com/operate-first).

[Operate First][1] hosts [Open Data Hub][3] with all the tools provided for Data Science projects (e.g. JupyterHub, Elyra, Kubeflow Pipelines, Seldon, Prometheus, Grafana, Superset) running on [Red Hat Openshift][4].


## Why does the tutorial repository have this structure?

The project template used can be found here: [project template][7]. It shows correlation between a data scientist needs (e.g. data, notebooks, models) and that of an AI DevOps engineer (e.g. manifests). Having structure in a project ensures all the pieces required for the ML and DevOps lifecycles are present and easily discoverable.


## Tutorial pre-requisites

0. [Pre-requisites](./docs/pre-requisite.md)

## Tutorial Steps

1. [Setup your initial environment](./docs/setup-initial-environment.md)

2. [Manage dependencies for your notebook](./docs/start-notebook-and-manage-dependencies.md)

3. [Push changes to GitHub](./docs/push-changes.md)

4. [Setup bots and pipelines to create releases, build images and enable automatic dependency management](./docs/thoth-aicoe-services.md)

5. [Share your work](./docs/share-your-work.md)


## References

* [Project Thoth][1]
* [jupyterlab-requirements][2]
* [Open Data Hub][3]
* [Red Hat Openshift][4]
* [Kebechet Bot][5]
* [AICoE CI][6]
* [project template][7]

[1]: https://thoth-station.ninja/
[2]: https://github.com/thoth-station/jupyterlab-requirements
[3]: https://opendatahub.io/
[4]: https://www.openshift.com/
[5]: https://github.com/marketplace/khebhut
[6]: https://github.com/AICoE/aicoe-ci
[7]: https://github.com/aicoe-aiops/project-template
Binary file added docs/images/JupyterHubNewUI.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/JupyterLabCloneYourRepo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/JupyterLabGitBoxPanel.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/JupyterLabGitCommitChanges.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/JupyterLabGitExtension.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/JupyterLabGitPush.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/JupyterLabGitStageFiles.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/JupyterLabHorusAdd.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/JupyterLabHorusCheckInitial.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/JupyterLabHorusCheckInitialPip.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/JupyterLabHorusClean.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/JupyterLabHorusShowAfterAdd.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/JupyterLabOpenTerminal.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/JupyterLabRequirementsExtension.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/JupyterLabStartExistingNotebook.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/JupyterLabStartNewNotebook.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/JupyterLabTerminalFirstCommit.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/JupyterLabUseTerminal.png
Binary file added docs/images/KhebutAutomaticUpdate.png
Binary file added docs/images/KhebutOpenIssueRelease.png
Binary file added docs/images/KhebutPullRequestRelease.png
Binary file added docs/images/QuayCreateNewRepository.png
Binary file added docs/images/QuayImageRegistry.png
Binary file added docs/images/QuayRepositorySettings.png
Binary file added docs/images/QuaySetPublicRepository.png
Binary file added docs/images/QuaySetRobotAccountRepository.png
Binary file added docs/images/SeshetaEnableForkIssues.png
Binary file added docs/images/SeshetaInvite.png
Binary file added docs/images/TakeLinkForkedRepo.png
37 changes: 37 additions & 0 deletions docs/pre-requisite.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Pre-requisites

In this section, the user can find the requirements needed for the tutorial:

- GitHub account
- GitHub token
- JupyterLab environment with [jupyterlab-requirements][1] library
- [Open Data Hub][3]
- [Red Hat Openshift][5]


## GitHub account

The project is based on GitHub, if you don't have one just following this [link](https://docs.github.com/en/github/getting-started-with-github/signing-up-for-a-new-github-account).

## GitHub token

If you don't have a GitHub token, you can create one following GitHub docs: [create GitHub token](https://docs.github.com/en/github/authenticating-to-github/creating-a-personal-access-token).

## Operate first environment

In [Operate First][2] you can find all components of [Open Data Hub][3], including [JupyterHub][4] to spawn images running on [Red Hat Openshift][5]. For this tutorial you can select the image called [Experimental Elyra Notebook Image](https://github.com/operate-first/apps/blob/master/kfdefs/base/jupyterhub/notebook-images/experimental-elyra-notebook-imagestream.yaml), which has the library for dependency management already installed.


## References

* [jupyterlab-requirements][1]
* [Operate First][2]
* [Open Data Hub][3]
* [JupyterHub][4]
* [Red Hat Openshift][5]

[1]: https://github.com/thoth-station/jupyterlab-requirements
[2]: https://www.operate-first.cloud/
[3]: https://opendatahub.io/
[4]: https://jupyter.org/hub
[5]: https://www.openshift.com/
87 changes: 87 additions & 0 deletions docs/push-changes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# Push your changes on your GitHub repo

This section will show how you can push your changes to your GitHub repo directly from the [Jupyterlab Git extension][1].
The only thing you need is a GitHub token. You can create one following GitHub docs: [create GitHub token](https://docs.github.com/en/github/authenticating-to-github/creating-a-personal-access-token).

## Push your changes using JupyterLab Git extension

If you are running this tutorial on Operate First your work-in-progress notebooks can be saved in your JupyterHub PVC by hitting the save button on the top panel.
Nevertheless, it is a good practice to push your changes to the GitHub repo when you finish working on your project, so that all your work can be saved.

In order to do that from within JupyterLab using the [Jupyterlab Git extension][1]:

1. Go to Git Box panel on the left to check what files have been changed.

<div style="text-align:center">
<img alt="Go to Git Box Panel" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/JupyterLabGitBoxPanel.png">
</div>

2. Stage the files you want to push to your GitHub repo.

<div style="text-align:center">
<img alt="Stage the files" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/JupyterLabGitStageFiles.png">
</div>

3. Add Summary of your changes (a.k.a commit message) and select Commit.

<div style="text-align:center">
<img alt="Commit Changes" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/JupyterLabGitCommitChanges.png">
</div>

NOTE: _If you are doing this for the first time, git requires user email and user name to be set.(The extension will open a Dialog Form to insert them)_

<div style="text-align:center">
<img alt="Insert User Name and Email" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/JupyterLabGitInsertUserNameEmail.png">
</div>


4. Select Push Changes.

<div style="text-align:center">
<img alt="Use Button to Push Changes" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/JupyterLabGitUseButtonToPushChanges.png">
</div>

5. Insert your Github account name and your GitHub token to push to the GitHub repo you cloned.

<div style="text-align:center">
<img alt="Push Changes with GitHub token" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/JupyterLabGitPush.png">
</div>

## Push your changes using the terminal in JupyterLab

If you want to clone a repo and push changes through the Terminal, you can use the following steps.

1. Open terminal from the icon in the Launcher.

<div style="text-align:center">
<img alt="Open Terminal" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/JupyterLabOpenTerminal.png">
</div>

2. Start using the git commands:

<div style="text-align:center">
<img alt="Use Git Commands" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/JupyterLabUseTerminal.png">
</div>

- `git clone <repo>` to clone a new repo.

- `git add <file>` after you modify files to put them in stage.

- `git commit -m "<commit message>"` to commit the changes in stage. _NOTE: If you are doing this for the first time, git requires user email and user name to be set._

<div style="text-align:center">
<img alt="First commit" src="https://raw.githubusercontent.com/AICoE/manage-dependencies-tutorial/master/docs/images/JupyterLabTerminalFirstCommit.png">
</div>

- `git push origin <branch>` to push your changes.


## Next Step

[Set bots and pipelines to enable automatic dependency management and automatic build after release](./thoth-aicoe-services.md)

## References

* [Jupyterlab Git extension][1]

[1]: https://github.com/jupyterlab/jupyterlab-git
Loading