Skip to content
This repository has been archived by the owner on Jun 7, 2023. It is now read-only.

Latest commit

 

History

History
65 lines (36 loc) · 3.93 KB

manage-dependencies-notebook.md

File metadata and controls

65 lines (36 loc) · 3.93 KB

Reproducibility of Jupyter Notebooks

Reproducibility and shareability of notebooks is very important if you want to allow others to repeat your experiments and avoid issues due to dependencies management. When using pip install <package_name> is not possible to verify which software stack was used to run the notebook and therefore another user cannot repeat the same experiment. Dependency management is one of the most important requirements for reproducibility. Having dependencies clearly stated allows portability of notebooks, so they can be shared safely with others, reused in other projects or simply reproduced. If you want to know more about this issue in the data science domain, have a look at this article or this video.

In order to help developers (including data scientists), dependencies for Jupyter notebooks in this tutorial are managed using the JupyterLab extension jupyterlab-requirements.

You can use this extension for each of your notebook to guarantee they have the correct dependencies. This extension is able to add/remove dependencies, lock them and store them in the notebook metadata. In this way all the dependencies information required to repeat the environment are shipped with the notebook.

In particular, the following notebook metadata is created for you, when you use Thoth's dependency management tool:

  • requirements (Pipfile);

  • requirements lock with all versions and hashes (Pipfile.lock);

  • dependency resolution engine used (Thoth or Pipenv);

  • .thoth.yaml configuration file (only for Thoth resolution engine).

All this information can allow reproducibility and shareability of the notebook.

Manage dependencies with the jupyterlab-requirements extension

There are 3 ways to interact with jupyterlab-requirements JupyterLab extension:

  • using %horus magic commands directly in your notebook's cells (preferred approach). To learn more about how to use the %horus magic commands check out the guide here or the video here
JupyterLab Requirements Horus magic commands
  • using the horus CLI directly from terminal or integrated in pipelines (check the video or this link if you want to know more about it).
JupyterLab Requirements Horus CLI
  • using the Manage Dependencies button that appears in the notebook when it is opened (check the link if you want to know more about it):
JupyterLab Requirements UI

NOTE:In this tutorial we will focus on %horus magic commands.

Next steps

You can consider the use case you are interested in for managing dependencies:

References