diff --git a/docs/0000-dependencies-management-jupyter-notebooks.md b/docs/0000-dependencies-management-jupyter-notebooks.md index bbf0d8a3..83e9e6b5 100644 --- a/docs/0000-dependencies-management-jupyter-notebooks.md +++ b/docs/0000-dependencies-management-jupyter-notebooks.md @@ -7,7 +7,8 @@ How to guarantee reproducibility of Jupyter Notebooks? In order to allow any user to re run the notebook with similar behaviour, it's important that each notebook is shipped with dependencies requirements that include direct and transitive dependencies. This would also enforce and support security, reproducibility, traecability. -Each notebook should be treated as single component/service that use its own dependencies, therefore when storing notebooks, they should be stored each in a specific repo. +Notebooks should be treated as component/service that use their own dependencies, therefore when storing notebooks, +they should be stored with dependencies so that an image can be built to run them or they can be shared and reused by others. ## Decision Drivers @@ -18,19 +19,17 @@ Each notebook should be treated as single component/service that use its own dep ## Considered Options * 1. Jupyter notebook without dependencies (no reproducibility) -* 2. Jupyter notebook with dependencies embedded in json file of the notebook (conflict with local requirements (Pipfile/Pipfile.lock)) -* 3. Jupyter notebook without dependencies embedded in json file but with Pipfile/Pipfile.lock always present (Jupyter notebook and requirements are decoupled) -* 4. Jupyter notebook with sha256 embedded in json file that matches Pipfile/Pipfile.lock sha256 always present (Jupyter notebook and requirements are coupled) +* 2. Jupyter notebook without dependencies embedded in json file but with Pipfile/Pipfile.lock always present (Jupyter notebook and requirements are decoupled) +* 3. Jupyter notebook with dependencies embedded in json file of the notebook and Pipfile/Pipfile.lock present ## Decision Outcome -The option select is 4. because: +The option selected is 3. because: -* avoid conflicts in dependencies and enforce security also * enforce reproducibility * enforce traceability between notebook and requirements ### Positive Consequences * Satisfy reproducibility, traecability, shareability. -* Each notebook need to be treated as single service/task with its own dependencies. +* Notebooks are coupled with dependencies in their metadata.