From 2c3820ad2f01b10be41be8d212fc6271b665d9af Mon Sep 17 00:00:00 2001 From: Francesco Murdaca Date: Tue, 3 Nov 2020 12:24:07 +0100 Subject: [PATCH 1/5] Introduce ADR for dependencies management in Jupyter notebooks Signed-off-by: Francesco Murdaca --- ...pendencies-management-jupyter-notebooks.md | 42 +++++++++++ docs/template.md | 74 +++++++++++++++++++ 2 files changed, 116 insertions(+) create mode 100644 docs/0000-dependencies-management-jupyter-notebooks.md create mode 100644 docs/template.md diff --git a/docs/0000-dependencies-management-jupyter-notebooks.md b/docs/0000-dependencies-management-jupyter-notebooks.md new file mode 100644 index 00000000..fc918eef --- /dev/null +++ b/docs/0000-dependencies-management-jupyter-notebooks.md @@ -0,0 +1,42 @@ +# [short title of solved problem and solution] + +* Status: [proposed | rejected | accepted | deprecated | … | superseded by [ADR-0005](0005-example.md)] +* Deciders: [list everyone involved in the decision] +* Date: [YYYY-MM-DD when the decision was last updated] + +Technical Story: [description | ticket/issue URL] + +## Context and Problem Statement + +How to guarantee reproducibility of Jupyter Notebooks? + +In order to allow any user to re run the notebook having similar behaviour, it's important that each notebook is shipped with dependencies requirements +that include direct and transitive dependencies. This would also enforce and support security, reproducibility, traecability. + +Each notebook should be treated as single component/service that use its own dependencies, therefore when storing notebooks, they should be created in a specific repo. + +## Decision Drivers + +* user prospective +* reproducibility +* traecability + +## Considered Options + +* 1. Jupyter notebook without dependencies (no reproducibility) +* 2. Jupyter notebook with dependencies embedded in json file of the notebook (conflict with local requirements (Pipfile/Pipfile.lock)) +* 3. Jupyter notebook without dependencies embedded in json file but with Pipfile/Pipfile.lock always present (Jupyter notebook and requirements are decoupled) +* 4. Jupyter notebook with sha256 embedded in json file that matches Pipfile/Pipfile.lock sha256 always present (Jupyter notebook and requirements are coupled) + +## Decision Outcome + +The option select is 4. because: + +* avoid conflicts in dependencies and enforce security also +* enforce reproducibility +* enforce traceability between notebook and requirements + +### Positive Consequences + +* Satisfy reproducibility, traecability, shareability. +* Each notebook need to be treated as single service/task with its own dependencies. diff --git a/docs/template.md b/docs/template.md new file mode 100644 index 00000000..2121f62e --- /dev/null +++ b/docs/template.md @@ -0,0 +1,74 @@ +# [short title of solved problem and solution] + +* Status: [proposed | rejected | accepted | deprecated | … | superseded by [ADR-0005](0005-example.md)] +* Deciders: [list everyone involved in the decision] +* Date: [YYYY-MM-DD when the decision was last updated] + +Technical Story: [description | ticket/issue URL] + +## Context and Problem Statement + +[Describe the context and problem statement, e.g., in free form using two to three sentences. You may want to articulate the problem in form of a question.] + +## Decision Drivers + +* [driver 1, e.g., a force, facing concern, …] +* [driver 2, e.g., a force, facing concern, …] +* … + +## Considered Options + +* [option 1] +* [option 2] +* [option 3] +* … + +## Decision Outcome + +Chosen option: "[option 1]", because [justification. e.g., only option, which meets k.o. criterion decision driver | which resolves force force | … | comes out best (see below)]. + +### Positive Consequences + +* [e.g., improvement of quality attribute satisfaction, follow-up decisions required, …] +* … + +### Negative Consequences + +* [e.g., compromising quality attribute, follow-up decisions required, …] +* … + +## Pros and Cons of the Options + +### [option 1] + +[example | description | pointer to more information | …] + +* Good, because [argument a] +* Good, because [argument b] +* Bad, because [argument c] +* … + +### [option 2] + +[example | description | pointer to more information | …] + +* Good, because [argument a] +* Good, because [argument b] +* Bad, because [argument c] +* … + +### [option 3] + +[example | description | pointer to more information | …] + +* Good, because [argument a] +* Good, because [argument b] +* Bad, because [argument c] +* … + +## Links + +* [Link type] [Link to ADR] +* … + + \ No newline at end of file From e3f1aab308b2101310c748302abec0d7aa344b8d Mon Sep 17 00:00:00 2001 From: Francesco Murdaca Date: Tue, 3 Nov 2020 12:26:19 +0100 Subject: [PATCH 2/5] Add title Signed-off-by: Francesco Murdaca --- docs/0000-dependencies-management-jupyter-notebooks.md | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/docs/0000-dependencies-management-jupyter-notebooks.md b/docs/0000-dependencies-management-jupyter-notebooks.md index fc918eef..11ea91d3 100644 --- a/docs/0000-dependencies-management-jupyter-notebooks.md +++ b/docs/0000-dependencies-management-jupyter-notebooks.md @@ -1,10 +1,4 @@ -# [short title of solved problem and solution] - -* Status: [proposed | rejected | accepted | deprecated | … | superseded by [ADR-0005](0005-example.md)] -* Deciders: [list everyone involved in the decision] -* Date: [YYYY-MM-DD when the decision was last updated] - -Technical Story: [description | ticket/issue URL] +# Dependencies management in Jupyter Notebooks ## Context and Problem Statement From e17c0e1c73dc36673ffe739e3a253841f85e0308 Mon Sep 17 00:00:00 2001 From: Francesco Murdaca Date: Tue, 3 Nov 2020 12:38:32 +0100 Subject: [PATCH 3/5] correct sentences Signed-off-by: Francesco Murdaca --- docs/0000-dependencies-management-jupyter-notebooks.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/0000-dependencies-management-jupyter-notebooks.md b/docs/0000-dependencies-management-jupyter-notebooks.md index 11ea91d3..bbf0d8a3 100644 --- a/docs/0000-dependencies-management-jupyter-notebooks.md +++ b/docs/0000-dependencies-management-jupyter-notebooks.md @@ -4,10 +4,10 @@ How to guarantee reproducibility of Jupyter Notebooks? -In order to allow any user to re run the notebook having similar behaviour, it's important that each notebook is shipped with dependencies requirements +In order to allow any user to re run the notebook with similar behaviour, it's important that each notebook is shipped with dependencies requirements that include direct and transitive dependencies. This would also enforce and support security, reproducibility, traecability. -Each notebook should be treated as single component/service that use its own dependencies, therefore when storing notebooks, they should be created in a specific repo. +Each notebook should be treated as single component/service that use its own dependencies, therefore when storing notebooks, they should be stored each in a specific repo. ## Decision Drivers From fd98d8a54e9834e856bc82eece8324f182688277 Mon Sep 17 00:00:00 2001 From: Francesco Murdaca Date: Mon, 9 Nov 2020 11:10:15 +0100 Subject: [PATCH 4/5] Update adr options Signed-off-by: Francesco Murdaca --- ...000-dependencies-management-jupyter-notebooks.md | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/docs/0000-dependencies-management-jupyter-notebooks.md b/docs/0000-dependencies-management-jupyter-notebooks.md index bbf0d8a3..83e9e6b5 100644 --- a/docs/0000-dependencies-management-jupyter-notebooks.md +++ b/docs/0000-dependencies-management-jupyter-notebooks.md @@ -7,7 +7,8 @@ How to guarantee reproducibility of Jupyter Notebooks? In order to allow any user to re run the notebook with similar behaviour, it's important that each notebook is shipped with dependencies requirements that include direct and transitive dependencies. This would also enforce and support security, reproducibility, traecability. -Each notebook should be treated as single component/service that use its own dependencies, therefore when storing notebooks, they should be stored each in a specific repo. +Notebooks should be treated as component/service that use their own dependencies, therefore when storing notebooks, +they should be stored with dependencies so that an image can be built to run them or they can be shared and reused by others. ## Decision Drivers @@ -18,19 +19,17 @@ Each notebook should be treated as single component/service that use its own dep ## Considered Options * 1. Jupyter notebook without dependencies (no reproducibility) -* 2. Jupyter notebook with dependencies embedded in json file of the notebook (conflict with local requirements (Pipfile/Pipfile.lock)) -* 3. Jupyter notebook without dependencies embedded in json file but with Pipfile/Pipfile.lock always present (Jupyter notebook and requirements are decoupled) -* 4. Jupyter notebook with sha256 embedded in json file that matches Pipfile/Pipfile.lock sha256 always present (Jupyter notebook and requirements are coupled) +* 2. Jupyter notebook without dependencies embedded in json file but with Pipfile/Pipfile.lock always present (Jupyter notebook and requirements are decoupled) +* 3. Jupyter notebook with dependencies embedded in json file of the notebook and Pipfile/Pipfile.lock present ## Decision Outcome -The option select is 4. because: +The option selected is 3. because: -* avoid conflicts in dependencies and enforce security also * enforce reproducibility * enforce traceability between notebook and requirements ### Positive Consequences * Satisfy reproducibility, traecability, shareability. -* Each notebook need to be treated as single service/task with its own dependencies. +* Notebooks are coupled with dependencies in their metadata. From e3c1165d240ee2b4667c7e6d0c80c4b2f8225da8 Mon Sep 17 00:00:00 2001 From: Francesco Murdaca Date: Thu, 12 Nov 2020 11:13:54 +0100 Subject: [PATCH 5/5] Update Signed-off-by: Francesco Murdaca --- docs/0000-dependencies-management-jupyter-notebooks.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/docs/0000-dependencies-management-jupyter-notebooks.md b/docs/0000-dependencies-management-jupyter-notebooks.md index 83e9e6b5..56a1117c 100644 --- a/docs/0000-dependencies-management-jupyter-notebooks.md +++ b/docs/0000-dependencies-management-jupyter-notebooks.md @@ -18,18 +18,19 @@ they should be stored with dependencies so that an image can be built to run the ## Considered Options -* 1. Jupyter notebook without dependencies (no reproducibility) -* 2. Jupyter notebook without dependencies embedded in json file but with Pipfile/Pipfile.lock always present (Jupyter notebook and requirements are decoupled) -* 3. Jupyter notebook with dependencies embedded in json file of the notebook and Pipfile/Pipfile.lock present +* 1. Jupyter notebook without dependencies (no reproducibility at all) +* 2. Jupyter notebook without dependencies embedded in json file but with Pipfile/Pipfile.lock always present (no reproducibility if I share the notebook) +* 3. Jupyter notebook with dependencies embedded in json file of the notebook that can be optionally extracted if the user wants. ## Decision Outcome The option selected is 3. because: * enforce reproducibility -* enforce traceability between notebook and requirements +* enforce traceability between notebook ### Positive Consequences * Satisfy reproducibility, traecability, shareability. * Notebooks are coupled with dependencies in their metadata. +* If more notebooks are present, a common Pipfile can be created with a button that can automatically extract from all notebook dependencies and new common Pipfile.lock will be created. This would allow creation of an image that can run the notebooks.