Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce ADR for dependencies management in Jupyter notebooks #282

Merged
merged 5 commits into from
Nov 12, 2020

Conversation

pacospace
Copy link
Contributor

Signed-off-by: Francesco Murdaca [email protected]

@MichaelClifford @sophwats

@sesheta sesheta self-requested a review November 3, 2020 11:25
@sesheta sesheta added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 3, 2020
Francesco Murdaca added 2 commits November 3, 2020 12:26
Signed-off-by: Francesco Murdaca <[email protected]>
Signed-off-by: Francesco Murdaca <[email protected]>
@sesheta
Copy link
Member

sesheta commented Nov 3, 2020

Pytest Test failed! Click here
running test
Searching for setuptools>=40.3.0
Reading https://pypi.org/simple/setuptools/
Downloading https://files.pythonhosted.org/packages/6d/38/c21ef5034684ffc0412deefbb07d66678332290c14bb5269c85145fbd55e/setuptools-50.3.2-py3-none-any.whl#sha256=2c242a0856fbad7efbe560df4a7add9324f340cf48df43651e9604924466794a
Best match: setuptools 50.3.2
Processing setuptools-50.3.2-py3-none-any.whl
Installing setuptools-50.3.2-py3-none-any.whl to /workspace/repo/.eggs
writing requirements to /workspace/repo/.eggs/setuptools-50.3.2-py3.6.egg/EGG-INFO/requires.txt

Installed /workspace/repo/.eggs/setuptools-50.3.2-py3.6.egg
running egg_info
creating jupyter_nbrequirements.egg-info
writing jupyter_nbrequirements.egg-info/PKG-INFO
writing dependency_links to jupyter_nbrequirements.egg-info/dependency_links.txt
writing requirements to jupyter_nbrequirements.egg-info/requires.txt
writing top-level names to jupyter_nbrequirements.egg-info/top_level.txt
writing manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
reading manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching 'CHANGELOG.md'
writing manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
running build_ext
Jupyter Require found itself running outside of Jupyter.
jupyter_nbrequirements (unittest.loader._FailedTest) ... ERROR

======================================================================
ERROR: jupyter_nbrequirements (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: jupyter_nbrequirements
Traceback (most recent call last):
  File "/usr/lib64/python3.6/unittest/loader.py", line 462, in _find_test_path
    package = self._get_module_from_name(name)
  File "/usr/lib64/python3.6/unittest/loader.py", line 369, in _get_module_from_name
    __import__(name)
  File "/workspace/repo/jupyter_nbrequirements/__init__.py", line 38, in <module>
    from jupyter_require import execute as executejs
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/__init__.py", line 34, in <module>
    from .notebook import link_css
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/notebook.py", line 30, in <module>
    from .core import execute_with_requirements
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/core.py", line 279, in <module>
    require = RequireJS()
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/core.py", line 90, in __new__
    raise EnvironmentError(msg)
OSError: Jupyter Require found itself running outside of Jupyter.


----------------------------------------------------------------------
Ran 1 test in 0.000s

FAILED (errors=1)
Test failed: <unittest.runner.TextTestResult run=1 errors=1 failures=0>
error: Test failed: <unittest.runner.TextTestResult run=1 errors=1 failures=0>

2 similar comments
@sesheta
Copy link
Member

sesheta commented Nov 3, 2020

Pytest Test failed! Click here
running test
Searching for setuptools>=40.3.0
Reading https://pypi.org/simple/setuptools/
Downloading https://files.pythonhosted.org/packages/6d/38/c21ef5034684ffc0412deefbb07d66678332290c14bb5269c85145fbd55e/setuptools-50.3.2-py3-none-any.whl#sha256=2c242a0856fbad7efbe560df4a7add9324f340cf48df43651e9604924466794a
Best match: setuptools 50.3.2
Processing setuptools-50.3.2-py3-none-any.whl
Installing setuptools-50.3.2-py3-none-any.whl to /workspace/repo/.eggs
writing requirements to /workspace/repo/.eggs/setuptools-50.3.2-py3.6.egg/EGG-INFO/requires.txt

Installed /workspace/repo/.eggs/setuptools-50.3.2-py3.6.egg
running egg_info
creating jupyter_nbrequirements.egg-info
writing jupyter_nbrequirements.egg-info/PKG-INFO
writing dependency_links to jupyter_nbrequirements.egg-info/dependency_links.txt
writing requirements to jupyter_nbrequirements.egg-info/requires.txt
writing top-level names to jupyter_nbrequirements.egg-info/top_level.txt
writing manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
reading manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching 'CHANGELOG.md'
writing manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
running build_ext
Jupyter Require found itself running outside of Jupyter.
jupyter_nbrequirements (unittest.loader._FailedTest) ... ERROR

======================================================================
ERROR: jupyter_nbrequirements (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: jupyter_nbrequirements
Traceback (most recent call last):
  File "/usr/lib64/python3.6/unittest/loader.py", line 462, in _find_test_path
    package = self._get_module_from_name(name)
  File "/usr/lib64/python3.6/unittest/loader.py", line 369, in _get_module_from_name
    __import__(name)
  File "/workspace/repo/jupyter_nbrequirements/__init__.py", line 38, in <module>
    from jupyter_require import execute as executejs
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/__init__.py", line 34, in <module>
    from .notebook import link_css
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/notebook.py", line 30, in <module>
    from .core import execute_with_requirements
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/core.py", line 279, in <module>
    require = RequireJS()
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/core.py", line 90, in __new__
    raise EnvironmentError(msg)
OSError: Jupyter Require found itself running outside of Jupyter.


----------------------------------------------------------------------
Ran 1 test in 0.000s

FAILED (errors=1)
Test failed: <unittest.runner.TextTestResult run=1 errors=1 failures=0>
error: Test failed: <unittest.runner.TextTestResult run=1 errors=1 failures=0>

@sesheta
Copy link
Member

sesheta commented Nov 3, 2020

Pytest Test failed! Click here
running test
Searching for setuptools>=40.3.0
Reading https://pypi.org/simple/setuptools/
Downloading https://files.pythonhosted.org/packages/6d/38/c21ef5034684ffc0412deefbb07d66678332290c14bb5269c85145fbd55e/setuptools-50.3.2-py3-none-any.whl#sha256=2c242a0856fbad7efbe560df4a7add9324f340cf48df43651e9604924466794a
Best match: setuptools 50.3.2
Processing setuptools-50.3.2-py3-none-any.whl
Installing setuptools-50.3.2-py3-none-any.whl to /workspace/repo/.eggs
writing requirements to /workspace/repo/.eggs/setuptools-50.3.2-py3.6.egg/EGG-INFO/requires.txt

Installed /workspace/repo/.eggs/setuptools-50.3.2-py3.6.egg
running egg_info
creating jupyter_nbrequirements.egg-info
writing jupyter_nbrequirements.egg-info/PKG-INFO
writing dependency_links to jupyter_nbrequirements.egg-info/dependency_links.txt
writing requirements to jupyter_nbrequirements.egg-info/requires.txt
writing top-level names to jupyter_nbrequirements.egg-info/top_level.txt
writing manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
reading manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching 'CHANGELOG.md'
writing manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
running build_ext
Jupyter Require found itself running outside of Jupyter.
jupyter_nbrequirements (unittest.loader._FailedTest) ... ERROR

======================================================================
ERROR: jupyter_nbrequirements (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: jupyter_nbrequirements
Traceback (most recent call last):
  File "/usr/lib64/python3.6/unittest/loader.py", line 462, in _find_test_path
    package = self._get_module_from_name(name)
  File "/usr/lib64/python3.6/unittest/loader.py", line 369, in _get_module_from_name
    __import__(name)
  File "/workspace/repo/jupyter_nbrequirements/__init__.py", line 38, in <module>
    from jupyter_require import execute as executejs
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/__init__.py", line 34, in <module>
    from .notebook import link_css
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/notebook.py", line 30, in <module>
    from .core import execute_with_requirements
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/core.py", line 279, in <module>
    require = RequireJS()
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/core.py", line 90, in __new__
    raise EnvironmentError(msg)
OSError: Jupyter Require found itself running outside of Jupyter.


----------------------------------------------------------------------
Ran 1 test in 0.000s

FAILED (errors=1)
Test failed: <unittest.runner.TextTestResult run=1 errors=1 failures=0>
error: Test failed: <unittest.runner.TextTestResult run=1 errors=1 failures=0>

Copy link
Member

@harshad16 harshad16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a good approach to deal with dependencies across the pipfile and json.

@pacospace pacospace requested a review from harshad16 November 3, 2020 12:57
@sesheta
Copy link
Member

sesheta commented Nov 3, 2020

Pytest Test failed! Click here
running test
Searching for setuptools>=40.3.0
Reading https://pypi.org/simple/setuptools/
Downloading https://files.pythonhosted.org/packages/6d/38/c21ef5034684ffc0412deefbb07d66678332290c14bb5269c85145fbd55e/setuptools-50.3.2-py3-none-any.whl#sha256=2c242a0856fbad7efbe560df4a7add9324f340cf48df43651e9604924466794a
Best match: setuptools 50.3.2
Processing setuptools-50.3.2-py3-none-any.whl
Installing setuptools-50.3.2-py3-none-any.whl to /workspace/repo/.eggs
writing requirements to /workspace/repo/.eggs/setuptools-50.3.2-py3.6.egg/EGG-INFO/requires.txt

Installed /workspace/repo/.eggs/setuptools-50.3.2-py3.6.egg
running egg_info
creating jupyter_nbrequirements.egg-info
writing jupyter_nbrequirements.egg-info/PKG-INFO
writing dependency_links to jupyter_nbrequirements.egg-info/dependency_links.txt
writing requirements to jupyter_nbrequirements.egg-info/requires.txt
writing top-level names to jupyter_nbrequirements.egg-info/top_level.txt
writing manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
reading manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching 'CHANGELOG.md'
writing manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
running build_ext
Jupyter Require found itself running outside of Jupyter.
jupyter_nbrequirements (unittest.loader._FailedTest) ... ERROR

======================================================================
ERROR: jupyter_nbrequirements (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: jupyter_nbrequirements
Traceback (most recent call last):
  File "/usr/lib64/python3.6/unittest/loader.py", line 462, in _find_test_path
    package = self._get_module_from_name(name)
  File "/usr/lib64/python3.6/unittest/loader.py", line 369, in _get_module_from_name
    __import__(name)
  File "/workspace/repo/jupyter_nbrequirements/__init__.py", line 38, in <module>
    from jupyter_require import execute as executejs
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/__init__.py", line 34, in <module>
    from .notebook import link_css
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/notebook.py", line 30, in <module>
    from .core import execute_with_requirements
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/core.py", line 279, in <module>
    require = RequireJS()
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/core.py", line 90, in __new__
    raise EnvironmentError(msg)
OSError: Jupyter Require found itself running outside of Jupyter.


----------------------------------------------------------------------
Ran 1 test in 0.000s

FAILED (errors=1)
Test failed: <unittest.runner.TextTestResult run=1 errors=1 failures=0>
error: Test failed: <unittest.runner.TextTestResult run=1 errors=1 failures=0>

2 similar comments
@sesheta
Copy link
Member

sesheta commented Nov 3, 2020

Pytest Test failed! Click here
running test
Searching for setuptools>=40.3.0
Reading https://pypi.org/simple/setuptools/
Downloading https://files.pythonhosted.org/packages/6d/38/c21ef5034684ffc0412deefbb07d66678332290c14bb5269c85145fbd55e/setuptools-50.3.2-py3-none-any.whl#sha256=2c242a0856fbad7efbe560df4a7add9324f340cf48df43651e9604924466794a
Best match: setuptools 50.3.2
Processing setuptools-50.3.2-py3-none-any.whl
Installing setuptools-50.3.2-py3-none-any.whl to /workspace/repo/.eggs
writing requirements to /workspace/repo/.eggs/setuptools-50.3.2-py3.6.egg/EGG-INFO/requires.txt

Installed /workspace/repo/.eggs/setuptools-50.3.2-py3.6.egg
running egg_info
creating jupyter_nbrequirements.egg-info
writing jupyter_nbrequirements.egg-info/PKG-INFO
writing dependency_links to jupyter_nbrequirements.egg-info/dependency_links.txt
writing requirements to jupyter_nbrequirements.egg-info/requires.txt
writing top-level names to jupyter_nbrequirements.egg-info/top_level.txt
writing manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
reading manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching 'CHANGELOG.md'
writing manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
running build_ext
Jupyter Require found itself running outside of Jupyter.
jupyter_nbrequirements (unittest.loader._FailedTest) ... ERROR

======================================================================
ERROR: jupyter_nbrequirements (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: jupyter_nbrequirements
Traceback (most recent call last):
  File "/usr/lib64/python3.6/unittest/loader.py", line 462, in _find_test_path
    package = self._get_module_from_name(name)
  File "/usr/lib64/python3.6/unittest/loader.py", line 369, in _get_module_from_name
    __import__(name)
  File "/workspace/repo/jupyter_nbrequirements/__init__.py", line 38, in <module>
    from jupyter_require import execute as executejs
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/__init__.py", line 34, in <module>
    from .notebook import link_css
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/notebook.py", line 30, in <module>
    from .core import execute_with_requirements
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/core.py", line 279, in <module>
    require = RequireJS()
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/core.py", line 90, in __new__
    raise EnvironmentError(msg)
OSError: Jupyter Require found itself running outside of Jupyter.


----------------------------------------------------------------------
Ran 1 test in 0.000s

FAILED (errors=1)
Test failed: <unittest.runner.TextTestResult run=1 errors=1 failures=0>
error: Test failed: <unittest.runner.TextTestResult run=1 errors=1 failures=0>

@sesheta
Copy link
Member

sesheta commented Nov 3, 2020

Pytest Test failed! Click here
running test
Searching for setuptools>=40.3.0
Reading https://pypi.org/simple/setuptools/
Downloading https://files.pythonhosted.org/packages/6d/38/c21ef5034684ffc0412deefbb07d66678332290c14bb5269c85145fbd55e/setuptools-50.3.2-py3-none-any.whl#sha256=2c242a0856fbad7efbe560df4a7add9324f340cf48df43651e9604924466794a
Best match: setuptools 50.3.2
Processing setuptools-50.3.2-py3-none-any.whl
Installing setuptools-50.3.2-py3-none-any.whl to /workspace/repo/.eggs
writing requirements to /workspace/repo/.eggs/setuptools-50.3.2-py3.6.egg/EGG-INFO/requires.txt

Installed /workspace/repo/.eggs/setuptools-50.3.2-py3.6.egg
running egg_info
creating jupyter_nbrequirements.egg-info
writing jupyter_nbrequirements.egg-info/PKG-INFO
writing dependency_links to jupyter_nbrequirements.egg-info/dependency_links.txt
writing requirements to jupyter_nbrequirements.egg-info/requires.txt
writing top-level names to jupyter_nbrequirements.egg-info/top_level.txt
writing manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
reading manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching 'CHANGELOG.md'
writing manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
running build_ext
Jupyter Require found itself running outside of Jupyter.
jupyter_nbrequirements (unittest.loader._FailedTest) ... ERROR

======================================================================
ERROR: jupyter_nbrequirements (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: jupyter_nbrequirements
Traceback (most recent call last):
  File "/usr/lib64/python3.6/unittest/loader.py", line 462, in _find_test_path
    package = self._get_module_from_name(name)
  File "/usr/lib64/python3.6/unittest/loader.py", line 369, in _get_module_from_name
    __import__(name)
  File "/workspace/repo/jupyter_nbrequirements/__init__.py", line 38, in <module>
    from jupyter_require import execute as executejs
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/__init__.py", line 34, in <module>
    from .notebook import link_css
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/notebook.py", line 30, in <module>
    from .core import execute_with_requirements
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/core.py", line 279, in <module>
    require = RequireJS()
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/core.py", line 90, in __new__
    raise EnvironmentError(msg)
OSError: Jupyter Require found itself running outside of Jupyter.


----------------------------------------------------------------------
Ran 1 test in 0.000s

FAILED (errors=1)
Test failed: <unittest.runner.TextTestResult run=1 errors=1 failures=0>
error: Test failed: <unittest.runner.TextTestResult run=1 errors=1 failures=0>

@sesheta
Copy link
Member

sesheta commented Nov 3, 2020

Pytest Test failed! Click here
running test
Searching for setuptools>=40.3.0
Reading https://pypi.org/simple/setuptools/
Downloading https://files.pythonhosted.org/packages/6d/38/c21ef5034684ffc0412deefbb07d66678332290c14bb5269c85145fbd55e/setuptools-50.3.2-py3-none-any.whl#sha256=2c242a0856fbad7efbe560df4a7add9324f340cf48df43651e9604924466794a
Best match: setuptools 50.3.2
Processing setuptools-50.3.2-py3-none-any.whl
Installing setuptools-50.3.2-py3-none-any.whl to /workspace/repo/.eggs
writing requirements to /workspace/repo/.eggs/setuptools-50.3.2-py3.6.egg/EGG-INFO/requires.txt

Installed /workspace/repo/.eggs/setuptools-50.3.2-py3.6.egg
running egg_info
creating jupyter_nbrequirements.egg-info
writing jupyter_nbrequirements.egg-info/PKG-INFO
writing dependency_links to jupyter_nbrequirements.egg-info/dependency_links.txt
writing requirements to jupyter_nbrequirements.egg-info/requires.txt
writing top-level names to jupyter_nbrequirements.egg-info/top_level.txt
writing manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
reading manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching 'CHANGELOG.md'
writing manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
running build_ext
Jupyter Require found itself running outside of Jupyter.
jupyter_nbrequirements (unittest.loader._FailedTest) ... ERROR

======================================================================
ERROR: jupyter_nbrequirements (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: jupyter_nbrequirements
Traceback (most recent call last):
  File "/usr/lib64/python3.6/unittest/loader.py", line 462, in _find_test_path
    package = self._get_module_from_name(name)
  File "/usr/lib64/python3.6/unittest/loader.py", line 369, in _get_module_from_name
    __import__(name)
  File "/workspace/repo/jupyter_nbrequirements/__init__.py", line 38, in <module>
    from jupyter_require import execute as executejs
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/__init__.py", line 34, in <module>
    from .notebook import link_css
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/notebook.py", line 30, in <module>
    from .core import execute_with_requirements
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/core.py", line 279, in <module>
    require = RequireJS()
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/core.py", line 90, in __new__
    raise EnvironmentError(msg)
OSError: Jupyter Require found itself running outside of Jupyter.


----------------------------------------------------------------------
Ran 1 test in 0.001s

FAILED (errors=1)
Test failed: <unittest.runner.TextTestResult run=1 errors=1 failures=0>
error: Test failed: <unittest.runner.TextTestResult run=1 errors=1 failures=0>

@goern
Copy link
Member

goern commented Nov 3, 2020

This seems to be a good decision, as it supports humans and cyborg and links experiments in notebooks with the corresponding software stacks

@goern
Copy link
Member

goern commented Nov 3, 2020

/lgtm

In order to allow any user to re run the notebook with similar behaviour, it's important that each notebook is shipped with dependencies requirements
that include direct and transitive dependencies. This would also enforce and support security, reproducibility, traceability.

Each notebook should be treated as single component/service that uses its own dependencies, therefore when storing notebooks, they should be stored each in a specific repo.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this suggesting that each notebook should be stored in its own repo? This does not sound like a practical approach to me for many projects. But let me know if I'm misunderstanding the use case here : )

Is the assumption being made here that each notebook in a project would have such different dependencies that loading the shared set for the entire project for each notebook would be wasteful or that there would be some incompatibilities when building an image from it? In my experience most of the packages are shared and used across multiple notebooks in a project. Perhaps notebooks would have unique dependencies if we had one notebook for collecting and processing data, one for training a model, and one for serving inferences and reporting metrics. (but there would still be data and model dependencies they would need to share, and I don't think there would be any incompatibilities).

And I don't disagree that this approach can ensures reproducibility per notebook, but I'm not convinced the complexity associated with breaking a project up into multiple repos per notebook, out-weighs the benefit of absolute reproducibility.

How is this dependency issue managed for pure python projects? I assume there is not a unique repo and requirements for each *.py file. : )

Copy link
Contributor Author

@pacospace pacospace Nov 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this suggesting that each notebook should be stored in its own repo? This does not sound like a practical approach to me for many projects. But let me know if I'm misunderstanding the use case here : )

Maybe I should say directory sorry, not repo, what I meant is, if we use aicoe-aiops templates, in the notebooks directory, instead of having all notebooks directly, there are more subdirectories, each with notebook and dependencies.
This would enforce the reproducibility and also complexity in the notebook. If you think that you could have optimized images for each of the step: for example pre processing using spark, training with Tensorflow on GPU, deployment with Seldon using Edge device. Similar to python projects, is better to break in simple pieces for maintainability, readability, reduce testing of single parts and we can enforce for optimization purposes for example.

The software stack for each notebook would be smalled, build time would be smaller, images would contain less dependencies, less risk of incompatibilities. Instead of having a large monolitic AI project, we have separated the different tasks in different software stacks.

Is the assumption being made here that each notebook in a project would have such different dependencies that loading the shared set for the entire project for each notebook would be wasteful or that there would be some incompatibilities when building an image from it? In my experience, most of the packages are shared and used across multiple notebooks in a project.
Perhaps notebooks would have unique dependencies if we had one notebook for collecting and processing data, one for training a model, and one for serving inferences and reporting metrics. (but there would still be data and model dependencies they would need to share, and I don't think there would be any incompatibilities).

We should enforce creating a context for each notebook/step. Once you finished processing, you store inputs for training for example. You don't need a library to process more maybe in some cases (just assuming some cases where this can be done easily). You just create another step for post processing. This would be enforced also by template notebooks as we discussed before.

And I don't disagree that this approach can ensures reproducibility per notebook, but I'm not convinced the complexity associated with breaking a project up into multiple repos per notebook, out-weighs the benefit of absolute reproducibility.

I think it can be double, although consideration on splitting software stacks into smaller pieces I think can be also important as mentioned above, maybe we can let the user decide, with jupyter-nbrequirement we could have a parameter setting default place for dependencies that can be changed by the user if necessary: #276

How is this dependency issue managed for pure python projects? I assume there is not a unique repo and requirements for each *.py file. : )
Right, usually someone expects one single Pipfile/Pipfile.lock as we try to enforce in all repositories, but for some cases, like in ML case I would say it depends how complex is the application and how complex each task is, in general for reasons above mentioned might be more interesting to split and maintain. thoth-station/thamos#464

But just one prospective :) Thanks for the reviews @MichaelClifford

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this suggesting that each notebook should be stored in its own repo? This does not sound like a practical approach to me for many projects. But let me know if I'm misunderstanding the use case here : )

Maybe I should say directory sorry, not repo, what I meant is, if we use aicoe-aiops templates, in the notebooks directory, instead of having all notebooks directly, there are more subdirectories, each with notebook and dependencies.
This would enforce the reproducibility and also complexity in the notebook. If you think that you could have optimized images for each of the step: for example pre processing using spark, training with Tensorflow on GPU, deployment with Seldon using Edge device. Similar to python projects, is better to break in simple pieces for maintainability, readability, reduce testing of single parts and we can enforce for optimization purposes for example.

The software stack for each notebook would be smalled, build time would be smaller, images would contain less dependencies, less risk of incompatibilities. Instead of having a large monolitic AI project, we have separated the different tasks in different software stacks.

I'm also not very happy about having a dir per notebook. It can easily explode in unnecessary dir traversals and hard to maintain git structure.

Is the assumption being made here that each notebook in a project would have such different dependencies that loading the shared set for the entire project for each notebook would be wasteful or that there would be some incompatibilities when building an image from it? In my experience, most of the packages are shared and used across multiple notebooks in a project.
Perhaps notebooks would have unique dependencies if we had one notebook for collecting and processing data, one for training a model, and one for serving inferences and reporting metrics. (but there would still be data and model dependencies they would need to share, and I don't think there would be any incompatibilities).

We should enforce creating a context for each notebook/step. Once you finished processing, you store inputs for training for example. You don't need a library to process more maybe in some cases (just assuming some cases where this can be done easily). You just create another step for post processing. This would be enforced also by template notebooks as we discussed before.

And I don't disagree that this approach can ensures reproducibility per notebook, but I'm not convinced the complexity associated with breaking a project up into multiple repos per notebook, out-weighs the benefit of absolute reproducibility.

I think it can be double, although consideration on splitting software stacks into smaller pieces I think can be also important as mentioned above, maybe we can let the user decide, with jupyter-nbrequirement we could have a parameter setting default place for dependencies that can be changed by the user if necessary: #276

Note splitting software stacks does not need to result in a better complexity. Pre-built container images with all the dependencies shipped (even though the software stack is not minimal) might result in faster response time and less user time spent on installing dependencies when opening a jupyter notebook.

Hence I see two aspects of this:

  1. using jupyter-nbrequirements for managing dependencies in notebooks - this is easy to bootstrap and easy to start an experiment with. I as data scientist open a jupyter notebook and start my experiments, I install whatever software is needed to experiment with data. jupyter-nbrequirements should keep track of these dependencies

  2. maintaining base container images with pre-built software stacks - this is something we can operate on - we can build and provide container images with specific set of dependencies. Users can select notebook with specific software (e.g. tensorflow+cuda) and run experimnts (as done now on ODH).

The first story will need some work to integrate easily. Managing dependencies directly in jupyter notebook JSON files is not a nice solution, managing Pipfile/Pipfile.lock+.thoth.yaml in a separate directory per notebook does not sound as nice UX neither.

jupyter-nbrequirements can still work with notebook requirements as done now. The story I see here:

If I, as a data-scientist, open a notebook, I use jupyter-nbrequirements to manage my dependencies. jupyter-nbrequirements keeps track of dependencies inside jupyternotebooks as metadata for reproducibility. It can export them to Pipfile/Pipfile.lock if user requests so, but explicitly. Otherwise it should act just as a thin client to talk to pypi/thoth to resolve and install software. Once the work is done, deps can be exported. To save devs time, a container image can be built with the desired set of dependencies (using exported pipfile+thoth.yaml that is managed inside a git repo).

Copy link
Contributor Author

@pacospace pacospace Nov 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this suggesting that each notebook should be stored in its own repo? This does not sound like a practical approach to me for many projects. But let me know if I'm misunderstanding the use case here : )

Maybe I should say directory sorry, not repo, what I meant is, if we use aicoe-aiops templates, in the notebooks directory, instead of having all notebooks directly, there are more subdirectories, each with notebook and dependencies.
This would enforce the reproducibility and also complexity in the notebook. If you think that you could have optimized images for each of the step: for example pre processing using spark, training with Tensorflow on GPU, deployment with Seldon using Edge device. Similar to python projects, is better to break in simple pieces for maintainability, readability, reduce testing of single parts and we can enforce for optimization purposes for example.
The software stack for each notebook would be smalled, build time would be smaller, images would contain less dependencies, less risk of incompatibilities. Instead of having a large monolitic AI project, we have separated the different tasks in different software stacks.

I'm also not very happy about having a dir per notebook. It can easily explode in unnecessary dir traversals and hard to maintain git structure.

Is the assumption being made here that each notebook in a project would have such different dependencies that loading the shared set for the entire project for each notebook would be wasteful or that there would be some incompatibilities when building an image from it? In my experience, most of the packages are shared and used across multiple notebooks in a project.
Perhaps notebooks would have unique dependencies if we had one notebook for collecting and processing data, one for training a model, and one for serving inferences and reporting metrics. (but there would still be data and model dependencies they would need to share, and I don't think there would be any incompatibilities).

We should enforce creating a context for each notebook/step. Once you finished processing, you store inputs for training for example. You don't need a library to process more maybe in some cases (just assuming some cases where this can be done easily). You just create another step for post processing. This would be enforced also by template notebooks as we discussed before.

And I don't disagree that this approach can ensures reproducibility per notebook, but I'm not convinced the complexity associated with breaking a project up into multiple repos per notebook, out-weighs the benefit of absolute reproducibility.

I think it can be double, although consideration on splitting software stacks into smaller pieces I think can be also important as mentioned above, maybe we can let the user decide, with jupyter-nbrequirement we could have a parameter setting default place for dependencies that can be changed by the user if necessary: #276

Note splitting software stacks does not need to result in a better complexity. Pre-built container images with all the dependencies shipped (even though the software stack is not minimal) might result in faster response time and less user time spent on installing dependencies when opening a jupyter notebook.

Hence I see two aspects of this:

  1. using jupyter-nbrequirements for managing dependencies in notebooks - this is easy to bootstrap and easy to start an experiment with. I as data scientist open a jupyter notebook and start my experiments, I install whatever software is needed to experiment with data. jupyter-nbrequirements should keep track of these dependencies
  2. maintaining base container images with pre-built software stacks - this is something we can operate on - we can build and provide container images with specific set of dependencies. Users can select notebook with specific software (e.g. tensorflow+cuda) and run experiments (as done now on ODH).

if using JupyterHub yes, but with Elyra things will be different, you have an AI pipeline where each notebook or python script is a step and for each step you need to select a runtime to be used once running the AI pipeline. In this case, I can choose not only images existing on ODH, but my own images created and available on some registry for example to run my specific step, maybe we built an optimized image for deployment or some specific image optimized by Thoth for performance for one step (training), which maybe is in conflict with one step which requires dask for heavy data processing on huge datasets to use on a remote cluster (something that can be done with Elyra and Kubeflow pipeline as well or if planned to use Jupyter Enterprise Gateway).

The first story will need some work to integrate easily. Managing dependencies directly in jupyter notebook JSON files is not a nice solution, managing Pipfile/Pipfile.lock+.thoth.yaml in a separate directory per notebook does not sound as nice UX neither.

jupyter-nbrequirements can still work with notebook requirements as done now. The story I see here:

If I, as a data-scientist, open a notebook, I use jupyter-nbrequirements to manage my dependencies. jupyter-nbrequirements keeps track of dependencies inside jupyternotebooks as metadata for reproducibility. It can export them to Pipfile/Pipfile.lock if user requests so, but explicitly. Otherwise it should act just as a thin client to talk to pypi/thoth to resolve and install software. Once the work is done, deps can be exported. To save devs time, a container image can be built with the desired set of dependencies (using exported pipfile+thoth.yaml that is managed inside a git repo).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this suggesting that each notebook should be stored in its own repo? This does not sound like a practical approach to me for many projects. But let me know if I'm misunderstanding the use case here : )

Maybe I should say directory sorry, not repo, what I meant is, if we use aicoe-aiops templates, in the notebooks directory, instead of having all notebooks directly, there are more subdirectories, each with notebook and dependencies.
This would enforce the reproducibility and also complexity in the notebook. If you think that you could have optimized images for each of the step: for example pre processing using spark, training with Tensorflow on GPU, deployment with Seldon using Edge device. Similar to python projects, is better to break in simple pieces for maintainability, readability, reduce testing of single parts and we can enforce for optimization purposes for example.
The software stack for each notebook would be smalled, build time would be smaller, images would contain less dependencies, less risk of incompatibilities. Instead of having a large monolitic AI project, we have separated the different tasks in different software stacks.

I'm also not very happy about having a dir per notebook. It can easily explode in unnecessary dir traversals and hard to maintain git structure.

Is the assumption being made here that each notebook in a project would have such different dependencies that loading the shared set for the entire project for each notebook would be wasteful or that there would be some incompatibilities when building an image from it? In my experience, most of the packages are shared and used across multiple notebooks in a project.
Perhaps notebooks would have unique dependencies if we had one notebook for collecting and processing data, one for training a model, and one for serving inferences and reporting metrics. (but there would still be data and model dependencies they would need to share, and I don't think there would be any incompatibilities).

We should enforce creating a context for each notebook/step. Once you finished processing, you store inputs for training for example. You don't need a library to process more maybe in some cases (just assuming some cases where this can be done easily). You just create another step for post processing. This would be enforced also by template notebooks as we discussed before.

And I don't disagree that this approach can ensures reproducibility per notebook, but I'm not convinced the complexity associated with breaking a project up into multiple repos per notebook, out-weighs the benefit of absolute reproducibility.

I think it can be double, although consideration on splitting software stacks into smaller pieces I think can be also important as mentioned above, maybe we can let the user decide, with jupyter-nbrequirement we could have a parameter setting default place for dependencies that can be changed by the user if necessary: #276

Note splitting software stacks does not need to result in a better complexity. Pre-built container images with all the dependencies shipped (even though the software stack is not minimal) might result in faster response time and less user time spent on installing dependencies when opening a jupyter notebook.
Hence I see two aspects of this:

  1. using jupyter-nbrequirements for managing dependencies in notebooks - this is easy to bootstrap and easy to start an experiment with. I as data scientist open a jupyter notebook and start my experiments, I install whatever software is needed to experiment with data. jupyter-nbrequirements should keep track of these dependencies
  2. maintaining base container images with pre-built software stacks - this is something we can operate on - we can build and provide container images with specific set of dependencies. Users can select notebook with specific software (e.g. tensorflow+cuda) and run experiments (as done now on ODH).

if using JupyterHub yes, but with Elyra things will be different, you have an AI pipeline where each notebook or python script is a step and for each step you need to select a runtime to be used once running the AI pipeline. In this case, I can choose not only images existing on ODH, but my own images created and available on some registry for example to run my specific step, maybe we built an optimized image for deployment or some specific image optimized by Thoth for performance for one step (training), which maybe is in conflict with one step which requires dask for heavy data processing on huge datasets to use on a remote cluster (something that can be done with Elyra and Kubeflow pipeline as well or if planned to use Jupyter Enterprise Gateway).

Does the runtime environment need to specified in the jupyter notebook itself? Can we use runtime autodiscovery for this as done in thamos?

BTW handling requirements could be also discussed with Jupyter upstream. They could be interested in this functionality to provide a better notebook experience.

Copy link
Contributor Author

@pacospace pacospace Nov 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this suggesting that each notebook should be stored in its own repo? This does not sound like a practical approach to me for many projects. But let me know if I'm misunderstanding the use case here : )

Maybe I should say directory sorry, not repo, what I meant is, if we use aicoe-aiops templates, in the notebooks directory, instead of having all notebooks directly, there are more subdirectories, each with notebook and dependencies.
This would enforce the reproducibility and also complexity in the notebook. If you think that you could have optimized images for each of the step: for example pre processing using spark, training with Tensorflow on GPU, deployment with Seldon using Edge device. Similar to python projects, is better to break in simple pieces for maintainability, readability, reduce testing of single parts and we can enforce for optimization purposes for example.
The software stack for each notebook would be smalled, build time would be smaller, images would contain less dependencies, less risk of incompatibilities. Instead of having a large monolitic AI project, we have separated the different tasks in different software stacks.

I'm also not very happy about having a dir per notebook. It can easily explode in unnecessary dir traversals and hard to maintain git structure.

Is the assumption being made here that each notebook in a project would have such different dependencies that loading the shared set for the entire project for each notebook would be wasteful or that there would be some incompatibilities when building an image from it? In my experience, most of the packages are shared and used across multiple notebooks in a project.
Perhaps notebooks would have unique dependencies if we had one notebook for collecting and processing data, one for training a model, and one for serving inferences and reporting metrics. (but there would still be data and model dependencies they would need to share, and I don't think there would be any incompatibilities).

We should enforce creating a context for each notebook/step. Once you finished processing, you store inputs for training for example. You don't need a library to process more maybe in some cases (just assuming some cases where this can be done easily). You just create another step for post processing. This would be enforced also by template notebooks as we discussed before.

And I don't disagree that this approach can ensures reproducibility per notebook, but I'm not convinced the complexity associated with breaking a project up into multiple repos per notebook, out-weighs the benefit of absolute reproducibility.

I think it can be double, although consideration on splitting software stacks into smaller pieces I think can be also important as mentioned above, maybe we can let the user decide, with jupyter-nbrequirement we could have a parameter setting default place for dependencies that can be changed by the user if necessary: #276

Note splitting software stacks does not need to result in a better complexity. Pre-built container images with all the dependencies shipped (even though the software stack is not minimal) might result in faster response time and less user time spent on installing dependencies when opening a jupyter notebook.
Hence I see two aspects of this:

  1. using jupyter-nbrequirements for managing dependencies in notebooks - this is easy to bootstrap and easy to start an experiment with. I as data scientist open a jupyter notebook and start my experiments, I install whatever software is needed to experiment with data. jupyter-nbrequirements should keep track of these dependencies
  2. maintaining base container images with pre-built software stacks - this is something we can operate on - we can build and provide container images with specific set of dependencies. Users can select notebook with specific software (e.g. tensorflow+cuda) and run experiments (as done now on ODH).

if using JupyterHub yes, but with Elyra things will be different, you have an AI pipeline where each notebook or python script is a step and for each step you need to select a runtime to be used once running the AI pipeline. In this case, I can choose not only images existing on ODH, but my own images created and available on some registry for example to run my specific step, maybe we built an optimized image for deployment or some specific image optimized by Thoth for performance for one step (training), which maybe is in conflict with one step which requires dask for heavy data processing on huge datasets to use on a remote cluster (something that can be done with Elyra and Kubeflow pipeline as well or if planned to use Jupyter Enterprise Gateway).

Does the runtime environment need to specified in the jupyter notebook itself? Can we use runtime autodiscovery for this as done in thamos?

You can create runtimes to be used in Kubeflow pipelines using elyra command line from console or from the UI button for runtimes. Once runtime exist, you can submit a notebook basically

BTW handling requirements could be also discussed with Jupyter upstream. They could be interested in this functionality to provide a better notebook experience.

Thanks @fridex we will open issue.

Signed-off-by: Francesco Murdaca <[email protected]>
@sesheta
Copy link
Member

sesheta commented Nov 9, 2020

Pytest Test failed! Click here
running test
Searching for setuptools>=40.3.0
Reading https://pypi.org/simple/setuptools/
Downloading https://files.pythonhosted.org/packages/6d/38/c21ef5034684ffc0412deefbb07d66678332290c14bb5269c85145fbd55e/setuptools-50.3.2-py3-none-any.whl#sha256=2c242a0856fbad7efbe560df4a7add9324f340cf48df43651e9604924466794a
Best match: setuptools 50.3.2
Processing setuptools-50.3.2-py3-none-any.whl
Installing setuptools-50.3.2-py3-none-any.whl to /workspace/repo/.eggs
writing requirements to /workspace/repo/.eggs/setuptools-50.3.2-py3.6.egg/EGG-INFO/requires.txt

Installed /workspace/repo/.eggs/setuptools-50.3.2-py3.6.egg
running egg_info
creating jupyter_nbrequirements.egg-info
writing jupyter_nbrequirements.egg-info/PKG-INFO
writing dependency_links to jupyter_nbrequirements.egg-info/dependency_links.txt
writing requirements to jupyter_nbrequirements.egg-info/requires.txt
writing top-level names to jupyter_nbrequirements.egg-info/top_level.txt
writing manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
reading manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching 'CHANGELOG.md'
writing manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
running build_ext
Jupyter Require found itself running outside of Jupyter.
jupyter_nbrequirements (unittest.loader._FailedTest) ... ERROR

======================================================================
ERROR: jupyter_nbrequirements (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: jupyter_nbrequirements
Traceback (most recent call last):
  File "/usr/lib64/python3.6/unittest/loader.py", line 462, in _find_test_path
    package = self._get_module_from_name(name)
  File "/usr/lib64/python3.6/unittest/loader.py", line 369, in _get_module_from_name
    __import__(name)
  File "/workspace/repo/jupyter_nbrequirements/__init__.py", line 38, in <module>
    from jupyter_require import execute as executejs
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/__init__.py", line 34, in <module>
    from .notebook import link_css
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/notebook.py", line 30, in <module>
    from .core import execute_with_requirements
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/core.py", line 279, in <module>
    require = RequireJS()
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/core.py", line 90, in __new__
    raise EnvironmentError(msg)
OSError: Jupyter Require found itself running outside of Jupyter.


----------------------------------------------------------------------
Ran 1 test in 0.000s

FAILED (errors=1)
Test failed: <unittest.runner.TextTestResult run=1 errors=1 failures=0>
error: Test failed: <unittest.runner.TextTestResult run=1 errors=1 failures=0>

@pacospace
Copy link
Contributor Author

Updated


* 1. Jupyter notebook without dependencies (no reproducibility)
* 2. Jupyter notebook without dependencies embedded in json file but with Pipfile/Pipfile.lock always present (Jupyter notebook and requirements are decoupled)
* 3. Jupyter notebook with dependencies embedded in json file of the notebook and Pipfile/Pipfile.lock present

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that for a repo, each notebook will live in its own dir with its own pipfile/pipfile.lock as well as having its dependencies embedded in json?

I'll admit I don't fully understand the dependency management process (and @fridex can probably answer this question better 😄 ), but isn't it redundant and potentially error prone to maintain dependencies both in the notebook and as a pipfile? Shouldn't the decision be one or the other? In which case, I think embedded would be the way to go for each notebook, with a single overarching project Pipfile for the whole repo (kinda of how projects are set up currently). Or is that what this Option 3 is saying already?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

We should keep dependencies embedded in the notebook all the time. Having them aside is an action that should be triggered explicitly when exporting them, or when importing dependency listing from Pipfile/Pipfile.lock.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that for a repo, each notebook will live in its own dir with its own pipfile/pipfile.lock as well as having its dependencies embedded in json?

I'll admit I don't fully understand the dependency management process (and @fridex can probably answer this question better ), but isn't it redundant and potentially error prone to maintain dependencies both in the notebook and as a pipfile? Shouldn't the decision be one or the other? In which case, I think embedded would be the way to go for each notebook, with a single overarching project Pipfile for the whole repo (kinda of how projects are set up currently). Or is that what this Option 3 is saying already?

No, as we talked last DS meetup, we decided not to consider that option of one repo per notebook, but thinking of what you and @fridex said, maybe we can restructure in:

Jupyter notebook with dependencies embedded in json file of the notebook that can be optionally extracted.

But what about the main Pipfile/Pipfile.lock? If a work on three different notebooks, createing dependencies for each, they will be different.

If we want to create an image to run those notebooks, there is need for a single Pipfile/Pipfile.lock with the dependencies from all notebooks.

How do we deal with having one single Pipfile/Pipfile.lock and different notebooks, each with their own dependencies?
Maybe notebook 1 required only numpy, pandas and matplolib, but notebook 2 only tensorflow.

Do we need some way that is able to merge them, syncing a common Pipfile/Pipfile.lock that can be used to run them all?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we deal with having one single Pipfile/Pipfile.lock and different notebooks, each with their own dependencies?
Maybe notebook 1 required only numpy, pandas and matplolib, but notebook 2 only tensorflow.

Do we need some way that is able to merge them, syncing a common Pipfile/Pipfile.lock that can be used to run them all?

Yes, these files are just TOML and JSON files. We have tooling in thoth-python that can merge these files and keep consistency (e.g. check the computed hash, avoid duplicates, ...). The workflow should include Thoth - just Pipfile is created out of the all notebooks and Thoth resolves Pipfile.lock. Thoth part is required as these dependencies can have issues between them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fridex , I will proceed in this way!! I will update the ADR

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jupyter notebook with dependencies embedded in json file of the notebook that can be optionally extracted.

Sounds good to me.

Maybe add a bit more specificity to it? "Jupyter notebook with dependencies embedded in json file of the notebook that can be optionally extracted as a merged Pipfile via Thoth"

Copy link
Contributor Author

@pacospace pacospace Nov 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will have two options:

  • One in notebook itself, to extract Pipfile/Pipfile.lock from the notebook

  • one other button, might be in the menu under kernels tab, that would look at all notebooks and create a merged Pipfile and Pipfile.lock.

Jupyter notebook with dependencies embedded in json file of the notebook that can be optionally extracted if user wants

  • If more notebooks are present, a common Pipfile can be created with a button that can automatically extract from all notebook dependencies and new common Pipfile.lock will be created. This would allow the creation of an image that can run the notebooks.

WDYT?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: Francesco Murdaca <[email protected]>
@sesheta
Copy link
Member

sesheta commented Nov 12, 2020

Pytest Test failed! Click here
running test
Searching for setuptools>=40.3.0
Reading https://pypi.org/simple/setuptools/
Downloading https://files.pythonhosted.org/packages/6d/38/c21ef5034684ffc0412deefbb07d66678332290c14bb5269c85145fbd55e/setuptools-50.3.2-py3-none-any.whl#sha256=2c242a0856fbad7efbe560df4a7add9324f340cf48df43651e9604924466794a
Best match: setuptools 50.3.2
Processing setuptools-50.3.2-py3-none-any.whl
Installing setuptools-50.3.2-py3-none-any.whl to /workspace/repo/.eggs
writing requirements to /workspace/repo/.eggs/setuptools-50.3.2-py3.6.egg/EGG-INFO/requires.txt

Installed /workspace/repo/.eggs/setuptools-50.3.2-py3.6.egg
running egg_info
creating jupyter_nbrequirements.egg-info
writing jupyter_nbrequirements.egg-info/PKG-INFO
writing dependency_links to jupyter_nbrequirements.egg-info/dependency_links.txt
writing requirements to jupyter_nbrequirements.egg-info/requires.txt
writing top-level names to jupyter_nbrequirements.egg-info/top_level.txt
writing manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
reading manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching 'CHANGELOG.md'
writing manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
running build_ext
Jupyter Require found itself running outside of Jupyter.
jupyter_nbrequirements (unittest.loader._FailedTest) ... ERROR

======================================================================
ERROR: jupyter_nbrequirements (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: jupyter_nbrequirements
Traceback (most recent call last):
  File "/usr/lib64/python3.6/unittest/loader.py", line 462, in _find_test_path
    package = self._get_module_from_name(name)
  File "/usr/lib64/python3.6/unittest/loader.py", line 369, in _get_module_from_name
    __import__(name)
  File "/workspace/repo/jupyter_nbrequirements/__init__.py", line 38, in <module>
    from jupyter_require import execute as executejs
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/__init__.py", line 34, in <module>
    from .notebook import link_css
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/notebook.py", line 30, in <module>
    from .core import execute_with_requirements
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/core.py", line 279, in <module>
    require = RequireJS()
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/core.py", line 90, in __new__
    raise EnvironmentError(msg)
OSError: Jupyter Require found itself running outside of Jupyter.


----------------------------------------------------------------------
Ran 1 test in 0.000s

FAILED (errors=1)
Test failed: <unittest.runner.TextTestResult run=1 errors=1 failures=0>
error: Test failed: <unittest.runner.TextTestResult run=1 errors=1 failures=0>

@pacospace
Copy link
Contributor Author

I will add this ADR also to jupyterlab extension if all good for you @MichaelClifford @fridex. See: thoth-station/jupyterlab-requirements#5

Copy link

@MichaelClifford MichaelClifford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pacospace
Copy link
Contributor Author

/approve

@sesheta
Copy link
Member

sesheta commented Nov 12, 2020

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MichaelClifford, pacospace

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sesheta sesheta added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 12, 2020
@sesheta sesheta merged commit a588482 into thoth-station:master Nov 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants