Introduce ADR for dependencies management in Jupyter notebooks #282

pacospace · 2020-11-03T11:25:19Z

Signed-off-by: Francesco Murdaca [email protected]

Signed-off-by: Francesco Murdaca <[email protected]>

sesheta · 2020-11-03T11:39:36Z

Pytest Test failed! Click here

running test
Searching for setuptools>=40.3.0
Reading https://pypi.org/simple/setuptools/
Downloading https://files.pythonhosted.org/packages/6d/38/c21ef5034684ffc0412deefbb07d66678332290c14bb5269c85145fbd55e/setuptools-50.3.2-py3-none-any.whl#sha256=2c242a0856fbad7efbe560df4a7add9324f340cf48df43651e9604924466794a
Best match: setuptools 50.3.2
Processing setuptools-50.3.2-py3-none-any.whl
Installing setuptools-50.3.2-py3-none-any.whl to /workspace/repo/.eggs
writing requirements to /workspace/repo/.eggs/setuptools-50.3.2-py3.6.egg/EGG-INFO/requires.txt

Installed /workspace/repo/.eggs/setuptools-50.3.2-py3.6.egg
running egg_info
creating jupyter_nbrequirements.egg-info
writing jupyter_nbrequirements.egg-info/PKG-INFO
writing dependency_links to jupyter_nbrequirements.egg-info/dependency_links.txt
writing requirements to jupyter_nbrequirements.egg-info/requires.txt
writing top-level names to jupyter_nbrequirements.egg-info/top_level.txt
writing manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
reading manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching 'CHANGELOG.md'
writing manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
running build_ext
Jupyter Require found itself running outside of Jupyter.
jupyter_nbrequirements (unittest.loader._FailedTest) ... ERROR

======================================================================
ERROR: jupyter_nbrequirements (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: jupyter_nbrequirements
Traceback (most recent call last):
  File "/usr/lib64/python3.6/unittest/loader.py", line 462, in _find_test_path
    package = self._get_module_from_name(name)
  File "/usr/lib64/python3.6/unittest/loader.py", line 369, in _get_module_from_name
    __import__(name)
  File "/workspace/repo/jupyter_nbrequirements/__init__.py", line 38, in <module>
    from jupyter_require import execute as executejs
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/__init__.py", line 34, in <module>
    from .notebook import link_css
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/notebook.py", line 30, in <module>
    from .core import execute_with_requirements
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/core.py", line 279, in <module>
    require = RequireJS()
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/core.py", line 90, in __new__
    raise EnvironmentError(msg)
OSError: Jupyter Require found itself running outside of Jupyter.


----------------------------------------------------------------------
Ran 1 test in 0.000s

FAILED (errors=1)
Test failed: <unittest.runner.TextTestResult run=1 errors=1 failures=0>
error: Test failed: <unittest.runner.TextTestResult run=1 errors=1 failures=0>

sesheta · 2020-11-03T11:40:07Z

Pytest Test failed! Click here

running test
Searching for setuptools>=40.3.0
Reading https://pypi.org/simple/setuptools/
Downloading https://files.pythonhosted.org/packages/6d/38/c21ef5034684ffc0412deefbb07d66678332290c14bb5269c85145fbd55e/setuptools-50.3.2-py3-none-any.whl#sha256=2c242a0856fbad7efbe560df4a7add9324f340cf48df43651e9604924466794a
Best match: setuptools 50.3.2
Processing setuptools-50.3.2-py3-none-any.whl
Installing setuptools-50.3.2-py3-none-any.whl to /workspace/repo/.eggs
writing requirements to /workspace/repo/.eggs/setuptools-50.3.2-py3.6.egg/EGG-INFO/requires.txt

Installed /workspace/repo/.eggs/setuptools-50.3.2-py3.6.egg
running egg_info
creating jupyter_nbrequirements.egg-info
writing jupyter_nbrequirements.egg-info/PKG-INFO
writing dependency_links to jupyter_nbrequirements.egg-info/dependency_links.txt
writing requirements to jupyter_nbrequirements.egg-info/requires.txt
writing top-level names to jupyter_nbrequirements.egg-info/top_level.txt
writing manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
reading manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching 'CHANGELOG.md'
writing manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
running build_ext
Jupyter Require found itself running outside of Jupyter.
jupyter_nbrequirements (unittest.loader._FailedTest) ... ERROR

======================================================================
ERROR: jupyter_nbrequirements (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: jupyter_nbrequirements
Traceback (most recent call last):
  File "/usr/lib64/python3.6/unittest/loader.py", line 462, in _find_test_path
    package = self._get_module_from_name(name)
  File "/usr/lib64/python3.6/unittest/loader.py", line 369, in _get_module_from_name
    __import__(name)
  File "/workspace/repo/jupyter_nbrequirements/__init__.py", line 38, in <module>
    from jupyter_require import execute as executejs
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/__init__.py", line 34, in <module>
    from .notebook import link_css
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/notebook.py", line 30, in <module>
    from .core import execute_with_requirements
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/core.py", line 279, in <module>
    require = RequireJS()
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/core.py", line 90, in __new__
    raise EnvironmentError(msg)
OSError: Jupyter Require found itself running outside of Jupyter.


----------------------------------------------------------------------
Ran 1 test in 0.000s

FAILED (errors=1)
Test failed: <unittest.runner.TextTestResult run=1 errors=1 failures=0>
error: Test failed: <unittest.runner.TextTestResult run=1 errors=1 failures=0>

sesheta · 2020-11-03T11:44:52Z

Pytest Test failed! Click here

running test
Searching for setuptools>=40.3.0
Reading https://pypi.org/simple/setuptools/
Downloading https://files.pythonhosted.org/packages/6d/38/c21ef5034684ffc0412deefbb07d66678332290c14bb5269c85145fbd55e/setuptools-50.3.2-py3-none-any.whl#sha256=2c242a0856fbad7efbe560df4a7add9324f340cf48df43651e9604924466794a
Best match: setuptools 50.3.2
Processing setuptools-50.3.2-py3-none-any.whl
Installing setuptools-50.3.2-py3-none-any.whl to /workspace/repo/.eggs
writing requirements to /workspace/repo/.eggs/setuptools-50.3.2-py3.6.egg/EGG-INFO/requires.txt

Installed /workspace/repo/.eggs/setuptools-50.3.2-py3.6.egg
running egg_info
creating jupyter_nbrequirements.egg-info
writing jupyter_nbrequirements.egg-info/PKG-INFO
writing dependency_links to jupyter_nbrequirements.egg-info/dependency_links.txt
writing requirements to jupyter_nbrequirements.egg-info/requires.txt
writing top-level names to jupyter_nbrequirements.egg-info/top_level.txt
writing manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
reading manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching 'CHANGELOG.md'
writing manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
running build_ext
Jupyter Require found itself running outside of Jupyter.
jupyter_nbrequirements (unittest.loader._FailedTest) ... ERROR

======================================================================
ERROR: jupyter_nbrequirements (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: jupyter_nbrequirements
Traceback (most recent call last):
  File "/usr/lib64/python3.6/unittest/loader.py", line 462, in _find_test_path
    package = self._get_module_from_name(name)
  File "/usr/lib64/python3.6/unittest/loader.py", line 369, in _get_module_from_name
    __import__(name)
  File "/workspace/repo/jupyter_nbrequirements/__init__.py", line 38, in <module>
    from jupyter_require import execute as executejs
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/__init__.py", line 34, in <module>
    from .notebook import link_css
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/notebook.py", line 30, in <module>
    from .core import execute_with_requirements
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/core.py", line 279, in <module>
    require = RequireJS()
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/core.py", line 90, in __new__
    raise EnvironmentError(msg)
OSError: Jupyter Require found itself running outside of Jupyter.


----------------------------------------------------------------------
Ran 1 test in 0.000s

FAILED (errors=1)
Test failed: <unittest.runner.TextTestResult run=1 errors=1 failures=0>
error: Test failed: <unittest.runner.TextTestResult run=1 errors=1 failures=0>

harshad16

This seems to be a good approach to deal with dependencies across the pipfile and json.

docs/0000-dependencies-management-jupyter-notebooks.md

sesheta · 2020-11-03T13:03:54Z

Pytest Test failed! Click here

running test
Searching for setuptools>=40.3.0
Reading https://pypi.org/simple/setuptools/
Downloading https://files.pythonhosted.org/packages/6d/38/c21ef5034684ffc0412deefbb07d66678332290c14bb5269c85145fbd55e/setuptools-50.3.2-py3-none-any.whl#sha256=2c242a0856fbad7efbe560df4a7add9324f340cf48df43651e9604924466794a
Best match: setuptools 50.3.2
Processing setuptools-50.3.2-py3-none-any.whl
Installing setuptools-50.3.2-py3-none-any.whl to /workspace/repo/.eggs
writing requirements to /workspace/repo/.eggs/setuptools-50.3.2-py3.6.egg/EGG-INFO/requires.txt

Installed /workspace/repo/.eggs/setuptools-50.3.2-py3.6.egg
running egg_info
creating jupyter_nbrequirements.egg-info
writing jupyter_nbrequirements.egg-info/PKG-INFO
writing dependency_links to jupyter_nbrequirements.egg-info/dependency_links.txt
writing requirements to jupyter_nbrequirements.egg-info/requires.txt
writing top-level names to jupyter_nbrequirements.egg-info/top_level.txt
writing manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
reading manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching 'CHANGELOG.md'
writing manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
running build_ext
Jupyter Require found itself running outside of Jupyter.
jupyter_nbrequirements (unittest.loader._FailedTest) ... ERROR

======================================================================
ERROR: jupyter_nbrequirements (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: jupyter_nbrequirements
Traceback (most recent call last):
  File "/usr/lib64/python3.6/unittest/loader.py", line 462, in _find_test_path
    package = self._get_module_from_name(name)
  File "/usr/lib64/python3.6/unittest/loader.py", line 369, in _get_module_from_name
    __import__(name)
  File "/workspace/repo/jupyter_nbrequirements/__init__.py", line 38, in <module>
    from jupyter_require import execute as executejs
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/__init__.py", line 34, in <module>
    from .notebook import link_css
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/notebook.py", line 30, in <module>
    from .core import execute_with_requirements
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/core.py", line 279, in <module>
    require = RequireJS()
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/core.py", line 90, in __new__
    raise EnvironmentError(msg)
OSError: Jupyter Require found itself running outside of Jupyter.


----------------------------------------------------------------------
Ran 1 test in 0.000s

FAILED (errors=1)
Test failed: <unittest.runner.TextTestResult run=1 errors=1 failures=0>
error: Test failed: <unittest.runner.TextTestResult run=1 errors=1 failures=0>

sesheta · 2020-11-03T13:04:27Z

Pytest Test failed! Click here

running test
Searching for setuptools>=40.3.0
Reading https://pypi.org/simple/setuptools/
Downloading https://files.pythonhosted.org/packages/6d/38/c21ef5034684ffc0412deefbb07d66678332290c14bb5269c85145fbd55e/setuptools-50.3.2-py3-none-any.whl#sha256=2c242a0856fbad7efbe560df4a7add9324f340cf48df43651e9604924466794a
Best match: setuptools 50.3.2
Processing setuptools-50.3.2-py3-none-any.whl
Installing setuptools-50.3.2-py3-none-any.whl to /workspace/repo/.eggs
writing requirements to /workspace/repo/.eggs/setuptools-50.3.2-py3.6.egg/EGG-INFO/requires.txt

Installed /workspace/repo/.eggs/setuptools-50.3.2-py3.6.egg
running egg_info
creating jupyter_nbrequirements.egg-info
writing jupyter_nbrequirements.egg-info/PKG-INFO
writing dependency_links to jupyter_nbrequirements.egg-info/dependency_links.txt
writing requirements to jupyter_nbrequirements.egg-info/requires.txt
writing top-level names to jupyter_nbrequirements.egg-info/top_level.txt
writing manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
reading manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching 'CHANGELOG.md'
writing manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
running build_ext
Jupyter Require found itself running outside of Jupyter.
jupyter_nbrequirements (unittest.loader._FailedTest) ... ERROR

======================================================================
ERROR: jupyter_nbrequirements (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: jupyter_nbrequirements
Traceback (most recent call last):
  File "/usr/lib64/python3.6/unittest/loader.py", line 462, in _find_test_path
    package = self._get_module_from_name(name)
  File "/usr/lib64/python3.6/unittest/loader.py", line 369, in _get_module_from_name
    __import__(name)
  File "/workspace/repo/jupyter_nbrequirements/__init__.py", line 38, in <module>
    from jupyter_require import execute as executejs
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/__init__.py", line 34, in <module>
    from .notebook import link_css
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/notebook.py", line 30, in <module>
    from .core import execute_with_requirements
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/core.py", line 279, in <module>
    require = RequireJS()
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/core.py", line 90, in __new__
    raise EnvironmentError(msg)
OSError: Jupyter Require found itself running outside of Jupyter.


----------------------------------------------------------------------
Ran 1 test in 0.000s

FAILED (errors=1)
Test failed: <unittest.runner.TextTestResult run=1 errors=1 failures=0>
error: Test failed: <unittest.runner.TextTestResult run=1 errors=1 failures=0>

sesheta · 2020-11-03T13:05:15Z

Pytest Test failed! Click here

running test
Searching for setuptools>=40.3.0
Reading https://pypi.org/simple/setuptools/
Downloading https://files.pythonhosted.org/packages/6d/38/c21ef5034684ffc0412deefbb07d66678332290c14bb5269c85145fbd55e/setuptools-50.3.2-py3-none-any.whl#sha256=2c242a0856fbad7efbe560df4a7add9324f340cf48df43651e9604924466794a
Best match: setuptools 50.3.2
Processing setuptools-50.3.2-py3-none-any.whl
Installing setuptools-50.3.2-py3-none-any.whl to /workspace/repo/.eggs
writing requirements to /workspace/repo/.eggs/setuptools-50.3.2-py3.6.egg/EGG-INFO/requires.txt

Installed /workspace/repo/.eggs/setuptools-50.3.2-py3.6.egg
running egg_info
creating jupyter_nbrequirements.egg-info
writing jupyter_nbrequirements.egg-info/PKG-INFO
writing dependency_links to jupyter_nbrequirements.egg-info/dependency_links.txt
writing requirements to jupyter_nbrequirements.egg-info/requires.txt
writing top-level names to jupyter_nbrequirements.egg-info/top_level.txt
writing manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
reading manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching 'CHANGELOG.md'
writing manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
running build_ext
Jupyter Require found itself running outside of Jupyter.
jupyter_nbrequirements (unittest.loader._FailedTest) ... ERROR

======================================================================
ERROR: jupyter_nbrequirements (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: jupyter_nbrequirements
Traceback (most recent call last):
  File "/usr/lib64/python3.6/unittest/loader.py", line 462, in _find_test_path
    package = self._get_module_from_name(name)
  File "/usr/lib64/python3.6/unittest/loader.py", line 369, in _get_module_from_name
    __import__(name)
  File "/workspace/repo/jupyter_nbrequirements/__init__.py", line 38, in <module>
    from jupyter_require import execute as executejs
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/__init__.py", line 34, in <module>
    from .notebook import link_css
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/notebook.py", line 30, in <module>
    from .core import execute_with_requirements
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/core.py", line 279, in <module>
    require = RequireJS()
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/core.py", line 90, in __new__
    raise EnvironmentError(msg)
OSError: Jupyter Require found itself running outside of Jupyter.


----------------------------------------------------------------------
Ran 1 test in 0.000s

FAILED (errors=1)
Test failed: <unittest.runner.TextTestResult run=1 errors=1 failures=0>
error: Test failed: <unittest.runner.TextTestResult run=1 errors=1 failures=0>

sesheta · 2020-11-03T13:05:18Z

Pytest Test failed! Click here

running test
Searching for setuptools>=40.3.0
Reading https://pypi.org/simple/setuptools/
Downloading https://files.pythonhosted.org/packages/6d/38/c21ef5034684ffc0412deefbb07d66678332290c14bb5269c85145fbd55e/setuptools-50.3.2-py3-none-any.whl#sha256=2c242a0856fbad7efbe560df4a7add9324f340cf48df43651e9604924466794a
Best match: setuptools 50.3.2
Processing setuptools-50.3.2-py3-none-any.whl
Installing setuptools-50.3.2-py3-none-any.whl to /workspace/repo/.eggs
writing requirements to /workspace/repo/.eggs/setuptools-50.3.2-py3.6.egg/EGG-INFO/requires.txt

Installed /workspace/repo/.eggs/setuptools-50.3.2-py3.6.egg
running egg_info
creating jupyter_nbrequirements.egg-info
writing jupyter_nbrequirements.egg-info/PKG-INFO
writing dependency_links to jupyter_nbrequirements.egg-info/dependency_links.txt
writing requirements to jupyter_nbrequirements.egg-info/requires.txt
writing top-level names to jupyter_nbrequirements.egg-info/top_level.txt
writing manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
reading manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching 'CHANGELOG.md'
writing manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
running build_ext
Jupyter Require found itself running outside of Jupyter.
jupyter_nbrequirements (unittest.loader._FailedTest) ... ERROR

======================================================================
ERROR: jupyter_nbrequirements (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: jupyter_nbrequirements
Traceback (most recent call last):
  File "/usr/lib64/python3.6/unittest/loader.py", line 462, in _find_test_path
    package = self._get_module_from_name(name)
  File "/usr/lib64/python3.6/unittest/loader.py", line 369, in _get_module_from_name
    __import__(name)
  File "/workspace/repo/jupyter_nbrequirements/__init__.py", line 38, in <module>
    from jupyter_require import execute as executejs
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/__init__.py", line 34, in <module>
    from .notebook import link_css
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/notebook.py", line 30, in <module>
    from .core import execute_with_requirements
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/core.py", line 279, in <module>
    require = RequireJS()
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/core.py", line 90, in __new__
    raise EnvironmentError(msg)
OSError: Jupyter Require found itself running outside of Jupyter.


----------------------------------------------------------------------
Ran 1 test in 0.001s

FAILED (errors=1)
Test failed: <unittest.runner.TextTestResult run=1 errors=1 failures=0>
error: Test failed: <unittest.runner.TextTestResult run=1 errors=1 failures=0>

goern · 2020-11-03T13:05:59Z

This seems to be a good decision, as it supports humans and cyborg and links experiments in notebooks with the corresponding software stacks

goern · 2020-11-03T13:06:20Z

/lgtm

MichaelClifford · 2020-11-03T16:42:50Z

docs/0000-dependencies-management-jupyter-notebooks.md

+In order to allow any user to re run the notebook with similar behaviour, it's important that each notebook is shipped with dependencies requirements
+that include direct and transitive dependencies. This would also enforce and support security, reproducibility, traceability.
+
+Each notebook should be treated as single component/service that uses its own dependencies, therefore when storing notebooks, they should be stored each in a specific repo.


Is this suggesting that each notebook should be stored in its own repo? This does not sound like a practical approach to me for many projects. But let me know if I'm misunderstanding the use case here : )

Is the assumption being made here that each notebook in a project would have such different dependencies that loading the shared set for the entire project for each notebook would be wasteful or that there would be some incompatibilities when building an image from it? In my experience most of the packages are shared and used across multiple notebooks in a project. Perhaps notebooks would have unique dependencies if we had one notebook for collecting and processing data, one for training a model, and one for serving inferences and reporting metrics. (but there would still be data and model dependencies they would need to share, and I don't think there would be any incompatibilities).

And I don't disagree that this approach can ensures reproducibility per notebook, but I'm not convinced the complexity associated with breaking a project up into multiple repos per notebook, out-weighs the benefit of absolute reproducibility.

How is this dependency issue managed for pure python projects? I assume there is not a unique repo and requirements for each *.py file. : )

Is this suggesting that each notebook should be stored in its own repo? This does not sound like a practical approach to me for many projects. But let me know if I'm misunderstanding the use case here : )

Maybe I should say directory sorry, not repo, what I meant is, if we use aicoe-aiops templates, in the notebooks directory, instead of having all notebooks directly, there are more subdirectories, each with notebook and dependencies.
This would enforce the reproducibility and also complexity in the notebook. If you think that you could have optimized images for each of the step: for example pre processing using spark, training with Tensorflow on GPU, deployment with Seldon using Edge device. Similar to python projects, is better to break in simple pieces for maintainability, readability, reduce testing of single parts and we can enforce for optimization purposes for example.

The software stack for each notebook would be smalled, build time would be smaller, images would contain less dependencies, less risk of incompatibilities. Instead of having a large monolitic AI project, we have separated the different tasks in different software stacks.

Is the assumption being made here that each notebook in a project would have such different dependencies that loading the shared set for the entire project for each notebook would be wasteful or that there would be some incompatibilities when building an image from it? In my experience, most of the packages are shared and used across multiple notebooks in a project.
Perhaps notebooks would have unique dependencies if we had one notebook for collecting and processing data, one for training a model, and one for serving inferences and reporting metrics. (but there would still be data and model dependencies they would need to share, and I don't think there would be any incompatibilities).

We should enforce creating a context for each notebook/step. Once you finished processing, you store inputs for training for example. You don't need a library to process more maybe in some cases (just assuming some cases where this can be done easily). You just create another step for post processing. This would be enforced also by template notebooks as we discussed before.

And I don't disagree that this approach can ensures reproducibility per notebook, but I'm not convinced the complexity associated with breaking a project up into multiple repos per notebook, out-weighs the benefit of absolute reproducibility.

I think it can be double, although consideration on splitting software stacks into smaller pieces I think can be also important as mentioned above, maybe we can let the user decide, with jupyter-nbrequirement we could have a parameter setting default place for dependencies that can be changed by the user if necessary: #276

How is this dependency issue managed for pure python projects? I assume there is not a unique repo and requirements for each *.py file. : )
Right, usually someone expects one single Pipfile/Pipfile.lock as we try to enforce in all repositories, but for some cases, like in ML case I would say it depends how complex is the application and how complex each task is, in general for reasons above mentioned might be more interesting to split and maintain. thoth-station/thamos#464

But just one prospective :) Thanks for the reviews @MichaelClifford

Is this suggesting that each notebook should be stored in its own repo? This does not sound like a practical approach to me for many projects. But let me know if I'm misunderstanding the use case here : )

Maybe I should say directory sorry, not repo, what I meant is, if we use aicoe-aiops templates, in the notebooks directory, instead of having all notebooks directly, there are more subdirectories, each with notebook and dependencies.
This would enforce the reproducibility and also complexity in the notebook. If you think that you could have optimized images for each of the step: for example pre processing using spark, training with Tensorflow on GPU, deployment with Seldon using Edge device. Similar to python projects, is better to break in simple pieces for maintainability, readability, reduce testing of single parts and we can enforce for optimization purposes for example.

The software stack for each notebook would be smalled, build time would be smaller, images would contain less dependencies, less risk of incompatibilities. Instead of having a large monolitic AI project, we have separated the different tasks in different software stacks.

I'm also not very happy about having a dir per notebook. It can easily explode in unnecessary dir traversals and hard to maintain git structure.

Is the assumption being made here that each notebook in a project would have such different dependencies that loading the shared set for the entire project for each notebook would be wasteful or that there would be some incompatibilities when building an image from it? In my experience, most of the packages are shared and used across multiple notebooks in a project.
Perhaps notebooks would have unique dependencies if we had one notebook for collecting and processing data, one for training a model, and one for serving inferences and reporting metrics. (but there would still be data and model dependencies they would need to share, and I don't think there would be any incompatibilities).

We should enforce creating a context for each notebook/step. Once you finished processing, you store inputs for training for example. You don't need a library to process more maybe in some cases (just assuming some cases where this can be done easily). You just create another step for post processing. This would be enforced also by template notebooks as we discussed before.

And I don't disagree that this approach can ensures reproducibility per notebook, but I'm not convinced the complexity associated with breaking a project up into multiple repos per notebook, out-weighs the benefit of absolute reproducibility.

I think it can be double, although consideration on splitting software stacks into smaller pieces I think can be also important as mentioned above, maybe we can let the user decide, with jupyter-nbrequirement we could have a parameter setting default place for dependencies that can be changed by the user if necessary: #276

Note splitting software stacks does not need to result in a better complexity. Pre-built container images with all the dependencies shipped (even though the software stack is not minimal) might result in faster response time and less user time spent on installing dependencies when opening a jupyter notebook.

Hence I see two aspects of this:

using jupyter-nbrequirements for managing dependencies in notebooks - this is easy to bootstrap and easy to start an experiment with. I as data scientist open a jupyter notebook and start my experiments, I install whatever software is needed to experiment with data. jupyter-nbrequirements should keep track of these dependencies

maintaining base container images with pre-built software stacks - this is something we can operate on - we can build and provide container images with specific set of dependencies. Users can select notebook with specific software (e.g. tensorflow+cuda) and run experimnts (as done now on ODH).

The first story will need some work to integrate easily. Managing dependencies directly in jupyter notebook JSON files is not a nice solution, managing Pipfile/Pipfile.lock+.thoth.yaml in a separate directory per notebook does not sound as nice UX neither.

jupyter-nbrequirements can still work with notebook requirements as done now. The story I see here:

If I, as a data-scientist, open a notebook, I use jupyter-nbrequirements to manage my dependencies. jupyter-nbrequirements keeps track of dependencies inside jupyternotebooks as metadata for reproducibility. It can export them to Pipfile/Pipfile.lock if user requests so, but explicitly. Otherwise it should act just as a thin client to talk to pypi/thoth to resolve and install software. Once the work is done, deps can be exported. To save devs time, a container image can be built with the desired set of dependencies (using exported pipfile+thoth.yaml that is managed inside a git repo).

Is this suggesting that each notebook should be stored in its own repo? This does not sound like a practical approach to me for many projects. But let me know if I'm misunderstanding the use case here : )

Maybe I should say directory sorry, not repo, what I meant is, if we use aicoe-aiops templates, in the notebooks directory, instead of having all notebooks directly, there are more subdirectories, each with notebook and dependencies.
This would enforce the reproducibility and also complexity in the notebook. If you think that you could have optimized images for each of the step: for example pre processing using spark, training with Tensorflow on GPU, deployment with Seldon using Edge device. Similar to python projects, is better to break in simple pieces for maintainability, readability, reduce testing of single parts and we can enforce for optimization purposes for example.
The software stack for each notebook would be smalled, build time would be smaller, images would contain less dependencies, less risk of incompatibilities. Instead of having a large monolitic AI project, we have separated the different tasks in different software stacks.

I'm also not very happy about having a dir per notebook. It can easily explode in unnecessary dir traversals and hard to maintain git structure.

Is the assumption being made here that each notebook in a project would have such different dependencies that loading the shared set for the entire project for each notebook would be wasteful or that there would be some incompatibilities when building an image from it? In my experience, most of the packages are shared and used across multiple notebooks in a project.
Perhaps notebooks would have unique dependencies if we had one notebook for collecting and processing data, one for training a model, and one for serving inferences and reporting metrics. (but there would still be data and model dependencies they would need to share, and I don't think there would be any incompatibilities).

We should enforce creating a context for each notebook/step. Once you finished processing, you store inputs for training for example. You don't need a library to process more maybe in some cases (just assuming some cases where this can be done easily). You just create another step for post processing. This would be enforced also by template notebooks as we discussed before.

And I don't disagree that this approach can ensures reproducibility per notebook, but I'm not convinced the complexity associated with breaking a project up into multiple repos per notebook, out-weighs the benefit of absolute reproducibility.

I think it can be double, although consideration on splitting software stacks into smaller pieces I think can be also important as mentioned above, maybe we can let the user decide, with jupyter-nbrequirement we could have a parameter setting default place for dependencies that can be changed by the user if necessary: #276

Note splitting software stacks does not need to result in a better complexity. Pre-built container images with all the dependencies shipped (even though the software stack is not minimal) might result in faster response time and less user time spent on installing dependencies when opening a jupyter notebook.

Hence I see two aspects of this:

using jupyter-nbrequirements for managing dependencies in notebooks - this is easy to bootstrap and easy to start an experiment with. I as data scientist open a jupyter notebook and start my experiments, I install whatever software is needed to experiment with data. jupyter-nbrequirements should keep track of these dependencies

maintaining base container images with pre-built software stacks - this is something we can operate on - we can build and provide container images with specific set of dependencies. Users can select notebook with specific software (e.g. tensorflow+cuda) and run experiments (as done now on ODH).

if using JupyterHub yes, but with Elyra things will be different, you have an AI pipeline where each notebook or python script is a step and for each step you need to select a runtime to be used once running the AI pipeline. In this case, I can choose not only images existing on ODH, but my own images created and available on some registry for example to run my specific step, maybe we built an optimized image for deployment or some specific image optimized by Thoth for performance for one step (training), which maybe is in conflict with one step which requires dask for heavy data processing on huge datasets to use on a remote cluster (something that can be done with Elyra and Kubeflow pipeline as well or if planned to use Jupyter Enterprise Gateway).

The first story will need some work to integrate easily. Managing dependencies directly in jupyter notebook JSON files is not a nice solution, managing Pipfile/Pipfile.lock+.thoth.yaml in a separate directory per notebook does not sound as nice UX neither.

jupyter-nbrequirements can still work with notebook requirements as done now. The story I see here:

If I, as a data-scientist, open a notebook, I use jupyter-nbrequirements to manage my dependencies. jupyter-nbrequirements keeps track of dependencies inside jupyternotebooks as metadata for reproducibility. It can export them to Pipfile/Pipfile.lock if user requests so, but explicitly. Otherwise it should act just as a thin client to talk to pypi/thoth to resolve and install software. Once the work is done, deps can be exported. To save devs time, a container image can be built with the desired set of dependencies (using exported pipfile+thoth.yaml that is managed inside a git repo).

Is this suggesting that each notebook should be stored in its own repo? This does not sound like a practical approach to me for many projects. But let me know if I'm misunderstanding the use case here : )

Maybe I should say directory sorry, not repo, what I meant is, if we use aicoe-aiops templates, in the notebooks directory, instead of having all notebooks directly, there are more subdirectories, each with notebook and dependencies.
This would enforce the reproducibility and also complexity in the notebook. If you think that you could have optimized images for each of the step: for example pre processing using spark, training with Tensorflow on GPU, deployment with Seldon using Edge device. Similar to python projects, is better to break in simple pieces for maintainability, readability, reduce testing of single parts and we can enforce for optimization purposes for example.
The software stack for each notebook would be smalled, build time would be smaller, images would contain less dependencies, less risk of incompatibilities. Instead of having a large monolitic AI project, we have separated the different tasks in different software stacks.

I'm also not very happy about having a dir per notebook. It can easily explode in unnecessary dir traversals and hard to maintain git structure.

Is the assumption being made here that each notebook in a project would have such different dependencies that loading the shared set for the entire project for each notebook would be wasteful or that there would be some incompatibilities when building an image from it? In my experience, most of the packages are shared and used across multiple notebooks in a project.
Perhaps notebooks would have unique dependencies if we had one notebook for collecting and processing data, one for training a model, and one for serving inferences and reporting metrics. (but there would still be data and model dependencies they would need to share, and I don't think there would be any incompatibilities).

We should enforce creating a context for each notebook/step. Once you finished processing, you store inputs for training for example. You don't need a library to process more maybe in some cases (just assuming some cases where this can be done easily). You just create another step for post processing. This would be enforced also by template notebooks as we discussed before.

And I don't disagree that this approach can ensures reproducibility per notebook, but I'm not convinced the complexity associated with breaking a project up into multiple repos per notebook, out-weighs the benefit of absolute reproducibility.

I think it can be double, although consideration on splitting software stacks into smaller pieces I think can be also important as mentioned above, maybe we can let the user decide, with jupyter-nbrequirement we could have a parameter setting default place for dependencies that can be changed by the user if necessary: #276

Note splitting software stacks does not need to result in a better complexity. Pre-built container images with all the dependencies shipped (even though the software stack is not minimal) might result in faster response time and less user time spent on installing dependencies when opening a jupyter notebook.
Hence I see two aspects of this:

using jupyter-nbrequirements for managing dependencies in notebooks - this is easy to bootstrap and easy to start an experiment with. I as data scientist open a jupyter notebook and start my experiments, I install whatever software is needed to experiment with data. jupyter-nbrequirements should keep track of these dependencies

maintaining base container images with pre-built software stacks - this is something we can operate on - we can build and provide container images with specific set of dependencies. Users can select notebook with specific software (e.g. tensorflow+cuda) and run experiments (as done now on ODH).

if using JupyterHub yes, but with Elyra things will be different, you have an AI pipeline where each notebook or python script is a step and for each step you need to select a runtime to be used once running the AI pipeline. In this case, I can choose not only images existing on ODH, but my own images created and available on some registry for example to run my specific step, maybe we built an optimized image for deployment or some specific image optimized by Thoth for performance for one step (training), which maybe is in conflict with one step which requires dask for heavy data processing on huge datasets to use on a remote cluster (something that can be done with Elyra and Kubeflow pipeline as well or if planned to use Jupyter Enterprise Gateway).

Does the runtime environment need to specified in the jupyter notebook itself? Can we use runtime autodiscovery for this as done in thamos?

BTW handling requirements could be also discussed with Jupyter upstream. They could be interested in this functionality to provide a better notebook experience.

Is this suggesting that each notebook should be stored in its own repo? This does not sound like a practical approach to me for many projects. But let me know if I'm misunderstanding the use case here : )

Maybe I should say directory sorry, not repo, what I meant is, if we use aicoe-aiops templates, in the notebooks directory, instead of having all notebooks directly, there are more subdirectories, each with notebook and dependencies.
This would enforce the reproducibility and also complexity in the notebook. If you think that you could have optimized images for each of the step: for example pre processing using spark, training with Tensorflow on GPU, deployment with Seldon using Edge device. Similar to python projects, is better to break in simple pieces for maintainability, readability, reduce testing of single parts and we can enforce for optimization purposes for example.
The software stack for each notebook would be smalled, build time would be smaller, images would contain less dependencies, less risk of incompatibilities. Instead of having a large monolitic AI project, we have separated the different tasks in different software stacks.

I'm also not very happy about having a dir per notebook. It can easily explode in unnecessary dir traversals and hard to maintain git structure.

Is the assumption being made here that each notebook in a project would have such different dependencies that loading the shared set for the entire project for each notebook would be wasteful or that there would be some incompatibilities when building an image from it? In my experience, most of the packages are shared and used across multiple notebooks in a project.
Perhaps notebooks would have unique dependencies if we had one notebook for collecting and processing data, one for training a model, and one for serving inferences and reporting metrics. (but there would still be data and model dependencies they would need to share, and I don't think there would be any incompatibilities).

We should enforce creating a context for each notebook/step. Once you finished processing, you store inputs for training for example. You don't need a library to process more maybe in some cases (just assuming some cases where this can be done easily). You just create another step for post processing. This would be enforced also by template notebooks as we discussed before.

And I don't disagree that this approach can ensures reproducibility per notebook, but I'm not convinced the complexity associated with breaking a project up into multiple repos per notebook, out-weighs the benefit of absolute reproducibility.

I think it can be double, although consideration on splitting software stacks into smaller pieces I think can be also important as mentioned above, maybe we can let the user decide, with jupyter-nbrequirement we could have a parameter setting default place for dependencies that can be changed by the user if necessary: #276

Note splitting software stacks does not need to result in a better complexity. Pre-built container images with all the dependencies shipped (even though the software stack is not minimal) might result in faster response time and less user time spent on installing dependencies when opening a jupyter notebook.
Hence I see two aspects of this:

using jupyter-nbrequirements for managing dependencies in notebooks - this is easy to bootstrap and easy to start an experiment with. I as data scientist open a jupyter notebook and start my experiments, I install whatever software is needed to experiment with data. jupyter-nbrequirements should keep track of these dependencies

maintaining base container images with pre-built software stacks - this is something we can operate on - we can build and provide container images with specific set of dependencies. Users can select notebook with specific software (e.g. tensorflow+cuda) and run experiments (as done now on ODH).

if using JupyterHub yes, but with Elyra things will be different, you have an AI pipeline where each notebook or python script is a step and for each step you need to select a runtime to be used once running the AI pipeline. In this case, I can choose not only images existing on ODH, but my own images created and available on some registry for example to run my specific step, maybe we built an optimized image for deployment or some specific image optimized by Thoth for performance for one step (training), which maybe is in conflict with one step which requires dask for heavy data processing on huge datasets to use on a remote cluster (something that can be done with Elyra and Kubeflow pipeline as well or if planned to use Jupyter Enterprise Gateway).

Does the runtime environment need to specified in the jupyter notebook itself? Can we use runtime autodiscovery for this as done in thamos?

You can create runtimes to be used in Kubeflow pipelines using elyra command line from console or from the UI button for runtimes. Once runtime exist, you can submit a notebook basically

BTW handling requirements could be also discussed with Jupyter upstream. They could be interested in this functionality to provide a better notebook experience.

Thanks @fridex we will open issue.

Signed-off-by: Francesco Murdaca <[email protected]>

sesheta · 2020-11-09T10:15:26Z

Pytest Test failed! Click here

running test
Searching for setuptools>=40.3.0
Reading https://pypi.org/simple/setuptools/
Downloading https://files.pythonhosted.org/packages/6d/38/c21ef5034684ffc0412deefbb07d66678332290c14bb5269c85145fbd55e/setuptools-50.3.2-py3-none-any.whl#sha256=2c242a0856fbad7efbe560df4a7add9324f340cf48df43651e9604924466794a
Best match: setuptools 50.3.2
Processing setuptools-50.3.2-py3-none-any.whl
Installing setuptools-50.3.2-py3-none-any.whl to /workspace/repo/.eggs
writing requirements to /workspace/repo/.eggs/setuptools-50.3.2-py3.6.egg/EGG-INFO/requires.txt

Installed /workspace/repo/.eggs/setuptools-50.3.2-py3.6.egg
running egg_info
creating jupyter_nbrequirements.egg-info
writing jupyter_nbrequirements.egg-info/PKG-INFO
writing dependency_links to jupyter_nbrequirements.egg-info/dependency_links.txt
writing requirements to jupyter_nbrequirements.egg-info/requires.txt
writing top-level names to jupyter_nbrequirements.egg-info/top_level.txt
writing manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
reading manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching 'CHANGELOG.md'
writing manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
running build_ext
Jupyter Require found itself running outside of Jupyter.
jupyter_nbrequirements (unittest.loader._FailedTest) ... ERROR

======================================================================
ERROR: jupyter_nbrequirements (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: jupyter_nbrequirements
Traceback (most recent call last):
  File "/usr/lib64/python3.6/unittest/loader.py", line 462, in _find_test_path
    package = self._get_module_from_name(name)
  File "/usr/lib64/python3.6/unittest/loader.py", line 369, in _get_module_from_name
    __import__(name)
  File "/workspace/repo/jupyter_nbrequirements/__init__.py", line 38, in <module>
    from jupyter_require import execute as executejs
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/__init__.py", line 34, in <module>
    from .notebook import link_css
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/notebook.py", line 30, in <module>
    from .core import execute_with_requirements
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/core.py", line 279, in <module>
    require = RequireJS()
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/core.py", line 90, in __new__
    raise EnvironmentError(msg)
OSError: Jupyter Require found itself running outside of Jupyter.


----------------------------------------------------------------------
Ran 1 test in 0.000s

FAILED (errors=1)
Test failed: <unittest.runner.TextTestResult run=1 errors=1 failures=0>
error: Test failed: <unittest.runner.TextTestResult run=1 errors=1 failures=0>

pacospace · 2020-11-11T13:28:10Z

Updated

MichaelClifford · 2020-11-11T20:14:02Z

docs/0000-dependencies-management-jupyter-notebooks.md

+
+* 1. Jupyter notebook without dependencies (no reproducibility)
+* 2. Jupyter notebook without dependencies embedded in json file but with Pipfile/Pipfile.lock always present (Jupyter notebook and requirements are decoupled)
+* 3. Jupyter notebook with dependencies embedded in json file of the notebook and Pipfile/Pipfile.lock present


Does this mean that for a repo, each notebook will live in its own dir with its own pipfile/pipfile.lock as well as having its dependencies embedded in json?

I'll admit I don't fully understand the dependency management process (and @fridex can probably answer this question better 😄 ), but isn't it redundant and potentially error prone to maintain dependencies both in the notebook and as a pipfile? Shouldn't the decision be one or the other? In which case, I think embedded would be the way to go for each notebook, with a single overarching project Pipfile for the whole repo (kinda of how projects are set up currently). Or is that what this Option 3 is saying already?

+1

We should keep dependencies embedded in the notebook all the time. Having them aside is an action that should be triggered explicitly when exporting them, or when importing dependency listing from Pipfile/Pipfile.lock.

Does this mean that for a repo, each notebook will live in its own dir with its own pipfile/pipfile.lock as well as having its dependencies embedded in json?

I'll admit I don't fully understand the dependency management process (and @fridex can probably answer this question better ), but isn't it redundant and potentially error prone to maintain dependencies both in the notebook and as a pipfile? Shouldn't the decision be one or the other? In which case, I think embedded would be the way to go for each notebook, with a single overarching project Pipfile for the whole repo (kinda of how projects are set up currently). Or is that what this Option 3 is saying already?

No, as we talked last DS meetup, we decided not to consider that option of one repo per notebook, but thinking of what you and @fridex said, maybe we can restructure in:

Jupyter notebook with dependencies embedded in json file of the notebook that can be optionally extracted.

But what about the main Pipfile/Pipfile.lock? If a work on three different notebooks, createing dependencies for each, they will be different.

If we want to create an image to run those notebooks, there is need for a single Pipfile/Pipfile.lock with the dependencies from all notebooks.

How do we deal with having one single Pipfile/Pipfile.lock and different notebooks, each with their own dependencies?
Maybe notebook 1 required only numpy, pandas and matplolib, but notebook 2 only tensorflow.

Do we need some way that is able to merge them, syncing a common Pipfile/Pipfile.lock that can be used to run them all?

How do we deal with having one single Pipfile/Pipfile.lock and different notebooks, each with their own dependencies?
Maybe notebook 1 required only numpy, pandas and matplolib, but notebook 2 only tensorflow.

Do we need some way that is able to merge them, syncing a common Pipfile/Pipfile.lock that can be used to run them all?

Yes, these files are just TOML and JSON files. We have tooling in thoth-python that can merge these files and keep consistency (e.g. check the computed hash, avoid duplicates, ...). The workflow should include Thoth - just Pipfile is created out of the all notebooks and Thoth resolves Pipfile.lock. Thoth part is required as these dependencies can have issues between them.

Thanks @fridex , I will proceed in this way!! I will update the ADR

Jupyter notebook with dependencies embedded in json file of the notebook that can be optionally extracted.

Sounds good to me.

Maybe add a bit more specificity to it? "Jupyter notebook with dependencies embedded in json file of the notebook that can be optionally extracted as a merged Pipfile via Thoth"

I think we will have two options:

One in notebook itself, to extract Pipfile/Pipfile.lock from the notebook

one other button, might be in the menu under kernels tab, that would look at all notebooks and create a merged Pipfile and Pipfile.lock.

Jupyter notebook with dependencies embedded in json file of the notebook that can be optionally extracted if user wants

If more notebooks are present, a common Pipfile can be created with a button that can automatically extract from all notebook dependencies and new common Pipfile.lock will be created. This would allow the creation of an image that can run the notebooks.

WDYT?

Thanks @MichaelClifford @fridex!

Signed-off-by: Francesco Murdaca <[email protected]>

sesheta · 2020-11-12T10:21:28Z

Pytest Test failed! Click here

running test
Searching for setuptools>=40.3.0
Reading https://pypi.org/simple/setuptools/
Downloading https://files.pythonhosted.org/packages/6d/38/c21ef5034684ffc0412deefbb07d66678332290c14bb5269c85145fbd55e/setuptools-50.3.2-py3-none-any.whl#sha256=2c242a0856fbad7efbe560df4a7add9324f340cf48df43651e9604924466794a
Best match: setuptools 50.3.2
Processing setuptools-50.3.2-py3-none-any.whl
Installing setuptools-50.3.2-py3-none-any.whl to /workspace/repo/.eggs
writing requirements to /workspace/repo/.eggs/setuptools-50.3.2-py3.6.egg/EGG-INFO/requires.txt

Installed /workspace/repo/.eggs/setuptools-50.3.2-py3.6.egg
running egg_info
creating jupyter_nbrequirements.egg-info
writing jupyter_nbrequirements.egg-info/PKG-INFO
writing dependency_links to jupyter_nbrequirements.egg-info/dependency_links.txt
writing requirements to jupyter_nbrequirements.egg-info/requires.txt
writing top-level names to jupyter_nbrequirements.egg-info/top_level.txt
writing manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
reading manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching 'CHANGELOG.md'
writing manifest file 'jupyter_nbrequirements.egg-info/SOURCES.txt'
running build_ext
Jupyter Require found itself running outside of Jupyter.
jupyter_nbrequirements (unittest.loader._FailedTest) ... ERROR

======================================================================
ERROR: jupyter_nbrequirements (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: jupyter_nbrequirements
Traceback (most recent call last):
  File "/usr/lib64/python3.6/unittest/loader.py", line 462, in _find_test_path
    package = self._get_module_from_name(name)
  File "/usr/lib64/python3.6/unittest/loader.py", line 369, in _get_module_from_name
    __import__(name)
  File "/workspace/repo/jupyter_nbrequirements/__init__.py", line 38, in <module>
    from jupyter_require import execute as executejs
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/__init__.py", line 34, in <module>
    from .notebook import link_css
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/notebook.py", line 30, in <module>
    from .core import execute_with_requirements
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/core.py", line 279, in <module>
    require = RequireJS()
  File "/opt/app-root/lib/python3.6/site-packages/jupyter_require/core.py", line 90, in __new__
    raise EnvironmentError(msg)
OSError: Jupyter Require found itself running outside of Jupyter.


----------------------------------------------------------------------
Ran 1 test in 0.000s

FAILED (errors=1)
Test failed: <unittest.runner.TextTestResult run=1 errors=1 failures=0>
error: Test failed: <unittest.runner.TextTestResult run=1 errors=1 failures=0>

pacospace · 2020-11-12T12:53:54Z

I will add this ADR also to jupyterlab extension if all good for you @MichaelClifford @fridex. See: thoth-station/jupyterlab-requirements#5

MichaelClifford

LGTM

pacospace · 2020-11-12T14:11:58Z

/approve

sesheta · 2020-11-12T14:12:02Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MichaelClifford, pacospace

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [pacospace]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Introduce ADR for dependencies management in Jupyter notebooks

2c3820a

Signed-off-by: Francesco Murdaca <[email protected]>

pacospace requested review from goern, fridex and harshad16 November 3, 2020 11:25

sesheta self-requested a review November 3, 2020 11:25

sesheta added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 3, 2020

Francesco Murdaca added 2 commits November 3, 2020 12:26

Add title

e3f1aab

Signed-off-by: Francesco Murdaca <[email protected]>

correct sentences

e17c0e1

Signed-off-by: Francesco Murdaca <[email protected]>

harshad16 reviewed Nov 3, 2020

View reviewed changes

pacospace requested a review from harshad16 November 3, 2020 12:57

MichaelClifford reviewed Nov 3, 2020

View reviewed changes

Update adr options

fd98d8a

Signed-off-by: Francesco Murdaca <[email protected]>

pacospace force-pushed the adr branch from 0f24fcd to fd98d8a Compare November 9, 2020 10:10

MichaelClifford reviewed Nov 11, 2020

View reviewed changes

Update

e3c1165

Signed-off-by: Francesco Murdaca <[email protected]>

pacospace mentioned this pull request Nov 12, 2020

Provide way to create a common Pipfile/Pipfile.lock optimized by Thoth from different notebook requirements thoth-station/jupyterlab-requirements#6

Closed

MichaelClifford approved these changes Nov 12, 2020

View reviewed changes

sesheta added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 12, 2020

sesheta merged commit a588482 into thoth-station:master Nov 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce ADR for dependencies management in Jupyter notebooks #282

Introduce ADR for dependencies management in Jupyter notebooks #282

pacospace commented Nov 3, 2020

sesheta commented Nov 3, 2020

sesheta commented Nov 3, 2020

sesheta commented Nov 3, 2020

harshad16 left a comment

sesheta commented Nov 3, 2020

sesheta commented Nov 3, 2020

sesheta commented Nov 3, 2020

sesheta commented Nov 3, 2020

goern commented Nov 3, 2020

goern commented Nov 3, 2020

MichaelClifford Nov 3, 2020

pacospace Nov 3, 2020 •

edited

Loading

fridex Nov 3, 2020

pacospace Nov 4, 2020 •

edited

Loading

fridex Nov 4, 2020

pacospace Nov 4, 2020 •

edited

Loading

sesheta commented Nov 9, 2020

pacospace commented Nov 11, 2020

MichaelClifford Nov 11, 2020

fridex Nov 12, 2020

pacospace Nov 12, 2020

fridex Nov 12, 2020

pacospace Nov 12, 2020

MichaelClifford Nov 12, 2020

pacospace Nov 12, 2020 •

edited

Loading

MichaelClifford Nov 12, 2020

pacospace Nov 12, 2020

sesheta commented Nov 12, 2020

pacospace commented Nov 12, 2020

MichaelClifford left a comment

pacospace commented Nov 12, 2020

sesheta commented Nov 12, 2020

Introduce ADR for dependencies management in Jupyter notebooks #282

Introduce ADR for dependencies management in Jupyter notebooks #282

Conversation

pacospace commented Nov 3, 2020

sesheta commented Nov 3, 2020

sesheta commented Nov 3, 2020

sesheta commented Nov 3, 2020

harshad16 left a comment

Choose a reason for hiding this comment

sesheta commented Nov 3, 2020

sesheta commented Nov 3, 2020

sesheta commented Nov 3, 2020

sesheta commented Nov 3, 2020

goern commented Nov 3, 2020

goern commented Nov 3, 2020

Choose a reason for hiding this comment

pacospace Nov 3, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pacospace Nov 4, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pacospace Nov 4, 2020 • edited Loading

Choose a reason for hiding this comment

sesheta commented Nov 9, 2020

pacospace commented Nov 11, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pacospace Nov 12, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sesheta commented Nov 12, 2020

pacospace commented Nov 12, 2020

MichaelClifford left a comment

Choose a reason for hiding this comment

pacospace commented Nov 12, 2020

sesheta commented Nov 12, 2020

pacospace Nov 3, 2020 •

edited

Loading

pacospace Nov 4, 2020 •

edited

Loading

pacospace Nov 4, 2020 •

edited

Loading

pacospace Nov 12, 2020 •

edited

Loading