Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ifp_egm] Tidy up EGM, remove jitclass #144

Merged
merged 21 commits into from
Mar 7, 2024
Merged

[ifp_egm] Tidy up EGM, remove jitclass #144

merged 21 commits into from
Mar 7, 2024

Conversation

jstac
Copy link
Contributor

@jstac jstac commented Mar 4, 2024

Replaces #130 and fixes #128

Copy link

netlify bot commented Mar 4, 2024

Deploy Preview for incomparable-parfait-2417f8 ready!

Name Link
🔨 Latest commit f1bf2a6
🔍 Latest deploy log https://app.netlify.com/sites/incomparable-parfait-2417f8/deploys/65e90944fbe36a0008ea6088
😎 Deploy Preview https://deploy-preview-144--incomparable-parfait-2417f8.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@jstac
Copy link
Contributor Author

jstac commented Mar 4, 2024

Replaces #130 and fixes #128

@shlff
Copy link
Member

shlff commented Mar 4, 2024

Hi @jstac I read the log file of the error and here are

nbclient.exceptions.CellExecutionError: An error occurred while executing the following cell:
------------------
qe.tic()
a_star_egm_nb, σ_star_egm_nb = successive_approx_numba(model,
                                                        print_skip=100)
qe.toc()
------------------

and it is related to the Python buffer protocal

[0;31mBufferError�[0m: INVALID_ARGUMENT: Python buffer protocol is only defined for CPU buffers.

I guess we still need to specify the datatype for numba jit.

@jstac
Copy link
Contributor Author

jstac commented Mar 4, 2024

Hi @mmcky , this runs fine on my machine with the latest Anaconda and `pip install -U "jax[cpu]".

I guess we still need to specify the datatype for numba jit.

Can you please clarify your comment @shlff ? Or add a fix?

@mmcky
Copy link
Contributor

mmcky commented Mar 4, 2024

Python buffer protocol is only defined for CPU buffers.

@jstac this is a new one. It looks like some part of the code is only suitable for running on cpu? I will need to dig into this a bit more.

@jstac
Copy link
Contributor Author

jstac commented Mar 4, 2024

Thanks @mmcky :-)

It looks like some part of the code is only suitable for running on cpu?

Wow, I hope not. The biggest selling point of JAX is it's hardware agnostic...

@mmcky
Copy link
Contributor

mmcky commented Mar 4, 2024

Thanks @mmcky :-)

It looks like some part of the code is only suitable for running on cpu?

Wow, I hope not. The biggest selling point of JAX is it's hardware agnostic...

@jstac my suspicion is an incompatibility with CUDA=12.3.1 so I will reset our docker to be CUDA=12.1.0 with latest anaconda and see if that works.

This is the cause previously.

jax-ml/jax#16440

@jstac
Copy link
Contributor Author

jstac commented Mar 4, 2024

Thanks @mmcky, nice work.

We need to make sure it runs on colab too, but we should be okay because usually colab's cuda driver is a few generations old.

@mmcky
Copy link
Contributor

mmcky commented Mar 4, 2024

thanks @jstac good reminder re: colab.

@mmcky mmcky changed the title Tidy up EGM, remove jitclass [ifp_egm] Tidy up EGM, remove jitclass Mar 4, 2024
@mmcky
Copy link
Contributor

mmcky commented Mar 4, 2024

@jstac mystery deepens here. It is producing the same error on the previous environment using CUDA=12.1.0 as well -- so I will need to look a bit more closely at this code. Will report back.

@mmcky
Copy link
Contributor

mmcky commented Mar 4, 2024

@jstac this is confusing -- as the issue is with the numba code (on github actions) so probably unrelated to gpu as initially thought.

My local environment is running the code just fine.

My next guess is the list of arrays that is unpacked by the numba function - but I don't follow why it happens on github actions and not locally. John can you let me know your numba version -- I am using 0.58 locally and the cloud is using 0.59 (however my local runs on numba=0.59 as well)

@mmcky
Copy link
Contributor

mmcky commented Mar 5, 2024

@jstac while I was reviewing the code I thought c4531c5 would help to clarify which jit is being used. I was just rulling out if jax.jit was being used. I can revert if you would like.

@jstac
Copy link
Contributor Author

jstac commented Mar 5, 2024

I'm using the latest Anaconda, freshly downloaded, with numba=0.59.

@mmcky
Copy link
Contributor

mmcky commented Mar 5, 2024

I'm using the latest Anaconda, freshly downloaded, with numba=0.59.

So that's the same as the cloud. 😵‍💫

@mmcky
Copy link
Contributor

mmcky commented Mar 5, 2024

I'm using the latest Anaconda, freshly downloaded, with numba=0.59.

So that's the same as the cloud. 😵‍💫

@jstac can you let me know what python version you have

It looks like python==3.12 is on the cloud server instance

https://www.python.org/downloads/release/python-3120/

and the release notes add https://peps.python.org/pep-0688/ which is all about buffers so I will try and pin python==3.11

Not sure where it is coming from but anaconda should have Installer Python Version: 3.11.7

@jstac
Copy link
Contributor Author

jstac commented Mar 5, 2024

Python 3.11.7 (main, Dec 15 2023, 18:12:31) [GCC 11.2.0] on linux

@jstac
Copy link
Contributor Author

jstac commented Mar 5, 2024

@jstac while I was reviewing the code I thought c4531c5 would help to clarify which jit is being used. I was just rulling out if jax.jit was being used. I can revert if you would like.

It's a good idea to clarify. I usually use numba.jit and jax.jit if I want to be explicit (which is maybe clearer than nb.jit at first glance). It would be a good idea to add this here, and to all jax lectures that mix numba and jax...

@mmcky
Copy link
Contributor

mmcky commented Mar 5, 2024

thanks @jstac I will update.

The python=3.12 vs python=3.11 has been fixed and still failing. Currently out of ideas but will try and replicate on another machine next.

@mmcky
Copy link
Contributor

mmcky commented Mar 5, 2024

@jstac this is a gnarly one. It does appear to be an issue with this specific PR as the cache rebuilt last night (with this lecture using jitclass). I might give it a rest for today -- and see if I can think of what might be causing this with numba again tomorrow morning.

@jstac
Copy link
Contributor Author

jstac commented Mar 5, 2024

Sure @mmcky , a rest will probably help.

@mmcky
Copy link
Contributor

mmcky commented Mar 5, 2024

@jstac short of some experimental rewrites I have no idea why this doesn't work on the cloud computer and works on all my local machines (and yours!).

I am going to launch an instance and see if I can replicate there manually.

@mmcky
Copy link
Contributor

mmcky commented Mar 6, 2024

@jstac just a quick update. I have now tried running directly on the p2.xlarge instance and lecture-jax runs just fine there as well on ubuntu. I guess that leaves the docker layer, but it isn't clear why that would be causing the issue but running out of ideas on this one.

@mmcky
Copy link
Contributor

mmcky commented Mar 6, 2024

I have extracted the actual traceback from _build folder that is generated but it isn't a lot of help.

---------------------------------------------------------------------------
BufferError                               Traceback (most recent call last)
Cell In[13], line 2
      1 qe.tic()
----> 2 a_star_egm_nb, σ_star_egm_nb = successive_approx_numba(model,
      3                                                         print_skip=100)
      4 qe.toc()

Cell In[10], line 23, in successive_approx_numba(model, tol, max_iter, verbose, print_skip)
     20 error = tol + 1
     22 while i < max_iter and error > tol:
---> 23     a_new, σ_new = K_egm_nb(a_vec, σ_vec, constants, sizes, arrays)
     24     error = np.max(np.abs(σ_vec - σ_new))
     25     i += 1

File /opt/conda/envs/quantecon/lib/python3.11/site-packages/numba/core/dispatcher.py:733, in _DispatcherBase.typeof_pyval(self, val)
    730 # Not going through the resolve_argument_type() indirection
    731 # can save a couple µs.
    732 try:
--> 733     tp = typeof(val, Purpose.argument)
    734 except ValueError:
    735     tp = types.pyobject

File /opt/conda/envs/quantecon/lib/python3.11/site-packages/numba/core/typing/typeof.py:33, in typeof(val, purpose)
     31 # Note the behaviour for Purpose.argument must match _typeof.c.
     32 c = _TypeofContext(purpose)
---> 33 ty = typeof_impl(val, c)
     34 if ty is None:
     35     msg = _termcolor.errmsg(
     36         f"Cannot determine Numba type of {type(val)}")

File /opt/conda/envs/quantecon/lib/python3.11/functools.py:909, in singledispatch.<locals>.wrapper(*args, **kw)
    905 if not args:
    906     raise TypeError(f'{funcname} requires at least '
    907                     '1 positional argument')
--> 909 return dispatch(args[0].__class__)(*args, **kw)

File /opt/conda/envs/quantecon/lib/python3.11/site-packages/numba/core/typing/typeof.py:175, in _typeof_tuple(val, c)
    173 @typeof_impl.register(tuple)
    174 def _typeof_tuple(val, c):
--> 175     tys = [typeof_impl(v, c) for v in val]
    176     if any(ty is None for ty in tys):
    177         return

File /opt/conda/envs/quantecon/lib/python3.11/site-packages/numba/core/typing/typeof.py:175, in <listcomp>(.0)
    173 @typeof_impl.register(tuple)
    174 def _typeof_tuple(val, c):
--> 175     tys = [typeof_impl(v, c) for v in val]
    176     if any(ty is None for ty in tys):
    177         return

File /opt/conda/envs/quantecon/lib/python3.11/functools.py:909, in singledispatch.<locals>.wrapper(*args, **kw)
    905 if not args:
    906     raise TypeError(f'{funcname} requires at least '
    907                     '1 positional argument')
--> 909 return dispatch(args[0].__class__)(*args, **kw)

File /opt/conda/envs/quantecon/lib/python3.11/site-packages/numba/core/typing/typeof.py:46, in typeof_impl(val, c)
     41 @singledispatch
     42 def typeof_impl(val, c):
     43     """
     44     Generic typeof() implementation.
     45     """
---> 46     tp = _typeof_buffer(val, c)
     47     if tp is not None:
     48         return tp

File /opt/conda/envs/quantecon/lib/python3.11/site-packages/numba/core/typing/typeof.py:69, in _typeof_buffer(val, c)
     67 from numba.core.typing import bufproto
     68 try:
---> 69     m = memoryview(val)
     70 except TypeError:
     71     return

BufferError: INVALID_ARGUMENT: Python buffer protocol is only defined for CPU buffers.

@mmcky
Copy link
Contributor

mmcky commented Mar 6, 2024

@jstac gave this my all this morning -- but I am going to move on and work on closing a few other issues (so I feel somewhat productive today :-)). Do you need this merged soon?

Update:

  • tried using an ssh reverse tunnel action to halt execution to see what is going on at the server level but that action has issues so may need to use upterm directly to initiate a connect. (NEXT STEP)
  • established this is likely due to running inside of a Docker Container (rather than directly on a server)

@jstac
Copy link
Contributor Author

jstac commented Mar 6, 2024

Thanks @mmcky . Frustrating...

Sure, it can wait a few days.

@mmcky
Copy link
Contributor

mmcky commented Mar 6, 2024

thanks @jstac I have made the current main live in the meantime.

Copy link

github-actions bot commented Mar 6, 2024

@github-actions github-actions bot temporarily deployed to pull request March 6, 2024 22:15 Inactive
@mmcky
Copy link
Contributor

mmcky commented Mar 6, 2024

@jstac honestly I don't fully understand why this works 180b02e but there is something about the more complex interface to the numba function unpacking the passed tuples / lists.

  • try and pass the model (named tuple) in directly?

https://65e8eb0f61066165def8d8b7--incomparable-parfait-2417f8.netlify.app/ifp_egm.html

@github-actions github-actions bot temporarily deployed to pull request March 7, 2024 00:18 Inactive
@github-actions github-actions bot temporarily deployed to pull request March 7, 2024 00:39 Inactive
@mmcky
Copy link
Contributor

mmcky commented Mar 7, 2024

@jstac this commit f1bf2a6 is my recommended fix.

The only change from your code is that it updates arrays with the np.arrays map

arrays = tuple(map(np.array, arrays))

which must help numba figure out types (or stops it from trying to autocast into np perhaps and it fails in some environments with the python buffer object).

@mmcky
Copy link
Contributor

mmcky commented Mar 7, 2024

@jstac
Copy link
Contributor Author

jstac commented Mar 7, 2024

Hats off to you @mmcky :-)

Well done 🥇

@jstac jstac merged commit 3a88350 into main Mar 7, 2024
7 checks passed
@jstac jstac deleted the egm_remove_jitclass branch March 7, 2024 01:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ifp_egm] use namedtuple
3 participants