Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guidelines on run Pace v0.1.0 with GPU? #371

Closed
miaoneng opened this issue Oct 31, 2022 · 24 comments
Closed

Guidelines on run Pace v0.1.0 with GPU? #371

miaoneng opened this issue Oct 31, 2022 · 24 comments

Comments

@miaoneng
Copy link

miaoneng commented Oct 31, 2022

I am trying to set up a lab to replicate the Pace v0. 1: A Python-based Performance-Portable Implementation of the FV3 Dynamical Core Due to #355 I cannot access the docker environment as provided in the docs (i.e., make dev doesn't work). So I tried to start with provided Dockerfile

I am using v0.1.0 release because I am assuming this is the version for the submission, and I modified requirements_dev.txt to install gt4py with cuda117 features, like gt4py[cuda117]

For sake of simplicity, I started with Nvidia's docker images. Here is my Dockerfile

FROM nvidia/cuda:11.7.0-devel-ubuntu22.04
RUN apt-get update && apt-get install -y make \
    software-properties-common \
    libopenmpi3 \
    libopenmpi-dev \
    libboost-all-dev \
    libhdf5-serial-dev \
    netcdf-bin \
    libnetcdf-dev \
    python3 \
    python3-pip
RUN pip3 install --upgrade setuptools wheel pip packaging
COPY . /pace
RUN cd /pace && \
    pip3 install -r /pace/requirements_dev.txt && \
    python3 -m gt4py.gt_src_manager install
RUN rm -rf /pace
ENV OMPI_ALLOW_RUN_AS_ROOT=1
ENV OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1

Then I modified driver/examples/configs/baroclinic_c12.yaml to change backend to cuda (which I believe cupy is invoked as the code generator). My modified top part is like:

stencil_config:
  compilation_config:
    backend: cuda
    rebuild: true
    validate_args: true
    format_source: false
    device_sync: true

Then I run the command line as

mpirun --allow-run-as-root --mca btl_vader_single_copy_mechanism none --oversubscribe -n 6 pyth
on3 -m pace.driver.run driver/examples/configs/baroclinic_c12.yaml

inside the Docker image.

After kernel is compiled, the program crashed as following.

[8181c3a7fe69:00151] Signal: Segmentation fault (11)
[8181c3a7fe69:00151] Signal code: Invalid permissions (2)
[8181c3a7fe69:00151] Failing at address: 0xb02920000
[8181c3a7fe69:00151] [ 0] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f97d32ff520]
[8181c3a7fe69:00151] [ 1] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x1a094d)[0x7f97d345d94d]
[8181c3a7fe69:00151] [ 2] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so(+0x3244)[0x7f974f2df244]
[8181c3a7fe69:00151] [ 3] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so(mca_pml_ob1_send_request_start_prepare+0x44)[0x7f974f2b8784]
[8181c3a7fe69:00151] [ 4] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so(mca_pml_ob1_isend+0x36d)[0x7f974f2b215d]
[8181c3a7fe69:00151] [ 5] /usr/lib/x86_64-linux-gnu/libmpi.so.40(MPI_Isend+0x12d)[0x7f9745f43b5d]
[8181c3a7fe69:00151] [ 6] /usr/local/lib/python3.10/dist-packages/mpi4py/MPI.cpython-310-x86_64-linux-gnu.so(+0xf61da)[0x7f97463901da]
[8181c3a7fe69:00151] [ 7] python3(+0x15c8de)[0x55f4995a18de]
[8181c3a7fe69:00151] [ 8] python3(PyObject_Call+0xbb)[0x55f4995b075b]
[8181c3a7fe69:00151] [ 9] python3(_PyEval_EvalFrameDefault+0x2955)[0x55f49958cb25]
[8181c3a7fe69:00151] [10] python3(+0x16ab11)[0x55f4995afb11]
[8181c3a7fe69:00151] [11] python3(_PyEval_EvalFrameDefault+0x1a31)[0x55f49958bc01]
[8181c3a7fe69:00151] [12] python3(_PyFunction_Vectorcall+0x7c)[0x55f4995a212c]
[8181c3a7fe69:00151] [13] python3(_PyEval_EvalFrameDefault+0x816)[0x55f49958a9e6]
[8181c3a7fe69:00151] [14] python3(_PyFunction_Vectorcall+0x7c)[0x55f4995a212c]
[8181c3a7fe69:00151] [15] python3(_PyEval_EvalFrameDefault+0x816)[0x55f49958a9e6]
[8181c3a7fe69:00151] [16] python3(+0x16ab11)[0x55f4995afb11]
[8181c3a7fe69:00151] [17] python3(_PyEval_EvalFrameDefault+0x1a31)[0x55f49958bc01]
[8181c3a7fe69:00151] [18] python3(_PyFunction_Vectorcall+0x7c)[0x55f4995a212c]
[8181c3a7fe69:00151] [19] python3(_PyEval_EvalFrameDefault+0x816)[0x55f49958a9e6]
[8181c3a7fe69:00151] [20] python3(_PyFunction_Vectorcall+0x7c)[0x55f4995a212c]
[8181c3a7fe69:00151] [21] python3(_PyObject_FastCallDictTstate+0x16d)[0x55f4995975fd]
[8181c3a7fe69:00151] [22] python3(+0x166d74)[0x55f4995abd74]
[8181c3a7fe69:00151] [23] python3(_PyObject_MakeTpCall+0x1fc)[0x55f49959835c]
[8181c3a7fe69:00151] [24] python3(_PyEval_EvalFrameDefault+0x73b3)[0x55f499591583]
[8181c3a7fe69:00151] [25] python3(+0x16ab11)[0x55f4995afb11]
[8181c3a7fe69:00151] [26] python3(_PyEval_EvalFrameDefault+0x1a31)[0x55f49958bc01]
[8181c3a7fe69:00151] [27] python3(+0x16ab11)[0x55f4995afb11]
[8181c3a7fe69:00151] [28] python3(_PyEval_EvalFrameDefault+0x1a31)[0x55f49958bc01]
[8181c3a7fe69:00151] [29] python3(_PyFunction_Vectorcall+0x7c)[0x55f4995a212c]
[8181c3a7fe69:00151] *** End of error message ***
python3(_PyEval_EvalFrameDefault+0x1a31)[0x5594b79d7c01]
[8181c3a7fe69:00152] [27] python3(+0x16ab11)[0x5594b79fbb11]
[8181c3a7fe69:00152] [28] python3(_PyEval_EvalFrameDefault+0x1a31)[0x5594b79d7c01]
[8181c3a7fe69:00152] [29] python3(_PyFunction_Vectorcall+0x7c)[0x5594b79ee12c]
[8181c3a7fe69:00152] *** End of error message ***

BTW, numpy backend works, but due to missing a written guideline to run with GPU backend. I am not sure if I am using a right way.

Could you help me to triage the issue or provide any additional instructions?

Thank you.

@miaoneng
Copy link
Author

miaoneng commented Nov 1, 2022

I think the above issue is caused by openmpi is not CUDA-aware installed from apt, and potentially cupy-cuda117 runtime isn't compatible.

I recompiled openmpi and cupy on p3 AWS instance and it now makes progress. However, I am not sure if pace is actually executing anything as I keep seeing compiling outputs as below (only a fraction of all outputs)

How can I tell when pace completed compiling and run dycore/phys? Do I need 1 gpu per node per rank to make it work?

            instantiation of "void gridtools::stencil::gpu_backend::launch_kernel_impl_::zero_extent_wrapper<NumThreads,BlockSizeI,BlockSizeJ,Fun>(Fun, gridtools::int_t, gridtools::int_t) [with NumThreads=512UL, BlockSizeI=64, BlockSizeJ=8, Fun=set_k0_and_calc_wk_impl_::kernel_139783429400624_f<set_k0_and_calc_wk_impl_::loop_139783429588208_f<gridtools::sid::composite::keys<set_k0_and_calc_wk_impl_::tag::pk3, set_k0_and_calc_wk_impl_::tag::wk, set_k0_and_calc_wk_impl_::tag::pp, set_k0_and_calc_wk_impl_::tag::top_value>::values<gridtools::sid::block_impl_::blocked_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::synthetic_impl_::synthetic<gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::upper_bounds, gridtools::array<size_t, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::lower_bounds, gridtools::array<gridtools::integral_constant<size_t, 0UL>, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides_kind, gridtools::python_sid_adapter_impl_::kind<3UL, gridtools::integral_constant<int, 2>>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides, gridtools::tuple<gridtools::integral_constant<pybind11::ssize_t, 1L>, std::size_t, std::size_t>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::origin, gridtools::sid::host_device::simple_ptr_holder<double *>>>, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>> &, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>>, gridtools::hymap::keys<gridtools::stencil::dim::i, gridtools::stencil::dim::j>::values<set_k0_and_calc_wk_impl_::i_block_size_t, set_k0_and_calc_wk_impl_::j_block_size_t>>, gridtools::sid::block_impl_::blocked_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::synthetic_impl_::synthetic<gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::upper_bounds, gridtools::array<size_t, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::lower_bounds, gridtools::array<gridtools::integral_constant<size_t, 0UL>, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides_kind, gridtools::python_sid_adapter_impl_::kind<3UL, gridtools::integral_constant<int, 3>>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides, gridtools::tuple<gridtools::integral_constant<pybind11::ssize_t, 1L>, std::size_t, std::size_t>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::origin, gridtools::sid::host_device::simple_ptr_holder<double *>>>, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>> &, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>>, gridtools::hymap::keys<gridtools::stencil::dim::i, gridtools::stencil::dim::j>::values<set_k0_and_calc_wk_impl_::i_block_size_t, set_k0_and_calc_wk_impl_::j_block_size_t>>, gridtools::sid::block_impl_::blocked_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::synthetic_impl_::synthetic<gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::upper_bounds, gridtools::array<size_t, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::lower_bounds, gridtools::array<gridtools::integral_constant<size_t, 0UL>, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides_kind, gridtools::python_sid_adapter_impl_::kind<3UL, gridtools::integral_constant<int, 1>>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides, gridtools::tuple<gridtools::integral_constant<pybind11::ssize_t, 1L>, std::size_t, std::size_t>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::origin, gridtools::sid::host_device::simple_ptr_holder<double *>>>, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>> &, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>>, gridtools::hymap::keys<gridtools::stencil::dim::i, gridtools::stencil::dim::j>::values<set_k0_and_calc_wk_impl_::i_block_size_t, set_k0_and_calc_wk_impl_::j_block_size_t>>, gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::stencil::global_parameter_impl_::global_parameter<double> &, gridtools::tuple<>, gridtools::tuple<>>>>>]"
/home/ubuntu/pace/external/gt4py/src/gt4py/_external_src/gridtools/include/gridtools/stencil/gpu/launch_kernel.hpp(192): here
            instantiation of "void gridtools::stencil::gpu_backend::launch_kernel_impl_::launch_kernel<Extent,BlockSizeI,BlockSizeJ,Fun,<unnamed>>(gridtools::int_t, gridtools::int_t, gridtools::uint_t, Fun, size_t) [with Extent=gridtools::stencil::extent<0, 0, 0, 0, 0, 0>, BlockSizeI=64, BlockSizeJ=8, Fun=set_k0_and_calc_wk_impl_::kernel_139783429400624_f<set_k0_and_calc_wk_impl_::loop_139783429588208_f<gridtools::sid::composite::keys<set_k0_and_calc_wk_impl_::tag::pk3, set_k0_and_calc_wk_impl_::tag::wk, set_k0_and_calc_wk_impl_::tag::pp, set_k0_and_calc_wk_impl_::tag::top_value>::values<gridtools::sid::block_impl_::blocked_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::synthetic_impl_::synthetic<gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::upper_bounds, gridtools::array<size_t, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::lower_bounds, gridtools::array<gridtools::integral_constant<size_t, 0UL>, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides_kind, gridtools::python_sid_adapter_impl_::kind<3UL, gridtools::integral_constant<int, 2>>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides, gridtools::tuple<gridtools::integral_constant<pybind11::ssize_t, 1L>, std::size_t, std::size_t>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::origin, gridtools::sid::host_device::simple_ptr_holder<double *>>>, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>> &, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>>, gridtools::hymap::keys<gridtools::stencil::dim::i, gridtools::stencil::dim::j>::values<set_k0_and_calc_wk_impl_::i_block_size_t, set_k0_and_calc_wk_impl_::j_block_size_t>>, gridtools::sid::block_impl_::blocked_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::synthetic_impl_::synthetic<gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::upper_bounds, gridtools::array<size_t, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::lower_bounds, gridtools::array<gridtools::integral_constant<size_t, 0UL>, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides_kind, gridtools::python_sid_adapter_impl_::kind<3UL, gridtools::integral_constant<int, 3>>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides, gridtools::tuple<gridtools::integral_constant<pybind11::ssize_t, 1L>, std::size_t, std::size_t>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::origin, gridtools::sid::host_device::simple_ptr_holder<double *>>>, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>> &, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>>, gridtools::hymap::keys<gridtools::stencil::dim::i, gridtools::stencil::dim::j>::values<set_k0_and_calc_wk_impl_::i_block_size_t, set_k0_and_calc_wk_impl_::j_block_size_t>>, gridtools::sid::block_impl_::blocked_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::synthetic_impl_::synthetic<gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::upper_bounds, gridtools::array<size_t, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::lower_bounds, gridtools::array<gridtools::integral_constant<size_t, 0UL>, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides_kind, gridtools::python_sid_adapter_impl_::kind<3UL, gridtools::integral_constant<int, 1>>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides, gridtools::tuple<gridtools::integral_constant<pybind11::ssize_t, 1L>, std::size_t, std::size_t>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::origin, gridtools::sid::host_device::simple_ptr_holder<double *>>>, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>> &, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>>, gridtools::hymap::keys<gridtools::stencil::dim::i, gridtools::stencil::dim::j>::values<set_k0_and_calc_wk_impl_::i_block_size_t, set_k0_and_calc_wk_impl_::j_block_size_t>>, gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::stencil::global_parameter_impl_::global_parameter<double> &, gridtools::tuple<>, gridtools::tuple<>>>>>, <unnamed>=0]"
(220): here

.gt_cache_000000/py310_1013/cuda/pace/fv3core/stencils/nh_p_grad/calc_v/m_calc_v__cuda_6b40335142_pyext_BUILD/computation.hpp(79): warning #430-D: returning reference to local temporary
          detected during:
            instantiation of function "lambda [](auto, auto, auto)->auto && [with <auto-1>=gridtools::integral_constant<gridtools::literals::literals_impl_::literal_int_t, 0>, <auto-2>=gridtools::integral_constant<gridtools::literals::literals_impl_::literal_int_t, 0>, <auto-3>=gridtools::integral_constant<gridtools::literals::literals_impl_::literal_int_t, 0>]"
(161): here
            instantiation of "void calc_v_impl_::loop_139707187091136_f<Sid>::operator()(int, int, Validator) const [with Sid=gridtools::sid::composite::keys<calc_v_impl_::tag::pp, calc_v_impl_::tag::pk3, calc_v_impl_::tag::gz, calc_v_impl_::tag::rdy, calc_v_impl_::tag::wk1, calc_v_impl_::tag::wk, calc_v_impl_::tag::v, calc_v_impl_::tag::dt>::values<gridtools::sid::block_impl_::blocked_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::synthetic_impl_::synthetic<gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::upper_bounds, gridtools::array<size_t, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::lower_bounds, gridtools::array<gridtools::integral_constant<size_t, 0UL>, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides_kind, gridtools::python_sid_adapter_impl_::kind<3UL, gridtools::integral_constant<int, 6>>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides, gridtools::tuple<gridtools::integral_constant<pybind11::ssize_t, 1L>, std::size_t, std::size_t>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::origin, gridtools::sid::host_device::simple_ptr_holder<double *>>>, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>> &, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>>, gridtools::hymap::keys<gridtools::stencil::dim::i, gridtools::stencil::dim::j>::values<calc_v_impl_::i_block_size_t, calc_v_impl_::j_block_size_t>>, gridtools::sid::block_impl_::blocked_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::synthetic_impl_::synthetic<gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::upper_bounds, gridtools::array<size_t, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::lower_bounds, gridtools::array<gridtools::integral_constant<size_t, 0UL>, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides_kind, gridtools::python_sid_adapter_impl_::kind<3UL, gridtools::integral_constant<int, 5>>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides, gridtools::tuple<gridtools::integral_constant<pybind11::ssize_t, 1L>, std::size_t, std::size_t>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::origin, gridtools::sid::host_device::simple_ptr_holder<double *>>>, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>> &, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>>, gridtools::hymap::keys<gridtools::stencil::dim::i, gridtools::stencil::dim::j>::values<calc_v_impl_::i_block_size_t, calc_v_impl_::j_block_size_t>>, gridtools::sid::block_impl_::blocked_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::synthetic_impl_::synthetic<gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::upper_bounds, gridtools::array<size_t, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::lower_bounds, gridtools::array<gridtools::integral_constant<size_t, 0UL>, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides_kind, gridtools::python_sid_adapter_impl_::kind<3UL, gridtools::integral_constant<int, 4>>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides, gridtools::tuple<gridtools::integral_constant<pybind11::ssize_t, 1L>, std::size_t, std::size_t>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::origin, gridtools::sid::host_device::simple_ptr_holder<double *>>>, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>> &, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>>, gridtools::hymap::keys<gridtools::stencil::dim::i, gridtools::stencil::dim::j>::values<calc_v_impl_::i_block_size_t, calc_v_impl_::j_block_size_t>>, gridtools::sid::block_impl_::blocked_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::rename_dimensions_impl_::renamed_sid<gridtools::meta::list<gridtools::meta::list<gridtools::integral_constant<gridtools::literals::literals_impl_::literal_int_t, 0>, gridtools::stencil::dim::i>, gridtools::meta::list<gridtools::integral_constant<int, 1>, gridtools::stencil::dim::j>>, gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::synthetic_impl_::synthetic<gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::upper_bounds, gridtools::array<size_t, 2UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::lower_bounds, gridtools::array<gridtools::integral_constant<size_t, 0UL>, 2UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides_kind, gridtools::python_sid_adapter_impl_::kind<2UL, gridtools::integral_constant<int, 7>>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides, gridtools::tuple<gridtools::integral_constant<pybind11::ssize_t, 1L>, std::size_t>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::origin, gridtools::sid::host_device::simple_ptr_holder<double *>>>, gridtools::array<size_t, 2UL>, gridtools::array<size_t, 2UL>>> &, gridtools::hymap::keys<gridtools::stencil::dim::i, gridtools::stencil::dim::j>::values<size_t, size_t>, gridtools::hymap::keys<gridtools::stencil::dim::i, gridtools::stencil::dim::j>::values<size_t, size_t>>, gridtools::hymap::keys<gridtools::stencil::dim::i, gridtools::stencil::dim::j>::values<calc_v_impl_::i_block_size_t, calc_v_impl_::j_block_size_t>>, gridtools::sid::block_impl_::blocked_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::synthetic_impl_::synthetic<gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::upper_bounds, gridtools::array<size_t, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::lower_bounds, gridtools::array<gridtools::integral_constant<size_t, 0UL>, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides_kind, gridtools::python_sid_adapter_impl_::kind<3UL, gridtools::integral_constant<int, 3>>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides, gridtools::tuple<gridtools::integral_constant<pybind11::ssize_t, 1L>, std::size_t, std::size_t>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::origin, gridtools::sid::host_device::simple_ptr_holder<double *>>>, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>> &, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>>, gridtools::hymap::keys<gridtools::stencil::dim::i, gridtools::stencil::dim::j>::values<calc_v_impl_::i_block_size_t, calc_v_impl_::j_block_size_t>>, gridtools::sid::block_impl_::blocked_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::synthetic_impl_::synthetic<gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::upper_bounds, gridtools::array<size_t, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::lower_bounds, gridtools::array<gridtools::integral_constant<size_t, 0UL>, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides_kind, gridtools::python_sid_adapter_impl_::kind<3UL, gridtools::integral_constant<int, 2>>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides, gridtools::tuple<gridtools::integral_constant<pybind11::ssize_t, 1L>, std::size_t, std::size_t>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::origin, gridtools::sid::host_device::simple_ptr_holder<double *>>>, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>> &, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>>, gridtools::hymap::keys<gridtools::stencil::dim::i, gridtools::stencil::dim::j>::values<calc_v_impl_::i_block_size_t, calc_v_impl_::j_block_size_t>>, gridtools::sid::block_impl_::blocked_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::synthetic_impl_::synthetic<gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::upper_bounds, gridtools::array<size_t, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::lower_bounds, gridtools::array<gridtools::integral_constant<size_t, 0UL>, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides_kind, gridtools::python_sid_adapter_impl_::kind<3UL, gridtools::integral_constant<int, 1>>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides, gridtools::tuple<gridtools::integral_constant<pybind11::ssize_t, 1L>, std::size_t, std::size_t>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::origin, gridtools::sid::host_device::simple_ptr_holder<double *>>>, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>> &, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>>, gridtools::hymap::keys<gridtools::stencil::dim::i, gridtools::stencil::dim::j>::values<calc_v_impl_::i_block_size_t, calc_v_impl_::j_block_size_t>>, gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::stencil::global_parameter_impl_::global_parameter<double> &, gridtools::tuple<>, gridtools::tuple<>>>, Validator=gridtools::stencil::gpu_backend::launch_kernel_impl_::dummy_validator_f]"
(179): here
            instantiation of "void calc_v_impl_::kernel_139707187100112_f<Loop139707187091136>::operator()(int, int, Validator) const [with Loop139707187091136=calc_v_impl_::loop_139707187091136_f<gridtools::sid::composite::keys<calc_v_impl_::tag::pp, calc_v_impl_::tag::pk3, calc_v_impl_::tag::gz, calc_v_impl_::tag::rdy, calc_v_impl_::tag::wk1, calc_v_impl_::tag::wk, calc_v_impl_::tag::v, calc_v_impl_::tag::dt>::values<gridtools::sid::block_impl_::blocked_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::synthetic_impl_::synthetic<gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::upper_bounds, gridtools::array<size_t, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::lower_bounds, gridtools::array<gridtools::integral_constant<size_t, 0UL>, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides_kind, gridtools::python_sid_adapter_impl_::kind<3UL, gridtools::integral_constant<int, 6>>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides, gridtools::tuple<gridtools::integral_constant<pybind11::ssize_t, 1L>, std::size_t, std::size_t>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::origin, gridtools::sid::host_device::simple_ptr_holder<double *>>>, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>> &, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>>, gridtools::hymap::keys<gridtools::stencil::dim::i, gridtools::stencil::dim::j>::values<calc_v_impl_::i_block_size_t, calc_v_impl_::j_block_size_t>>, gridtools::sid::block_impl_::blocked_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::synthetic_impl_::synthetic<gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::upper_bounds, gridtools::array<size_t, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::lower_bounds, gridtools::array<gridtools::integral_constant<size_t, 0UL>, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides_kind, gridtools::python_sid_adapter_impl_::kind<3UL, gridtools::integral_constant<int, 5>>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides, gridtools::tuple<gridtools::integral_constant<pybind11::ssize_t, 1L>, std::size_t, std::size_t>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::origin, gridtools::sid::host_device::simple_ptr_holder<double *>>>, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>> &, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>>, gridtools::hymap::keys<gridtools::stencil::dim::i, gridtools::stencil::dim::j>::values<calc_v_impl_::i_block_size_t, calc_v_impl_::j_block_size_t>>, gridtools::sid::block_impl_::blocked_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::synthetic_impl_::synthetic<gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::upper_bounds, gridtools::array<size_t, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::lower_bounds, gridtools::array<gridtools::integral_constant<size_t, 0UL>, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides_kind, gridtools::python_sid_adapter_impl_::kind<3UL, gridtools::integral_constant<int, 4>>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides, gridtools::tuple<gridtools::integral_constant<pybind11::ssize_t, 1L>, std::size_t, std::size_t>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::origin, gridtools::sid::host_device::simple_ptr_holder<double *>>>, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>> &, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>>, gridtools::hymap::keys<gridtools::stencil::dim::i, gridtools::stencil::dim::j>::values<calc_v_impl_::i_block_size_t, calc_v_impl_::j_block_size_t>>, gridtools::sid::block_impl_::blocked_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::rename_dimensions_impl_::renamed_sid<gridtools::meta::list<gridtools::meta::list<gridtools::integral_constant<gridtools::literals::literals_impl_::literal_int_t, 0>, gridtools::stencil::dim::i>, gridtools::meta::list<gridtools::integral_constant<int, 1>, gridtools::stencil::dim::j>>, gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::synthetic_impl_::synthetic<gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::upper_bounds, gridtools::array<size_t, 2UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::lower_bounds, gridtools::array<gridtools::integral_constant<size_t, 0UL>, 2UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides_kind, gridtools::python_sid_adapter_impl_::kind<2UL, gridtools::integral_constant<int, 7>>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides, gridtools::tuple<gridtools::integral_constant<pybind11::ssize_t, 1L>, std::size_t>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::origin, gridtools::sid::host_device::simple_ptr_holder<double *>>>, gridtools::array<size_t, 2UL>, gridtools::array<size_t, 2UL>>> &, gridtools::hymap::keys<gridtools::stencil::dim::i, gridtools::stencil::dim::j>::values<size_t, size_t>, gridtools::hymap::keys<gridtools::stencil::dim::i, gridtools::stencil::dim::j>::values<size_t, size_t>>, gridtools::hymap::keys<gridtools::stencil::dim::i, gridtools::stencil::dim::j>::values<calc_v_impl_::i_block_size_t, calc_v_impl_::j_block_size_t>>, gridtools::sid::block_impl_::blocked_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::synthetic_impl_::synthetic<gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::upper_bounds, gridtools::array<size_t, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::lower_bounds, gridtools::array<gridtools::integral_constant<size_t, 0UL>, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides_kind, gridtools::python_sid_adapter_impl_::kind<3UL, gridtools::integral_constant<int, 3>>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides, gridtools::tuple<gridtools::integral_constant<pybind11::ssize_t, 1L>, std::size_t, std::size_t>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::origin, gridtools::sid::host_device::simple_ptr_holder<double *>>>, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>> &, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>>, gridtools::hymap::keys<gridtools::stencil::dim::i, gridtools::stencil::dim::j>::values<calc_v_impl_::i_block_size_t, calc_v_impl_::j_block_size_t>>, gridtools::sid::block_impl_::blocked_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::synthetic_impl_::synthetic<gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::upper_bounds, gridtools::array<size_t, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::lower_bounds, gridtools::array<gridtools::integral_constant<size_t, 0UL>, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides_kind, gridtools::python_sid_adapter_impl_::kind<3UL, gridtools::integral_constant<int, 2>>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides, gridtools::tuple<gridtools::integral_constant<pybind11::ssize_t, 1L>, std::size_t, std::size_t>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::origin, gridtools::sid::host_device::simple_ptr_holder<double *>>>, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>> &, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>>, gridtools::hymap::keys<gridtools::stencil::dim::i, gridtools::stencil::dim::j>::values<calc_v_impl_::i_block_size_t, calc_v_impl_::j_block_size_t>>, gridtools::sid::block_impl_::blocked_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::synthetic_impl_::synthetic<gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::upper_bounds, gridtools::array<size_t, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::lower_bounds, gridtools::array<gridtools::integral_constant<size_t, 0UL>, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides_kind, gridtools::python_sid_adapter_impl_::kind<3UL, gridtools::integral_constant<int, 1>>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides, gridtools::tuple<gridtools::integral_constant<pybind11::ssize_t, 1L>, std::size_t, std::size_t>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::origin, gridtools::sid::host_device::simple_ptr_holder<double *>>>, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>> &, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>>, gridtools::hymap::keys<gridtools::stencil::dim::i, gridtools::stencil::dim::j>::values<calc_v_impl_::i_block_size_t, calc_v_impl_::j_block_size_t>>, gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::stencil::global_parameter_impl_::global_parameter<double> &, gridtools::tuple<>, gridtools::tuple<>>>>, Validator=gridtools::stencil::gpu_backend::launch_kernel_impl_::dummy_validator_f]"
/home/ubuntu/pace/external/gt4py/src/gt4py/_external_src/gridtools/include/gridtools/stencil/gpu/launch_kernel.hpp(126): here
            instantiation of "void gridtools::stencil::gpu_backend::launch_kernel_impl_::zero_extent_wrapper<NumThreads,BlockSizeI,BlockSizeJ,Fun>(Fun, gridtools::int_t, gridtools::int_t) [with NumThreads=512UL, BlockSizeI=64, BlockSizeJ=8, Fun=calc_v_impl_::kernel_139707187100112_f<calc_v_impl_::loop_139707187091136_f<gridtools::sid::composite::keys<calc_v_impl_::tag::pp, calc_v_impl_::tag::pk3, calc_v_impl_::tag::gz, calc_v_impl_::tag::rdy, calc_v_impl_::tag::wk1, calc_v_impl_::tag::wk, calc_v_impl_::tag::v, calc_v_impl_::tag::dt>::values<gridtools::sid::block_impl_::blocked_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::synthetic_impl_::synthetic<gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::upper_bounds, gridtools::array<size_t, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::lower_bounds, gridtools::array<gridtools::integral_constant<size_t, 0UL>, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides_kind, gridtools::python_sid_adapter_impl_::kind<3UL, gridtools::integral_constant<int, 6>>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides, gridtools::tuple<gridtools::integral_constant<pybind11::ssize_t, 1L>, std::size_t, std::size_t>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::origin, gridtools::sid::host_device::simple_ptr_holder<double *>>>, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>> &, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>>, gridtools::hymap::keys<gridtools::stencil::dim::i, gridtools::stencil::dim::j>::values<calc_v_impl_::i_block_size_t, calc_v_impl_::j_block_size_t>>, gridtools::sid::block_impl_::blocked_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::synthetic_impl_::synthetic<gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::upper_bounds, gridtools::array<size_t, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::lower_bounds, gridtools::array<gridtools::integral_constant<size_t, 0UL>, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides_kind, gridtools::python_sid_adapter_impl_::kind<3UL, gridtools::integral_constant<int, 5>>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides, gridtools::tuple<gridtools::integral_constant<pybind11::ssize_t, 1L>, std::size_t, std::size_t>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::origin, gridtools::sid::host_device::simple_ptr_holder<double *>>>, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>> &, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>>, gridtools::hymap::keys<gridtools::stencil::dim::i, gridtools::stencil::dim::j>::values<calc_v_impl_::i_block_size_t, calc_v_impl_::j_block_size_t>>, gridtools::sid::block_impl_::blocked_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::synthetic_impl_::synthetic<gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::upper_bounds, gridtools::array<size_t, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::lower_bounds, gridtools::array<gridtools::integral_constant<size_t, 0UL>, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides_kind, gridtools::python_sid_adapter_impl_::kind<3UL, gridtools::integral_constant<int, 4>>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides, gridtools::tuple<gridtools::integral_constant<pybind11::ssize_t, 1L>, std::size_t, std::size_t>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::origin, gridtools::sid::host_device::simple_ptr_holder<double *>>>, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>> &, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>>, gridtools::hymap::keys<gridtools::stencil::dim::i, gridtools::stencil::dim::j>::values<calc_v_impl_::i_block_size_t, calc_v_impl_::j_block_size_t>>, gridtools::sid::block_impl_::blocked_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::rename_dimensions_impl_::renamed_sid<gridtools::meta::list<gridtools::meta::list<gridtools::integral_constant<gridtools::literals::literals_impl_::literal_int_t, 0>, gridtools::stencil::dim::i>, gridtools::meta::list<gridtools::integral_constant<int, 1>, gridtools::stencil::dim::j>>, gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::synthetic_impl_::synthetic<gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::upper_bounds, gridtools::array<size_t, 2UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::lower_bounds, gridtools::array<gridtools::integral_constant<size_t, 0UL>, 2UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides_kind, gridtools::python_sid_adapter_impl_::kind<2UL, gridtools::integral_constant<int, 7>>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides, gridtools::tuple<gridtools::integral_constant<pybind11::ssize_t, 1L>, std::size_t>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::origin, gridtools::sid::host_device::simple_ptr_holder<double *>>>, gridtools::array<size_t, 2UL>, gridtools::array<size_t, 2UL>>> &, gridtools::hymap::keys<gridtools::stencil::dim::i, gridtools::stencil::dim::j>::values<size_t, size_t>, gridtools::hymap::keys<gridtools::stencil::dim::i, gridtools::stencil::dim::j>::values<size_t, size_t>>, gridtools::hymap::keys<gridtools::stencil::dim::i, gridtools::stencil::dim::j>::values<calc_v_impl_::i_block_size_t, calc_v_impl_::j_block_size_t>>, gridtools::sid::block_impl_::blocked_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::synthetic_impl_::synthetic<gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::upper_bounds, gridtools::array<size_t, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::lower_bounds, gridtools::array<gridtools::integral_constant<size_t, 0UL>, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides_kind, gridtools::python_sid_adapter_impl_::kind<3UL, gridtools::integral_constant<int, 3>>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides, gridtools::tuple<gridtools::integral_constant<pybind11::ssize_t, 1L>, std::size_t, std::size_t>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::origin, gridtools::sid::host_device::simple_ptr_holder<double *>>>, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>> &, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>>, gridtools::hymap::keys<gridtools::stencil::dim::i, gridtools::stencil::dim::j>::values<calc_v_impl_::i_block_size_t, calc_v_impl_::j_block_size_t>>, gridtools::sid::block_impl_::blocked_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::synthetic_impl_::synthetic<gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::upper_bounds, gridtools::array<size_t, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::lower_bounds, gridtools::array<gridtools::integral_constant<size_t, 0UL>, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides_kind, gridtools::python_sid_adapter_impl_::kind<3UL, gridtools::integral_constant<int, 2>>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides, gridtools::tuple<gridtools::integral_constant<pybind11::ssize_t, 1L>, std::size_t, std::size_t>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::origin, gridtools::sid::host_device::simple_ptr_holder<double *>>>, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>> &, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>>, gridtools::hymap::keys<gridtools::stencil::dim::i, gridtools::stencil::dim::j>::values<calc_v_impl_::i_block_size_t, calc_v_impl_::j_block_size_t>>, gridtools::sid::block_impl_::blocked_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::synthetic_impl_::synthetic<gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::upper_bounds, gridtools::array<size_t, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::lower_bounds, gridtools::array<gridtools::integral_constant<size_t, 0UL>, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides_kind, gridtools::python_sid_adapter_impl_::kind<3UL, gridtools::integral_constant<int, 1>>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides, gridtools::tuple<gridtools::integral_constant<pybind11::ssize_t, 1L>, std::size_t, std::size_t>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::origin, gridtools::sid::host_device::simple_ptr_holder<double *>>>, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>> &, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>>, gridtools::hymap::keys<gridtools::stencil::dim::i, gridtools::stencil::dim::j>::values<calc_v_impl_::i_block_size_t, calc_v_impl_::j_block_size_t>>, gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::stencil::global_parameter_impl_::global_parameter<double> &, gridtools::tuple<>, gridtools::tuple<>>>>>]"
/home/ubuntu/pace/external/gt4py/src/gt4py/_external_src/gridtools/include/gridtools/stencil/gpu/launch_kernel.hpp(192): here
            instantiation of "void gridtools::stencil::gpu_backend::launch_kernel_impl_::launch_kernel<Extent,BlockSizeI,BlockSizeJ,Fun,<unnamed>>(gridtools::int_t, gridtools::int_t, gridtools::uint_t, Fun, size_t) [with Extent=gridtools::stencil::extent<0, 0, 0, 0, 0, 0>, BlockSizeI=64, BlockSizeJ=8, Fun=calc_v_impl_::kernel_139707187100112_f<calc_v_impl_::loop_139707187091136_f<gridtools::sid::composite::keys<calc_v_impl_::tag::pp, calc_v_impl_::tag::pk3, calc_v_impl_::tag::gz, calc_v_impl_::tag::rdy, calc_v_impl_::tag::wk1, calc_v_impl_::tag::wk, calc_v_impl_::tag::v, calc_v_impl_::tag::dt>::values<gridtools::sid::block_impl_::blocked_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::synthetic_impl_::synthetic<gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::upper_bounds, gridtools::array<size_t, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::lower_bounds, gridtools::array<gridtools::integral_constant<size_t, 0UL>, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides_kind, gridtools::python_sid_adapter_impl_::kind<3UL, gridtools::integral_constant<int, 6>>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides, gridtools::tuple<gridtools::integral_constant<pybind11::ssize_t, 1L>, std::size_t, std::size_t>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::origin, gridtools::sid::host_device::simple_ptr_holder<double *>>>, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>> &, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>>, gridtools::hymap::keys<gridtools::stencil::dim::i, gridtools::stencil::dim::j>::values<calc_v_impl_::i_block_size_t, calc_v_impl_::j_block_size_t>>, gridtools::sid::block_impl_::blocked_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::synthetic_impl_::synthetic<gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::upper_bounds, gridtools::array<size_t, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::lower_bounds, gridtools::array<gridtools::integral_constant<size_t, 0UL>, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides_kind, gridtools::python_sid_adapter_impl_::kind<3UL, gridtools::integral_constant<int, 5>>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides, gridtools::tuple<gridtools::integral_constant<pybind11::ssize_t, 1L>, std::size_t, std::size_t>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::origin, gridtools::sid::host_device::simple_ptr_holder<double *>>>, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>> &, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>>, gridtools::hymap::keys<gridtools::stencil::dim::i, gridtools::stencil::dim::j>::values<calc_v_impl_::i_block_size_t, calc_v_impl_::j_block_size_t>>, gridtools::sid::block_impl_::blocked_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::synthetic_impl_::synthetic<gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::upper_bounds, gridtools::array<size_t, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::lower_bounds, gridtools::array<gridtools::integral_constant<size_t, 0UL>, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides_kind, gridtools::python_sid_adapter_impl_::kind<3UL, gridtools::integral_constant<int, 4>>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides, gridtools::tuple<gridtools::integral_constant<pybind11::ssize_t, 1L>, std::size_t, std::size_t>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::origin, gridtools::sid::host_device::simple_ptr_holder<double *>>>, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>> &, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>>, gridtools::hymap::keys<gridtools::stencil::dim::i, gridtools::stencil::dim::j>::values<calc_v_impl_::i_block_size_t, calc_v_impl_::j_block_size_t>>, gridtools::sid::block_impl_::blocked_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::rename_dimensions_impl_::renamed_sid<gridtools::meta::list<gridtools::meta::list<gridtools::integral_constant<gridtools::literals::literals_impl_::literal_int_t, 0>, gridtools::stencil::dim::i>, gridtools::meta::list<gridtools::integral_constant<int, 1>, gridtools::stencil::dim::j>>, gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::synthetic_impl_::synthetic<gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::upper_bounds, gridtools::array<size_t, 2UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::lower_bounds, gridtools::array<gridtools::integral_constant<size_t, 0UL>, 2UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides_kind, gridtools::python_sid_adapter_impl_::kind<2UL, gridtools::integral_constant<int, 7>>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides, gridtools::tuple<gridtools::integral_constant<pybind11::ssize_t, 1L>, std::size_t>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::origin, gridtools::sid::host_device::simple_ptr_holder<double *>>>, gridtools::array<size_t, 2UL>, gridtools::array<size_t, 2UL>>> &, gridtools::hymap::keys<gridtools::stencil::dim::i, gridtools::stencil::dim::j>::values<size_t, size_t>, gridtools::hymap::keys<gridtools::stencil::dim::i, gridtools::stencil::dim::j>::values<size_t, size_t>>, gridtools::hymap::keys<gridtools::stencil::dim::i, gridtools::stencil::dim::j>::values<calc_v_impl_::i_block_size_t, calc_v_impl_::j_block_size_t>>, gridtools::sid::block_impl_::blocked_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::synthetic_impl_::synthetic<gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::upper_bounds, gridtools::array<size_t, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::lower_bounds, gridtools::array<gridtools::integral_constant<size_t, 0UL>, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides_kind, gridtools::python_sid_adapter_impl_::kind<3UL, gridtools::integral_constant<int, 3>>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides, gridtools::tuple<gridtools::integral_constant<pybind11::ssize_t, 1L>, std::size_t, std::size_t>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::origin, gridtools::sid::host_device::simple_ptr_holder<double *>>>, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>> &, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>>, gridtools::hymap::keys<gridtools::stencil::dim::i, gridtools::stencil::dim::j>::values<calc_v_impl_::i_block_size_t, calc_v_impl_::j_block_size_t>>, gridtools::sid::block_impl_::blocked_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::synthetic_impl_::synthetic<gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::upper_bounds, gridtools::array<size_t, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::lower_bounds, gridtools::array<gridtools::integral_constant<size_t, 0UL>, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides_kind, gridtools::python_sid_adapter_impl_::kind<3UL, gridtools::integral_constant<int, 2>>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides, gridtools::tuple<gridtools::integral_constant<pybind11::ssize_t, 1L>, std::size_t, std::size_t>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::origin, gridtools::sid::host_device::simple_ptr_holder<double *>>>, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>> &, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>>, gridtools::hymap::keys<gridtools::stencil::dim::i, gridtools::stencil::dim::j>::values<calc_v_impl_::i_block_size_t, calc_v_impl_::j_block_size_t>>, gridtools::sid::block_impl_::blocked_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::sid::synthetic_impl_::synthetic<gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::upper_bounds, gridtools::array<size_t, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::lower_bounds, gridtools::array<gridtools::integral_constant<size_t, 0UL>, 3UL>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides_kind, gridtools::python_sid_adapter_impl_::kind<3UL, gridtools::integral_constant<int, 1>>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::strides, gridtools::tuple<gridtools::integral_constant<pybind11::ssize_t, 1L>, std::size_t, std::size_t>>, gridtools::sid::synthetic_impl_::unique_mixin<gridtools::sid::property::origin, gridtools::sid::host_device::simple_ptr_holder<double *>>>, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>> &, gridtools::array<size_t, 3UL>, gridtools::array<size_t, 3UL>>, gridtools::hymap::keys<gridtools::stencil::dim::i, gridtools::stencil::dim::j>::values<calc_v_impl_::i_block_size_t, calc_v_impl_::j_block_size_t>>, gridtools::sid::shift_sid_origin_impl_::shifted_sid<gridtools::stencil::global_parameter_impl_::global_parameter<double> &, gridtools::tuple<>, gridtools::tuple<>>>>>, <unnamed>=0]"
(265): here

@jdahm
Copy link
Contributor

jdahm commented Nov 3, 2022

Hi @miaoneng,

How can I tell when pace completed compiling and run dycore/phys?

That information will be in in the log. It can be the case that some ranks are still compiling code after the first timestep starts, but at the end of the first step you can be assured that all code is compiled.

In a more concrete sense, all the compilation happens in the __init__ methods of the objects, so if the pace.driver.Driver is initialized, the compilation is done (on that Python instance, other ranks might still be compiling).

Do I need 1 gpu per node per rank to make it work?

This is how we currently run it, but I believe it is probably not required to be that way. If you have CUDA_VISIBLE_DEVICES correctly configured so that each rank sees a unique GPU as device 0, then that should be equivalent.

Let me know if you still run into issues getting going with it. Happy to help!

@miaoneng
Copy link
Author

miaoneng commented Nov 4, 2022

Hi @jdahm. Thank you for your kind response. I am running the c12 test on one p3.2xlarge node as a test, I am not sure 16GB GDDR is sufficient or not. I will let it compile overnight and see what would report tomorrow.

Thank you.

@miaoneng
Copy link
Author

miaoneng commented Nov 4, 2022

I leave the machine overnight and I think it completed compile but failed at execution. I restarted it again I think it reuses cached binary so start up is much faster. However, it still failed at early execution stage.

2022-11-04 13:03:20 [INFO] (rank 0) pace.driver.driver:initializing driver
2022-11-04 13:03:47 [INFO] (rank 0) pace.driver.driver:running on rank 0 with subtile location {'north': True, 'south': True, 'east': True, 'west': True}
2022-11-04 13:03:47 [INFO] (rank 3) pace.driver.driver:running on rank 3 with subtile location {'north': True, 'south': True, 'east': True, 'west': True}
2022-11-04 13:03:47 [INFO] (rank 5) pace.driver.driver:running on rank 5 with subtile location {'north': True, 'south': True, 'east': True, 'west': True}
2022-11-04 13:03:47 [INFO] (rank 1) pace.driver.driver:running on rank 1 with subtile location {'north': True, 'south': True, 'east': True, 'west': True}
2022-11-04 13:03:47 [INFO] (rank 2) pace.driver.driver:running on rank 2 with subtile location {'north': True, 'south': True, 'east': True, 'west': True}
2022-11-04 13:03:47 [INFO] (rank 4) pace.driver.driver:running on rank 4 with subtile location {'north': True, 'south': True, 'east': True, 'west': True}
2022-11-04 13:03:47 [INFO] (rank 5) pace.driver.driver:integrating driver forward in time
2022-11-04 13:03:47 [INFO] (rank 3) pace.driver.driver:integrating driver forward in time
2022-11-04 13:03:47 [INFO] (rank 0) pace.driver.driver:integrating driver forward in time
2022-11-04 13:03:47 [INFO] (rank 4) pace.driver.driver:integrating driver forward in time
2022-11-04 13:03:47 [INFO] (rank 0) pace.fv3core.stencils.fv_dynamics:FV Setup
2022-11-04 13:03:47 [INFO] (rank 1) pace.driver.driver:integrating driver forward in time
2022-11-04 13:03:47 [INFO] (rank 2) pace.driver.driver:integrating driver forward in time
2022-11-04 13:03:47 [INFO] (rank 0) pace.fv3core.stencils.fv_dynamics:Adjust pt
2022-11-04 13:03:47 [INFO] (rank 0) pace.fv3core.stencils.fv_dynamics:DynCore
2022-11-04 13:03:47 [INFO] (rank 0) pace.fv3core.stencils.fv_dynamics:TracerAdvection
2022-11-04 13:03:47 [INFO] (rank 0) pace.fv3core.stencils.fv_dynamics:Remapping
2022-11-04 13:03:48 [INFO] (rank 0) pace.fv3core.stencils.fv_dynamics:Omega
2022-11-04 13:03:48 [INFO] (rank 0) pace.fv3core.stencils.fv_dynamics:Del2Cubed
2022-11-04 13:03:48 [INFO] (rank 0) pace.fv3core.stencils.fv_dynamics:Neg Adj 3
2022-11-04 13:03:48 [INFO] (rank 0) pace.fv3core.stencils.fv_dynamics:CubedToLatLon
2022-11-04 13:03:48 [INFO] (rank 3) pace.driver.driver:cleaning up driver
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/ubuntu/pace/driver/pace/driver/run.py", line 96, in <module>
    command_line()
  File "/home/ubuntu/pace/venv/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/ubuntu/pace/venv/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/ubuntu/pace/venv/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ubuntu/pace/venv/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/ubuntu/pace/driver/pace/driver/run.py", line 84, in command_line
    main(driver_config=driver_config)
  File "/home/ubuntu/pace/driver/pace/driver/run.py", line 90, in main
    driver.step_all()
  File "/home/ubuntu/pace/driver/pace/driver/driver.py", line 442, in step_all
    self._critical_path_step_all(
  File "/home/ubuntu/pace/driver/pace/driver/driver.py", line 436, in _critical_path_step_all
    self._step_physics(timestep=dt)
  File "/home/ubuntu/pace/driver/pace/driver/driver.py", line 466, in _step_physics
    self.physics(self.state.physics_state, timestep=float(timestep))
  File "/home/ubuntu/pace/physics/pace/physics/stencils/physics.py", line 295, in __call__
    self._microphysics(physics_state.microphysics, timestep=timestep)
  File "/home/ubuntu/pace/physics/pace/physics/stencils/microphysics.py", line 2317, in __call__
    self._warm_rain(
  File "/home/ubuntu/pace/dsl/pace/dsl/stencil.py", line 411, in __call__
    self.stencil_object(
  File "/home/ubuntu/pace/.gt_cache_000003/py310_1013/cuda/pace/physics/stencils/microphysics/warm_rain/m_warm_rain__cuda_480356a820.py", line 106, in __call__
    self._call_run(
  File "/home/ubuntu/pace/external/gt4py/src/gt4py/stencil_object.py", line 521, in _call_run
    self._validate_args(field_args, parameter_args, domain, origin)
  File "/home/ubuntu/pace/external/gt4py/src/gt4py/stencil_object.py", line 431, in _validate_args
    raise TypeError(
TypeError: The type of parameter 'crevp_0' is '<class 'cupy.ndarray'>' instead of 'float64'
2022-11-04 13:03:48 [INFO] (rank 5) pace.driver.driver:cleaning up driver
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/ubuntu/pace/driver/pace/driver/run.py", line 96, in <module>
    command_line()
  File "/home/ubuntu/pace/venv/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/ubuntu/pace/venv/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/ubuntu/pace/venv/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ubuntu/pace/venv/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/ubuntu/pace/driver/pace/driver/run.py", line 84, in command_line
    main(driver_config=driver_config)
  File "/home/ubuntu/pace/driver/pace/driver/run.py", line 90, in main
    driver.step_all()
  File "/home/ubuntu/pace/driver/pace/driver/driver.py", line 442, in step_all
    self._critical_path_step_all(
  File "/home/ubuntu/pace/driver/pace/driver/driver.py", line 436, in _critical_path_step_all
    self._step_physics(timestep=dt)
  File "/home/ubuntu/pace/driver/pace/driver/driver.py", line 466, in _step_physics
    self.physics(self.state.physics_state, timestep=float(timestep))
  File "/home/ubuntu/pace/physics/pace/physics/stencils/physics.py", line 295, in __call__
    self._microphysics(physics_state.microphysics, timestep=timestep)
  File "/home/ubuntu/pace/physics/pace/physics/stencils/microphysics.py", line 2317, in __call__
    self._warm_rain(
  File "/home/ubuntu/pace/dsl/pace/dsl/stencil.py", line 411, in __call__
    self.stencil_object(
  File "/home/ubuntu/pace/.gt_cache_000005/py310_1013/cuda/pace/physics/stencils/microphysics/warm_rain/m_warm_rain__cuda_480356a820.py", line 106, in __call__
    self._call_run(
  File "/home/ubuntu/pace/external/gt4py/src/gt4py/stencil_object.py", line 521, in _call_run
    self._validate_args(field_args, parameter_args, domain, origin)
  File "/home/ubuntu/pace/external/gt4py/src/gt4py/stencil_object.py", line 431, in _validate_args
    raise TypeError(
TypeError: The type of parameter 'crevp_0' is '<class 'cupy.ndarray'>' instead of 'float64'
2022-11-04 13:03:48 [INFO] (rank 2) pace.driver.driver:cleaning up driver
2022-11-04 13:03:48 [INFO] (rank 4) pace.driver.driver:cleaning up driver
2022-11-04 13:03:48 [INFO] (rank 0) pace.driver.driver:cleaning up driver
2022-11-04 13:03:48 [INFO] (rank 1) pace.driver.driver:cleaning up driver
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/ubuntu/pace/driver/pace/driver/run.py", line 96, in <module>
    command_line()
  File "/home/ubuntu/pace/venv/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/ubuntu/pace/venv/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/ubuntu/pace/venv/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ubuntu/pace/venv/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/ubuntu/pace/driver/pace/driver/run.py", line 84, in command_line
    main(driver_config=driver_config)
  File "/home/ubuntu/pace/driver/pace/driver/run.py", line 90, in main
    driver.step_all()
  File "/home/ubuntu/pace/driver/pace/driver/driver.py", line 442, in step_all
    self._critical_path_step_all(
  File "/home/ubuntu/pace/driver/pace/driver/driver.py", line 436, in _critical_path_step_all
    self._step_physics(timestep=dt)
  File "/home/ubuntu/pace/driver/pace/driver/driver.py", line 466, in _step_physics
    self.physics(self.state.physics_state, timestep=float(timestep))
  File "/home/ubuntu/pace/physics/pace/physics/stencils/physics.py", line 295, in __call__
    self._microphysics(physics_state.microphysics, timestep=timestep)
  File "/home/ubuntu/pace/physics/pace/physics/stencils/microphysics.py", line 2317, in __call__
    self._warm_rain(
  File "/home/ubuntu/pace/dsl/pace/dsl/stencil.py", line 411, in __call__
    self.stencil_object(
  File "/home/ubuntu/pace/.gt_cache_000002/py310_1013/cuda/pace/physics/stencils/microphysics/warm_rain/m_warm_rain__cuda_480356a820.py", line 106, in __call__
    self._call_run(
  File "/home/ubuntu/pace/external/gt4py/src/gt4py/stencil_object.py", line 521, in _call_run
    self._validate_args(field_args, parameter_args, domain, origin)
  File "/home/ubuntu/pace/external/gt4py/src/gt4py/stencil_object.py", line 431, in _validate_args
    raise TypeError(
TypeError: The type of parameter 'crevp_0' is '<class 'cupy.ndarray'>' instead of 'float64'
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/ubuntu/pace/driver/pace/driver/run.py", line 96, in <module>
    command_line()
  File "/home/ubuntu/pace/venv/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/ubuntu/pace/venv/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/ubuntu/pace/venv/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ubuntu/pace/venv/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/ubuntu/pace/driver/pace/driver/run.py", line 84, in command_line
    main(driver_config=driver_config)
  File "/home/ubuntu/pace/driver/pace/driver/run.py", line 90, in main
    driver.step_all()
  File "/home/ubuntu/pace/driver/pace/driver/driver.py", line 442, in step_all
    self._critical_path_step_all(
  File "/home/ubuntu/pace/driver/pace/driver/driver.py", line 436, in _critical_path_step_all
    self._step_physics(timestep=dt)
  File "/home/ubuntu/pace/driver/pace/driver/driver.py", line 466, in _step_physics
    self.physics(self.state.physics_state, timestep=float(timestep))
  File "/home/ubuntu/pace/physics/pace/physics/stencils/physics.py", line 295, in __call__
    self._microphysics(physics_state.microphysics, timestep=timestep)
  File "/home/ubuntu/pace/physics/pace/physics/stencils/microphysics.py", line 2317, in __call__
    self._warm_rain(
  File "/home/ubuntu/pace/dsl/pace/dsl/stencil.py", line 411, in __call__
    self.stencil_object(
  File "/home/ubuntu/pace/.gt_cache_000004/py310_1013/cuda/pace/physics/stencils/microphysics/warm_rain/m_warm_rain__cuda_480356a820.py", line 106, in __call__
    self._call_run(
  File "/home/ubuntu/pace/external/gt4py/src/gt4py/stencil_object.py", line 521, in _call_run
    self._validate_args(field_args, parameter_args, domain, origin)
  File "/home/ubuntu/pace/external/gt4py/src/gt4py/stencil_object.py", line 431, in _validate_args
    raise TypeError(
TypeError: The type of parameter 'crevp_0' is '<class 'cupy.ndarray'>' instead of 'float64'
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/ubuntu/pace/driver/pace/driver/run.py", line 96, in <module>
    command_line()
  File "/home/ubuntu/pace/venv/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/ubuntu/pace/venv/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/ubuntu/pace/venv/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ubuntu/pace/venv/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/ubuntu/pace/driver/pace/driver/run.py", line 84, in command_line
    main(driver_config=driver_config)
  File "/home/ubuntu/pace/driver/pace/driver/run.py", line 90, in main
    driver.step_all()
  File "/home/ubuntu/pace/driver/pace/driver/driver.py", line 442, in step_all
    self._critical_path_step_all(
  File "/home/ubuntu/pace/driver/pace/driver/driver.py", line 436, in _critical_path_step_all
    self._step_physics(timestep=dt)
  File "/home/ubuntu/pace/driver/pace/driver/driver.py", line 466, in _step_physics
    self.physics(self.state.physics_state, timestep=float(timestep))
  File "/home/ubuntu/pace/physics/pace/physics/stencils/physics.py", line 295, in __call__
    self._microphysics(physics_state.microphysics, timestep=timestep)
  File "/home/ubuntu/pace/physics/pace/physics/stencils/microphysics.py", line 2317, in __call__
    self._warm_rain(
  File "/home/ubuntu/pace/dsl/pace/dsl/stencil.py", line 411, in __call__
    self.stencil_object(
  File "/home/ubuntu/pace/.gt_cache_000001/py310_1013/cuda/pace/physics/stencils/microphysics/warm_rain/m_warm_rain__cuda_480356a820.py", line 106, in __call__
    self._call_run(
  File "/home/ubuntu/pace/external/gt4py/src/gt4py/stencil_object.py", line 521, in _call_run
    self._validate_args(field_args, parameter_args, domain, origin)
  File "/home/ubuntu/pace/external/gt4py/src/gt4py/stencil_object.py", line 431, in _validate_args
    raise TypeError(
TypeError: The type of parameter 'crevp_0' is '<class 'cupy.ndarray'>' instead of 'float64'
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/ubuntu/pace/driver/pace/driver/run.py", line 96, in <module>
    command_line()
  File "/home/ubuntu/pace/venv/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/ubuntu/pace/venv/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/ubuntu/pace/venv/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ubuntu/pace/venv/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/ubuntu/pace/driver/pace/driver/run.py", line 84, in command_line
    main(driver_config=driver_config)
  File "/home/ubuntu/pace/driver/pace/driver/run.py", line 90, in main
    driver.step_all()
  File "/home/ubuntu/pace/driver/pace/driver/driver.py", line 442, in step_all
    self._critical_path_step_all(
  File "/home/ubuntu/pace/driver/pace/driver/driver.py", line 436, in _critical_path_step_all
    self._step_physics(timestep=dt)
  File "/home/ubuntu/pace/driver/pace/driver/driver.py", line 466, in _step_physics
    self.physics(self.state.physics_state, timestep=float(timestep))
  File "/home/ubuntu/pace/physics/pace/physics/stencils/physics.py", line 295, in __call__
    self._microphysics(physics_state.microphysics, timestep=timestep)
  File "/home/ubuntu/pace/physics/pace/physics/stencils/microphysics.py", line 2317, in __call__
    self._warm_rain(
  File "/home/ubuntu/pace/dsl/pace/dsl/stencil.py", line 411, in __call__
    self.stencil_object(
  File "/home/ubuntu/pace/.gt_cache_000000/py310_1013/cuda/pace/physics/stencils/microphysics/warm_rain/m_warm_rain__cuda_480356a820.py", line 106, in __call__
    self._call_run(
  File "/home/ubuntu/pace/external/gt4py/src/gt4py/stencil_object.py", line 521, in _call_run
    self._validate_args(field_args, parameter_args, domain, origin)
  File "/home/ubuntu/pace/external/gt4py/src/gt4py/stencil_object.py", line 431, in _validate_args
    raise TypeError(
TypeError: The type of parameter 'crevp_0' is '<class 'cupy.ndarray'>' instead of 'float64'
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[9399,1],1]
  Exit code:    1
--------------------------------------------------------------------------

Could you provide any suggestions?

@jdahm
Copy link
Contributor

jdahm commented Nov 8, 2022

@miaoneng,

Apologies for the delay. Is this on a recent commit on main?

There is a check for a "GPU" backend internally that does not catch cuda, since we deprecated/no longer test it. Can you try the gt:gpu backend?

Hope this helps!

@mcgibbon
Copy link
Collaborator

mcgibbon commented Nov 8, 2022

It looks like this scalar is initialized using a dynamic numpy module which should be cupy for gpu backends - the numpy module is selected by

        if (
            stencil_factory.config.is_gpu_backend
            and stencil_factory.config.dace_config.is_dace_orchestrated()
        ):
            self.gfdl_cloud_microphys_init(namelist.dt_atmos, cp)
        else:
            self.gfdl_cloud_microphys_init(namelist.dt_atmos, np)

in physics/pace/physics/stencils/microphysics.py. Because we don't use the cupy backend, is_gpu_backend does not properly interpret "cupy" as a gpu backend, and instead uses numpy to initialize this memory. At the least, using the gt:gpu backend should fix the bug you're currently seeing.

We're looking at updating the code so that an exception is raised if an unexpected backend is used.

@miaoneng
Copy link
Author

miaoneng commented Nov 8, 2022

Thank you for your valuable response. I just changed the backend to gt:gpu. It now starts recompiling, lol.

It usually takes a whole night to compile it.

I am currently using the v0.1 GMD tagged version. Should I switch to main?

Best

@mcgibbon
Copy link
Collaborator

mcgibbon commented Nov 9, 2022

Yes, compilation does take quite a while. For the full model I'd expect an hour or two, but it depends on your system. It should run much faster once you have a cache of compiled code. We have systems in place to compile all configurations using only 9 ranks before we submit many-node performance runs, for this reason.

Switching to main or not is up to you, not knowing anything about your use of the code I would suggest sticking to the tagged version. If you are mainly looking to reproduce the paper results and check out the model, this should work best.

If you want to use this code more extensively for a project I'd be happy to meet and discuss our current development plans and how best to keep you updated on our latest changes. The APIs are all still subject to change, but some features are more stable than others.

@miaoneng
Copy link
Author

miaoneng commented Nov 9, 2022

Looks like I am still hitting the same error. Here is the top section of the config

stencil_config:
  compilation_config:
    backend: gt:gpu
    rebuild: false
    validate_args: true
    format_source: false
    device_sync: false

I deleted .gt_cache* to make sure I have a clean cache, however, I think the program stops at a roughly same location.

I put the whole log here https://pastebin.com/WS4iafRS

@mcgibbon
Copy link
Collaborator

mcgibbon commented Nov 9, 2022

That's odd, I'm starting to suspect this is a real bug and not an issue with how you're running the code. In a PR I am running into an issue with the same section of code for different reasons. I should be able to investigate this today and get back to you.

@mcgibbon
Copy link
Collaborator

mcgibbon commented Nov 9, 2022

I wasn't able to reproduce your errors because of cupy not importing inside the docker image you provided, which I think is why you moved to working baremetal. If you can get something running in Docker I can try to debug your issue directly. Otherwise, I'll let you know when this section of the code is updated and your problem will likely be solved (though I won't be able to confirm by testing it myself).

@miaoneng
Copy link
Author

miaoneng commented Nov 9, 2022

I can definitely work towards a Dockerfile so we can see same environment. I will post it here once I have it.

@miaoneng
Copy link
Author

Here is my docker file

FROM nvidia/cuda:11.7.0-devel-ubuntu22.04
RUN apt-get update && apt-get install -y make \
    software-properties-common \
    libboost-all-dev \
    libhdf5-serial-dev \
    netcdf-bin \
    libnetcdf-dev \
    python3 \
    python3-pip \
    wget

RUN apt-get purge -y "*openmpi*"

RUN pip3 install --upgrade setuptools wheel pip packaging

RUN mkdir -p /src && \
    cd /src && \
    wget https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-4.1.4.tar.gz && \
    tar zxf openmpi-4.1.4.tar.gz && \
    cd openmpi-4.1.4 && \
    ./configure --with-cuda --disable-builtin-atomics && \
    make -j 8 && \
    make install
    
COPY . /pace

RUN cd /pace && \
    pip3 install --verbose cupy

ENV LD_LIBRARY_PATH=/usr/local/lib:/usr/local/lib64

RUN cd /pace && \
    pip3 install -r /pace/requirements_dev.txt && \
    python3 -m gt4py.gt_src_manager install

ENV OMPI_ALLOW_RUN_AS_ROOT=1
ENV OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1

I think I am able to run the application and stops at the same (probably) location.

@mcgibbon
Copy link
Collaborator

mcgibbon commented Nov 14, 2022

I had to add an apt-get install of git to your dockerfile to get it to build, otherwise it fails on the gt4py gt_src_manager install. Once I did that, I built the image (on tag v0.1.0), entered it with make enter, and ran cd /pace/driver followed by mpirun -n 6 --oversubscribe python3 -m pace.driver.run examples/configs/baroclinic_c12.yaml. That gave me the following error I can reproduce by just importing cupy:

(base) mcgibbon@jeremy-vm-gpu:~/python/pace$ make enter
docker run --rm -it \
	--network host \
	 -v /home/mcgibbon/python/pace:/pace \
us.gcr.io/vcm-ml/pace bash
root@jeremy-vm-gpu:/# python3
Python 3.10.6 (main, Nov  2 2022, 18:53:38) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cupy
/usr/local/lib/python3.10/dist-packages/cupy/_environment.py:437: UserWarning:
--------------------------------------------------------------------------------

  CuPy may not function correctly because multiple CuPy packages are installed
  in your environment:

    cupy, cupy-cuda117

  Follow these steps to resolve this issue:

    1. For all packages listed above, run the following command to remove all
       existing CuPy installations:

         $ pip uninstall <package_name>

      If you previously installed CuPy via conda, also run the following:

         $ conda uninstall cupy

    2. Install the appropriate CuPy package.
       Refer to the Installation Guide for detailed instructions.

         https://docs.cupy.dev/en/stable/install.html

--------------------------------------------------------------------------------

  warnings.warn(f'''
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/cupy/__init__.py", line 18, in <module>
    from cupy import _core  # NOQA
  File "/usr/local/lib/python3.10/dist-packages/cupy/_core/__init__.py", line 1, in <module>
    from cupy._core import core  # NOQA
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.10/dist-packages/cupy/__init__.py", line 20, in <module>
    raise ImportError(f'''
ImportError:
================================================================
Failed to import CuPy.

If you installed CuPy via wheels (cupy-cudaXXX or cupy-rocm-X-X), make sure that the package matches with the version of CUDA or ROCm installed.

On Linux, you may need to set LD_LIBRARY_PATH environment variable depending on how you installed CUDA/ROCm.
On Windows, try setting CUDA_PATH environment variable.

Check the Installation Guide for details:
  https://docs.cupy.dev/en/latest/install.html

Original error:
  ImportError: libcuda.so.1: cannot open shared object file: No such file or directory
================================================================

If I follow what it says and pip uninstall cupy, the exception on import disappears, but I get the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 187, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/usr/lib/python3.10/runpy.py", line 110, in _get_module_details
    __import__(pkg_name)
  File "/pace/driver/pace/driver/__init__.py", line 1, in <module>
    from .comm import (
  File "/pace/driver/pace/driver/comm.py", line 7, in <module>
    import pace.dsl
  File "/pace/dsl/pace/dsl/__init__.py", line 3, in <module>
    from pace.util.mpi import MPI
  File "/pace/util/pace/util/__init__.py", line 1, in <module>
    from . import testing
  File "/pace/util/pace/util/testing/__init__.py", line 3, in <module>
    from .dummy_comm import ConcurrencyError, DummyComm
  File "/pace/util/pace/util/testing/dummy_comm.py", line 1, in <module>
    from ..local_comm import ConcurrencyError  # noqa
  File "/pace/util/pace/util/local_comm.py", line 6, in <module>
    from .utils import ensure_contiguous, safe_assign_array
  File "/pace/util/pace/util/utils.py", line 16, in <module>
    except cp.cuda.runtime.CUDARuntimeError:
AttributeError: module 'cupy' has no attribute 'cuda'

Do you have any ideas why this image is behaving differently on my machine?

@miaoneng
Copy link
Author

I am not 100% sure. Let me relinquish and get a whole new p3d instance and retry the docker image. I think when you install libboost-dev-all it should install git along with it.

Here is the env after entering the docker image.

ubuntu@ip-172-31-93-73:~$ sudo docker run -ti -v $HOME/pace:/pace --gpus '"device=0"' --rm miaoneng/pace:gpu /bin/bash -i
root@4cf7a2d14db3:/# python
bash: python: command not found
root@4cf7a2d14db3:/# python3
Python 3.10.6 (main, Nov  2 2022, 18:53:38) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cupy
>>> 
root@4cf7a2d14db3:/# export
declare -x CUDA_VERSION="11.7.0"
declare -x HOME="/root"
declare -x HOSTNAME="4cf7a2d14db3"
declare -x LD_LIBRARY_PATH="/usr/local/lib:/usr/local/lib64"
declare -x LIBRARY_PATH="/usr/local/cuda/lib64/stubs"
declare -x LS_COLORS="rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.webp=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:"
declare -x NCCL_VERSION="2.13.4-1"
declare -x NVARCH="x86_64"
declare -x NVIDIA_DRIVER_CAPABILITIES="compute,utility"
declare -x NVIDIA_REQUIRE_CUDA="cuda>=11.7 brand=tesla,driver>=450,driver<451 brand=tesla,driver>=470,driver<471 brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=geforce,driver>=470,driver<471 brand=geforcertx,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=quadrortx,driver>=470,driver<471 brand=titan,driver>=470,driver<471 brand=titanrtx,driver>=470,driver<471 brand=unknown,driver>=510,driver<511 brand=nvidia,driver>=510,driver<511 brand=nvidiartx,driver>=510,driver<511 brand=quadro,driver>=510,driver<511 brand=quadrortx,driver>=510,driver<511 brand=titan,driver>=510,driver<511 brand=titanrtx,driver>=510,driver<511 brand=geforce,driver>=510,driver<511 brand=geforcertx,driver>=510,driver<511"
declare -x NVIDIA_VISIBLE_DEVICES="0"
declare -x NV_CUDA_COMPAT_PACKAGE="cuda-compat-11-7"
declare -x NV_CUDA_CUDART_DEV_VERSION="11.7.60-1"
declare -x NV_CUDA_CUDART_VERSION="11.7.60-1"
declare -x NV_CUDA_LIB_VERSION="11.7.0-1"
declare -x NV_LIBCUBLAS_DEV_PACKAGE="libcublas-dev-11-7=11.10.1.25-1"
declare -x NV_LIBCUBLAS_DEV_PACKAGE_NAME="libcublas-dev-11-7"
declare -x NV_LIBCUBLAS_DEV_VERSION="11.10.1.25-1"
declare -x NV_LIBCUBLAS_PACKAGE="libcublas-11-7=11.10.1.25-1"
declare -x NV_LIBCUBLAS_PACKAGE_NAME="libcublas-11-7"
declare -x NV_LIBCUBLAS_VERSION="11.10.1.25-1"
declare -x NV_LIBCUSPARSE_DEV_VERSION="11.7.3.50-1"
declare -x NV_LIBCUSPARSE_VERSION="11.7.3.50-1"
declare -x NV_LIBNCCL_DEV_PACKAGE="libnccl-dev=2.13.4-1+cuda11.7"
declare -x NV_LIBNCCL_DEV_PACKAGE_NAME="libnccl-dev"
declare -x NV_LIBNCCL_DEV_PACKAGE_VERSION="2.13.4-1"
declare -x NV_LIBNCCL_PACKAGE="libnccl2=2.13.4-1+cuda11.7"
declare -x NV_LIBNCCL_PACKAGE_NAME="libnccl2"
declare -x NV_LIBNCCL_PACKAGE_VERSION="2.13.4-1"
declare -x NV_LIBNPP_DEV_PACKAGE="libnpp-dev-11-7=11.7.3.21-1"
declare -x NV_LIBNPP_DEV_VERSION="11.7.3.21-1"
declare -x NV_LIBNPP_PACKAGE="libnpp-11-7=11.7.3.21-1"
declare -x NV_LIBNPP_VERSION="11.7.3.21-1"
declare -x NV_NVML_DEV_VERSION="11.7.50-1"
declare -x NV_NVPROF_DEV_PACKAGE="cuda-nvprof-11-7=11.7.50-1"
declare -x NV_NVPROF_VERSION="11.7.50-1"
declare -x NV_NVTX_VERSION="11.7.50-1"
declare -x OLDPWD
declare -x OMPI_ALLOW_RUN_AS_ROOT="1"
declare -x OMPI_ALLOW_RUN_AS_ROOT_CONFIRM="1"
declare -x PATH="/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
declare -x PWD="/"
declare -x SHLVL="1"
declare -x TERM="xterm"
root@4cf7a2d14db3:/# 

@mcgibbon
Copy link
Collaborator

To recap, I checked out the v0.1.0 tag:

(base) mcgibbon@jeremy-vm-gpu:~/python/pace$ git status
HEAD detached at v0.1.0

I modified requirements_dev.txt with:

(base) mcgibbon@jeremy-vm-gpu:~/python/pace$ git diff requirements_dev.txt
diff --git a/requirements_dev.txt b/requirements_dev.txt
index 6a4386b8..63897b71 100644
--- a/requirements_dev.txt
+++ b/requirements_dev.txt
@@ -12,7 +12,7 @@ fv3config>=0.9.0
 dace>=0.14
 f90nml>=1.1.0
 numpy>=1.15
--e external/gt4py
+-e external/gt4py[cuda117]
 -e util
 -e stencils
 -e dsl

I modified the Dockerfile by replacing it with your copy. I modified the baroclinic_c12.yaml with

(base) mcgibbon@jeremy-vm-gpu:~/python/pace$ git diff driver
diff --git a/driver/examples/configs/baroclinic_c12.yaml b/driver/examples/configs/baroclinic_c12.yaml
index 6aa295b8..b1a2c6bc 100644
--- a/driver/examples/configs/baroclinic_c12.yaml
+++ b/driver/examples/configs/baroclinic_c12.yaml
@@ -1,6 +1,6 @@
 stencil_config:
   compilation_config:
-    backend: numpy
+    backend: gt:gpu
     rebuild: false
     validate_args: true
     format_source: false

I forced a rebuild of the image and entered the interactive docker environment with

(base) mcgibbon@jeremy-vm-gpu:~/python/pace$ make _force_build enter

While doing this, the image failed to build on the third last step, with this error tail:

#12 121.8     self._execute_child(args, executable, preexec_fn, close_fds,
#12 121.8   File "/usr/lib/python3.10/subprocess.py", line 1845, in _execute_child
#12 121.8     raise child_exception_type(errno_num, err_msg, err_filename)
#12 121.8 FileNotFoundError: [Errno 2] No such file or directory: 'git'
#12 121.8 Getting GridTools C++ sources...
#12 121.8 $ git clone --depth 1 -b v2.2.0 https://github.com/GridTools/gridtools.git /pace/external/gt4py/src/gt4py/_external_src/gridtools

I then added the following line to the Dockerfile just before that step:

RUN apt-get install -y git

And I re-ran make _force_build enter. The image then built and I entered an interactive session. From there, if I run python3 -c "import cupy", I get the aforementioned ImportError: libcuda.so.1: cannot open shared object file: No such file or directory, with a warning that both cupy and cupy-cuda117 are installed.

From there, I run pip uninstall cupy, which then makes it so python3 -c "import cupy" will run and exit without error. Then when I try to run the model, I get this, different error (which appears 6 times because of parallelism, only showing one):

root@jeremy-vm-gpu:/# mpirun -n 6 --oversubscribe python3 -m pace.driver.run /pace/driver/examples/configs/baroclinic_c12.yaml
Traceback (most recent call last):
Traceback (most recent call last):
  File "/pace/util/pace/util/utils.py", line 14, in <module>
    cp.cuda.runtime.deviceSynchronize()
AttributeError: module 'cupy' has no attribute 'cuda'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 187, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/usr/lib/python3.10/runpy.py", line 110, in _get_module_details
    __import__(pkg_name)
  File "/pace/driver/pace/driver/__init__.py", line 1, in <module>
    from .comm import (
  File "/pace/driver/pace/driver/comm.py", line 7, in <module>
    import pace.dsl
  File "/pace/dsl/pace/dsl/__init__.py", line 3, in <module>
    from pace.util.mpi import MPI
  File "/pace/util/pace/util/__init__.py", line 1, in <module>
    from . import testing
  File "/pace/util/pace/util/testing/__init__.py", line 3, in <module>
    from .dummy_comm import ConcurrencyError, DummyComm
  File "/pace/util/pace/util/testing/dummy_comm.py", line 1, in <module>
    from ..local_comm import ConcurrencyError  # noqa
  File "/pace/util/pace/util/local_comm.py", line 6, in <module>
    from .utils import ensure_contiguous, safe_assign_array
  File "/pace/util/pace/util/utils.py", line 16, in <module>
    except cp.cuda.runtime.CUDARuntimeError:
AttributeError: module 'cupy' has no attribute 'cuda'
Traceback (most recent call last):
  File "/pace/util/pace/util/utils.py", line 14, in <module>
    cp.cuda.runtime.deviceSynchronize()
AttributeError: module 'cupy' has no attribute 'cuda'

With all of these differences, I don't think I am running the same docker image you're running. Are you positive what you pasted is exactly the same as what you built? If so, is there some way you can provide me with your pre-built image so I can run that instead?

@mcgibbon
Copy link
Collaborator

OK, I was finally able to resolve the "no attribute 'cuda'" issue. It was because I didn't include your --gpus flag in my docker run command. I edited the Makefile:

(base) mcgibbon@jeremy-vm-gpu:~/python/pace$ git diff Makefile
diff --git a/Makefile b/Makefile
index e17d570f..81b8fff0 100644
--- a/Makefile
+++ b/Makefile
@@ -91,6 +91,7 @@ enter:
        docker run --rm -it \
                --network host \
                $(VOLUMES) \
+               --gpus '"device=0"' \
        $(PACE_IMAGE) bash

 dev:

And also removed the pip install cupy from your Dockerfile (instead of uninstalling it interactively). I re-ran the mpirun command above, and the model appears to be compiling. Hopefully it will error out on your issue while I have it running overnight.

@mcgibbon
Copy link
Collaborator

mcgibbon commented Nov 22, 2022

@miaoneng sorry for the lack of progress on this over the past week, I came down with covid and have not gotten any work done. I'm leaving starting Thursday for essentially the remainder of AI2's participation in the Pace project (barring paper revisions), which is being handed over to GFDL at the end of this year. @jdahm will be taking charge of helping you with this issue as much as we can with the time available to us.

@jdahm
Copy link
Contributor

jdahm commented Nov 23, 2022

Hi @miaoneng, Using the Dockerfile above, and the Makefile change from @mcgibbon, I was able to get this to work:

system $ make _force_build enter  # builds docker image, starts container, and attaches shell
container $ apt install -y --no-install-recommends git  # this was still required for some reason
container $ mpirun -n 6 --oversubscribe python3 -m pace.driver.run examples/configs/baroclinic_c12.yaml

Did that work for you?

@miaoneng
Copy link
Author

Hi @jdahm.

Could you kindly post the Dockerfile and the change to Makefile so I can double confirm everything is identical? If you're able to get this to work, I would suspect the root cause is still environment setup.

@miaoneng
Copy link
Author

@mcgibbon I am sorry to hear that and truly hope you have a fast recovery. Thank you for your time and I will follow your posts to recreate the environment and give it a try again.

@jdahm
Copy link
Contributor

jdahm commented Nov 27, 2022

Happy Thanksgiving! Apologies for the delay. I ran this on on e7a0ede (from the main branch past the release tag), with this diff. Let me know if you have issues.

@jdahm
Copy link
Contributor

jdahm commented Dec 14, 2022

Closing this for now. Let us know if you still have questions!

@jdahm jdahm closed this as completed Dec 14, 2022
@miaoneng
Copy link
Author

Thank you, John. I actually didn't have time to work on it at all in the past a couple of weeks. I will reopen in future in case I encounter any issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants