Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZE_RESULT_ERROR_DEPENDENCY_UNAVAILABLE error while running the test program #37

Open
shibdas opened this issue Nov 8, 2024 · 0 comments

Comments

@shibdas
Copy link

shibdas commented Nov 8, 2024

Hi,
I'm running the test program from the readme after compiling and installing the extension but I'm hitting a segfault after a level zero error. I turned on various debug env variable and it seems level zero is loaded from the GPU but still while calling zeModulecreate in

https://github.com/intel/intel-extension-for-openxla/blob/main/xla/stream_executor/sycl/sycl_driver.cc#L392

python test.py
DEBUG:jax._src.xla_bridge:Discovered path based JAX plugin: jax_plugins.intel_extension_for_openxla
DEBUG:jax._src.xla_bridge:Loading plugin module jax_plugins.intel_extension_for_openxla
WARNING:jax_plugins.intel_extension_for_openxla:INFO: Intel Extension for OpenXLA version: 0.4.0, commit: eb3d812a
DEBUG:jax._src.xla_bridge:registering PJRT plugin xpu from /home/test/.local/lib/python3.10/site-packages/jax_plugins/intel_extension_for_openxla/pjrt_plugin_xpu.so
DEBUG:jax._src.xla_bridge:Initializing backend 'cpu'
DEBUG:jax._src.xla_bridge:Backend 'cpu' initialized
DEBUG:jax._src.xla_bridge:Initializing backend 'cuda'
INFO:jax._src.xla_bridge:Unable to initialize backend 'cuda':
DEBUG:jax._src.xla_bridge:Initializing backend 'rocm'
INFO:jax._src.xla_bridge:Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig'
DEBUG:jax._src.xla_bridge:Initializing backend 'tpu'
INFO:jax._src.xla_bridge:Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory
WARNING:jax._src.xla_bridge:Platform 'xpu' is experimental and not all JAX functionality may be correctly supported!
DEBUG:jax._src.xla_bridge:Initializing backend 'xpu'
KHR ICD trace at /netbatch/donb66987_00/dir/workspace/NIT/xmain-rel/LX/xmainefi2linux_release/ws/icsws/builds/xmainefi2linux_pgouse_sreleaseusingrelease/llvm/_deps/ocl-icd-src/loader/icd.c:362: Found OCL_ICD_FILENAMES environment variable.
KHR ICD trace at /netbatch/donb66987_00/dir/workspace/NIT/xmain-rel/LX/xmainefi2linux_release/ws/icsws/builds/xmainefi2linux_pgouse_sreleaseusingrelease/llvm/_deps/ocl-icd-src/loader/icd.c:71: attempting to add vendor /opt/intel/oneapi/compiler/2024.2/lib/libintelocl.so...
KHR ICD trace at /netbatch/donb66987_00/dir/workspace/NIT/xmain-rel/LX/xmainefi2linux_release/ws/icsws/builds/xmainefi2linux_pgouse_sreleaseusingrelease/llvm/_deps/ocl-icd-src/loader/icd.c:197: successfully added vendor /opt/intel/oneapi/compiler/2024.2/lib/libintelocl.so with suffix INTEL
KHR ICD trace at /netbatch/donb66987_00/dir/workspace/NIT/xmain-rel/LX/xmainefi2linux_release/ws/icsws/builds/xmainefi2linux_pgouse_sreleaseusingrelease/llvm/_deps/ocl-icd-src/loader/icd.c:71: attempting to add vendor /usr/lib/x86_64-linux-gnu/intel-opencl/libigdrcl.so...
KHR ICD trace at /netbatch/donb66987_00/dir/workspace/NIT/xmain-rel/LX/xmainefi2linux_release/ws/icsws/builds/xmainefi2linux_pgouse_sreleaseusingrelease/llvm/_deps/ocl-icd-src/loader/icd.c:197: successfully added vendor /usr/lib/x86_64-linux-gnu/intel-opencl/libigdrcl.so with suffix INTEL
KHR ICD trace at /netbatch/donb66987_00/dir/workspace/NIT/xmain-rel/LX/xmainefi2linux_release/ws/icsws/builds/xmainefi2linux_pgouse_sreleaseusingrelease/llvm/_deps/ocl-icd-src/loader/icd.c:71: attempting to add vendor /opt/intel//oneapi/compiler/latest/lib/libintelocl.so...
KHR ICD trace at /netbatch/donb66987_00/dir/workspace/NIT/xmain-rel/LX/xmainefi2linux_release/ws/icsws/builds/xmainefi2linux_pgouse_sreleaseusingrelease/llvm/_deps/ocl-icd-src/loader/icd.c:86: already loaded vendor /opt/intel//oneapi/compiler/latest/lib/libintelocl.so, nothing to do here
KHR ICD trace at /netbatch/donb66987_00/dir/workspace/NIT/xmain-rel/LX/xmainefi2linux_release/ws/icsws/builds/xmainefi2linux_pgouse_sreleaseusingrelease/llvm/_deps/ocl-icd-src/loader/linux/icd_linux.c:150: Failed to open path /etc/OpenCL/layers, continuing
ZE_LOADER_DEBUG_TRACE:Using Loader Library Path:
ZE_LOADER_DEBUG_TRACE:Loading Driver libze_intel_gpu.so.1
ZE_LOADER_DEBUG_TRACE:Loading Driver libze_intel_gpu_legacy1.so.1
ZE_LOADER_DEBUG_TRACE:Load Library of libze_intel_gpu_legacy1.so.1 failed with libze_intel_gpu_legacy1.so.1: cannot open shared object file: No such file or directory
ZE_LOADER_DEBUG_TRACE:Loading Driver libze_intel_vpu.so.1
ZE_LOADER_DEBUG_TRACE:Load Library of libze_intel_vpu.so.1 failed with libze_intel_vpu.so.1: cannot open shared object file: No such file or directory
ZE_LOADER_DEBUG_TRACE:Loading Driver libze_intel_npu.so.1
ZE_LOADER_DEBUG_TRACE:Load Library of libze_intel_npu.so.1 failed with libze_intel_npu.so.1: cannot open shared object file: No such file or directory
ZE_LOADER_DEBUG_TRACE:Tracing Layer Library Path: libze_tracing_layer.so.1
ZE_LOADER_DEBUG_TRACE:check_drivers(flags=ZE_INIT_FLAG_GPU_ONLY)
ZE_LOADER_DEBUG_TRACE:init driver libze_intel_gpu.so.1 zeInit(ZE_INIT_FLAG_GPU_ONLY) returning ZE_RESULT_SUCCESS
DEBUG:jax._src.xla_bridge:Backend 'xpu' initialized
jax.local_devices(): [xpu(id=0)]
DEBUG:jax._src.dispatch:Finished tracing + transforming for pjit in 0.00041031837463378906 sec
DEBUG:jax._src.dispatch:Finished tracing + transforming for pjit in 0.00033354759216308594 sec
DEBUG:jax._src.dispatch:Finished tracing + transforming fn for pjit in 0.00041556358337402344 sec
DEBUG:jax._src.dispatch:Finished tracing + transforming fn for pjit in 0.00045800209045410156 sec
DEBUG:jax._src.dispatch:Finished tracing + transforming _uniform for pjit in 0.004739522933959961 sec
DEBUG:jax._src.dispatch:Finished tracing + transforming for pjit in 0.0003650188446044922 sec
DEBUG:jax._src.dispatch:Finished tracing + transforming fn for pjit in 0.0003840923309326172 sec
DEBUG:jax._src.dispatch:Finished tracing + transforming fn for pjit in 0.00032138824462890625 sec
DEBUG:jax._src.dispatch:Finished tracing + transforming _uniform for pjit in 0.0030584335327148438 sec
DEBUG:jax._src.dispatch:Finished tracing + transforming for pjit in 0.0004353523254394531 sec
DEBUG:jax._src.dispatch:Finished tracing + transforming fn for pjit in 0.0003151893615722656 sec
DEBUG:jax._src.dispatch:Finished tracing + transforming fn for pjit in 0.000331878662109375 sec
DEBUG:jax._src.dispatch:Finished tracing + transforming _uniform for pjit in 0.0031616687774658203 sec
DEBUG:jax._src.dispatch:Finished tracing + transforming for pjit in 0.00046443939208984375 sec
DEBUG:jax._src.dispatch:Finished tracing + transforming relu for pjit in 0.0011076927185058594 sec
DEBUG:jax._src.dispatch:Finished tracing + transforming fn for pjit in 0.000347137451171875 sec
DEBUG:jax._src.dispatch:Finished tracing + transforming lax_conv for pjit in 0.016260147094726562 sec
DEBUG:jax._src.interpreters.pxla:Compiling lax_conv for with global shapes and types []. Argument mapping: [].
DEBUG:jax._src.dispatch:Finished tracing + transforming for pjit in 0.0004506111145019531 sec
DEBUG:jax._src.dispatch:Finished tracing + transforming _threefry_seed for pjit in 0.0017113685607910156 sec
DEBUG:jax._src.dispatch:Finished tracing + transforming ravel for pjit in 0.00023245811462402344 sec
DEBUG:jax._src.dispatch:Finished tracing + transforming threefry_2x32 for pjit in 0.001512765884399414 sec
DEBUG:jax._src.dispatch:Finished tracing + transforming _threefry_random_bits_original for pjit in 0.002304553985595703 sec
DEBUG:jax._src.dispatch:Finished tracing + transforming for pjit in 0.0003342628479003906 sec
DEBUG:jax._src.dispatch:Finished tracing + transforming fn for pjit in 0.00028896331787109375 sec
DEBUG:jax._src.dispatch:Finished tracing + transforming fn for pjit in 0.00030732154846191406 sec
DEBUG:jax._src.dispatch:Finished tracing + transforming for pjit in 0.0002868175506591797 sec
DEBUG:jax._src.dispatch:Finished tracing + transforming for pjit in 0.00042128562927246094 sec
DEBUG:jax._src.dispatch:Finished tracing + transforming ravel for pjit in 0.00017571449279785156 sec
DEBUG:jax._src.dispatch:Finished tracing + transforming threefry_2x32 for pjit in 0.0012390613555908203 sec
DEBUG:jax._src.dispatch:Finished tracing + transforming _threefry_random_bits_original for pjit in 0.001980304718017578 sec
DEBUG:jax._src.dispatch:Finished tracing + transforming fn for pjit in 0.0004525184631347656 sec
DEBUG:jax._src.dispatch:Finished tracing + transforming fn for pjit in 0.0002968311309814453 sec
DEBUG:jax._src.dispatch:Finished tracing + transforming for pjit in 0.0002846717834472656 sec
DEBUG:jax._src.dispatch:Finished tracing + transforming for pjit in 0.0003006458282470703 sec
DEBUG:jax._src.dispatch:Finished tracing + transforming ravel for pjit in 0.0002028942108154297 sec
DEBUG:jax._src.dispatch:Finished tracing + transforming threefry_2x32 for pjit in 0.0018062591552734375 sec
DEBUG:jax._src.dispatch:Finished tracing + transforming _threefry_random_bits_original for pjit in 0.002747058868408203 sec
DEBUG:jax._src.dispatch:Finished tracing + transforming fn for pjit in 0.000431060791015625 sec
DEBUG:jax._src.dispatch:Finished tracing + transforming fn for pjit in 0.0003037452697753906 sec
DEBUG:jax._src.dispatch:Finished tracing + transforming for pjit in 0.0003323554992675781 sec
DEBUG:jax._src.dispatch:Finished tracing + transforming for pjit in 0.0002956390380859375 sec
DEBUG:jax._src.dispatch:Finished jaxpr to MLIR module conversion jit(lax_conv) in 0.1102445125579834 sec
DEBUG:jax._src.compiler:get_compile_options: num_replicas=1 num_partitions=1 device_assignment=[[xpu(id=0)]]
DEBUG:jax._src.compiler:get_compile_options XLA-AutoFDO profile: using XLA-AutoFDO profile version -1
DEBUG:jax._src.dispatch:Finished XLA compilation of jit(lax_conv) in 0.5001187324523926 sec
2024-11-08 10:21:43.152872: F xla/stream_executor/sycl/sycl_driver.cc:402] L0 error 1879179264:
Aborted (core dumped)

#clinfo -l

Platform #0: Intel(R) OpenCL
-- Device #0: Intel(R) Xeon(R) Silver 4314 CPU @ 2.40GHz Platform #1: Intel(R) OpenCL Graphics -- Device #0: Intel(R) Data Center GPU Flex 140

AFAICT I have installed all the packages for the intel flex GPUs and I have oneapi 2024.2 installed. Can anyone please help with this? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant