Releases
v1.17.0
1.17.0 (June 13, 2024)
Features:
UCP
Improved the accuracy of rendezvous protocol performance estimation
Enabled short protocol for non-host memory types on empty messages
Improved the accuracy of performance estimation for empty messages by removing non-relevant overheads
Added RMA_ZCOPY_MAX_SEG_SIZE configuration parameter to allow modifying segment size for RMA-ZCOPY protocols
Added support for separate intra/inter-node rendezvous thresholds
Added support for minimal fragment size in rendezvous protocol
Added support for resetting request during send operation
Added UCX_PROTO_OVERHEAD configuration variable to allow setting protocol overheads
Improved performance for combined Active Message/RMA scenarios by separating them to different lanes
Added support for device staging buffers in pipeline protocols
Enabled on-demand paging for Nvidia's Grace platforms by default
RDMA CORE (IB, ROCE, etc.)
Introduced the UCX_REVERSE_SL environment variable to configure reverse SL for DC transport. By default, it uses UCX_IB_SL.
Added support for GID auto-detection in Floating LID based routing
Added support for multithreading KSM registration of unaligned buffers
Added IB_SEND_OVERHEAD and MM_[SEND|RECV]_OVERHEAD configuration variables
GPU (CUDA, ROCM)
Added support for oneAPI Level-Zero library for Intel GPUs
UCS
Added support for rcache dynamic region alignment
Added dynamic bitmap data structure
Added support for advanced key-value parsing for UCX configuration
Added piecewise linear function data structure
Added support for allocating dynamic arrays on stack
Tools
Added support for device memory allocation in UCX perftest
Added a script to use for squashing commits after PR approval
Added support for DPU cross-gvmi daemon in UCX perftest
Java
Added support for EP local socket address API in JUCX
Build
Added address sanitizer support
Added a helper shell script to run static checks
AZP
Replaced Valgrind tests with address sanitizer tool
Added Ubuntu 22.04 docker image testing
Configuration
Added support for filtering configuration sections by platform type
Added configuration file with section for Grace Hopper
Bugfixes:
UCP
Fixed crash due to incorrect lane selection when active message is disabled
Fixed RMA lane selection issue due to wrong bandwidth calculation
Fixed rendezvous protocol information in protocol details table
Fixed endpoint reconfiguration issue due to wrong bandwidth calculation
Fixed Active Message handlers issue due to out of order registration
Fixed registration of memh evens for imported memory key
Fixed sockaddr unreachable destination error handling
Fixed uninitialized memory issue in new protocols infrastructure
Fixed race condition when using strong fence by flushing all endpoints
Fixed incorrect RMA message size on immediate completion with no datatype
Fixed incorrect performance estimation due to fp8 pack/unpack issue
Fixed remote access error when rcache memory is not registered with atomic access
Fixed assertion failure when rcache fails during memh allocation
Fixed atomic device selection issue
Fixed worker interface deactivation while still in use by endpoints
Fixed wire compatibility issue due to mismatched lane selection
RDMA CORE (IB, ROCE, etc.)
Disabled device memory if atomics are not available
Fixed indirect keys creation for MT registered memory
Fixed KSM start address value when creating export key
Fixed DCI pool index to support maximum of 16 pools
Fixed atomic rkey issue when using imported memory
Fixed crash due to unsupported SRQ capability
GPU (CUDA, ROCM)
Removed unused environment variable RCACHE_ADDR_ALIGN from ROCm transport
Fixed usage of cuda device 0 when no context is active
Removed error handling support from CUDA IPC transport
Fixed allocation of unaligned CUDA memory
Shared Memory
Fixed occasional crash when shm_unlink fails during interface initialization
UCS
Fixed system device distance calculation for devices on different PCIe root
Fixed support for large size arrays in ucs_array
Fixed synchronization issue in rcache
Fixed uninitialized variable access in rcache
Tests
Fixed test failures when GPU is present but disabled
Fixed Active Message hanging issue in ucp_client_server
Fixed potential crash due to redundant munmap call in ucp mmap tests
Fixed a crash when running CUDA gtest under valgrind
Fixed UD endpoint timeout issue under Valgrind
Java
Fixed failures in Java tests by waiting for send requests completion
Fixed JVM segfault in Java tests when gdrcopy driver is not loaded
Fixed go build and go tests failures
Packaging
Disabled Go bindings in Debian package
You can’t perform that action at this time.