Version 23.06.0 (June 28, 2023)
elliottslaughter
released this
27 Jun 17:56
·
7580 commits
to stable
since this release
- Build
- Fixes for CMake build on macOS
- Fixes for HIP build when arch is specified
- Realm
- Support for better backtraces via libdw and libunwind
- Improve scalability and performance in task spawning by caching the triggering operation of an event if one is provided
- Fix a minor issue with affinity queries to properly clear the user-provided vector before populating it
- Add more accurate GPU memory bandwidth affinity calculations if NVML is available
- Refactor CPU core topology enumeration to serve systems without NUMA capabilities (like Jetson ARM systems)
- Improve scalability and performance of task spawning by moving event reuse freelists to be per-processor, reducing lock contention
- Add a microbenchmark for measuring task throughput more accurately
- Add a series of Realm API tutorials
- Replace
CU_EVENT_DEFAULT
withCU_EVENT_DISABLE_TIMING
for better performance of CUDA events - Support Kokkos interop for the HIP module
- Fixes for Realm tests on macOS
- Tools
- Legion Prof now supports search in the new profiler UI
- Legion Prof now supports an HTTP client/server interface. Launch the server with
--serve
(on port 8080 by default) and attach a client to it with--attach http://127.0.0.1:8080
- Legion Prof now supports a new achival mode via the
--archive
flag. Generate an offline profile and view it either via--attach
or by uploading it to a server and navigating tohttps://legion.stanford.edu/prof-viewer/?url=...
- Legion Prof modes (client/server/viewer) are now parallel by default, and perform heavy computations off the UI thread for better responsiveness
- Add support for rendering indirect copies (i.e., gather/scatter)
- Fix rendering of profiles over HTTP with old profiler UI
- Fix profiling of copies with different numbers of hops between instances