Releases: halide/Halide
Halide 13.0.3
This is a patch release with some added build system capabilities and a handful of backported stability improvements. Please see the PR list below for more details.
What's changed
- Build system
- The Mullapudi 2016 autoscheduler no longer assert-rejects unsupported targets. #6520
- Fixed invalid headers in the linear algebra app on RISC-V. #6503
- Fixed CMake export bug when custom-built LLVM has multiple include directories. #6519
- Python artifacts will be installed when built, in the
Halide_Python
CPack component. Targets are not (yet) exported. #6530 #6523 - Added SOVERSION override for libHalide to support advanced package maintenance workflows. #6534
- Stability improvements
- Other changes
Full Changelog: v13.0.2...v13.0.3
Halide 13.0.2
This is a patch release to support official Debian packaging. No changes have been made to the compiler library or runtime.
Apps
- Linear algebra app now correctly checks for the availability of SSE/AVX headers. #6471
Halide 13.0.1
Halide 13.0.0
We are pleased to announce the release of Halide 13.0.0!
This is a major release. Most notably, Halide now requires C++17 (or higher).
You can download one of our binary releases here, or check one of the following package repositories (they might take some time to be updated):
- Vcpkg: https://github.com/microsoft/vcpkg/tree/master/ports/halide
- Homebrew: https://formulae.brew.sh/formula/halide
Language and Compiler
- The compiler now requires C++17 or higher. (#5282)
- Overloads of
realize()
that were deprecated in Halide 12 are now removed. (#6122, #6162) - Added new predicated tail strategies for
split
loops. (#6126) - Added a more fine-grained
prefetch
directive. (#6155) - Compiler now always runs in a separate 32MB stack on all platforms. (#6239)
- Fixed a semantics bug where data-dependent loads might be uninitialized on over-compute. (#6294)
- Using MemoryType::Stack may now trigger a real stack allocation for dynamically-sized allocations discovered to be small at runtime (#6289)
Backends
- Simplifier improvements saw a >10% reduction in peak memory usage in many apps, including
camera_pipe
,harris
,nl_means
, andstencil_chain
. (#6174) - The ARM backend now supports native 16-bit float instructions (#6102)
- Division by non-power-of-two unsigned constants is now faster on X86 (#6322)
- The WebAssembly backend is mature enough for significant production use (See https://web.dev/ps-on-the-web/)
Build
- Fixed an issue with
add_halide_library
on Xcode, which requires at least one source file for every target. (#6175) - Added a watchdog timer to the Halide generator executables (i.e.
GenGen.cpp
). (#6184, #6240) - Fixed a missing dependency on
Threads::Threads
in CMake (#6257) - The tutorials and readmes are now packaged to the doc dir. The documentation has been moved one level deeper to
share/doc/Halide/html
(#6267)
Halide 12.0.1
Halide 12.0.0
We are pleased to announce the release of Halide 12.0.0!
This is mostly a quality of life and bugfix release to set the stage for larger changes in Halide 13 (which will require C++17).
You can download one of our binary releases here, or check one of the following package repositories:
- Vcpkg: https://github.com/microsoft/vcpkg/tree/master/ports/halide
- Homebrew: https://formulae.brew.sh/formula/halide
Language and Compiler
- Added
align_extent
scheduling directive #5829 - Added
TailStrategy::Predicate
as an alternative toTailStrategy::GuardWithIf
to use predicated loops unconditionally #5856 - Added
scatter()
andgather()
expressions to support reading from and writing to multiple locations in update definitions #5553 - Added internal memoization to Adams2019 autoscheduler (performance improvement) #5697 #5654
- Removed old-style
realize()
methods which had been deprecated #5676 - Removed deprecated scheduling directive overloads #5656
- Many simplifier and bounds inference improvements and bugfixes #5615 #5618 #5895 #6002
Backends
- Added support for AVX512 VNNI instructions #5725 #5807
- Removed OpenGL/GLSL backend #5626
- Fixed various errors with
large_buffers
#5716 #5940 - Improved support for
sdot
andudot
instructions on ARM (where supported) #5954 - Improved support for WebAssembly SIMD ops, when compiling with LLVM 13 #5849 #5850 #5853 #5854 #5861 #5863
- PyStub generators must now choose to use either only positional arguments or only keyword arguments. This is an ABI break #5761
Build
- Added scripts to create Ubuntu packages #5754 #5967
- Added experimental support for ClangCL on Windows #5876
- Added support and pre-built binaries for macOS ARM64
- Halide headers no longer inject stack space linker flags on Windows; now, the compiler runs on a fiber with enough stack space #5873
- Halide shared library no longer exposes LLVM symbols on macOS and Linux. Help wanted for Windows! #5659
Halide 11.0.1
Halide 11.0.0
We are pleased to announce the release of Halide 11.0.0!
This release comes with many backend improvements and some notable deprecations. HVX 64 support has been removed, and OpenGL support has been deprecated (and has been removed from upstream).
You can download one of our binary releases here, or check one of the following package repositories:
- Vcpkg: https://github.com/microsoft/vcpkg/tree/master/ports/halide
- Homebrew: https://formulae.brew.sh/formula/halide
Language and Compiler
- Scheduling
- Bounds inference
- Various bugfixes
- An integer-sign bug in
lossless_cast
was fixed #5459
- An integer-sign bug in
Backends
- ARM64 Windows is now supported, along with Direct3D 12. #5544
- OpenGL (not OpenGL Compute) has been deprecated in this release and will be removed in Halide 12. You will see deprecation messages during your builds. #5475 #5551
- CUDA
- Metal
- Thread limits are now checked correctly #5588
- Hexagon
Build
- Dependencies
- Upgraded pybind11 dependency to 2.6.1 #5644
- CMake
- Bugfixes
Halide 10.0.1
We are pleased to announce the release of Halide 10.0.1!
The main change is that LLVM 10.0.1 is now the bundled version (it had previously been 10.0.0).
- Fixed target detection for i686 in CMake #5675
- Upgraded pybind11 to 2.6.1 #5644
- Fixed missing newline bug in OpenCL backend #5277
- Improved performance in Direct32 12 backend #5293 #5298
- Fixed minor bug in loop partitioning #5355
- Fixed linking to shared LLVM from CMake #5308
- Fixed imprecisions in bounds inference for integer div and mod #5331 #5350
- Fixed various issues in documentation #5330
Halide 10.0.0
We are pleased to announce the release of Halide 10.0.0!
This is a major update over the previous version, Halide 8.0.0, and contains many new features and a few breaking changes.
What happened to version 9?
For major version numbers, we now use the included LLVM version. We aim to release new versions of Halide at the same cadence as LLVM (every six months or so).
Autoschedulers
- There are now multiple autoschedulers, and they have been reworked as plugins. They are each named for the research paper that produced them. The existing autoscheduler is now Mullapudi2016. See the generator documentation for more details.
- The Adams2019 autoscheduler has been added. It is optimized for x86 CPUs and includes an autotuning mode.
- The Li2018 autoscheduler has been added and generates CUDA schedules. It is optimized for pipelines using gradient descent features.
Build
- The CMake build has been rewritten. See
README_cmake.md
for details. - The minimum CMake version is now 3.16
- The old
halide.cmake
module has been removed in favor offind_package(Halide)
. - We no longer support the MinGW toolchain.
Language features
- The
atomic
scheduling directive, which gives you another way to parallelize associative reductions (e.g. histograms, or summations) by emitting atomic instructions when available (and compare-and-swap loops or locks when not). - Support for horizontal vector reduction instructions, including dot-product instructions useful in machine learning, via combining the
vectorize
andatomic
directives - Integer division or mod by zero now returns zero instead of being undefined behavior.
- The simplifier is now formally verified.
- You can now store Funcs that are compute_at GPU blocks in global memory, which is useful if they won't fit in shared memory.
- Allocation size inference is more precise in a variety of cases.
- Various bugfixes for
compute_with
.
Backends and targets
- Better Direct3D 12 support
- Added support for macOS and Windows on ARM.
- We no longer support the legacy
buffer_t
type. - Explicit support for Volta, Turing, Ampere GPUs