Skip to content

Releases: halide/Halide

Halide 13.0.3

06 Jan 10:05
Compare
Choose a tag to compare

This is a patch release with some added build system capabilities and a handful of backported stability improvements. Please see the PR list below for more details.

What's changed

  • Build system
    • The Mullapudi 2016 autoscheduler no longer assert-rejects unsupported targets. #6520
    • Fixed invalid headers in the linear algebra app on RISC-V. #6503
    • Fixed CMake export bug when custom-built LLVM has multiple include directories. #6519
    • Python artifacts will be installed when built, in the Halide_Python CPack component. Targets are not (yet) exported. #6530 #6523
    • Added SOVERSION override for libHalide to support advanced package maintenance workflows. #6534
  • Stability improvements
    • Fixed a missing clamp on inputs which might be read out of bounds via undefined overcompute values. #6352 #6508
    • Fixed an internal use-after-free bug. #6527
    • Fixed a free-order bug in Halide::Runtime that affected CUDA targets. #6511
    • Python bindings now correctly acquire/release the GIL. #6525 #6537
  • Other changes
    • Functions are tagged with the LLVM MSAN attribute when MSAN feature is enabled. #6516
    • CMake documentation has been updated. #6535

Full Changelog: v13.0.2...v13.0.3

Halide 13.0.2

10 Dec 18:49
Compare
Choose a tag to compare

This is a patch release to support official Debian packaging. No changes have been made to the compiler library or runtime.

Apps

  • Linear algebra app now correctly checks for the availability of SSE/AVX headers. #6471

Halide 13.0.1

16 Nov 19:36
Compare
Choose a tag to compare

This is a hotfix for v13.0.0.

Bugs fixed:

  • Fix obscure bug in widening let substitution. #6405
  • x86_cpuid_halide must preserve all 64 bits of rbx/rsi. #6409

Halide 13.0.0

02 Nov 17:24
c3641b6
Compare
Choose a tag to compare

We are pleased to announce the release of Halide 13.0.0!

This is a major release. Most notably, Halide now requires C++17 (or higher).

You can download one of our binary releases here, or check one of the following package repositories (they might take some time to be updated):

Language and Compiler

  • The compiler now requires C++17 or higher. (#5282)
  • Overloads of realize() that were deprecated in Halide 12 are now removed. (#6122, #6162)
  • Added new predicated tail strategies for split loops. (#6126)
  • Added a more fine-grained prefetch directive. (#6155)
  • Compiler now always runs in a separate 32MB stack on all platforms. (#6239)
  • Fixed a semantics bug where data-dependent loads might be uninitialized on over-compute. (#6294)
  • Using MemoryType::Stack may now trigger a real stack allocation for dynamically-sized allocations discovered to be small at runtime (#6289)

Backends

  • Simplifier improvements saw a >10% reduction in peak memory usage in many apps, including camera_pipe, harris, nl_means, and stencil_chain. (#6174)
  • The ARM backend now supports native 16-bit float instructions (#6102)
  • Division by non-power-of-two unsigned constants is now faster on X86 (#6322)
  • The WebAssembly backend is mature enough for significant production use (See https://web.dev/ps-on-the-web/)

Build

  • Fixed an issue with add_halide_library on Xcode, which requires at least one source file for every target. (#6175)
  • Added a watchdog timer to the Halide generator executables (i.e. GenGen.cpp). (#6184, #6240)
  • Fixed a missing dependency on Threads::Threads in CMake (#6257)
  • The tutorials and readmes are now packaged to the doc dir. The documentation has been moved one level deeper to share/doc/Halide/html (#6267)

Halide 12.0.1

21 May 03:38
Compare
Choose a tag to compare

This is a hotfix for v12.0.0

Bugs fixed:

  • Don't emit aligned loads to unaligned addresses in certain strided scenarios. #6046 #6047

Halide 12.0.0

20 May 09:11
b5a34c3
Compare
Choose a tag to compare

We are pleased to announce the release of Halide 12.0.0!

This is mostly a quality of life and bugfix release to set the stage for larger changes in Halide 13 (which will require C++17).

You can download one of our binary releases here, or check one of the following package repositories:

Language and Compiler

  • Added align_extent scheduling directive #5829
  • Added TailStrategy::Predicate as an alternative to TailStrategy::GuardWithIf to use predicated loops unconditionally #5856
  • Added scatter() and gather() expressions to support reading from and writing to multiple locations in update definitions #5553
  • Added internal memoization to Adams2019 autoscheduler (performance improvement) #5697 #5654
  • Removed old-style realize() methods which had been deprecated #5676
  • Removed deprecated scheduling directive overloads #5656
  • Many simplifier and bounds inference improvements and bugfixes #5615 #5618 #5895 #6002

Backends

  • Added support for AVX512 VNNI instructions #5725 #5807
  • Removed OpenGL/GLSL backend #5626
  • Fixed various errors with large_buffers #5716 #5940
  • Improved support for sdot and udot instructions on ARM (where supported) #5954
  • Improved support for WebAssembly SIMD ops, when compiling with LLVM 13 #5849 #5850 #5853 #5854 #5861 #5863
  • PyStub generators must now choose to use either only positional arguments or only keyword arguments. This is an ABI break #5761

Build

  • Added scripts to create Ubuntu packages #5754 #5967
  • Added experimental support for ClangCL on Windows #5876
  • Added support and pre-built binaries for macOS ARM64
  • Halide headers no longer inject stack space linker flags on Windows; now, the compiler runs on a fiber with enough stack space #5873
  • Halide shared library no longer exposes LLVM symbols on macOS and Linux. Help wanted for Windows! #5659

Halide 11.0.1

19 Feb 04:29
Compare
Choose a tag to compare

This is a small bugfix release over Halide 11.0.0.

Build

  • Fixes build failure with disabling Hexagon. #5745
  • Fixes dependence on LLVM having ARM, AArch64 backends. #5745

Halide 11.0.0

15 Feb 23:18
Compare
Choose a tag to compare

We are pleased to announce the release of Halide 11.0.0!

This release comes with many backend improvements and some notable deprecations. HVX 64 support has been removed, and OpenGL support has been deprecated (and has been removed from upstream).

You can download one of our binary releases here, or check one of the following package repositories:

Language and Compiler

  • Scheduling
    • The memoize directive gained a new EvictionKey parameter to schedule removal of particular entries from the cache. #5510
    • Added support for multi-dimensional vectorization #4873
  • Bounds inference
    • Left shifts could have incorrect bounds #5477
    • Analysis of comparisons (<,<=) and max/min could have incorrect bounds #5438
    • Integer division analysis was improved #5407
  • Various bugfixes
    • An integer-sign bug in lossless_cast was fixed #5459

Backends

  • ARM64 Windows is now supported, along with Direct3D 12. #5544
  • OpenGL (not OpenGL Compute) has been deprecated in this release and will be removed in Halide 12. You will see deprecation messages during your builds. #5475 #5551
    • We still welcome PRs to release/11.x from users who cannot move off the OpenGL backend.
    • Several bugs with EGL and OpenGL ES were fixed. #5730 #5619
    • Several bugs with plain OpenGL were fixed. #5545
  • CUDA
    • A bug with warp shuffles with narrow types was fixed #5624 #5669
  • Metal
    • Thread limits are now checked correctly #5588
  • Hexagon
    • Several bugs were fixed in #5570
    • Gained support for saturating vdmpy and vtmpy instructions #5424
    • Removed support for HVX_64 #5365 #3925

Build

  • Dependencies
    • Upgraded pybind11 dependency to 2.6.1 #5644
  • CMake
    • CMake rules learned about ppc64le targets #5558
    • CMake presets are available for users on 3.19+ #5508
    • add_halide_library gained a NAMESPACE argument to improve readability when using C++ name mangling. #5467
  • Bugfixes
    • Corrected Makefile warnings that MinGW is not supported. #5580
    • Incorrect system headers on FreeBSD/powerpc64 were replaced #5572
    • Emit error when trying to link to static LLVM, but lldWasm was linked to shared LLVM. #5472
    • CMake rules fixed for i686 systems #5675

Halide 10.0.1

15 Feb 22:23
Compare
Choose a tag to compare

We are pleased to announce the release of Halide 10.0.1!

The main change is that LLVM 10.0.1 is now the bundled version (it had previously been 10.0.0).

  • Fixed target detection for i686 in CMake #5675
  • Upgraded pybind11 to 2.6.1 #5644
  • Fixed missing newline bug in OpenCL backend #5277
  • Improved performance in Direct32 12 backend #5293 #5298
  • Fixed minor bug in loop partitioning #5355
  • Fixed linking to shared LLVM from CMake #5308
  • Fixed imprecisions in bounds inference for integer div and mod #5331 #5350
  • Fixed various issues in documentation #5330

Halide 10.0.0

16 Sep 20:17
Compare
Choose a tag to compare

We are pleased to announce the release of Halide 10.0.0!

This is a major update over the previous version, Halide 8.0.0, and contains many new features and a few breaking changes.

What happened to version 9?

For major version numbers, we now use the included LLVM version. We aim to release new versions of Halide at the same cadence as LLVM (every six months or so).

Autoschedulers

  • There are now multiple autoschedulers, and they have been reworked as plugins. They are each named for the research paper that produced them. The existing autoscheduler is now Mullapudi2016. See the generator documentation for more details.
  • The Adams2019 autoscheduler has been added. It is optimized for x86 CPUs and includes an autotuning mode.
  • The Li2018 autoscheduler has been added and generates CUDA schedules. It is optimized for pipelines using gradient descent features.

Build

  • The CMake build has been rewritten. See README_cmake.md for details.
  • The minimum CMake version is now 3.16
  • The old halide.cmake module has been removed in favor of find_package(Halide).
  • We no longer support the MinGW toolchain.

Language features

  • The atomic scheduling directive, which gives you another way to parallelize associative reductions (e.g. histograms, or summations) by emitting atomic instructions when available (and compare-and-swap loops or locks when not).
  • Support for horizontal vector reduction instructions, including dot-product instructions useful in machine learning, via combining the vectorize and atomic directives
  • Integer division or mod by zero now returns zero instead of being undefined behavior.
  • The simplifier is now formally verified.
  • You can now store Funcs that are compute_at GPU blocks in global memory, which is useful if they won't fit in shared memory.
  • Allocation size inference is more precise in a variety of cases.
  • Various bugfixes for compute_with.

Backends and targets

  • Better Direct3D 12 support
  • Added support for macOS and Windows on ARM.
  • We no longer support the legacy buffer_t type.
  • Explicit support for Volta, Turing, Ampere GPUs