Introduction
The TVM community has worked since the last release to deliver the following new exciting improvements!
The main tags are below (bold text is with lots of progress):
- Frontend: PyTorch's ExportedProgram is supported in the relax frontend ( #17346)
- Community, RFCs
- AOT, Hexagon, OpenCL & CLML, Web, Metal
- Relax, Dlight, Disco
- TIR, TVMScript
- Docs, Docker, CI, Misc, BugFix
Please visit the full listing of commits for a complete view: v0.18.dev0...v0.18.0.rc0.
Community
- #17450 - update contributors
RFCs
The new RFC introduces a new backend Android Neural Network API (NNAPI) for BYOC. It is a graph-level neural network inference API provided by the Android runtime. Prior to this RFC, TVM on Android mobile devices mainly relies on OpenCL for GPU acceleration. This RFC aims to add a new codegen and a runtime via the BYOC framework, which enables execution on custom accelerators from SoC vendors on mobile devices.
- #109 - [RFC] NNAPI Integration via BYOC
BYOC
- #17385 - [NNAPI] Add NNAPI backend for BYOC
BugFix
- #17440 - [TIR][Schedule] TileWithTensorIntrin skip ComputeInline if bu…
- #17419 - [FFI]Grab GIL when check env signals
- #17403 - [Fix][LLVM] Fix getHostCPUFeatures LLVM version cutoff
- #17383 - [ONNX] Skip constant If node generated by PyTorch
- #17360 - [FIX] fix bug when normalize iter with different lower bounds
- #17148 - [Relax] Preserve existing DataflowBlock in ConvertToDataflow
- #17345 - [Fix][Relax] Add the missing tree-attn func arg for KV cache creation
- #17073 - [Relax]FCallPacked not checked in CodegenVMTIR
- #17315 - [MSC]Bugfix for strided_slice op
- #17335 - [Relax][PyTorch][Fix] use
_convert_torch_tensor_to_relax()
where possible - #17330 - [Relax][PyTorch]Update
layer_norm
converter to supportimmutable_list
fornormalized_shape
- #17324 - [Fix] Remove
tvm.
prefix from image name when./docker/build.sh
- #17308 - [TVM4J]Fix unhandled return type in JNI
- #17307 - [Fix][TIR] LowerThreadAllreduce warp reduction mask
- #17312 - [Relax]Infer TIR values from shapes inside a tuple
- #17292 - [Relax]Support torch.unbind op and fix bugs for expand && split
- #17263 - [Relax]Preserve dtype in ToMixedPrecision for kNever ops
- #17229 - [Cutlass] fix cutlass instantiate attention template bugs
- #17121 - [Relax]Fix a bug about the IR construction in test file
- #17142 - Allow import of TVM when current directory is read-only
CI
- #17444 - [Docs] Upgrade Sphinx
- #17425 - Upgrade CI to Python 3.9
- #17410 - Upgrade unity image tag to
20240917-153130-9f281758
- #17409 - [Windows] Workaround for error in FindLLVM
- #17397 - Update image tag to 20240917-153130-9f281758
- #17338 - Upgrade PyTorch to 2.4.1
- #17337 - Disable NNPACK build and fix error on Android SDK installaion
- #17355 - Upgrade github upload-artifact action
- #17334 - [Hexagon] Forward gtest tests into pytest as separate tests
- #17271 - Resolve CI compilation failures on MacOSX
- #17221 - Reduce logging level when checking if docker image exists
- #17206 - Update dummy-variable regex for pylint
- #17117 - [CLML]Fix for few clml regression issues
- #17155 - Remove lint step from
unity/pr-head
step
Disco
- #17398 - Enable float8 data type in disco
- #17275 - Fix double free of nccl communicator
- #17264 - Disable splitting nccl communicator in single-group
- #17182 - Implement SocketSession
- #17191 - Cross-group and p2p send/receive primitives
- #17180 - Group-wise operation
Dlight
- #17430 - [GPU] Improve matmul schedule for adreno
- #17363 - Fix Matmul rule for Conv3D
- #17259 - [ADRENO] Fix for opencl adreno matmul schedule
- #17187 - [GPU] Add OpenCL dequant matmul schedule
Docker
- #17433 - [CI] Add NNEF dependency to CI images
Docs
- #17436 - [Relax][PyTorch]Use
torch.export
insteamd offx.symbolic_trace
for tutorial - #17402 - [Doc] Update Architecture Overview
- #17382 - More clarity on security model of RPC server
- #17380 - [Doc] Relax Deep Dive
- #17377 - Update document to include security model of RPC server
- #17378 - Link to project-specific security page
- #17352 - TVM pip Installation fix
- #17343 - Minor fix typo in developer howto guide
- #17328 - [Doc] Deep Dive TensorIR
- #17327 - [Doc] How to Optimize a Language Model
- #17320 - [Doc] Customize Optimization
- #17319 - [Doc] Fix doc build error in e2e_opt_model.py
- #17306 - [Doc] Refactor How-To
- #17296 - [Doc] Overview
- #17298 - [Doc] IRModule
- #17286 - Introduce Relax API and move legacy part to standalone page
- #17289 - [Doc] Quick Start
- #17287 - [Doc] Refactor install docs
Frontend
- #17431 - [Relax][Onnx] Add support for pad-2
- #17447 - [ONNX] Move relax related tests to the correct file
- #17427 - [Relax][ONNX] Expand op support for ONNX frontend
- #17429 - [Relax][PyTorch] Support tensor manipulation and creation ops for ExportedProgram importer
- #17426 - [Relax][PyTorch] Support neural network ops for ExportedProgram importer
- #17424 - [Relax][PyTorch] Support binary, statistical and search ops for ExportedProgram importer
- #17421 - [Relax][PyTorch] Support more unary ops for ExportedProgram importer
- #17396 - [Relax][PyTorch] Add support for
torch.export.ExportedProgram
in Relax PyTorch Frontend - #17379 - [Relax][PyTorch] Fix output shape of
torch.nn.functional.scaled_dot_product_attention
- #17376 - [Relax][PyTorch] Cleanup Tensor Manipulation and Creation op converters
- #17372 - [Relax][PyTorch] Cleanup Statistical, Search and DataType op converters
- #17369 - [Relax][PyTorch] Cleanup Neural Network op converters
- #17366 - [Relax][PyTorch] Cleanup binary op converters
- #17356 - [Relax][PyTorch] Cleanup unary op converters
- #17350 - [Relax][Onnx] fix params name bug in onnx frontend
- #17342 - [Relax][PyTorch] Add support for
torch.ops.aten.sym_size.int
- #17300 - [Relax][PyTorch] Add support for torchvision.ops.stochastic_depth
- #17325 - [Relax][PyTorch] Add support for
torch.nn.functional.conv*
- #17309 - [Relax][Onnx] fix expand bug in onnx frontend
- #17304 - [Relax][PyTorch] Add support for torch.repeat
- #17291 - [Relax][PyTorch] Add support for torch.tile
- #17277 - [Relay][Pytorch] Add support for
aten::tile
- #17228 - [Unity]Add Sqrt Op
- #17189 - [Relax][PyTorch] Add support for
torch.nn.functional.max_pool2d
- #17186 - [Relax][PyTorch] Add support for torch.einsum
- #17184 - [Relax][PyTorch] Add support for torch.permute
- #17167 - [Relax] [ONNX] Add support for Sign and Not
Hexagon
- #17204 - Fix LWP assembly handler (predicate register)
- #17169 - [CMake] Fix v66 build issue
- #17162 - Support RPC execution of existing shared lib
LLVM
- #17347 - [RUNTIME] Fix RISC-V CodeModel propagation to ORCJIT runtime executor
- #17199 - Fix for getHostCPUFeatures API change
MetaSchedule
- #17166 - Replace
xgboost.rabit
withxgboost.collective
because it's deprecated - #17171 - Add a testcase for padded conv2d in meta_schedule
OpenCL & CLML
- #17273 - [CODEGEN][OPENCL] Fix opencl codegen for few ops
ROCm
Relax
- #17449 - Add scatter_nd op support
- #17453 - Add NonZero op
- #17448 - Support left_shift and right_shift op
- #17432 - [KVCACHE] Improved schedule for prefill attention
- #17428 - Introduce static shape tuning pipeline
- #17401 - [KVCache] Attention func accepting over-padded qkv and output NDArray
- #17331 - Validate StructInfo annotations in well-formed check
- #17368 - [Transform] Add SelectNode handling in SymbolicMatcher
- #17353 - Fix BYOC removing existing ext mods
- #17359 - Add new NN allgather operator
- #17362 - [KV Cache] Refactor
_attention_sequence_prefill
function to … - #17332 - Validate StructInfo of variable bindings
- #17354 - Fix inline source module cause path too long error
- #17213 - Refactor RealizeVDevice to remove in-place mutation
- #17253 - [Transform] Handle tuple return in RemoveUnusedOutputs
- #17285 - Require correct input/output shapes
R.call_tir
- #17202 - Update GlobalVar name in AttachGlobalSymbol
- #17218 - Allow dynamic shape argument to R.reshape
- #17326 - [KVCache] Add tree attention with paged cache support
- #17314 - [Transform] Compose preproc functions in LiftTransformParams
- #17313 - Identify tuple unpack/repack in CanonicalizeBindings
- #17305 - [Python]Rotary positional embedding scaling
- #17243 - Avoid wrapping TupleStructInfo into a Tuple for R.call_tir
- #17224 - [Analysis] Handle recursive functions in CollectVarUsage
- #17280 - [KVCache] Increase coalesce threshold
- #17261 - Add KVCache Interface for Relax NNModule
- #17145 - Implement R.ensure_zero_offset and update memory planning for R.view
- #17242 - Remove segfault in R.call_tir_inplace validation
- #17234 - FuseTransposeMatmul Pass
- #17226 - Fix segfault in rewrite_bindings for MatchCast node
- #17220 - Handle presence of R.call_tir in MergeCompositeFunctions
- #17201 - [Transform]Handle
is_group
argument in IPC AllReduce - #17198 - Disable fusion for fetching from the packed params in FuseOps
- #17149 - Implement Rewriter class for pattern-rewrite
- #17192 - [KVCache] Partial layers support
- #17157 - Integrate cuDNN attention
- #17160 - Fix fuseOps via pattern
Relay
- #17339 - [qnn]: Fix qnn.avg_pool2d layout inference
- #17177 - [FQ2I]: Use appropriate dtype while quantizing relay.op.nn.pad…
Runtime
- #17407 - Add property Module.is_device_module
- #17294 - Support KV cache with RoPE extension factor array
- #17240 - [FFI]Use TVMValue::v_int64 to represent boolean values
- #17252 - Revert "[FFI]Introduce runtime boxed types for int/float/bool"
- #16183 - [FFI]Introduce runtime boxed types for int/float/bool
- #17237 - Reorganize PagedKVCache attn kernel invocation
- #17227 - Allow aborting fetchWithCache through AbortSignal
- #17208 - Allow aborting fetchNDArray through AbortSignal
TIR
- #17443 - Add
is_vector
Method to DataType class and update usages across Codebase - #17411 - [NarrowDataType] Bufferload's index should not inherit bits constraint of value
- #17219 - Validate tir::Buffer axis_separators on construction
- #17158 - [Analyzer] Simplify
x==x
expressions for all dtypes
TOPI
- #17274 - [ADRENO] Add Group Conv2d texture schedule
TVMScript
- #17435 - Enable T.macro decorateing class method
- #17434 - [TIR] Add source kernel intetration via call_kernel
- #17395 - [TIR, TVMScript] Add TIR - Triton integration
- #17131 - [Relax] Allow return statement in DataflowBlock
- #17373 - Avoid segfault from invalid TVMScript
cuda & cutlass & tensorrt
- #17408 - [CUTLASS] Add FP8 gemm kernels
web
- #17420 - Allow deprecated API requestAdapterInfo with any cast
- #17404 - [WASM] Implement concat embeddings
- #17251 - Add TVMArgBool to ArgTypeCode
Misc
- #17457 - Try to fix windows CI conda build issue
- #17415 - [NVSHMEM] Enable nvshmem memory allocation
- #17422 - [CMake] Add NCCL/RCCL header directory to include path
- #17405 - [TVMjs] Modify web package description
- #17400 - [3rdparty] Bump FlashInfer for tmp workspace reduction
- #17394 - [MSC] Support concat with constant inputs
- #17351 - [MSC][Refactor] Support dynamic shape
- #17371 - [WEBGPU] Update runtime to remove deprecated API
- #17361 - [IR] Expose ReplaceGlobalVars utility in the Python API
- #17358 - Update tvmc_command_line_driver.py, modify the sentence, remove the duplicate "as"
- #17344 - [MSC] Reconstruct tensorrt module
- #17297 - [Apps] Remove mxnet dependency from /apps/android_camera/models
- #17299 - [Apps] Remove mxnet dependency from /apps/ios_rpc
- #17293 - [Rust] Remove mxnet dependency and re-enable rust example
- #17321 - [Target] Refine equality check on TargetKind instances
- #17317 - Add NVSHMEM support
- #17301 - [TE][CreatePrimFunc] Fix create reduce block with spatial iter dependent init value
- #17284 - [Support] Fix the Read/Write of socket stream
- #17302 - [Codegen][WebGPU] LetNode common subexpr override
- #17246 - [Cleanup] Remove
using namespace tvm::runtime
from headers - #17278 - [Codegen] Emit
tir::Let
as var assignment explicitly - #17260 - [WINDOWS] Compiler options for non x86 targets
- #17249 - [IR] Handle NaN in StructuralEqual and StructuralHash
- #17257 - [FFI] Re-introduce the boxed primitive values
- #17265 - [CompileBugfix][contrib] meet 'base64.h: No such file or directory' and '‘tvm::runtime::vm::AllocatorType’ has not been declared' while compiling
- #17214 - Replacing unary ops with LookUpTable and Take op to improve performance
- #17250 - [WebGPU] Fix unexpected device lost error when intentional dispose
- #17236 - [3rdparty] Bump FlashInfer
- #17233 - [Runtime Patch] Add AbortSignal to fetchWithCache in ArtifactCacheTemplate interface
- #17190 - [Cython][FFI] Fix crash when call del operator for handle
- #17170 - Pass to eliminate redundant branch and overcompute
- #17185 - Remove and replace deprecated
distutils.util.strtobool()
- #17188 - Add
packaging
topython/gen_requirements.py
- #17181 - [FFI] Add python signal handler for ctypes FFI
- #17173 - Use
packaging.version.parse
instead ofdistutils.version.LooseVersion
- #17174 - [TVMJS] Check DataType.NUMPY2STR when saving array
- #17168 - [Meta Schedule][XGBoost] enable custom callback func test with xgboost>=1.6.0