Apache TVM v0.12.0
Introduction
The TVM community has worked since the v0.11.1 release to deliver the following new exciting improvements! The main tags are below (bold text is with lots of progress):
- Community, RFC;
- Runtime: ACL(ArmComputeLibrary), Adreno, OpenCL & CLML, ROCm, CUDA & CUTLASS & TensorRT, Ethosn, CRT, Hexagon, Metal, Web & WASM, others about runtime;
- Frontend: TensorFlow/tflite, Pytorch/Torch, Paddle, OneFlow, keras;
- TE, Relay, BYOC, TOPI, Arith, TIR, TVMScript, MetaSchedule, Schedule;
- CI, Tests, BugFix, Docs, Docker, Build;
- Android, microTVM, Target, AutoTVM, AOT, LLVM.
Please visit the full listing of commits for a complete view: v0.11.1...v0.12.0.
Thanks @ysh329 for the great effort to the release process as the release manager.
Community
- Reviewer
- Committer
- PMC
RFC
- [RFC] Introduce PresburgerSet (#99) (
e17994b
) - [RFC] Further Unify Packed and Object in TVM Runtime (#97) (
d646a22
)
Runtime
ArmComputeLibrary
- [ACL][TESTING] Use pytest.mark.parametrize in ACL conv2d tests
- [ACL] Prevent offloading of per-channel quantized operators
- [CL] Update Compute Library from v22.11 to v23.02.1
Adreno
- [Adreno] Extend pack_filter for HWIO layout
- [Adreno] Update interface of AnnotateMemoryScope pass
- [Adreno] Optimize reduction schedule
- [BENCHMARK][ADRENO] Adreno Benchmarks with texture
- [BENCHMARKS][CLML] Adreno benchmarks with CLML BYOC path added
- [BENCHMARKS][ADRENO] Documentation for Adreno (Texture) benchmarks
- [DOCS][ADRENO] Improved Adreno documentation
OpenCL & CLML
- OpenCL
- CLML
- [CLML][RUNTIME] Enable more ops in CLML runtime
- [CLML][RELAY] Enable Pad and Conv2d layer fusion
- [CLML][CODEGEN] CLML native codegen utility
- [CLML] Version compatibility and various test cases
- [CLML] Changes corresponding to OpenCL workspace refactorization
- [RUNTIME][CLML] OpenCLML tuning and profiling enhanced
ROCm
CMSIS-NN
- [CMSIS-NN] Global function that provides range based on dtype
- [CMSIS-NN] Add int16 add and mul operator support
- [CMSIS-NN] Add a runtime error message
- [CMSIS-NN] Reduction in code size of AOT test runner binary
- [CMSIS-NN] Remove support for the old CMSIS NN project
- [CMSIS-NN] Support CMSIS NN from new GitHub location
- [CMSIS-NN] Add Cortex-M85 support
CUDA & CUTLASS & TensorRT
- [CUDA][Schedule] Better Layout Transform Schedules
- [Profiler] Allow user to flush L2 cache in
time_evalutor
function for profiling CUDA kernels - [Codegen][CUDA] Add error message for missing fragment info
- [CUTLASS][Ansor] Combine CUTLASS and Ansor
- [TensorRT] Fix BiasAdd with correct axis attribute
- [TRT][BYOC] allow strided_slice ops on selected dimensions (#14142)
Ethosn
- [ETHOSN] Update driver stack version to 22.11
- [ETHOSN] Support for addition with constant input
- [ETHOSN] Apply FoldConstant before NPU partitioning
- [ETHOSN] Remove support for NPU driver 22.08
- [ETHOSN] Fix for the mock inference after NPU driver update
- [ETHOSN] Remove requantize dependency on resize
- [ETHOSN] Add support for experimental compiler option
CRT
- [CRT] USE CMake for CRT standalone libraries
- [CRT][microTVM] Enable USMP by default for AoTExecutor + CRT runtime
- [CRT]Cleanup unused macros in crt_config.h.template
Hexagon
- [Hexagon][TOPI] Use IndexMap axis separator instead of TE
- [Hexagon] Add concept of DMA groups
- [Hexagon] Improve cache management strategy for HexagonBuffer
- [Hexagon] Denote DMA cache bypass as experimental feature
- [Hexagon] Adapt some intrinsics for high vector lanes
- Hexagon compilation on MacOS system
- [Hexagon] Enable depthwise conv2d NHWC with an HWIO kernel layout
- [Hexagon][QNN] Improve performance wo QNN canonicalization
- [Hexagon][Metaschedule] Add timeout_sec arg to get_hexagon_local_builder
- [Hexagon] Fix deprecated call for data layout size in bits
- [Hexagon] Allow scalar tensors to have null shape during allocation
- [Hexagon][runtime] Make HexagonThreadManager::CheckSemaphore thread safe
- [Hexagon] Float and quantized dense operators with schedules
- [Hexagon][CI] Updated sha for builder LLVM
- [Hexagon][CI] Update the docker image ID to reflect newer LLVM
- [Hexagon] Switch from default_rng to random in Hexagon tests
- [Hexagon] Add hexagon user DMA intrins for tensorization
- [hexagon] Hexagon inference fix
Metal
- [METAL][CODEGEN] testcase for ramp codegen
- [CODEGEN][METAL] Fix unaligned vector load
- [CODEGEN][METAL] Fix ramp codegen
MicroNPU
- [microNPU] Sum legalization support
- [microNPU] Add rescale parameters for binary elementwise
- [microNPU] Add hardware constraints for binary elementwise
- [microNPU] Add support for TFLite PAD
- [microNPU] Upgrade Vela to v3.7.0
- [microNPU] Merge LUT activation with binary elementwise operation
- [microNPU] Upgrade to 22.08 version of Arm(R) Ethos(TM)-U NPU drivers
- [microNPU] Add relu6 relu_n1_to_1 test cases for Ethos-U
- [microNPU] Add a legalization test for TFLite PAD
- [microNPU] Disable copying weights to SRAM for FullyConnected ops in CopyConstants scheduler
- [microNPU] Add support for ResizeNearestNeighbor with half_pixel_centers=True
Web & WASM
- [Web] Try to upgrade WebGPU API usage to the latest
- [WEB] Reduce memleak in web runtime
- [WEB] WebGPU Codegen
- [WEB] Update web runtime to support latest emcc
- [WASM][FIX] test tests/node/websock_rpc_test.py
Others about Runtime
- [FIX][RUNTIME] Convert container with function value type
- [RUNTIME] Fix the manual determination of cores in FillDataForMeasure
- [RUNTIME] Fix determination of big/little cores domains
- [Runtime] Fix Potential DeviceAPIManager Memory Bug
- [Runtime] Fix high RAM usage when saving / loading paramters of big models
- [Runtime] Runtime module property mask for Metal and Vulkan
- [Runtime] Introduce runtime module property
- [Runtime] Add missing Type2Str for TVMByteArray
Android
- [Android] Fix using system libraries in Android apps
- [TOOL][NATIVE] Android native application for deploy and run
AOT
- [AOT] Added a test for detecting output size post MLF export
- [AOT]Aot module post-test error workaround
- [AOT]Raise error when input name is not valid
- [AoT]Add get_input_name function to AoT Module
Arith
- "[Arith] Simplifications for floormod(x
- [Arith] Implemented PMatchesOneOf and matches_one_of
- [Arith][UnitTest] Parametrize tests of RewriteSimplifier
- [Arith] Use ConstIntBound to remove negative numerator when lowering
- "[Arith][Bugfix] Simplify ""x - 1 < y"" into ""x <= y"""
- "[Arith] Add simplification rule for `x - max(x+y
- [Arith] Updated incorrect simplification rule
- [Arith] Allow const folding on fp16 involving one and zero
- [ARITH] Enhance CanProve to handle symbolic bound
- [ARITH] support floordiv in deduce bound
- [Arith] Support eq in detect_clip_bound
- [Fix][Arith] Analyzer simplification starts with canonical
AutoTVM
BugFix
- [BugFix][UMA] Protect target registration
- [BugFix][Runtime] Add missing check for
PackedFunc
- [Bugfix][TIR] Fix version conflict with typing for Python 3.8.0
- Fix build platform environment variable
- [BugFix][TVMScript] Fix the roundtripability of intrinsic pow
- [BugFix] Pylance emits the warnning 'Code is unreachable'
- [BugFix][TVMScript]fix var capturing order error
- [BugFix][TVMScript] Parser crash
- [Bugfix][TVMScript] Handle LetStmt for
var1 = var2
expressions - [Bug][CodeGen,Cuda]fix cast fp16 to int8/uint8 in cuda
- [fix] MXNet dot for all tensor dimensions
- [Bugfix] Conv1Dtranspose default kernel layout should be IOW
- [Bugfix] Conv3Dtranspose default kernel layout should be IODHW
- [BugFix] Support rewrite_once when the number of callbacks > 1
- [Bugfix][TIR] Fix version conflict with typing for different Python versions (3.8.0-3.10.0)
- Fix out of bound enum conversion
- [bugfix] Fix the write buffer scope of
mma_store_impl
- [BugFix][Runtime] Fix Incorrect node information
Build
- [Build] Expose missing USE_VERILATOR in cmake
- [Build] Fix find_include_path when using TVM python package
- [Build] Fix misleading error messages
- [Build][Bugfix] Use CMAKE_ prefix for _COMPILER_LAUNCHER
BYOC
CI
- [CI][microTVM] Enable USE_MICRO for mac and windows CI builds
- [CI] Pass the 'path' parameter passed to cmake_build to the task_build.py script
- [CI][EZ] Upgrade CI Lint Image
- [CI][Lint] Update black
- [CI][Flaky] Skip zephyr_qemu-x86 tests that are part of task_python_microTVM
- [CI] Fix for NNPack error due to misalignment with pthreadpool library
- [ci] Disable Windows-Static-Runtime
- [ci][docker] Make branch names valid before using them as tags
- [CI] Cross-compile libtvm_runtime to Aarch64 and run tests
- [CI] Include static builds of the runtime as part of CI
- [CI] Update rerun list for tvm-bot
- [CI] Update ci_minimal docker image to cross-compile TVM to aarch64
- [CI] Update ci_arm docker image to have LLVM 15
- [CI] Update Compute Library to v22.11
- [CI] Fix broken model link
- [CI][ETHOSN] Add ssh to the driver stack installation
- [CI] Fix android build by constraining numpy version
- [CI] NNPACK build issue workaround
- [CI] Update GPU image for CUDA 11.7
- [CI] Update CUDA to 11.7
- [CI] Update cpu and gpu image
- [CI] Enable USE_MICRO in minimal cross ISA build
- [CI][microTVM]Update ci_cortexm image
- [CI][Docker][Cortex-M]Update scripts to update ci_cortexm to Ubuntu 20.04
- [CI] Fix MLF input and output name map
- [CI] Pin sccache version to 0.3.3
- [CI] Add llvm-15 and mlir-15 to Docker setup
- [CI] Add onnx dependency to test_auto_tensorize.py::test_vnni_bert_int8
- [CI] Fix test skipping pytest attribute
- [skip ci][ci][docker] Add cross compilation libs
Tests
- [Tests] Replace pytest.main with tvm.testing.main
- [TESTING] Enable execution of test_packed_8x8x32_resnet50
- [testing] Use tuples for numpy indexing
- [testing][py_converter] Enhance py_converter to better support entire modules
- [Unittest] merge test_cp_async_in_if_then_else into test_tir_transform_inject_ptx_async_copy
- [UnitTest] Parametrized test_arith_iter_affine_map::test_padding
Docker
- [Docker] Update ci-cpu and ci-arm to tag 20230223-070143-a3b51f11b
- [docker][microTVM]Fix Zephyr 0.15.2 SDK installation and separate Zephyr python environment
- [docker][microTVM]Update zephyr version to 3.2 and Zephyr SDK to 0.15.2
- [Docker]Add dialout group by default on login
- [Docker] Add script to build llvm from source
- [DOCKER] Configurable NDK version support
- [Docker update] Update ci_cpu tag to the latest from tlcpackstaging
Docs
- [Doc] fix doc for tvm.te.const()
- Add v0.11.0 docs link to site
- [docs] Remove empty code blocks
- [docs] Add details about patch releases
- [Docs] Update listed tvmc python dependencies
- "[docs] Add ""Open with Colab"" button to documentation"
- [Docs] Add
typing-extensions
dependency guide - [Docs] Fix MetaSchedule Docs
- [FIX] Fix Typos in Docs and Comments
- [HotFix][docs] Use correct Colab button URL
Frontend
- TensorFlow & TFLite
- Pytorch
- ONNX
- [Frontend] Add ONNX importer for QLinearSoftmax
- [ONNX] QGemm support
- [ONNX][TOPI] Add
DFT
operator - [Frontend] [ONNX] Support sequence_lens of GRU
- [ONNX] Extend converter for Attention from Microsoft onnxruntime contrib opset
- [ONNX] Add converter for QAttention from Microsoft onnxruntime contrib opset
- [ONNX][TORCH] Replace scatter op by scatter_elements
- [ONNX] Support ScatterElements with reduction
- [ONNX] Support Bitwise operations
- [ONNX] Support Bernoulli op on ONNX front-end
- [ONNX] Extend reduction types supported by ScatterND
- [ONNX] Support SequenceEmpty op
- [ONNX] Support SequenceErase op
- [ONNX] Support SequenceLength op
- Keras
- OneFlow
- Paddle
- [PaddlePaddle Hackathon 4][Frontend][Paddle]add conv3d for paddle frontend
- [Frontend][PaddlePaddle] Fix bug in tests for upgrading paddlepaddle to 2.4.2
- [Frontend][Paddle]add take_alone_axis and topk converter for paddle frontend
- [Frontend][Paddle] Add where_index op and add vm for paddle frontend's unitest
- [Frontend][Paddle] Add norm and one_hot_v2 op
- "[Frontend][PaddlePaddle] Add topk op and Fix bug
- [PaddlePaddle Hackathon 4][Frontend][Paddle]Add tile/mish/stack/unstack/silu/softshrink/where op for paddle frontend
- [Frontend][Paddle]fix eye and dist
- [PaddlePaddle Hackathon 4][Frontend][Paddle]add grid-sample/gaussian_random/flip/fill_zeros_like/unique for paddle frontend
- [PaddlePaddle Hackathon 4][Frontend][Paddle]add thresholded_relu/index_select/eye/linspace/take_alone_axis/dist for paddle frontend
microTVM
- [microTVM] Clean-up test_crt.py and add to pylint
- [microTVM] Build standalone_crt with cmake instead of makefile
- [microTVM] additional refactoring for enabling USE_MICRO in more builds
- [microTVM] Fix host-driven AOT memory workspaces
- [microTVM] Fix MacOS build with USE_MICRO=ON
- [microTVM] Use QNN schedules to give SOTA performance
- [microTVM]Fix more security issues with pyproject
- [microTVM] Update poetry to fix security issues
- [microTVM]Enable TVMC micro with AoT Executor
- [microTVM]Add test for MLPerfTiny models
- [microTVM][CRT]Move Makefile to CMake to be cross-platform compatible
- [microTVM]Refactor crt_config.h header file generation
- [microTVM] Refactor required external functions in CRT to platform-template.c
- [microTVM] Update Zephyr version and Zephyr SDK version
- [microTVM]Refactor test and add skip to current failing tests/boards
- [microTVM] Update tutorials
- [microTVM] Add tutorial on how to generate MLPerfTiny submissions
- [microTVM][Zephyr]Add project files for mlperftiny submission
- [microTVM]Add default value to unspecified project options in project API
- [microTVM]Add MLPerfTiny test harness
- [microTVM] Fix tvmc tutorial
- [microTVM][Zephyr] Remove unnecessary use of generate_c_interface_header
- [microTVM][CRT]Separate CRT template project from standalone CRT build
- [microTVM][Zephyr] Fix flash command for nrfjprog
- [microTVM][Zephyr] Fix TVMC test on hardware
- [microTVM] Custom IDE Tutorial
- [microTVM] tuning on micro targets with meta-schedule
- [microTVM] Allow multiple runners in tuning micro models with meta-schedule
- [microTVM] Replace arm_nnsupportfunctions.h with arm_acle.h
LLVM
- [LLVM] Use DataLayout::getABITypeAlign instead of getABITypeAlignment
- [LLVM] Add missing
override
to GetFormat and GetPropertyMask - [LLVM] Add guard for #include <llvm/Transforms/IPO/PassManagerBuilder.h>
- [LLVM] Remove call to EmitDebugLocation from AddAliasInfo
- [LLVM] Use std::nullopt instead of llvm::None
- [LLVM] Fix registerCallbacks API after recent change
- [LLVM] Add support to generate llvm.assume
- [LLVM] Add support for DeclBufferNode
- [LLVM][BugFix] Fix include Triplet.h bug when LLVM version>= 17
- [TEST] Fix division by 0 in llvm codegen test
- [SVE] Adding codegen tests for SVE
MetaSchedule
- [MetaSchedule] Introducing MemHammer
- [MetaSchedule] Introduce Async Pipeline in MultiLevelTiling
- [MetaSchedule][ARM] Enable ARM CPU intrinsic for MetaSchedule
- [MetaSchedule] Use
shared.dyn
for Tensor Core Schedule Rules - [MetaSchedule] add fp16-16-32 TensorCores rule to default settings
- [MetaSchedule][Hexagon] Improve vectorization for standalone elementwise op
- "[MetaSchedule] Add ""disabled_pass"" option in tuning API"
- [MetaSchedule] Fix anchor-block flow with empty design space generator
- [Metaschedule] get_top_k should not return not built records
- [Metaschedule] Aligning get_top_k logic in MemoryDatabase and JSONDatabase
- [MetaSchedule] preseve global_symbol attached to function after applying MS
- [MetaSchedule] Fix a typo in MemoryDatabase
- [MetaSchedule] Fix for RewriteLayout + AllocateConst when the rank of the rewritten weight doesn't change
- [MetaSchedule] Fix tensorcore winograd task extraction
- [HotFix][MetaSchedule] Turn off database shash check
- [MetaSchedule] MutateTileSize skip single-candidate SampleCategorical
- [Metaschedule] EvolutionarySearchNode::State constructor typo fix
- [Fix][MetaSchedule] Fix redundant stages in async pipeline for mlt
- [Fix][MetaSchedule] RPCRunner timeout when queueing up
- [MetaSchedule] Add pass instrument to MetaSchedule api
- [MetaSchedule] Tile and pack intermediate output for CUDA TensorCore
- [MeteSchedule] Bugfix: Add checks for nullable
run_secs
Misc
- [UX] Make T.prim_func typecheck as staticmethod
- [VM][DMLC] Lower memory usage when loading and dumping weights
- [APP] Update android_rpc build tools version
- [apps][bundle_deploy]Fix bundle build issue
- [Diagnostic] Support constructing Diagnostic Error through ObjectRef
- [skip ci] Replace magic_wand model with micro_speech
- [IR] Enhance IRModule SEqual/SHash to support cross function calls
- [Fix]Fix function ObjectPath in IRModule SEqual
- Update to v0.12.dev0
- Enable C++17 for cmake modules
- Remove temporary VTCM workspace APIs
- [IR] Platform-independent SHash
- Fix numpy version constraint
- [Utils] Allow classmethod and staticmethod in TVMDerivedObject
- [Git] Ignore python/requirements directory
- Enhance the --help message of composite target
- Add support for named outputs in MLF archive
- Add Name Transforms for Rust style
- Refactor test to make it easier for user to understand how tensor_intrin works
- Remove tutorials CMSIS dependency when not needed
- Add DisallowAsyncStridedMemCopy post processor to rem
- Add check for non-contiguous memory access when lowering to async dma
- Relay transform for rolling a known pattern into batch_matmul
- [Typo] Fix name of iter var type 4
- Extend the USE_LIBBACKTRACE option
- [Refactor] Move
VarUseDefAnalysis
to header file - Add header files for GraphExecutorDebug
- [pytest] Don't return values from test_* functions
- [Analysis] Improve error message in VerifyWellFormed
- Revert the changes for NNPACK build issue
- [Node] Utility methods for ObjectPathPair handling
- [Minor] Change file mode 755 -> 644; EOL CRLF -> LF
- [FIX] Minor Compilation Warning Fixes
- [Contrib][Sort] Faster Top-K Implementation
- [COLLAGE] Add more customization to support more targets
- [CONTAINER] Struct Hash/Equal and JSON support for ShapeTuple
- [VTA] Provide zero-initialization for VTAGenericInsn
- [Fix,Roofline] Fix roofline handling of multiple peak flops
- [RPC] Add fail-guard for termination time exception
- [TOPHUB] use keys as a keyword for searching of existing statistics
- [Transform] Use callable() instead of isinstance() for type checking
- [TRANSFORM] Fix virtual device annotation issue with BYOC subgraphs
Relay
- [Fix][Relay] Fix axis transformation in squeeze shape function
- [QNN][Relay][Topi] Add qnn.dense with weight layout
- [fix][relay][qnn] Bug fix for 8-bit quantized mul
- [Relay][Op] Connect existing arm_cpu schedule to relay strategy for concat
- [Relay] Convert negative axes to positive when importing ONNX Unsqueeze
- [Relay][Frontend] Span Filling PyTorch
- [Relay][Frontend] Span Filling ONNX
- [Relay][Frontend] Span Filling TensorFlow 1
- [Relay][Frontend] Span Filling TFLite
- [Relay][Frontend] Span filling common API
- [Relay][Pass] Separate out the graph partitioning code from fuse_ops.cc
- [Relay] Remove overwriting of matmul shapes when they are static
- [Relay][Frontend][Onnx] SequenceAt and SplitToSequence Operators
- [Relay] Move pad value extraction past null pointer check
- [relay][frontend][pytorch]Fix a bug in the _get_pytorch_value_type function
- [Relay] Enhance EliminateCommonSubexpr to support Tuple argument
- [Relay][TIR] Add utility to lower Relay func to TIR prim func
- "[Relay] Check if the attribute ""name"" exists before accessing it"
- [Relay][Docs] Fixed examples in relay/transform.py documentation
- [Relay][Runtime] Add
set_input/output_zero_copy
in python - [Relay][Testing][Bugfix]
py_converter
should use correct AST for versions above 3.8 too - [relay] preserve the order of input_info of pytorch
- [QNN] Change in Pass Context for lookup table calculation
- [QNN] Convert fake quantized take to quantized op
Schedule
- [Schedule][Bugfix] Fix decompose padding wrt the single child subtree
- [Schedule] Add an optional argument
disable_checks
forSchedule
Target
- "[Target] Make
key=arm_cpu
--> `key=arm_cpu - [Target] Add target tags for Apple Silicon GPU
- [Target] Fix Jetson AGX Xavier CPU core count
- [Target] Add A10G gpu cuda tag
TE
- [TE] Record primitives of Schedule for visualization
- [TE][PrimFunc] Fix create primfunc from te extern with explicit buffer load
Tensorize
- [Tensorize][runtime] Add support for AMX(Advanced Matrix Extensions) through Tensor intrinsics
- [Tensorize][TOPI] Add AMX Tensorizing for int8 batch matmul
TIR
- [TensorIR] Support for L2 prefetch async copy and pred_guard enabled async in vectorized if_then_else
- [TensorIR][Schedule] New primitive
reorder_block_itervar
- [TensorIR] New schedule primitive
set_dtype
- [Fix][TIR] LowerCrossThreadReduction with write-back predicate
- [TIR] Introduce Pass InjectPTXLDG32
- [Fix][TIR] Fix tvm::arith::UnionLowerBound
- [TIR][Schedule] Add unittest for read_write_at
- [TIR] Add cp.async support for tir.if_then_else
- [tir] fix buffer_decl buffer allocation
- [tir] Add line level debug info
- [TIR][FIX] check args size when creating prim_func by runtime::Registry
- [TIR] not estimating the flops when there is a default estimated flops as attr
- [TIR][Hexagon] Enhancement of NarrowDataType pass for binary ops
- [TIR] Handle nullptr returned by FindEntryFunc
- [TIR]Fix the crash of the pass RemoveNoOp
- [TIR] Update SplitHostDevice to post-process with ConvertSSA
- [TIR][Utility] More flexible tir::Substitute arguments
- [TIR][Analysis] Implement IdentifyMemCpy analysis function
- [TIR] Merged kDeviceThreadAxis and kUseDynamicSharedMemoryTag
- [TIR] Improved SeqStmt::Flatten utility
- [TIR] Use IRModuleNode::Remove to remove None in PrimFuncPass
- [TIR] Use same DataType of builtin::tvm_struct_set in C++ and Python
- [TIR] Update LowerTVMBuiltin to use Optional
- [TIR] Improved MakePackedAPI error message
- [TIR] Legalize dtype of constants in IndexMap
- [TIR] Improved error message in InjectSoftwarePipeline
- [TIR][Schedule] Allow buffer name argument to Schedule.set_scope
- [TIR] Fix dtype mismatch error due to LetStmt
- [Fix][TIR] SampleCategorical apply-to-schedule
- [TIR][Fix] IndexDataTypeNormalizer not unwrapping float casting
- [TIR][Fix] Buffer slicing using index dtype as extent
- [TIR] Create Layout with specified axis dtype
- [TIR][Schedule] Improve cache_index to cache common subexpressions
- [TIR][Arith] Add common sub expr analyzer
- [TIR] [Schedule] Add get_output_blocks primitive
- [TIR] [Analysis] Expose IsOutputBlock to python
- [TIR] [Bugfix] Pass the correct block_sref_reuse to Replace
- [TIR] Fix cache_write bug with allocate const node
- [TIR][Schedule] Fix reverse_compute_inline
- [TIR] Remove special-casing of T.address_of in the storage rewrite pass
- [TIR] Refactor BF16Legalize
- [TIR] Enhance loop unroll with unroll local access
- [TIR] Remove LoadNode and StoreNode
- [TIR] Allow TransformLayout index_map to contain RVs
- [TIR] Allow TransformLayout with non-inversible index map
- [TIR] Fix typo in doc
- [TIR] Update block flags and simplify predicate in Reverse-Compute-Inline
- [TIR][TOPI][x86][CI] Support skylake avx512
- [TIR][TOPI][CI] Fix number of arguments in calls of llvm_pure_intrin
- [TIR][Compute-at] Utilize InverseAffineIterMap for dom estimation
- [TIR] Expose bitwise ops to python
- [TIR] Add merge primitive for TIR schedule
- [TensorIR][Primitive] New schedule primitive
reindex_cache_read/write
- [TIR] Fix Datatype in Lower TVM Builtin
- [TIR] Enable Host Func Attribute for PrimFunc
TOPI
- [FIX][TOPI] Clip with IntImm/FloatImm
- [Fix,TOPI] Consolidate generic and x86 scatter nd
- [Test][Topi] Avoid depending on f32 rounding behavior for crop_and_divide tests
- [TOPI] Expose mem_scope from generic conv2d variants to be more reusable
- [TOPI][bugfix] Fix a bug in arm_cpu int8 dotprod schedule and modernize tests
- [TOPI] Bugfix arm_cpu schedule_conv2d_spatial_pack_nhwc schedule
- [TOPI][OP] Support grouped conv2d_NCHWc
- [TOPI] Fix batch_matmul tensorcore legalize for transpose_b = False case
- [TOPI] Group normalization
- [TOPI] dynamic externsion
- [TOPI] Fix tuple unpack in conv2d NCHWc int8
- [TOPI] Making test_strided_set require a GPU for testing
- [Fix][Relay][TOPI] Bug fix in relay.sum and topi.sum functions
- "[TOPI][Fix] Pool must return error if layout is tiled on H
- [TOPI] Batch Norm Training Mode
- [topi] remove comment redundancy in resize.py
- [TOPI][Hexagon] Implement global_avg_pool2d for hexagon
- [TOPI] Support non-batch cases for topi.nll_loss
- [TOPI] Add instance_norm operator
- [TOPI] Support symbolic shape in einsum
- "[TOPI][Relay][ONNX] Replace scatter_add by scatter_elements(reduction=""add"")"
- [TOPI] Fix data race of batch multibox detection
- [TOPI] Fix index dtype in topi strided_slice
- [TORCH][TOPI] Support mean reduction for scatter_reduce
TVMC
- [TVMC] Fix logging in TVMC
- [TVMC] Stop printing a wall of warnings with tvmc tune
- [TVMC] Add option to dump TIR code to file
- [TVMC] Allow selecting a subset of tasks to be used in
tvmc tune
- [TVMC] Improve --desired-layouts functionality
- [TVMC][microNPU] tvmc option for printing which operators are offloaded to Ethos-U
- [TVMC][TRANSFORMS] ToMixedPrecision transform support with custom options enabled
TVMScript
- [Fix][TVMScript]TVMScript BinOP printing refactor
- [TVMScript] Schedule error reporting with new TVMScript printer
- [TVMScript] Connect
assert_structural_equal
with new TVMScript printer - [TVMScript] Comments and docstrings printing
- [TVMScript]
T.allocate
withT.decl_buffer
syntax sugar for TVMScript printer - [TVMScript]
T.match_buffer
syntax sugar in arguments for TVMScript printer - [TVMScript] Linter-friendly function definitions
- [TVMScript][Fix] Fix
bool
printing for roundtrip - [Fix][TVMScript] Fix
LetStmt
printing logic - [TVMScript] More concise
T.allocate
syntax printing - [TVMScript] Implicit root block syntax sugar for TVMScript printer
- [TVMScript]
T.axis.remap
syntax sugar for TVMScript printer - [TVMScript] Robustify the Highlight Printer
- [TVMScript] Sugar Var Definition in TIR Buffer
- [TVMScript] Distinguish LetStmt and Let expression
- [TVMScript] Simplify TIR Var Definition
- [TVMScript][UX] Introduce decorator for deprecation
- [TVMScript] Support
show_meta
- [TVMScript] Consolidate folder structure
- [TVMScript] Default to T.Buffer than T.buffer_decl
- [TVMScript] Introduce
PrinterConfig
- [TVMScript] Add ObjectPath to LiteralDoc
- [TVMScript] Use TVMScript for all TIR Printing
- [TVMScript] Migrate More to TVMScripr Printer
- [TVMScript] IR Fragment Printing
- [TVMScript] Refactor IRDocsifier
- [TVMScript] Remove obsolete modules
- [TVMScript] Support SizeVar Roundtripping
- [TVMScript] Sugar T.env_thread + T.launch_thread
- [TVMScript] Encourage using T.Buffer directly
- [TVMScript] Unify
T.handle
andT.Ptr
- [TVMScript] Enable Safe Autocasting in BufferStore
- [TVMScript] Deterministic function ordering
- [TVMScript][Fix] Print Multi-line String as Metadata
- [TVMScript] Use op attribute to control whether to print dtype in TVMScript
- [TVMScript] Upstream IRModule parser from unity
- [TVMScript] Upstream IRModule parser from unity
- [TVMScript] Upstream IRModule parser from unity
- [TVMScript] Improved error message for unexpected top frame
- [TVMScript] Use new variable frame in If/Then/Else
- [Bugfix][TVMScript] Preserve variable names in LetStmt
- [TVMScript] More accurate hints for ImportError
- [TVMScript,Fix] Fix findsource when classes are indented
- [TVMScript][Printer] Remove relax prefix for now
- [Fix][TVMScript] Fix index of metadata in printed script
- [TVMScript] Fix print round-tripable multi thread env binding
- [TVMScript][Parser] Add more warp-level builtins and
Range