Release Highlights:
New Features:
Full-Range Symmetric Quantization: We’ve introduced full-range symmetric quantization, which often matches or even exceeds the performance of asymmetric quantization, especially at lower bit widths, such as 2.
Command-Line Support: You can now quantize models using the command auto-round --model xxx --format xxx
Default Exporting Format Change: The default format has been updated to auto_round instead of auto_gptq.
Muiti-thread packing: up to 2X speed up on packing phase
Bug Fixes:
Resolved Missing Cached Position Embeddings: Fixed an issue with missing cached position embeddings in Transformer version 4.45.2.
Mutable Default Values Issue: Addressed problems related to mutable default values.
3 bit packing bug for AutoGPTQ format
What's Changed
- Add setseed in autoround by @WeiweiZhang1 in #201
- support autoawq format by @yintong-lu in #115
- Remove UT coverage check by @XuehaoSun in #202
- set autoround format as default to unify CPU/HPU/CUDA by @wenhuach21 in #205
- add local file of pile-10k by @WeiweiZhang1 in #198
- modify setup.py by @n1ck-guo in #206
- limit the scale minimum value not to 0 by @WeiweiZhang1 in #211
- fix example dataset regression by @WeiweiZhang1 in #212
- remove local pile file by @WeiweiZhang1 in #213
- update xpu format exporting by @WeiweiZhang1 in #214
- fix a bug in autoround format inference by @wenhuach21 in #215
- avoid underflow and overflow for exllamav2 by @wenhuach21 in #218
- add qwen int4 model, refine example by @WeiweiZhang1 in #217
- [Experimental Feature]fast tuning norm/bias at 2 bits by @wenhuach21 in #208
- update readme by @wenhuach21 in #220
- refine eval_042 to enable parallelize evaluation by @WeiweiZhang1 in #221
- Enable phi3v tuning by @WeiweiZhang1 in #197
- Bump setuptools from 69.5.1 to 70.0.0 in /examples/multimodal-modeling/Phi-3-vision by @dependabot in #223
- refine example by @WeiweiZhang1 in #224
- change the scale thresh generally by @WeiweiZhang1 in #229
- add quantized models by 3rd party by @WeiweiZhang1 in #230
- add meta3.1-70B-instruct model, refine docs by @WeiweiZhang1 in #231
- fix model link by @WeiweiZhang1 in #232
- refine docs, add accuracy data, add receip and eval scripts by @WeiweiZhang1 in #226
- add brief formats introduction by @wenhuach21 in #236
- update readme and add itrex in the requirements.txt by @wenhuach21 in #238
- add tritonv2, improve packing and pbar by @wenhuach21 in #239
- refine the code and the speedup is notable by @wenhuach21 in #240
- move some settings from example to main by @wenhuach21 in #241
- add runable script for autoround by @n1ck-guo in #225
- update readme by @n1ck-guo in #242
- Add MANIFEST.in file to include requirements.txt by @XuehaoSun in #243
- fix example bug by @n1ck-guo in #245
- enable llava int4 inference with autoround format by @WeiweiZhang1 in #237
- remove autoawq requirement at packing stage by @n1ck-guo in #249
- remove unused log by @n1ck-guo in #252
- support INC API by @WeiweiZhang1 in #255
- avoid potential bug for auto-gptq 0.8 by @wenhuach21 in #250
- fix example by @n1ck-guo in #256
- fix preci by @n1ck-guo in #258
- enable_qwen2-vl_quantization by @WeiweiZhang1 in #248
- update eval and fix example by @n1ck-guo in #260
- refine autoawq exporting code by @wenhuach21 in #261
- better support quant_lm_head for larger models by @wenhuach21 in #263
- Fix 3bit packing for auto-gptq format by @wenhuach21 in #264
- Add a warning for improper export formats. by @wenhuach21 in #265
- Update readme for VLM support and integration by @wenhuach21 in #266
- remove g_idx in gptq format by @wenhuach21 in #267
- keep the dtype after qdq by @wenhuach21 in #268
- enable llama3.2-vision model quantization by @WeiweiZhang1 in #269
- fix mutable default value by @wenhuach21 in #272
- change to even rounding for mantissa of mx_fp by @wenhuach21 in #277
- adamround bugfix, refine import by @WeiweiZhang1 in #275
- [Important Change]set full range sym as the default by @wenhuach21 in #278
- refine eval by @wenhuach21 in #282
- qwen2_bugfix, add adamround vision UT by @WeiweiZhang1 in #281
New Contributors
- @dependabot made their first contribution in #223
Full Changelog: v0.3...v0.3.1