NVIDIA / TensorRT-Model-Optimizer Public

Notifications You must be signed in to change notification settings
Fork 46
Star 639

Code
Issues 61
Pull requests
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Security
Insights

Issues: NVIDIA/TensorRT-Model-Optimizer

[RFC] TensorRT Model Optimizer - Product Roadmap

#108 opened Nov 21, 2024 by hchings

Open 6

Labels 6 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

61 Open 51 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[Quantization] activation bitwidth for memory access

#119 opened Dec 22, 2024 by yokosyun

quantize resnet18 onnx with fp8

#118 opened Dec 20, 2024 by thishome

[ONNX][PTQ] Quantization failed with --dq_only flag in ConvTranspose

#117 opened Dec 19, 2024 by ry3s

Asymmetric quantization for Activation?

#115 opened Dec 15, 2024 by yokosyun

What is difference of torch.quantization and onnx.quantization for speed and accuracy ?

#114 opened Dec 11, 2024 by demuxin

Is there a plan to support more recent PTQ methods for INT8 ViT?

#113 opened Dec 10, 2024 by dedoogong

Question aboutf use QuantizeLinear Node with my custom op

#112 opened Dec 10, 2024 by AnnaTrainingG

AssertionError: We only support fp8 for SDXL on Level 4

#109 opened Nov 25, 2024 by wxsms

[RFC] TensorRT Model Optimizer - Product Roadmap roadmap

#108 opened Nov 21, 2024 by hchings

how to reduce memory usage?

#107 opened Nov 20, 2024 by dedoogong

FP16 and FP32 shows 30% lower accuracy compared to INT8 for the ViT Example in ONNX_PTQ

#106 opened Nov 13, 2024 by chjej202

All the modules are disabled and onnx exporation is failed

#104 opened Nov 10, 2024 by Worromots

Whether fp8 quantization supports the the DIT module？

#102 opened Nov 4, 2024 by Rudin6

In cache_diffusion example, can we use dynamic image shape & batch size?

#101 opened Nov 4, 2024 by wxsms

quantification for SD1.5

#99 opened Oct 30, 2024 by zeng121

Can we use multi GPU while exporting (diffusers ) onnx model?

#96 opened Oct 29, 2024 by wxsms

cache_diffusion

#95 opened Oct 29, 2024 by zeng121

cross_attention_kwargs ['adapter_params'] are not expected by DefaultAttnProcessor2_0 and will be ignored.

#94 opened Oct 25, 2024 by zeng121

cross_attention_kwargs ['adapter_params'] are not expected by DefaultAttnProcessor2_0 and will be ignored.

#93 opened Oct 23, 2024 by zeng121

The algorithm output is not the same as after cache_diffusion

#92 opened Oct 23, 2024 by zeng121

calib_dataloader created fail

#91 opened Oct 22, 2024 by relaxtheo

unrecognized arguments: --deployment

#90 opened Oct 22, 2024 by relaxtheo

quantization flux-dev int8 or fp8 occur oom in L40s

#89 opened Oct 21, 2024 by algorithmconquer

Can you quantify the Multi-scale Deformable Attention module?

#88 opened Oct 12, 2024 by IEIAuto

ARM64 compatibility

#86 opened Oct 11, 2024 by felixkarevo

Previous 1 2 3 Next

Previous Next

ProTip! Follow long discussions with comments:>50.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly