Update on the development branch #1837
kaiyux
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
The TensorRT-LLM team is pleased to announce that we are pushing an update to the development branch (and the Triton backend) this June 25, 2024.
This update includes:
--weight_only_precision
argument fromtrtllm-build
command.shared_embedding_table
is not being set when loading Gemma [GEMMA]from_hugging_face
not settingshare_embedding_table
to True leading to incapacity to load Gemma #1799, thanks to the contribution from @mfuntowicz.gptManagerBenchmark
.nvcr.io/nvidia/pytorch:24.05-py3
.nvcr.io/nvidia/tritonserver:24.05-py3
.Thanks,
The TensorRT-LLM Engineering Team
Beta Was this translation helpful? Give feedback.
All reactions