Update on the development branch #2009
kaiyux
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
The TensorRT-LLM team is pleased to announce that we have pushed an update to the development branch (and the Triton backend) this July 23, 2024.
This update includes:
chunk_length
parameter to Whisper, thanks to the contribution from @MahmoudAshraf97 in addchunk_length
parameter to Whisper #1909.use_custom_all_reduce
argument is removed fromtrtllm-build
.multi_block_mode
argument is moved from build stage (trtllm-build
and builder API) to the runtime.cluster_infos
defined intensorrt_llm/auto_parallel/cluster_info.py
, thanks to the contribution from @saeyoonoh in fix auto parallel cluster info typo #1987.docs/source/reference/troubleshooting.md
, thanks for the contribution from @hattizai in chore: remove duplicate flag #1937.We are working on an update to the Llama FP8 code today or tomorrow (the current code works but we need to update the checkpoint converter).
Thanks,
The TensorRT-LLM Engineering Team
Beta Was this translation helpful? Give feedback.
All reactions