Update on the development branch #1456
kaiyux
announced in
Announcements
Replies: 1 comment 3 replies
-
Hi, I saw the warning "FP8 Context Paged KV FMHA hasn't been implemented yet." in gptAttentionCommon.cpp is removed, so does "paged_context_fmha" support fp8 in this update? |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
The TensorRT-LLM team is pleased to announce that we are pushing an update to the development branch (and the Triton backend) this April 16th, 2024.
This update includes:
gather_all_token_logits
Segmentation fault with pipeline parallelism andgather_all_token_logits
#1284applyBiasRopeUpdateKVCache
kernel by avoiding re-computationThanks,
The TensorRT-LLM Engineering Team
Beta Was this translation helpful? Give feedback.
All reactions