Update on the development branch #1456

kaiyux · 2024-04-16T11:48:38Z

kaiyux
Apr 16, 2024
Maintainer

Hi,

The TensorRT-LLM team is pleased to announce that we are pushing an update to the development branch (and the Triton backend) this April 16th, 2024.

This update includes:

Features
- Support pipeline parallelism for GPT
API
- [BREAKING CHANGE] Migrate enc-dec models to the unified workflow
Bug fixes
- Fix segmentation fault with pipeline parallelism and gather_all_token_logits Segmentation fault with pipeline parallelism and gather_all_token_logits #1284
- Remove the unnecessary check in XQA to fix Code Llama 70b triton crashes Code Llama 70b triton crashes with XQA #1256
- Fix an unsupported ScalarType issue for BF16 LoRA Support bfloat16 LoRa Adaptors triton-inference-server/tensorrtllm_backend#403
- Eliminate the load and save of prompt table in multimodal why is the `prompt_table` in ModelRunner.generate passed in as npy file instead of a tensor ? #1436
Performance
- Optimize applyBiasRopeUpdateKVCache kernel by avoiding re-computation

Thanks,
The TensorRT-LLM Engineering Team

kimbaol · 2024-04-16T12:01:21Z

kimbaol
Apr 16, 2024

Hi, I saw the warning "FP8 Context Paged KV FMHA hasn't been implemented yet." in gptAttentionCommon.cpp is removed, so does "paged_context_fmha" support fp8 in this update?

3 replies

PerkzZheng Apr 24, 2024
Collaborator

yes, it has been supported (need fp8 quantization workflow). See the doc here (https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/advanced/gpt-attention.md#fp8-context-fmha).

kimbaol Apr 24, 2024

Thanks for your reply.
I tired it on L40S, currently paged_context_fmha requires fp8-contex-fmha, and fp8-context-fmha is only supported on Hopper, is there any plan to support fp8-context-fmha on Ada?

PerkzZheng Apr 24, 2024
Collaborator

we are working on that, but there is no concrete date when we will support it. I will let you know when it is scheduled.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update on the development branch #1456

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Update on the development branch #1456

kaiyux Apr 16, 2024 Maintainer

Replies: 1 comment · 3 replies

kimbaol Apr 16, 2024

PerkzZheng Apr 24, 2024 Collaborator

kimbaol Apr 24, 2024

PerkzZheng Apr 24, 2024 Collaborator

kaiyux
Apr 16, 2024
Maintainer

Replies: 1 comment 3 replies

kimbaol
Apr 16, 2024

PerkzZheng Apr 24, 2024
Collaborator

PerkzZheng Apr 24, 2024
Collaborator