[QUESTION] Does TP overlap support variable sequence length? #1303

wplf · 2024-11-01T09:51:37Z

Hi, thank you for great works.
I'd like to ask if TP overlap support variable sequence length?

denera · 2024-11-05T23:51:14Z

TP overlap currently requires sequence parallelism and does not have any attention layout/format restrictions except that the sequence length has to be constant and evenly divisible by TP size.

Since you asked about the format, I want to clarify that we currently do not support comm+GEMM overlap in the attention mechanism. TP overlap is restricted to the te.Linear, te.LayerNormLinear and te.LayerNormMLP modules.

wplf · 2024-11-06T02:35:01Z

Thank you very much.
I'd like to ask another question. Is there any chance to bypass the constraint about the sequence length has to be constant and evenly divisible by TP size?
I'd like to overlap TP/SP for thd format, in which sequence length is flexible.

denera · 2024-11-06T04:02:13Z

Unfortunately the current implementation does not support variable sequence lengths, so you would have to pad your sequences up to a static maximum. Theoretically there is no reason why it couldn't be done, but the custom communication kernels we use for TP overlap have far too many hard-coded assumptions about buffer and work chunk sizes to strip out easily in practice.

We do plan to support this in the near future, after we migrate the TP overlap functionality to the latest cuBlasMp v0.3.0 release that introduced support for collective GEMM with overlapped communication (these are NVSHMEM-based re-implementations of the same TP overlap algorithms in Transformer Engine).

wplf · 2024-11-06T06:05:12Z

Thanks for your great works, again.
Could you tell me the ETA of this feature?

denera · 2024-11-06T06:59:34Z

I hope to integrate cuBlasMp into TE by mid-December at the latest. There's a chance this might support variable sequence lengths out of the box, but otherwise it would have to wait until at least January if not later, depending on where this feature lands on our list of priorities.

wplf closed this as completed Nov 6, 2024

wplf reopened this Nov 6, 2024

wplf changed the title ~~[QUESTION] Does TP overlap support SP and thd format?~~ [QUESTION] Does TP overlap support variable sequence length? Nov 6, 2024

wplf closed this as completed Nov 6, 2024

wplf reopened this Nov 11, 2024

wplf closed this as completed Nov 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] Does TP overlap support variable sequence length? #1303

[QUESTION] Does TP overlap support variable sequence length? #1303

wplf commented Nov 1, 2024 •

edited

Loading

denera commented Nov 5, 2024

wplf commented Nov 6, 2024

denera commented Nov 6, 2024 •

edited

Loading

wplf commented Nov 6, 2024 •

edited

Loading

denera commented Nov 6, 2024

[QUESTION] Does TP overlap support variable sequence length? #1303

[QUESTION] Does TP overlap support variable sequence length? #1303

Comments

wplf commented Nov 1, 2024 • edited Loading

denera commented Nov 5, 2024

wplf commented Nov 6, 2024

denera commented Nov 6, 2024 • edited Loading

wplf commented Nov 6, 2024 • edited Loading

denera commented Nov 6, 2024

wplf commented Nov 1, 2024 •

edited

Loading

denera commented Nov 6, 2024 •

edited

Loading

wplf commented Nov 6, 2024 •

edited

Loading