Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] Does TP overlap support variable sequence length? #1303

Closed
wplf opened this issue Nov 1, 2024 · 5 comments
Closed

[QUESTION] Does TP overlap support variable sequence length? #1303

wplf opened this issue Nov 1, 2024 · 5 comments

Comments

@wplf
Copy link
Contributor

wplf commented Nov 1, 2024

Hi, thank you for great works.
I'd like to ask if TP overlap support variable sequence length?

@denera
Copy link
Collaborator

denera commented Nov 5, 2024

TP overlap currently requires sequence parallelism and does not have any attention layout/format restrictions except that the sequence length has to be constant and evenly divisible by TP size.

Since you asked about the format, I want to clarify that we currently do not support comm+GEMM overlap in the attention mechanism. TP overlap is restricted to the te.Linear, te.LayerNormLinear and te.LayerNormMLP modules.

@wplf
Copy link
Contributor Author

wplf commented Nov 6, 2024

Thank you very much.
I'd like to ask another question. Is there any chance to bypass the constraint about the sequence length has to be constant and evenly divisible by TP size?
I'd like to overlap TP/SP for thd format, in which sequence length is flexible.

@denera
Copy link
Collaborator

denera commented Nov 6, 2024

Unfortunately the current implementation does not support variable sequence lengths, so you would have to pad your sequences up to a static maximum. Theoretically there is no reason why it couldn't be done, but the custom communication kernels we use for TP overlap have far too many hard-coded assumptions about buffer and work chunk sizes to strip out easily in practice.

We do plan to support this in the near future, after we migrate the TP overlap functionality to the latest cuBlasMp v0.3.0 release that introduced support for collective GEMM with overlapped communication (these are NVSHMEM-based re-implementations of the same TP overlap algorithms in Transformer Engine).

@wplf
Copy link
Contributor Author

wplf commented Nov 6, 2024

Thanks for your great works, again.
Could you tell me the ETA of this feature?

@wplf wplf closed this as completed Nov 6, 2024
@wplf wplf reopened this Nov 6, 2024
@denera
Copy link
Collaborator

denera commented Nov 6, 2024

I hope to integrate cuBlasMp into TE by mid-December at the latest. There's a chance this might support variable sequence lengths out of the box, but otherwise it would have to wait until at least January if not later, depending on where this feature lands on our list of priorities.

@wplf wplf changed the title [QUESTION] Does TP overlap support SP and thd format? [QUESTION] Does TP overlap support variable sequence length? Nov 6, 2024
@wplf wplf closed this as completed Nov 6, 2024
@wplf wplf reopened this Nov 11, 2024
@wplf wplf closed this as completed Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants