ring-attention

Ring Attention leverages blockwise computation of self-attention on multiple GPUs and enables training and inference of sequences that would be too long for a single devices.

This repository contains notebooks, experiments and a collection of links to papers and other material related to Ring Attention.

Reserach / Material

Paper: Ring Attention with Blockwise Transformers for Near-Infinite Context
- code: lhao499/ring-attention
Paper: World Model on Million-Length Video And Language With RingAttention
- code: LargeWorldModel/LWM,
- project site: largeworldmodel.github.io
- models: HF/LargeWorldModel
Paper: Striped Attention: Faster Ring Attention for Causal Transformers, code: exists-forall/striped_attention
Paper (2022): 4D parallelism: Sequence Parallelism: Long Sequence Training from System Perspective
related: Flash-Decoding for long-context inference (together.ai blog)
Paper: Online normalizer calculation for softmax (NVIDIA, 2018)
ELI5: FlashAttention by Aleksa Gordić
LWM model in ollama: https://ollama.com/ifioravanti/lwm
Phil Wang's (lucidrain) pytorch impl: lucidrains/ring-attention-pytorch
Zilin Zhu's nice zhuzilin/ring-flash-attention implementation

Notebooks

Incremental Softmax (to understand the algorithm in 'high-level' pytorch)
Naive flash-attn (to understand the algorithm in 'high-level' pytorch)

Development References

NVIDIA Collective Communication Library (NCCL) Documentation
PyTorch Distributed Overview
Distributed communication package - torch.distributed (send(), recv(), broadcast(), etc.)

How to contribute

Contact us on the GPU MODE discord server: https://discord.gg/gpumode, PRs are welcome (please create an issue first).

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
notebooks		notebooks
ring-llama		ring-llama
ring-transformer		ring-transformer
trition_flash_attn		trition_flash_attn
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ring-attention

Reserach / Material

Notebooks

Development References

How to contribute

About

Contributors 4

Languages

License

gpu-mode/ring-attention

Folders and files

Latest commit

History

Repository files navigation

ring-attention

Reserach / Material

Notebooks

Development References

How to contribute

About

Resources

License

Stars

Watchers

Forks

Contributors 4

Languages