Skip to content

Commit

Permalink
📦 Packing documentation (huggingface#2503)
Browse files Browse the repository at this point in the history
  • Loading branch information
qgallouedec authored Dec 22, 2024
1 parent 99451b4 commit aed5da5
Showing 1 changed file with 34 additions and 1 deletion.
35 changes: 34 additions & 1 deletion docs/source/reducing_memory_usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,4 +51,37 @@ training_args = SFTConfig(..., max_length=...)
```

</hfoption>
</hfoptions>
</hfoptions>

## Packing

<Tip>

This technique applies only to SFT.

</Tip>


[Truncation](#truncation) has several drawbacks:
1. **Loss of information**: Key data at the end of a sequence may be discarded.
2. **Choosing truncation length**: Too short loses data; too long undermines efficiency.

Packing, introduced in [Raffel et al., 2020](https://huggingface.co/papers/1910.10683), addresses these issues by grouping sequences instead of truncating. It concatenates and splits dataset sequences into the desired lengths.

<div class="flex justify-center">
<img src="https://huggingface.co/datasets/trl-lib/documentation-images/resolve/main/packing.png" alt="Packing" width="600"/>
</div>

Packing eliminates padding, preserves all sequence information, and allows for flexible sequence lengths, making it a more efficient alternative to truncation. To enable packing, use `packing=True` in the [`SFTConfig`]:

```python
from trl import SFTConfig

training_args = SFTConfig(..., packing=True, max_seq_length=512)
```

<Tip warning={true}>

Packing may cause batch contamination, where adjacent sequences influence one another. This can be problematic for some applications. For more details, see [#1230](https://github.com/huggingface/trl/issues/1230).

</Tip>

0 comments on commit aed5da5

Please sign in to comment.