Add `use_cache` option #39

wiseodd · 2024-07-02T18:38:20Z

Huggingface has this option set to True by default.

https://github.com/huggingface/transformers/blob/82486e5995ed0a65520b10ce1ea938214a199231/src/transformers/generation/configuration_utils.py#L131-L133

It caches the previous "key" & "value" attention activations, so during autoregressive text generation, one only needs to freshly compute the "query" part of the attention.

The text was updated successfully, but these errors were encountered:

wiseodd added the enhancement New feature or request label Jul 2, 2024

wiseodd self-assigned this Jul 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `use_cache` option #39

Add `use_cache` option #39

wiseodd commented Jul 2, 2024

Add use_cache option #39

Add use_cache option #39

Comments

wiseodd commented Jul 2, 2024

Add `use_cache` option #39

Add `use_cache` option #39