Skip to content

b3651

Compare
Choose a tag to compare
@github-actions github-actions released this 02 Sep 04:52
8f1d81a
llama : support RWKV v6 models (#8980)

* convert_hf_to_gguf: Add support for RWKV v6

Signed-off-by: Molly Sophia <[email protected]>

* Add RWKV tokenization

* Fix build

Signed-off-by: Molly Sophia <[email protected]>

* Do not use special tokens when matching in RWKV tokenizer

* Fix model loading

* Add (broken) placeholder graph builder for RWKV

* Add workaround for kv cache

* Add logits conversion to rwkv5

* Add rwkv5 layer norms

* Add time mix KVRG & correct merge mistake

* Add remaining time mix parameters

* Add time mix output loading

* Add placeholder llm_build_time_mix

* Fix build

Signed-off-by: Molly Sophia <[email protected]>

* Load more tensors for rwkv v6

Signed-off-by: Molly Sophia <[email protected]>

* Fix rwkv tokenizer

Signed-off-by: Molly Sophia <[email protected]>

* ggml: Add unary operator Exp

Signed-off-by: Molly Sophia <[email protected]>

* RWKV v6 graph building

Signed-off-by: Molly Sophia <[email protected]>

* Add ``rescale_every_n_layers`` parameter

Signed-off-by: Molly Sophia <[email protected]>

* Add ``wkv.head_size`` key for RWKV

so it doesn't reuse Mamba ssm parameters

Signed-off-by: Molly Sophia <[email protected]>

* Fix offloading layers to CUDA

Signed-off-by: Molly Sophia <[email protected]>

* Fix parallel inferencing for RWKV

Signed-off-by: Molly Sophia <[email protected]>

* Remove trailing whitespaces

Signed-off-by: Molly Sophia <[email protected]>

* build_rwkv: Avoid using inplace operations

Signed-off-by: Molly Sophia <[email protected]>

* convert_hf_to_gguf: rwkv: Avoid using ``eval``

Signed-off-by: Molly Sophia <[email protected]>

* convert_hf_to_gguf: rwkv tokenizer: Don't escape sequences manually

Signed-off-by: Molly Sophia <[email protected]>

* Update convert_hf_to_gguf.py

Co-authored-by: compilade <[email protected]>

* ggml: Add backward computation for unary op ``exp``

Signed-off-by: Molly Sophia <[email protected]>

* Update convert_hf_to_gguf.py

Co-authored-by: compilade <[email protected]>

* Update convert_hf_to_gguf.py

Co-authored-by: compilade <[email protected]>

* Use MODEL_ARCH.RWKV6 instead of MODEL_ARCH.RWKV

Signed-off-by: Molly Sophia <[email protected]>

* build_rwkv6: Simplify graph

Signed-off-by: Molly Sophia <[email protected]>

* llama: rwkv6: Detect model.type

Signed-off-by: Molly Sophia <[email protected]>

* llama: rwkv6: Fix tensor loading for 7B/14B models

Signed-off-by: Molly Sophia <[email protected]>

* llama: rwkv6: Fix group_norm assertion failure with Metal

Signed-off-by: Molly Sophia <[email protected]>

* llama: rwkv6: Clean up

Signed-off-by: Molly Sophia <[email protected]>

* llama: rwkv6: Add quantization tensor exclusion

Signed-off-by: Molly Sophia <[email protected]>

* llama: rwkv6: Use the new advanced batch splits

Signed-off-by: Molly Sophia <[email protected]>

* Update src/llama.cpp

Co-authored-by: compilade <[email protected]>

* llama: rwkv6: Use ``ggml_norm`` instead of ``ggml_group_norm``

Co-authored-by: compilade <[email protected]>

* llama: rwkv6: Apply code style and misc changes

Signed-off-by: Molly Sophia <[email protected]>

* converter: Use class name ``Rwkv6Model``

Signed-off-by: Molly Sophia <[email protected]>

* llama: rwkv6: Make use of key ``feed_forward_length``

Signed-off-by: Molly Sophia <[email protected]>

* llama: rwkv6: Add kv ``time_mix_extra_dim`` and ``time_decay_extra_dim``

Signed-off-by: Molly Sophia <[email protected]>

* converter: Match ``new_name`` instead of ``name`` for float32 explicit tensors

Signed-off-by: Molly Sophia <[email protected]>

* llama: rwkv6: Keep ``time_mix_w1/w2`` as F32

Signed-off-by: Molly Sophia <[email protected]>

* llama: rwkv6: Remove unused nodes

Signed-off-by: Molly Sophia <[email protected]>

* llama: rwkv6: Apply code format changes

Signed-off-by: Molly Sophia <[email protected]>

* llama: rwkv6: Add lora for some supported tensors

Currently att.key/receptance/value/gate/output, ffn.receptance/key/value, as well as head.weight

Signed-off-by: Molly Sophia <[email protected]>

* rwkv : speed-up tokenization using trie

* minor : style + indentation

* llama: rwkv6: Avoid division by zero

Co-authored-by: compilade <[email protected]>

* ggml: rwkv_wkv: Avoid copying the state

Signed-off-by: Molly Sophia <[email protected]>

---------

Signed-off-by: Molly Sophia <[email protected]>
Co-authored-by: Layl Bongers <[email protected]>
Co-authored-by: compilade <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>