cohere2 architecture issue #1893

MertcanTekin · 2025-01-08T19:40:14Z

Description

I'm encountering an error when trying to use the CohereForAI/c4ai-command-r7b-12-2024 GGUF model with llama-cpp-python. The error indicates that the cohere2 architecture is not recognized.

Environment

Platform: Google Colab
Package: llama-cpp-python (via llama-index-llms-llama-cpp)
Model: c4ai-command-r7b-12-2024 GGUF version
CUDA Version: 12.2

Code

from llama_index.llms.llama_cpp import LlamaCPP

def initialize_llm(model_path, temperature=0.9, max_new_tokens=1000, context_window=8192):
    llm = LlamaCPP(
        model_url=model_path,
        temperature=temperature,
        max_new_tokens=max_new_tokens,
        context_window=context_window,
        messages_to_prompt=messages_to_prompt,
        completion_to_prompt=completion_to_prompt,
        generate_kwargs={
            "min_p": 0.2,
        },
        model_kwargs={
            "n_gpu_layers": -1,
            "n_batch": 512,
            "n_threads": 96,
            "n_ctx": context_window,
        },
        verbose=True
    )
    return llm

llm = initialize_llm("https://huggingface.co/dranger003/c4ai-command-r7b-12-2024-GGUF/resolve/main/ggml-c4ai-command-r7b-12-2024-q4_k.gguf")

Downloading url https://huggingface.co/dranger003/c4ai-command-r7b-12-2024-GGUF/resolve/main/ggml-c4ai-command-r7b-12-2024-q4_k.gguf to path /tmp/llama_index/models/ggml-c4ai-command-r7b-12-2024-q4_k.gguf
total size (MB): 5057.01
4823it [03:30, 22.89it/s]                          
llama_model_loader: loaded meta data with 38 key-value pairs and 258 tensors from /tmp/llama_index/models/ggml-c4ai-command-r7b-12-2024-q4_k.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = cohere2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = C4AI Command R7B
llama_model_loader: - kv   3:                         general.size_label str              = 8.0B
llama_model_loader: - kv   4:                            general.license str              = cc-by-nc-4.0
llama_model_loader: - kv   5:                          general.languages arr[str,23]      = ["en", "fr", "de", "es", "it", "pt", ...
llama_model_loader: - kv   6:                        cohere2.block_count u32              = 32
llama_model_loader: - kv   7:                     cohere2.context_length u32              = 8192
llama_model_loader: - kv   8:                   cohere2.embedding_length u32              = 4096
llama_model_loader: - kv   9:                cohere2.feed_forward_length u32              = 14336
llama_model_loader: - kv  10:               cohere2.attention.head_count u32              = 32
llama_model_loader: - kv  11:            cohere2.attention.head_count_kv u32              = 8
llama_model_loader: - kv  12:                     cohere2.rope.freq_base f32              = 50000.000000
llama_model_loader: - kv  13:       cohere2.attention.layer_norm_epsilon f32              = 0.000010
llama_model_loader: - kv  14:               cohere2.attention.key_length u32              = 128
llama_model_loader: - kv  15:             cohere2.attention.value_length u32              = 128
llama_model_loader: - kv  16:                          general.file_type u32              = 15
llama_model_loader: - kv  17:                        cohere2.logit_scale f32              = 0.250000
llama_model_loader: - kv  18:           cohere2.attention.sliding_window u32              = 4096
llama_model_loader: - kv  19:                         cohere2.vocab_size u32              = 256000
llama_model_loader: - kv  20:               cohere2.rope.dimension_count u32              = 128
llama_model_loader: - kv  21:                  cohere2.rope.scaling.type str              = none
llama_model_loader: - kv  22:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  23:                         tokenizer.ggml.pre str              = command-r
llama_model_loader: - kv  24:                      tokenizer.ggml.tokens arr[str,256000]  = ["<PAD>", "<UNK>", "<CLS>", "<SEP>", ...
llama_model_loader: - kv  25:                  tokenizer.ggml.token_type arr[i32,256000]  = [3, 3, 3, 3, 3, 3, 3, 3, 1, 1, 1, 1, ...
llama_model_loader: - kv  26:                      tokenizer.ggml.merges arr[str,253333]  = ["Ġ Ġ", "Ġ t", "e r", "i n", "Ġ a...
llama_model_loader: - kv  27:                tokenizer.ggml.bos_token_id u32              = 5
llama_model_loader: - kv  28:                tokenizer.ggml.eos_token_id u32              = 255001
llama_model_loader: - kv  29:            tokenizer.ggml.unknown_token_id u32              = 1
llama_model_loader: - kv  30:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  31:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  32:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  33:           tokenizer.chat_template.tool_use str              = {%- macro document_turn(documents) -%...
llama_model_loader: - kv  34:                tokenizer.chat_template.rag str              = {% set tools = [] %}\n{%- macro docume...
llama_model_loader: - kv  35:                   tokenizer.chat_templates arr[str,2]       = ["tool_use", "rag"]
llama_model_loader: - kv  36:                    tokenizer.chat_template str              = {% if documents %}\n{% set tools = [] ...
llama_model_loader: - kv  37:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   33 tensors
llama_model_loader: - type q4_K:  192 tensors
llama_model_loader: - type q6_K:   33 tensors
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'cohere2'
llama_load_model_from_file: failed to load model

Error Message
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'cohere2'
llama_load_model_from_file: failed to load model

ValueError: Failed to load model from file: /tmp/llama_index/models/ggml-c4ai-command-r7b-12-2024-q4_k.gguf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cohere2 architecture issue #1893

cohere2 architecture issue #1893

MertcanTekin commented Jan 8, 2025 •

edited

Loading

cohere2 architecture issue #1893

cohere2 architecture issue #1893

Comments

MertcanTekin commented Jan 8, 2025 • edited Loading

Description

Environment

Code

MertcanTekin commented Jan 8, 2025 •

edited

Loading