Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cohere2 architecture issue #1893

Open
MertcanTekin opened this issue Jan 8, 2025 · 0 comments
Open

cohere2 architecture issue #1893

MertcanTekin opened this issue Jan 8, 2025 · 0 comments

Comments

@MertcanTekin
Copy link

MertcanTekin commented Jan 8, 2025

Description

I'm encountering an error when trying to use the CohereForAI/c4ai-command-r7b-12-2024 GGUF model with llama-cpp-python. The error indicates that the cohere2 architecture is not recognized.

Environment

  • Platform: Google Colab
  • Package: llama-cpp-python (via llama-index-llms-llama-cpp)
  • Model: c4ai-command-r7b-12-2024 GGUF version
  • CUDA Version: 12.2

Code

from llama_index.llms.llama_cpp import LlamaCPP

def initialize_llm(model_path, temperature=0.9, max_new_tokens=1000, context_window=8192):
    llm = LlamaCPP(
        model_url=model_path,
        temperature=temperature,
        max_new_tokens=max_new_tokens,
        context_window=context_window,
        messages_to_prompt=messages_to_prompt,
        completion_to_prompt=completion_to_prompt,
        generate_kwargs={
            "min_p": 0.2,
        },
        model_kwargs={
            "n_gpu_layers": -1,
            "n_batch": 512,
            "n_threads": 96,
            "n_ctx": context_window,
        },
        verbose=True
    )
    return llm

llm = initialize_llm("https://huggingface.co/dranger003/c4ai-command-r7b-12-2024-GGUF/resolve/main/ggml-c4ai-command-r7b-12-2024-q4_k.gguf")

Downloading url https://huggingface.co/dranger003/c4ai-command-r7b-12-2024-GGUF/resolve/main/ggml-c4ai-command-r7b-12-2024-q4_k.gguf to path /tmp/llama_index/models/ggml-c4ai-command-r7b-12-2024-q4_k.gguf
total size (MB): 5057.01
4823it [03:30, 22.89it/s]                          
llama_model_loader: loaded meta data with 38 key-value pairs and 258 tensors from /tmp/llama_index/models/ggml-c4ai-command-r7b-12-2024-q4_k.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = cohere2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = C4AI Command R7B
llama_model_loader: - kv   3:                         general.size_label str              = 8.0B
llama_model_loader: - kv   4:                            general.license str              = cc-by-nc-4.0
llama_model_loader: - kv   5:                          general.languages arr[str,23]      = ["en", "fr", "de", "es", "it", "pt", ...
llama_model_loader: - kv   6:                        cohere2.block_count u32              = 32
llama_model_loader: - kv   7:                     cohere2.context_length u32              = 8192
llama_model_loader: - kv   8:                   cohere2.embedding_length u32              = 4096
llama_model_loader: - kv   9:                cohere2.feed_forward_length u32              = 14336
llama_model_loader: - kv  10:               cohere2.attention.head_count u32              = 32
llama_model_loader: - kv  11:            cohere2.attention.head_count_kv u32              = 8
llama_model_loader: - kv  12:                     cohere2.rope.freq_base f32              = 50000.000000
llama_model_loader: - kv  13:       cohere2.attention.layer_norm_epsilon f32              = 0.000010
llama_model_loader: - kv  14:               cohere2.attention.key_length u32              = 128
llama_model_loader: - kv  15:             cohere2.attention.value_length u32              = 128
llama_model_loader: - kv  16:                          general.file_type u32              = 15
llama_model_loader: - kv  17:                        cohere2.logit_scale f32              = 0.250000
llama_model_loader: - kv  18:           cohere2.attention.sliding_window u32              = 4096
llama_model_loader: - kv  19:                         cohere2.vocab_size u32              = 256000
llama_model_loader: - kv  20:               cohere2.rope.dimension_count u32              = 128
llama_model_loader: - kv  21:                  cohere2.rope.scaling.type str              = none
llama_model_loader: - kv  22:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  23:                         tokenizer.ggml.pre str              = command-r
llama_model_loader: - kv  24:                      tokenizer.ggml.tokens arr[str,256000]  = ["<PAD>", "<UNK>", "<CLS>", "<SEP>", ...
llama_model_loader: - kv  25:                  tokenizer.ggml.token_type arr[i32,256000]  = [3, 3, 3, 3, 3, 3, 3, 3, 1, 1, 1, 1, ...
llama_model_loader: - kv  26:                      tokenizer.ggml.merges arr[str,253333]  = ["Ġ Ġ", "Ġ t", "e r", "i n", "Ġ a...
llama_model_loader: - kv  27:                tokenizer.ggml.bos_token_id u32              = 5
llama_model_loader: - kv  28:                tokenizer.ggml.eos_token_id u32              = 255001
llama_model_loader: - kv  29:            tokenizer.ggml.unknown_token_id u32              = 1
llama_model_loader: - kv  30:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  31:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  32:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  33:           tokenizer.chat_template.tool_use str              = {%- macro document_turn(documents) -%...
llama_model_loader: - kv  34:                tokenizer.chat_template.rag str              = {% set tools = [] %}\n{%- macro docume...
llama_model_loader: - kv  35:                   tokenizer.chat_templates arr[str,2]       = ["tool_use", "rag"]
llama_model_loader: - kv  36:                    tokenizer.chat_template str              = {% if documents %}\n{% set tools = [] ...
llama_model_loader: - kv  37:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   33 tensors
llama_model_loader: - type q4_K:  192 tensors
llama_model_loader: - type q6_K:   33 tensors
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'cohere2'
llama_load_model_from_file: failed to load model

Error Message
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'cohere2'
llama_load_model_from_file: failed to load model

ValueError: Failed to load model from file: /tmp/llama_index/models/ggml-c4ai-command-r7b-12-2024-q4_k.gguf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant