Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Google Gemma Model #5562

Closed
shreyanshsaha opened this issue Feb 22, 2024 · 13 comments
Closed

Add support for Google Gemma Model #5562

shreyanshsaha opened this issue Feb 22, 2024 · 13 comments
Labels
enhancement New feature or request stale

Comments

@shreyanshsaha
Copy link

Description

There is a new model by google for text generation LLM called Gemma which is based on Gemini AI.
https://ai.google.dev/gemma

The models are present on huggingface: https://huggingface.co/google/gemma-7b-it/tree/main

It would be nice if the tool can be updated to support this new model

@shreyanshsaha shreyanshsaha added the enhancement New feature or request label Feb 22, 2024
@rombodawg
Copy link

rombodawg commented Feb 22, 2024

I second this. We now have Exl2 quants and GGUF quants so we should have support for both in llamacpp-python as well as Exllama loaders

https://huggingface.co/models?sort=trending&search=LoneStriker+%2F+gemma

@mclassen
Copy link

I'm still getting errors with GGUF quants of Gemma

@DerRehberg
Copy link

Same

@AndreyRGW
Copy link

AndreyRGW commented Feb 23, 2024

╭───────────────────────────────────────── Traceback (most recent call last) ──────────────────────────────────────────╮
│ F:\WBC\textwebui\server.py:241 in <module>                                                                           │
│                                                                                                                      │
│   240         # Load the model                                                                                       │
│ ❱ 241         shared.model, shared.tokenizer = load_model(model_name)                                                │
│   242         if shared.args.lora:                                                                                   │
│                                                                                                                      │
│ F:\WBC\textwebui\modules\models.py:87 in load_model                                                                  │
│                                                                                                                      │
│    86     shared.args.loader = loader                                                                                │
│ ❱  87     output = load_func_map[loader](model_name)                                                                 │
│    88     if type(output) is tuple:                                                                                  │
│                                                                                                                      │
│ F:\WBC\textwebui\modules\models.py:250 in llamacpp_loader                                                            │
│                                                                                                                      │
│   249     logger.info(f"llama.cpp weights detected: \"{model_file}\"")                                               │
│ ❱ 250     model, tokenizer = LlamaCppModel.from_pretrained(model_file)                                               │
│   251     return model, tokenizer                                                                                    │
│                                                                                                                      │
│ F:\WBC\textwebui\modules\llamacpp_model.py:102 in from_pretrained                                                    │
│                                                                                                                      │
│   101                                                                                                                │
│ ❱ 102         result.model = Llama(**params)                                                                         │
│   103         if cache_capacity > 0:                                                                                 │
│                                                                                                                      │
│ F:\WBC\textwebui\installer_files\env\Lib\site-packages\llama_cpp_cuda\llama.py:300 in __init__                       │
│                                                                                                                      │
│    299                                                                                                               │
│ ❱  300         self._model = _LlamaModel(                                                                            │
│    301             path_model=self.model_path, params=self.model_params, verbose=self.verbose                        │
│                                                                                                                      │
│ F:\WBC\textwebui\installer_files\env\Lib\site-packages\llama_cpp_cuda\_internals.py:50 in __init__                   │
│                                                                                                                      │
│    49                                                                                                                │
│ ❱  50         self.model = llama_cpp.llama_load_model_from_file(                                                     │
│    51             self.path_model.encode("utf-8"), self.params                                                       │
│                                                                                                                      │
│ F:\WBC\textwebui\installer_files\env\Lib\site-packages\llama_cpp_cuda\llama_cpp.py:728 in llama_load_model_from_file │
│                                                                                                                      │
│    727 ) -> llama_model_p:                                                                                           │
│ ❱  728     return _lib.llama_load_model_from_file(path_model, params)                                                │
│    729                                                                                                               │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
OSError: exception: access violation reading 0x0000000000000000
Exception ignored in: <function LlamaCppModel.__del__ at 0x0000000032D5E7A0>
Traceback (most recent call last):
  File "F:\WBC\textwebui\modules\llamacpp_model.py", line 58, in __del__
    del self.model
        ^^^^^^^^^^
AttributeError: 'LlamaCppModel' object has no attribute 'model'

LoneStriker/gemma-2b-GGUF

upd: working on dev branch

@rombodawg
Copy link

Now we just need to add support for finetuned/merged gemma models which arent working. Follow the mulit-thread. And check out my model for debugging.

Thread links:
lmstudio-ai/configs#21
ggerganov/llama.cpp#5706
arcee-ai/mergekit#181

Model:
https://huggingface.co/rombodawg/Gemme-Merge-Test-7b

@safadfadf
Copy link

Same

@wangfeng35
Copy link

also needs support for qwen1.5 models

@shaktisd
Copy link

shaktisd commented Mar 2, 2024

any updates on this ?

@TheOneTrueNiz
Copy link

Wondering the same for Gemma-7B

@ruizcrp
Copy link

ruizcrp commented Apr 12, 2024

CodeGemma just out. Did anyone try already?

@github-actions github-actions bot added the stale label Jun 11, 2024
Copy link

This issue has been closed due to inactivity for 2 months. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

@Mark-Tomlinson
Copy link

Gemma2-2b-IT is out, and I'd love to try it. Any support for Gemma yet?

llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma2'
llama_load_model_from_file: failed to load model
14:34:21-436521 ERROR    Failed to load the model.
Traceback (most recent call last):
  File "D:\text-generation-webui\modules\ui_model_menu.py", line 231, in load_model_wrapper
    shared.model, shared.tokenizer = load_model(selected_model, loader)
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\text-generation-webui\modules\models.py", line 93, in load_model
    output = load_func_map[loader](model_name)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\text-generation-webui\modules\models.py", line 274, in llamacpp_loader
    model, tokenizer = LlamaCppModel.from_pretrained(model_file)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\text-generation-webui\modules\llamacpp_model.py", line 85, in from_pretrained
    result.model = Llama(**params)
                   ^^^^^^^^^^^^^^^
  File "D:\text-generation-webui\installer_files\env\Lib\site-packages\llama_cpp_cuda_tensorcores\llama.py", line 323, in __init__
    self._model = _LlamaModel(
                  ^^^^^^^^^^^^
  File "D:\text-generation-webui\installer_files\env\Lib\site-packages\llama_cpp_cuda_tensorcores\_internals.py", line 55, in __init__
    raise ValueError(f"Failed to load model from file: {path_model}")
ValueError: Failed to load model from file: models\gemma-2-2b-it-GGUF\gemma-2-2b-it-Q8_0.gguf

Exception ignored in: <function LlamaCppModel.__del__ at 0x000001C898492340>
Traceback (most recent call last):
  File "D:\text-generation-webui\modules\llamacpp_model.py", line 33, in __del__
    del self.model
        ^^^^^^^^^^
AttributeError: 'LlamaCppModel' object has no attribute 'model'

@TheOneTrueNiz
Copy link

TheOneTrueNiz commented Aug 12, 2024

Use the transformers model loader. Gemma 2 27B loads and generates, just slow. I'm running dual 4090s. Roughly 90 seconds to generate and output a response.. Also, previous Gemma models load with the exLlamaV2_HF loader if anyone was curious.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request stale
Projects
None yet
Development

No branches or pull requests