Add support for Google Gemma Model #5562

shreyanshsaha · 2024-02-22T12:30:09Z

Description

There is a new model by google for text generation LLM called Gemma which is based on Gemini AI.
https://ai.google.dev/gemma

The models are present on huggingface: https://huggingface.co/google/gemma-7b-it/tree/main

It would be nice if the tool can be updated to support this new model

rombodawg · 2024-02-22T17:11:05Z

I second this. We now have Exl2 quants and GGUF quants so we should have support for both in llamacpp-python as well as Exllama loaders

https://huggingface.co/models?sort=trending&search=LoneStriker+%2F+gemma

mclassen · 2024-02-22T20:16:23Z

I'm still getting errors with GGUF quants of Gemma

DerRehberg · 2024-02-23T11:53:57Z

Same

AndreyRGW · 2024-02-23T23:50:26Z

╭───────────────────────────────────────── Traceback (most recent call last) ──────────────────────────────────────────╮
│ F:\WBC\textwebui\server.py:241 in <module>                                                                           │
│                                                                                                                      │
│   240         # Load the model                                                                                       │
│ ❱ 241         shared.model, shared.tokenizer = load_model(model_name)                                                │
│   242         if shared.args.lora:                                                                                   │
│                                                                                                                      │
│ F:\WBC\textwebui\modules\models.py:87 in load_model                                                                  │
│                                                                                                                      │
│    86     shared.args.loader = loader                                                                                │
│ ❱  87     output = load_func_map[loader](model_name)                                                                 │
│    88     if type(output) is tuple:                                                                                  │
│                                                                                                                      │
│ F:\WBC\textwebui\modules\models.py:250 in llamacpp_loader                                                            │
│                                                                                                                      │
│   249     logger.info(f"llama.cpp weights detected: \"{model_file}\"")                                               │
│ ❱ 250     model, tokenizer = LlamaCppModel.from_pretrained(model_file)                                               │
│   251     return model, tokenizer                                                                                    │
│                                                                                                                      │
│ F:\WBC\textwebui\modules\llamacpp_model.py:102 in from_pretrained                                                    │
│                                                                                                                      │
│   101                                                                                                                │
│ ❱ 102         result.model = Llama(**params)                                                                         │
│   103         if cache_capacity > 0:                                                                                 │
│                                                                                                                      │
│ F:\WBC\textwebui\installer_files\env\Lib\site-packages\llama_cpp_cuda\llama.py:300 in __init__                       │
│                                                                                                                      │
│    299                                                                                                               │
│ ❱  300         self._model = _LlamaModel(                                                                            │
│    301             path_model=self.model_path, params=self.model_params, verbose=self.verbose                        │
│                                                                                                                      │
│ F:\WBC\textwebui\installer_files\env\Lib\site-packages\llama_cpp_cuda\_internals.py:50 in __init__                   │
│                                                                                                                      │
│    49                                                                                                                │
│ ❱  50         self.model = llama_cpp.llama_load_model_from_file(                                                     │
│    51             self.path_model.encode("utf-8"), self.params                                                       │
│                                                                                                                      │
│ F:\WBC\textwebui\installer_files\env\Lib\site-packages\llama_cpp_cuda\llama_cpp.py:728 in llama_load_model_from_file │
│                                                                                                                      │
│    727 ) -> llama_model_p:                                                                                           │
│ ❱  728     return _lib.llama_load_model_from_file(path_model, params)                                                │
│    729                                                                                                               │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
OSError: exception: access violation reading 0x0000000000000000
Exception ignored in: <function LlamaCppModel.__del__ at 0x0000000032D5E7A0>
Traceback (most recent call last):
  File "F:\WBC\textwebui\modules\llamacpp_model.py", line 58, in __del__
    del self.model
        ^^^^^^^^^^
AttributeError: 'LlamaCppModel' object has no attribute 'model'

LoneStriker/gemma-2b-GGUF

upd: working on dev branch

rombodawg · 2024-02-26T06:05:05Z

Now we just need to add support for finetuned/merged gemma models which arent working. Follow the mulit-thread. And check out my model for debugging.

Thread links:
lmstudio-ai/configs#21
ggerganov/llama.cpp#5706
arcee-ai/mergekit#181

Model:
https://huggingface.co/rombodawg/Gemme-Merge-Test-7b

safadfadf · 2024-02-27T15:28:01Z

Same

wangfeng35 · 2024-02-28T05:07:04Z

also needs support for qwen1.5 models

shaktisd · 2024-03-02T17:27:37Z

any updates on this ?

TheOneTrueNiz · 2024-03-03T01:41:08Z

Wondering the same for Gemma-7B

ruizcrp · 2024-04-12T13:10:28Z

CodeGemma just out. Did anyone try already?

github-actions · 2024-06-11T23:16:36Z

This issue has been closed due to inactivity for 2 months. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

Mark-Tomlinson · 2024-08-07T19:39:23Z

Gemma2-2b-IT is out, and I'd love to try it. Any support for Gemma yet?

llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma2'
llama_load_model_from_file: failed to load model
14:34:21-436521 ERROR    Failed to load the model.
Traceback (most recent call last):
  File "D:\text-generation-webui\modules\ui_model_menu.py", line 231, in load_model_wrapper
    shared.model, shared.tokenizer = load_model(selected_model, loader)
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\text-generation-webui\modules\models.py", line 93, in load_model
    output = load_func_map[loader](model_name)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\text-generation-webui\modules\models.py", line 274, in llamacpp_loader
    model, tokenizer = LlamaCppModel.from_pretrained(model_file)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\text-generation-webui\modules\llamacpp_model.py", line 85, in from_pretrained
    result.model = Llama(**params)
                   ^^^^^^^^^^^^^^^
  File "D:\text-generation-webui\installer_files\env\Lib\site-packages\llama_cpp_cuda_tensorcores\llama.py", line 323, in __init__
    self._model = _LlamaModel(
                  ^^^^^^^^^^^^
  File "D:\text-generation-webui\installer_files\env\Lib\site-packages\llama_cpp_cuda_tensorcores\_internals.py", line 55, in __init__
    raise ValueError(f"Failed to load model from file: {path_model}")
ValueError: Failed to load model from file: models\gemma-2-2b-it-GGUF\gemma-2-2b-it-Q8_0.gguf

Exception ignored in: <function LlamaCppModel.__del__ at 0x000001C898492340>
Traceback (most recent call last):
  File "D:\text-generation-webui\modules\llamacpp_model.py", line 33, in __del__
    del self.model
        ^^^^^^^^^^
AttributeError: 'LlamaCppModel' object has no attribute 'model'

TheOneTrueNiz · 2024-08-12T12:39:33Z

Use the transformers model loader. Gemma 2 27B loads and generates, just slow. I'm running dual 4090s. Roughly 90 seconds to generate and output a response.. Also, previous Gemma models load with the exLlamaV2_HF loader if anyone was curious.

shreyanshsaha added the enhancement New feature or request label Feb 22, 2024

github-actions bot added the stale label Jun 11, 2024

github-actions bot closed this as completed Jun 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Google Gemma Model #5562

Add support for Google Gemma Model #5562

shreyanshsaha commented Feb 22, 2024

rombodawg commented Feb 22, 2024 •

edited

Loading

mclassen commented Feb 22, 2024

DerRehberg commented Feb 23, 2024

AndreyRGW commented Feb 23, 2024 •

edited

Loading

rombodawg commented Feb 26, 2024

safadfadf commented Feb 27, 2024

wangfeng35 commented Feb 28, 2024

shaktisd commented Mar 2, 2024

TheOneTrueNiz commented Mar 3, 2024

ruizcrp commented Apr 12, 2024

github-actions bot commented Jun 11, 2024

Mark-Tomlinson commented Aug 7, 2024

TheOneTrueNiz commented Aug 12, 2024 •

edited

Loading

Add support for Google Gemma Model #5562

Add support for Google Gemma Model #5562

Comments

shreyanshsaha commented Feb 22, 2024

rombodawg commented Feb 22, 2024 • edited Loading

mclassen commented Feb 22, 2024

DerRehberg commented Feb 23, 2024

AndreyRGW commented Feb 23, 2024 • edited Loading

rombodawg commented Feb 26, 2024

safadfadf commented Feb 27, 2024

wangfeng35 commented Feb 28, 2024

shaktisd commented Mar 2, 2024

TheOneTrueNiz commented Mar 3, 2024

ruizcrp commented Apr 12, 2024

github-actions bot commented Jun 11, 2024

Mark-Tomlinson commented Aug 7, 2024

TheOneTrueNiz commented Aug 12, 2024 • edited Loading

rombodawg commented Feb 22, 2024 •

edited

Loading

AndreyRGW commented Feb 23, 2024 •

edited

Loading

TheOneTrueNiz commented Aug 12, 2024 •

edited

Loading