You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
i just had to make one change in the scipt to upgrade onnx opset version from 13 to 14. but now im seeing error
"[E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running RotaryEmbedding node. Name:'RotaryEmbedding_0' Status Message: Input 'x' is expected to have 3 dimensions, got 4
Exception in thread Thread-5 (generate):
Traceback (most recent call last):
File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/usr/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/home/kainat/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/kainat/.local/lib/python3.10/site-packages/transformers/generation/utils.py", line 1764, in generate
return self.sample(
File "/home/kainat/.local/lib/python3.10/site-packages/transformers/generation/utils.py", line 2861, in sample
outputs = self(
File "/home/kainat/.local/lib/python3.10/site-packages/optimum/modeling_base.py", line 90, in call
return self.forward(*args, **kwargs)
File "/home/kainat/.local/lib/python3.10/site-packages/optimum/onnxruntime/modeling_decoder.py", line 255, in forward
self.model.run_with_iobinding(io_binding)
File "/home/kainat/.local/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 331, in run_with_iobinding
self._sess.run_with_iobinding(iobinding._iobinding, run_options)
RuntimeError: Error in execution: Non-zero status code returned while running RotaryEmbedding node. Name:'RotaryEmbedding_0' Status Message: Input 'x' is expected to have 3 dimensions, got 4"
on code
"from transformers import LlamaConfig, LlamaTokenizer
from optimum.onnxruntime import ORTModelForCausalLM
import torch
# User settings
model_name = "riazk/llama2-13b-merged-peft_kk"
onnx_model_dir = "./onnxruntime/onnxruntime/python/tools/transformers/llama2-13b-merged-int4-gpu/"
cache_dir = "./onnxruntime/onnxruntime/python/tools/transformers/model_cache"
device_id = 0
device = torch.device(f"cuda:{device_id}") # Change to torch.device("cpu") if running on CPU
ep = "CUDAExecutionProvider" # change to CPUExecutionProvider if running on CPU
ep_options = {"device_id": device_id}
prompt = ["ONNX Runtime is ", "I want to book a vacation to Hawaii. First, I need to ", "A good workout routine is ", "How are astronauts launched into space? "]
max_length = 64 # max(prompt length + generation length)
config = LlamaConfig.from_pretrained(model_name, use_auth_token=True, cache_dir=cache_dir)
config.save_pretrained(onnx_model_dir) # Save config file in ONNX model directory
tokenizer = LlamaTokenizer.from_pretrained(model_name, use_auth_token=True, cache_dir=cache_dir)
tokenizer.pad_token = "[PAD]"
model = ORTModelForCausalLM.from_pretrained(
onnx_model_dir,
use_auth_token=True,
use_io_binding=True,
provider=ep,
provider_options={"device_id": device_id} # comment out if running on CPU
)
# inputs = tokenizer(prompt, return_tensors="pt", padding=True).to(device)
print("-------------")
generate_ids = model.generate(**inputs, do_sample=False, max_length=max_length)
transcription = tokenizer.batch_decode(generate_ids, skip_special_tokens=True)
print(transcription)
print("-------------")"
i have tried this converting and inference script for llama-2-13b-hf model and it worked perfectly. so I was wondering is this error is because converting script is only optimized for llama model and not llama "chat" model?
or If there is any solution for the error im facing?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I have converted llama-2-13b-chat-hf model into onnx model using this script "(https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/python/tools/transformers/models/llama)
/convert_to_onnx.py"
i just had to make one change in the scipt to upgrade onnx opset version from 13 to 14. but now im seeing error
on code
i have tried this converting and inference script for llama-2-13b-hf model and it worked perfectly. so I was wondering is this error is because converting script is only optimized for llama model and not llama "chat" model?
or If there is any solution for the error im facing?
Beta Was this translation helpful? Give feedback.
All reactions