-
-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for sending images into OpenAI chat API #4827
Conversation
Could you create a simple curl command with an example for testing purposes? |
Ofc, I'm planning to test this Friday. Thursday is a busy day for me |
Works! (note: my api port is 5001 for compat) curl http://127.0.0.1:5001/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [
{
"role": "user",
"image_url": "https://avatars.githubusercontent.com/u/112222186?v=4"
},
{
"role": "user",
"content": "What is unusual about this image?"
}
],
"mode": "chat",
"character": "Example"
}' Raw response: {"id":"chatcmpl-1702041177702761984","object":"chat.completions","created":1702041177,"model":"TheBloke_llava-v1.5-13B-GPTQ","choices":[{"index":0,"finish_reason":"length","message":{"role":"assistant","content":"Well, for one thing there is a computer engineer wearing a blue shirt with a frog on one should. And then, there's an actual frog strapped to the other shoulder, which makes this sight look unusual."}}],"usage":{"prompt_tokens":30216,"completion_tokens":49,"total_tokens":30265}} Extracted message:
|
Base64 works too import base64
import json
import requests
img = open('image.jpg', 'rb')
img_bytes = img.read()
img_base64 = base64.b64encode(img_bytes).decode('utf-8')
data = { "messages": [
{
"role": "user",
"image_url": f"data:image/jpeg;base64,{img_base64}"
},
{
"role": "user",
"content": "what is unusual about this image?"
}
],
"mode": "chat",
"character": "Example"
}
response = requests.post('http://127.0.0.1:5001/v1/chat/completions', json=data)
print(response.text) Raw response: Extracted message:
|
@oobabooga are you ready to test/merge this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great! Any reason to wait for this to get merged?
Sorry for taking so long to review! The PR is perfect and I'm impressed that you managed to add multimodal functionality to the API with so few added lines. Well done. I used these commands for testing:
@kabachuha a related open problem is that multimodal doesn't work with the llama.cpp loader at the moment. For transformers, the extension gets the embeddings and does the whole process manually; for llama.cpp it may be doable by adapting the following code in llama-cpp-python somehow: https://github.com/abetlen/llama-cpp-python?tab=readme-ov-file#multi-modal-models For llamacpp_HF it would be necessary to get the correct embeddings method from https://llama-cpp-python.readthedocs.io/en/latest/api-reference/ and attach it to the llamacpp_HF class. Both of these are very difficult but if you are interested in digging into that, that would be welcome. |
I am trying to test this and I feel the models I've tried ignore the images. I have tried FuYu-8B such as So I tried with the ones you mentioned, and I can't manage to get them running, how do you do so?
Edit: Solved it with PR #5038 , there is a way to solving this modifying modules/models.py, probably will be fixed soon |
This PR aims to handle 'image_url's (base64 or remote file) supplied to the messages history if a user want to use GPT-Vision-like features by converting it to base64 html tags supported by the multimodal extension
closes #4603
Checklist: