-
-
Notifications
You must be signed in to change notification settings - Fork 6.9k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1082 from zigabrencic/docs/open-llm-suport
Support for Open LLMs
- Loading branch information
Showing
5 changed files
with
198 additions
and
6 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
# Test that the Open LLM is running | ||
|
||
First start the server by using only CPU: | ||
|
||
```bash | ||
export model_path="TheBloke/CodeLlama-13B-GGUF/codellama-13b.Q8_0.gguf" | ||
python -m llama_cpp.server --model $model_path | ||
``` | ||
|
||
Or with GPU support (recommended): | ||
|
||
```bash | ||
python -m llama_cpp.server --model TheBloke/CodeLlama-13B-GGUF/codellama-13b.Q8_0.gguf --n_gpu_layers 1 | ||
``` | ||
|
||
If you have more `GPU` layers available set `--n_gpu_layers` to the higher number. | ||
|
||
To find the amount of available run the above command and look for `llm_load_tensors: offloaded 1/41 layers to GPU` in the output. | ||
|
||
## Test API call | ||
|
||
Set the environment variables: | ||
|
||
```bash | ||
export OPENAI_API_BASE="http://localhost:8000/v1" | ||
export OPENAI_API_KEY="sk-xxx" | ||
export MODEL_NAME="CodeLlama" | ||
```` | ||
|
||
Then ping the model via `python` using `OpenAI` API: | ||
|
||
```bash | ||
python examples/open_llms/openai_api_interface.py | ||
``` | ||
|
||
If you're not using `CodeLLama` make sure to change the `MODEL_NAME` parameter. | ||
Or using `curl`: | ||
```bash | ||
curl --request POST \ | ||
--url http://localhost:8000/v1/chat/completions \ | ||
--header "Content-Type: application/json" \ | ||
--data '{ "model": "CodeLlama", "prompt": "Who are you?", "max_tokens": 60}' | ||
``` | ||
If this works also make sure that `langchain` interface works since that's how `gpte` interacts with LLMs. | ||
|
||
## Langchain test | ||
|
||
```bash | ||
export MODEL_NAME="CodeLlama" | ||
python examples/open_llms/langchain_interface.py | ||
``` | ||
|
||
That's it 🤓 time to go back [to](/docs/open_models.md#running-the-example) and give `gpte` a try. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
import os | ||
|
||
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler | ||
from langchain_openai import ChatOpenAI | ||
|
||
model = ChatOpenAI( | ||
model=os.getenv("MODEL_NAME"), | ||
temperature=0.1, | ||
callbacks=[StreamingStdOutCallbackHandler()], | ||
streaming=True, | ||
) | ||
|
||
prompt = ( | ||
"Provide me with only the code for a simple python function that sums two numbers." | ||
) | ||
|
||
model.invoke(prompt) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
import os | ||
|
||
from openai import OpenAI | ||
|
||
client = OpenAI( | ||
base_url=os.getenv("OPENAI_API_BASE"), api_key=os.getenv("OPENAI_API_KEY") | ||
) | ||
|
||
response = client.chat.completions.create( | ||
model=os.getenv("MODEL_NAME"), | ||
messages=[ | ||
{ | ||
"role": "user", | ||
"content": "Provide me with only the code for a simple python function that sums two numbers.", | ||
}, | ||
], | ||
temperature=0.7, | ||
max_tokens=200, | ||
) | ||
|
||
print(response.choices[0].message.content) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters