Local Models integration #27

TheWhiteWord · 2023-09-08T13:38:36Z

Hey Devs,
let me start by saying that this programme is great. Well done on your work, and thanks for sharing it.

My question is: is there any plan to allow for the integration of local models?
Even a just section in the documentation would be great.

Have a good day
theWW

andraz · 2023-09-08T13:52:23Z

+1 to this question

It makes no sense to shovel money into some closed source while we have a powerful GPU that can run 13b Llama with no problem with some of the other open source projects.

thedualspace · 2023-09-08T18:27:07Z

I'd also be very eager to use local models with ChatDev, Llama based models show great promise

j-loquat · 2023-09-08T20:00:46Z

Local model use and perhaps as a more advanced feature, assign different models to different agents in the company - so could use a local python-optimized model for an engineer, and a llama2 model for the CEO, etc.

TheWhiteWord · 2023-09-08T20:56:39Z

@j-loquat I love that idea. That is a thing i was considering more and more. Ai becoming more and more like greek gods, each with its characther and function that complete each other.. it was the original vision of Altman too, kind of, but they lost their way

andraz · 2023-09-08T21:15:23Z

No need to have 1 "god AGI" (which can not be ran locally as it demands crazy hardware) if we can have 20 agents with 20 different local narrow AI models that can be loaded one after another.

TheWhiteWord · 2023-09-08T21:43:22Z

Oh god, sorry Devs but this conversation is too interesting. You may need to turn notifications off XD

I was trained as an artist, and the first thing to know is that limitations are the generator of creativity. A big Ai with all the knowledge of the world may just become the most boring thing to touch the planet. And this may be controversial, but I think that bad qualities are needed too...everything has its meaning and use in order to create balance. Just my opinion

hemangjoshi37a · 2023-09-09T08:39:35Z

This has been referenced in #33

starkdmi · 2023-09-11T09:54:11Z

Install LocalAI - OpenAI compatible server.
Create new model config file named gpt-3.5-turbo-16k.yaml and set the model name to gpt-3.5-turbo-16k-0613.
Start LocalAI server locally and run:

OPENAI_API_BASE=http://127.0.0.1:8000/v1 OPENAI_API_KEY="dummy" python run.py --task "Snake game in pure html" --name "WebSnake"

andraz · 2023-09-13T17:19:38Z

The command above did not work in Anaconda Prompt, but this version did:

(chatdev_conda_env) C:\chatdev>set OPENAI_API_BASE=http://127.0.0.1:5001/v1

(chatdev_conda_env) C:\chatdev>set OPENAI_API_KEY=123456

(chatdev_conda_env) C:\chatdev>python run.py --task "Hello world in python" --name "HelloWorld"
**[Preprocessing]**

**ChatDev Starts** (20230913191808)

**Timestamp**: 20230913191808

**config_path**: C:\chatdev\CompanyConfig\Default\ChatChainConfig.json

**config_phase_path**: C:\chatdev\CompanyConfig\Default\PhaseConfig.json

**config_role_path**: C:\chatdev\CompanyConfig\Default\RoleConfig.json

**task_prompt**: Hello world in python

**project_name**: HelloWorld

**Log File**: C:\chatdev\WareHouse\HelloWorld_DefaultOrganization_20230913191808.log

**ChatDevConfig**:
 ChatEnvConfig.clear_structure: True
ChatEnvConfig.brainstorming: False


**ChatGPTConfig**:
 ChatGPTConfig(temperature=0.2, top_p=1.0, n=1, stream=False, stop=None, max_tokens=None, presence_penalty=0.0, frequency_penalty=0.0, logit_bias={}, user='')

I am having a problem using it with local api:

It looks like all that the API returns is 1 token:

Text-generation-webui side:

llm_load_print_meta: model size     = 13.02 B
llm_load_print_meta: general.name   = openassistant_llama2-13b-orca-8k-3319
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: LF token  = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.12 MB
llm_load_tensors: using CUDA for GPU acceleration
llm_load_tensors: mem required  =  128.35 MB (+ 1600.00 MB per state)
llm_load_tensors: offloading 40 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloading v cache to GPU
llm_load_tensors: offloading k cache to GPU
llm_load_tensors: offloaded 43/43 layers to GPU
llm_load_tensors: VRAM used: 11656 MB
...................................................................................................
llama_new_context_with_model: kv self size  = 1600.00 MB
llama_new_context_with_model: compute buffer total size =  191.47 MB
llama_new_context_with_model: VRAM scratch buffer: 190.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 |
2023-09-13 19:11:27 INFO:Loaded the model in 7.52 seconds.

Warning: $This model maximum context length is 2048 tokens. However, your messages resulted in over 498 tokens and max_tokens is 15937.

llama_print_timings:        load time =   955.37 ms
llama_print_timings:      sample time =     0.23 ms /     1 runs   (    0.23 ms per token,  4424.78 tokens per second)
llama_print_timings: prompt eval time =   955.31 ms /   498 tokens (    1.92 ms per token,   521.30 tokens per second)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings:       total time =   957.88 ms
Output generated in 1.41 seconds (0.00 tokens/s, 0 tokens, context 498, seed 1828391196)
127.0.0.1 - - [13/Sep/2023 19:11:42] "POST /v1/chat/completions HTTP/1.1" 200 -
Warning: $This model maximum context length is 2048 tokens. However, your messages resulted in over 551 tokens and max_tokens is 15885.
Llama.generate: prefix-match hit

llama_print_timings:        load time =   955.37 ms
llama_print_timings:      sample time =     0.16 ms /     1 runs   (    0.16 ms per token,  6410.26 tokens per second)
llama_print_timings: prompt eval time =   835.88 ms /   489 tokens (    1.71 ms per token,   585.01 tokens per second)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings:       total time =   836.73 ms
Output generated in 1.24 seconds (0.00 tokens/s, 0 tokens, context 551, seed 192786861)
127.0.0.1 - - [13/Sep/2023 19:11:46] "POST /v1/chat/completions HTTP/1.1" 200 -
Warning: $This model maximum context length is 2048 tokens. However, your messages resulted in over 521 tokens and max_tokens is 15907.
Llama.generate: prefix-match hit

llama_print_timings:        load time =   955.37 ms
llama_print_timings:      sample time =     0.13 ms /     1 runs   (    0.13 ms per token,  7633.59 tokens per second)
llama_print_timings: prompt eval time =   884.39 ms /   459 tokens (    1.93 ms per token,   519.00 tokens per second)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings:       total time =   885.63 ms
Output generated in 1.26 seconds (0.00 tokens/s, 0 tokens, context 521, seed 1288396660)
127.0.0.1 - - [13/Sep/2023 19:11:53] "POST /v1/chat/completions HTTP/1.1" 200 -
Warning: $This model maximum context length is 2048 tokens. However, your messages resulted in over 574 tokens and max_tokens is 15854.
Llama.generate: prefix-match hit

ChatDev side



Chief Executive Officer: **Chief Product Officer<->Chief Executive Officer on : DemandAnalysis, turn 0**

[ChatDev is a software company powered by multiple intelligent agents, such as chief executive officer, chief human resources officer, chief product officer, chief technology officer, etc, with a multi-agent organizational structure and the mission of "changing the digital world through programming".
You are Chief Product Officer. we are both working at ChatDev. We share a common interest in collaborating to successfully complete a task assigned by a new customer.
You are responsible for all product-related matters in ChatDev. Usually includes product design, product strategy, product vision, product innovation, project management and product marketing.
Here is a new customer's task: Hello world in python.
To complete the task, you must write a response that appropriately solves the requested instruction based on your expertise and customer's needs.]



**[OpenAI_Usage_Info Receive]**
prompt_tokens: 521
completion_tokens: 1
total_tokens: 522


**[OpenAI_Usage_Info Receive]**
prompt_tokens: 574
completion_tokens: 1
total_tokens: 575


Chief Product Officer: **Chief Product Officer<->Chief Executive Officer on : DemandAnalysis, turn 1**

[ChatDev is a software company powered by multiple intelligent agents, such as chief executive officer, chief human resources officer, chief product officer, chief technology officer, etc, with a multi-agent organizational structure and the mission of "changing the digital world through programming".
You are Chief Executive Officer. Now, we are both working at ChatDev and we share a common interest in collaborating to successfully complete a task assigned by a new customer.
Your main responsibilities include being an active decision-maker on users' demands and other key policy issues, leader, manager, and executor. Your decision-making role involves high-level decisions about policy and strategy; and your communicator role can involve speaking to the organization's management and employees.
Here is a new customer's task: Hello world in python.
To complete the task, I will give you one or more instructions, and you must help me to write a specific solution that appropriately solves the requested instruction based on your expertise and my needs.]



Chief Executive Officer: **Chief Product Officer<->Chief Executive Officer on : DemandAnalysis, turn 1**

[ChatDev is a software company powered by multiple intelligent agents, such as chief executive officer, chief human resources officer, chief product officer, chief technology officer, etc, with a multi-agent organizational structure and the mission of "changing the digital world through programming".
You are Chief Product Officer. we are both working at ChatDev. We share a common interest in collaborating to successfully complete a task assigned by a new customer.
You are responsible for all product-related matters in ChatDev. Usually includes product design, product strategy, product vision, product innovation, project management and product marketing.
Here is a new customer's task: Hello world in python.
To complete the task, you must write a response that appropriately solves the requested instruction based on your expertise and customer's needs.]



**[OpenAI_Usage_Info Receive]**
prompt_tokens: 544
completion_tokens: 1
total_tokens: 545

starkdmi · 2023-09-13T19:30:56Z

Yeah, the command above was for macOS, no troubles with conda environment here.

@andraz, why don't you increase the context to 4K or 8K tokens? Based on your model name it support context up to 8K tokens.

Warning: $This model maximum context length is 2048 tokens. However, your messages resulted in over 521 tokens and max_tokens is 15907.

As for one token response I guess it's streaming feature, so you don't need to wait for a full response.

xkaraman · 2023-09-20T13:33:11Z

Hello there,
I am trying to use llama-2-7B version as described above.

I created a new yaml file with name gpt-3.5-turbo-16k.yaml and set the model name to gpt-3.5-turbo-16k-0613. Then on model used I downloaded and use one of the hugginface model library `llama-2*.bin' models.

I can successfully run it and receive answers to my questions as part of the returning object via curl but also says that
"usage":"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}

When then I try to run chatdev on a simple task ie python run.py --task "Hello world in python" --name "HelloWorld", chatdev prints the start up prompts and then receives no objects from the local llm with continuous empty usage logs

...
Note that we must ONLY discuss the product modality and do not discuss anything else! Once we all have expressed our opinion(s) and agree with the results of the discussion unanimously, any of us must actively terminate the discussion by replying with only one line, which starts with a single word <INFO>, followed by our final product modality without any other words, e.g., "<INFO> PowerPoint".

**[OpenAI_Usage_Info Receive]**
prompt_tokens: 0
completion_tokens: 0
total_tokens: 0


**[OpenAI_Usage_Info Receive]**
prompt_tokens: 0
completion_tokens: 0
total_tokens: 0


**[OpenAI_Usage_Info Receive]**
prompt_tokens: 0
completion_tokens: 0
total_tokens: 0

After 3 retries it crashes with the following KeyError.

Traceback (most recent call last):
  File "/home/**/miniconda3/envs/chatdev/lib/python3.9/site-packages/tenacity/__init__.py", line 382, in __call__
    result = fn(*args, **kwargs)
  File "/media/**/4TB_DATA/git/ChatDev/camel/utils.py", line 145, in wrapper
    return func(self, *args, **kwargs)
  File "/media/**/4TB_DATA/git/ChatDev/camel/agents/chat_agent.py", line 200, in step
    response["id"],
KeyError: 'id'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/media/**/4TB_DATA/git/ChatDev/run.py", line 111, in <module>
    chat_chain.execute_chain()
  File "/media/**/4TB_DATA/git/ChatDev/chatdev/chat_chain.py", line 160, in execute_chain
    self.execute_step(phase_item)
  File "/media/**/4TB_DATA/git/ChatDev/chatdev/chat_chain.py", line 130, in execute_step
    self.chat_env = self.phases[phase].execute(self.chat_env,
  File "/media/**/4TB_DATA/git/ChatDev/chatdev/phase.py", line 292, in execute
    self.chatting(chat_env=chat_env,
  File "/media/**/4TB_DATA/git/ChatDev/chatdev/utils.py", line 77, in wrapper
    return func(*args, **kwargs)
  File "/media/**/4TB_DATA/git/ChatDev/chatdev/phase.py", line 131, in chatting
    assistant_response, user_response = role_play_session.step(input_user_msg, chat_turn_limit == 1)
  File "/media/**/4TB_DATA/git/ChatDev/camel/agents/role_playing.py", line 242, in step
    assistant_response = self.assistant_agent.step(user_msg_rst)
  File "/home/**/miniconda3/envs/chatdev/lib/python3.9/site-packages/tenacity/__init__.py", line 289, in wrapped_f
    return self(f, *args, **kw)
  File "/home/**/miniconda3/envs/chatdev/lib/python3.9/site-packages/tenacity/__init__.py", line 379, in __call__
    do = self.iter(retry_state=retry_state)
  File "/home/**/miniconda3/envs/chatdev/lib/python3.9/site-packages/tenacity/__init__.py", line 326, in iter
    raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x7ff680181ac0 state=finished raised KeyError>]

I have already exported OPENAI_API_BASE and OPENAI_API_KEY to the localhost otherwise it crashed.

What can I do to successfully use the local LLM?

Thanks for any help and sorry if this is the wrong place to ask it!

GitSimply · 2023-09-20T23:46:01Z

@starkdmi

Create new model config file named gpt-3.5-turbo-16k.yaml and set the model name to gpt-3.5-turbo-16k-0613.

What model are you using?

jacktang · 2023-09-21T01:45:09Z

Install LocalAI - OpenAI compatible server.

Create new model config file named gpt-3.5-turbo-16k.yaml and set the model name to gpt-3.5-turbo-16k-0613.

Hello @starkdmi , can you share the file gpt-3.5-turbo-16k.yaml?

starkdmi · 2023-09-21T05:18:40Z

@jacktang, it depends on the model but for example looks like - gpt-3.5-turbo-16k.txt (rename to .yaml) for Vicuna 1.5.

@GitSimply, those are working with many of the GPT tools on my setup: WizardLM, WizardCoder, WizardCoderPy, Wizard-Vicuna, Vicuna, CodeLLaMa.

Egalitaristen · 2023-09-26T16:42:22Z

I'm just going to add a bit about how I got ChatDev running locally with LM Studio server for anyone searching. It was really easy if there would have been clear instructions but I had to read through all of the issues and tried to find stuff in the code to no avail.

Anyway. The basics:

Windows 10
Following the installation instructions from the readme for steps 1-3 (gitclone, conda, cd, install requirements)

On step 4 do this instead:

set OPENAI_API_BASE=http://localhost:1234/v1

And that's it (you'll need to start the LMS server and load a model), now you can just run ChatDev like you normally would but locally.

sankalp-25 · 2023-09-28T04:36:22Z

Hey @starkdmi, while using LocalAI

git clone https://github.com/go-skynet/LocalAI
cd LocalAI
git checkout -b build
cp your-model.bin models/
docker compose up -d --pull always
curl http://localhost:8080/v1/models

After doing this in LocalAI, I am directly executing this in ChatDev
OPENAI_API_BASE=http://127.0.0.1:8000/v1 OPENAI_API_KEY="dummy" python run.py --task "Snake game in pure html" --name "WebSnake"

and I am getting the following error:

Traceback (most recent call last):
File "/root/anaconda3/lib/python3.11/site-packages/tenacity/init.py", line 382, in call
result = fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/root/ChatDev/camel/utils.py", line 145, in wrapper
return func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/ChatDev/camel/agents/chat_agent.py", line 191, in step
response = self.model_backend.run(messages=openai_messages)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/ChatDev/camel/model_backend.py", line 69, in run
response = openai.ChatCompletion.create(*args, **kwargs,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/openai/api_resources/chat_completion.py", line 25, in create
return super().create(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 153, in create
response, _, api_key = requestor.request(
^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/openai/api_requestor.py", line 298, in request
resp, got_stream = self._interpret_response(result, stream)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/openai/api_requestor.py", line 700, in _interpret_response
self._interpret_response_line(
File "/root/anaconda3/lib/python3.11/site-packages/openai/api_requestor.py", line 763, in _interpret_response_line
raise self.handle_error_response(
openai.error.APIError: rpc error: code = Unknown desc = inference failed {"error":{"code":500,"message":"rpc error: code = Unknown desc = inference failed","type":""}} 500 {'error': {'code': 500, 'message': 'rpc error: code = Unknown desc = inference failed', 'type': ''}} {'Date': 'Tue, 26 Sep 2023 06:20:10 GMT', 'Content-Type': 'application/json', 'Content-Length': '94'}

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/root/ChatDev/run.py", line 111, in
chat_chain.execute_chain()
File "/root/ChatDev/chatdev/chat_chain.py", line 160, in execute_chain
self.execute_step(phase_item)
File "/root/ChatDev/chatdev/chat_chain.py", line 130, in execute_step
self.chat_env = self.phases[phase].execute(self.chat_env,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/ChatDev/chatdev/phase.py", line 292, in execute
self.chatting(chat_env=chat_env,
File "/root/ChatDev/chatdev/utils.py", line 77, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/root/ChatDev/chatdev/phase.py", line 131, in chatting
assistant_response, user_response = role_play_session.step(input_user_msg, chat_turn_limit == 1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/ChatDev/camel/agents/role_playing.py", line 242, in step
assistant_response = self.assistant_agent.step(user_msg_rst)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/tenacity/init.py", line 289, in wrapped_f
return self(f, *args, **kw)
^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/tenacity/init.py", line 379, in call
do = self.iter(retry_state=retry_state)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/tenacity/init.py", line 326, in iter
raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x7f272c7ca710 state=finished raised APIError>]

how can I fix this?

starkdmi · 2023-09-28T10:00:43Z

@sankalp-25, the problem is the local open-ai server which wrongly responds. Do you have a config file for your model in the models/ directory near the .bin file?

It should look like that one so it simulates gtp-3.5 model instead of hosting your-model.

LocalAI on startup will list the models hosted and you should see the correct name (gpt-3.5/4).

sankalp-25 · 2023-09-28T10:32:25Z

Hey @starkdmi, I have renamed the .yaml file to gpt-3.5-turbo-16k.yaml and the model file to gpt-3.5-turbo-16k-0613, after which I am doing as follows, and if I am not wrong config file is .yaml which I have renamed from docker-compose.yaml to gpt-3.5-turbo-16k.yaml. If I am wrong, please let me know what is the mistake.

Please check the below log

$docker compose -f gpt-3.5-turbo-16k.yaml up -d --pull always
[+] Running 1/1
✔ api Pulled 2.9s
[+] Running 1/0
✔ Container localai-api-1 Running

$ curl http://localhost:8000/v1/models
{"object":"list","data":[{"id":"gpt-3.5-turbo-16k-0613","object":"model"}]}

after this I am trying to run the following in chatdev

$OPENAI_API_BASE=http://127.0.0.1:8000/v1 OPENAI_API_KEY="dummy" python run.py --task "Snake game in pure html" --name "WebSnake"

The error I am getting was given in previous comment

Thank you

starkdmi · 2023-09-28T10:59:36Z

@sankalp-25, we could test the model is working using this Python code:

import openai # https://github.com/openai/openai-python#installation

openai.api_key = "sk-dummy"
openai.api_base = "http://127.0.0.1:8000/v1"

chat_completion = openai.ChatCompletion.create(
  model="gpt-3.5-turbo-16k-0613",
  messages=[{"role": "user", "content": "Calculate 20 minus 5."}]
)

completion = chat_completion.choices[0].message.content
print(completion) # The result of 20 minus 5 is 15.

sankalp-25 · 2023-09-28T11:32:16Z

@starkdmi, what is it when you say config file?
if I am not wrong config file is .yaml which I have renamed from docker-compose.yaml to gpt-3.5-turbo-16k.yaml.
and I only have gpt-3.5-turbo-16k-0613 and gpt-3.5-turbo-16k-0613.tmpl in /models,
when I run the I am code for checking of model, the following is the error

Traceback (most recent call last):
File "/root/FGPT/LocalAI/models/infer.py", line 6, in
chat_completion = openai.ChatCompletion.create(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/openai/api_resources/chat_completion.py", line 25, in create
return super().create(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 153, in create
response, _, api_key = requestor.request(
^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/openai/api_requestor.py", line 298, in request
resp, got_stream = self._interpret_response(result, stream)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.11/site-packages/openai/api_requestor.py", line 700, in _interpret_response
self._interpret_response_line(
File "/root/anaconda3/lib/python3.11/site-packages/openai/api_requestor.py", line 763, in _interpret_response_line
raise self.handle_error_response(
openai.error.APIError: rpc error: code = Unknown desc = unimplemented {"error":{"code":500,"message":"rpc error: code = Unknown desc = unimplemented","type":""}} 500 {'error': {'code': 500, 'message': 'rpc error: code = Unknown desc = unimplemented', 'type': ''}} {'Date': 'Thu, 28 Sep 2023 11:08:00 GMT', 'Content-Type': 'application/json', 'Content-Length': '91'}

starkdmi · 2023-09-28T11:46:04Z

@sankalp-25, wow, the docker-compose.yaml is completely different thing. The docs are here.

The correct content of the file named gpt-3.5-turbo-16k.yaml may look like:

name: gpt-3.5-turbo-16k # or gpt-3.5-turbo-16k-0613

parameters:
  model: vicuna-13b-v1.5-16k.Q5_K_M.gguf
  temperature: 0.2
  top_k: 80
  top_p: 0.7
  max_tokens: 2048
  f16: true

context_size: 16384

template:
  chat: vicuna

f16: true
gpu_layers: 32
mmap: true

asfandsaleem · 2023-10-05T22:29:28Z

I'm just going to add a bit about how I got ChatDev running locally with LM Studio server for anyone searching. It was really easy if there would have been clear instructions but I had to read through all of the issues and tried to find stuff in the code to no avail.

Anyway. The basics:

Windows 10

Following the installation instructions from the readme for steps 1-3 (gitclone, conda, cd, install requirements)

On step 4 do this instead:

set OPENAI_API_BASE=http://localhost:1234/v1

And that's it (you'll need to start the LMS server and load a model), now you can just run ChatDev like you normally would but locally.

Correct. You also need one more step.
set OPENAI_API_KEY="xyz"

favouriter · 2023-11-15T10:48:07Z

使用Langchain-Chatchat这个项目，调用本地2000端口，

# update model config
LLM_MODELS = ["gpt-3.5-turbo-16k-0613"]
MODEL_PATH['llm_model'].update({"gpt-3.5-turbo-16k-0613": MODEL_PATH['llm_model']['chatglm3-6b-32k']})

OPENAI_API_BASE=http://127.0.0.1:2000/v1 OPENAI_API_KEY="dummy" python run.py --task "2048 game" --name "2048"

travelhawk · 2023-11-21T09:28:04Z

I tried to use LM Studio as a local OpenAI substitute. It works good, by utilizing the here suggested setup of environment variables.

OPENAI_API_BASE=http://127.0.0.1:1234/v1 OPENAI_API_KEY="xyz" python run.py --task "A drawing app" --name "Draw App"

However, it doesn't run through and terminates with an error that the max tokens are exceeded:

Traceback (most recent call last):
  File "C:\Users\falk\repos\AI\ChatDev\run.py", line 114, in <module>
    chat_chain.execute_chain()
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\chat_chain.py", line 163, in execute_chain
    self.execute_step(phase_item)
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\chat_chain.py", line 133, in execute_step
    self.chat_env = self.phases[phase].execute(self.chat_env,
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\phase.py", line 291, in execute
    self.chatting(chat_env=chat_env,
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\utils.py", line 77, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\phase.py", line 165, in chatting
    seminar_conclusion = "<INFO> " + self.self_reflection(task_prompt, role_play_session, phase_name,
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\phase.py", line 219, in self_reflection
    self.chatting(chat_env=chat_env,
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\utils.py", line 77, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\phase.py", line 136, in chatting
    if isinstance(assistant_response.msg, ChatMessage):
  File "C:\Users\falk\repos\AI\ChatDev\camel\agents\chat_agent.py", line 53, in msg
    raise RuntimeError("error in ChatAgentResponse, info:{}".format(str(self.info)))
RuntimeError: error in ChatAgentResponse, info:{'id': None, 'usage': None, 'termination_reasons': ['max_tokens_exceeded_by_camel'], 'num_tokens': 17171}

For inference I'm using the zephyr-7B-beta. Does anyone know how to fix this or what to do?

jamiemoller · 2023-11-21T23:33:46Z

I tried to use LM Studio as a local OpenAI substitute. It works good, by utilizing the here suggested setup of environment variables.

OPENAI_API_BASE=http://127.0.0.1:1234/v1 OPENAI_API_KEY="xyz" python run.py --task "A drawing app" --name "Draw App"

However, it doesn't run through and terminates with an error that the max tokens are exceeded:

Traceback (most recent call last):
  File "C:\Users\falk\repos\AI\ChatDev\run.py", line 114, in <module>
    chat_chain.execute_chain()
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\chat_chain.py", line 163, in execute_chain
    self.execute_step(phase_item)
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\chat_chain.py", line 133, in execute_step
    self.chat_env = self.phases[phase].execute(self.chat_env,
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\phase.py", line 291, in execute
    self.chatting(chat_env=chat_env,
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\utils.py", line 77, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\phase.py", line 165, in chatting
    seminar_conclusion = "<INFO> " + self.self_reflection(task_prompt, role_play_session, phase_name,
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\phase.py", line 219, in self_reflection
    self.chatting(chat_env=chat_env,
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\utils.py", line 77, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\falk\repos\AI\ChatDev\chatdev\phase.py", line 136, in chatting
    if isinstance(assistant_response.msg, ChatMessage):
  File "C:\Users\falk\repos\AI\ChatDev\camel\agents\chat_agent.py", line 53, in msg
    raise RuntimeError("error in ChatAgentResponse, info:{}".format(str(self.info)))
RuntimeError: error in ChatAgentResponse, info:{'id': None, 'usage': None, 'termination_reasons': ['max_tokens_exceeded_by_camel'], 'num_tokens': 17171}

For inference I'm using the zephyr-7B-beta. Does anyone know how to fix this or what to do?

My first though is that this is a max token problem

travelhawk · 2023-11-22T11:19:50Z

Obviously it is. Is it because of the model? How to raise the max tokens?

sammcj · 2023-11-26T00:05:22Z

Looks like there's an open PR to add this - #53

acbp · 2023-11-28T04:09:03Z

Is it possible to use ollama

yes using litellm openai-proxy, like that:

litellm --api_base http://localhost:11434 --add_key OPENAI_API_KEY=dummy --drop_params --model ollama/orca2:7b

A proxy server openai-compatible api will run and redirect to ollama,
then run AI-openai-compatible-app, like chatdev:

OPENAI_API_BASE=http://localhost:8000/v1 OPENAI_API_KEY=dummy python3 run.py --task "<task>" --name "<title>"

docs litellm proxy

BackMountainDevil · 2023-12-02T07:24:43Z

@xkaraman

tenacity.RetryError: RetryError

have you fix the error? I met same err when I use chatglm3-6b as llm server. And server got some red color logs at ""POST /send_message HTTP/1.1" 404 Not Found" . So I think the code got err because llm server did not respond to /send_message correctly. And the code will try again until max_time.

davidxll · 2023-12-03T09:04:04Z

Install LocalAI - OpenAI compatible server.

Create new model config file named gpt-3.5-turbo-16k.yaml and set the model name to gpt-3.5-turbo-16k-0613.

Start LocalAI server locally and run:
OPENAI_API_BASE=http://127.0.0.1:8000/v1 OPENAI_API_KEY="dummy" python run.py --task "Snake game in pure html" --name "WebSnake"

This should be added to the wiki or documented somewhere

godshades · 2023-12-04T09:55:20Z

can someone guide me how to run on full docker stack
like 1 container for local models
1 container for ChatDev

tecno14 · 2023-12-16T01:08:56Z

To save the base and/or key in the conda environment use this before activate it (or unactive then re active again)

conda env config vars set OPENAI_API_BASE=http://localhost:1234/v1 --name ChatDev_conda_env
conda env config vars set OPENAI_API_KEY=any --name ChatDev_conda_env

BackMountainDevil · 2023-12-16T08:30:16Z

使用Langchain-Chatchat这个项目，调用本地2000端口

我尝试了你的提议，端口上不出意外是2w,应该不是2k（可能是打错了），我用的也是chatglm3-6b-32k,知识库是BAAI/bge-large-zh,能跑，但奇怪的是响应很慢，不是没有响应，而是过好一会才响应，gpu的80g内存够的，最后花了93mins完成一款无法运行的“Snake game in pure html”

mroxso · 2023-12-26T00:33:35Z

For me it wasn't OPENAI_API_BASE, but BASE_URL.
After setting this, everything works fine with LiteLLM + Ollama

evmond1 · 2024-02-13T10:56:51Z

FYI - its not OPEN_API_BASE. if using anaconda on windows you do SET BASE_URL="http://localhost:1234/v1" and then SET OPEN_API_KEY="not needed" . this is if you're using LMstudio. All working my end using Mistral instruct 7B.

hemangjoshi37a · 2024-02-20T12:10:57Z

If anyone has ollama integrated with this then please let me know. thanks a lot. happy coding.

akhil3417 · 2024-02-29T05:36:43Z

how to go about using other services that like Together.ai offers an OpenAI compatible API, how to set host ?

opencoca · 2024-02-29T11:19:53Z

If the API is OpenAI compatible you can point at the API endpoint using --api_base as with local models.

resdig3 · 2024-03-01T19:34:26Z

I'm just going to add a bit about how I got ChatDev running locally with LM Studio server for anyone searching. It was really easy if there would have been clear instructions but I had to read through all of the issues and tried to find stuff in the code to no avail.

Anyway. The basics:

Windows 10

Following the installation instructions from the readme for steps 1-3 (gitclone, conda, cd, install requirements)

On step 4 do this instead:

set OPENAI_API_BASE=http://localhost:1234/v1

And that's it (you'll need to start the LMS server and load a model), now you can just run ChatDev like you normally would but locally.

Trying to get this running on a Win10 machine.
I keep getting this error, like it needs a working API key of some sort:

.conda\envs\ChatDev_conda_env\lib\site-packages\openai_base_client.py", line 877, in _request
raise self._make_status_error_from_response(err.response) from None
openai.AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: abc123xyz. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}

vishusinghal · 2024-04-20T11:23:18Z

FYI - its not OPEN_API_BASE. if using anaconda on windows you do SET BASE_URL="http://localhost:1234/v1" and then SET OPEN_API_KEY="not needed" . this is if you're using LMstudio. All working my end using Mistral instruct 7B.

I tried with BASE_URL as well as OPEN_AI_BASE but getting an APIConnectionError

tenacity.RetryError: RetryError[<Future at 0x1cdcf0bce80 state=finished raised APIConnectionError>]

Can you help?

maramowicz · 2024-05-01T21:04:21Z

It's not BASE_URL, it's OPENAI_BASE_URL! But I also saw on begin that OPENAI_API_BASE is also used (maybe I'm wrong here but it works for me), so the correct command is OPENAI_API_BASE=http://localhost OPENAI_BASE_URL=http://localhost OPENAI_API_KEY="anything" python run.py --task "Snake game in pure html" --name "WebSnake"

hemangjoshi37a · 2024-05-15T13:52:50Z

hi all the contributors ,
I have found a github app that helps solving issues using AI and the approach to this is very interesting so i am sharing this , please anyone add this to this repo . : https://github.com/apps/sweep-ai
not sponsored or anything but i found it helpful so.

BradKML · 2024-07-24T09:07:09Z

@acbp has anyone wrote documentation specifically for things like Ollama (or LocalAI, Xorbit, or OpenLLM) to work with ChatDev? LiteLLM maybe?

hemangjoshi37a · 2024-07-24T09:16:47Z

If anyone has achieved or can suggest for this . In this we want to continuously index the updated code base in the HippoRAG index and query on the updated index and then make code changes and continuously do so. Here consider that we want to do this offline with ollama type models only and dont want to use OpenAI or Claude .

If anyone can suggest how can i do this ?

hgftrdw45ud67is8o89 · 2024-07-26T15:46:29Z

anyone had success with llama.cpp?

I got tenacity.RetryError: RetryError[<Future at 0x2595e8f1e10 state=finished raised NotFoundError>]

and

raise self._make_status_error_from_response(err.response) from None openai.NotFoundError: Error code: 404 - {'error': {'code': 404, 'message': 'File Not Found', 'type': 'not_found_error'}}

syedshahzebhasnain · 2024-07-30T20:00:03Z

On Mac( m3) i used this with local install ( Anaconda)

BASE_URL=http://localhost:1234/v1 OPENAI_API_KEY=dummy python3 run.py --task "< task >" --name "<title>"

hemangjoshi37a mentioned this issue Sep 9, 2023

Please add support to connect falcon and llama models with this . #33

Closed

thinkwee assigned qianc62 and JiahaoLi2003 Sep 12, 2023

0xsks mentioned this issue Sep 24, 2023

Using OpenRouter #86

Closed

xkaraman mentioned this issue Sep 25, 2023

OpenAI response compatibility mudler/LocalAI#1103

Closed

2good4hisowngood mentioned this issue Sep 26, 2023

Feature Request: Support for Multiple LLM AI API Endpoints for Self-Hosting and Model Selection #98

Closed

Nbtguyoriginal mentioned this issue Sep 26, 2023

Proven to run local #102

Closed

2good4hisowngood mentioned this issue Sep 28, 2023

Feature Request: Support for Multiple Simultaneous LLM AI API Endpoints for Self-Hosting and Model Selection microsoft/autogen#34

Closed

sammcj mentioned this issue Nov 26, 2023

Feature Request: Support for Local Hosting of LLM API Endpoint with TextGenWebUI Integration #100

Closed

thinkwee mentioned this issue Dec 27, 2023

Issues with OPENAI_API_KEY and OPENAI_API_BASE in the Latest Update #312

Closed

cryptictech mentioned this issue Dec 30, 2023

Implementation of Gemini API #313

Open

Liagim01 mentioned this issue Jan 5, 2024

Add support for Zephyr 7B #295

Open

Liyulingyue mentioned this issue Mar 19, 2024

关于一些冗余代码的合并咨询 #357

Closed

thinkwee mentioned this issue Sep 9, 2024

GPT-4O-mini does not support image generation #407

Open

Local Models integration #27

Local Models integration #27

Comments

TheWhiteWord commented Sep 8, 2023

andraz commented Sep 8, 2023

thedualspace commented Sep 8, 2023

j-loquat commented Sep 8, 2023

TheWhiteWord commented Sep 8, 2023

andraz commented Sep 8, 2023

TheWhiteWord commented Sep 8, 2023 • edited Loading

hemangjoshi37a commented Sep 9, 2023

starkdmi commented Sep 11, 2023

andraz commented Sep 13, 2023

starkdmi commented Sep 13, 2023

xkaraman commented Sep 20, 2023 • edited Loading

GitSimply commented Sep 20, 2023 • edited Loading

jacktang commented Sep 21, 2023

starkdmi commented Sep 21, 2023

Egalitaristen commented Sep 26, 2023 • edited Loading

sankalp-25 commented Sep 28, 2023 • edited Loading

starkdmi commented Sep 28, 2023

sankalp-25 commented Sep 28, 2023 • edited Loading

starkdmi commented Sep 28, 2023

sankalp-25 commented Sep 28, 2023

starkdmi commented Sep 28, 2023

asfandsaleem commented Oct 5, 2023

favouriter commented Nov 15, 2023

travelhawk commented Nov 21, 2023

jamiemoller commented Nov 21, 2023

travelhawk commented Nov 22, 2023

sammcj commented Nov 26, 2023

acbp commented Nov 28, 2023 • edited Loading

BackMountainDevil commented Dec 2, 2023

davidxll commented Dec 3, 2023

godshades commented Dec 4, 2023

tecno14 commented Dec 16, 2023

BackMountainDevil commented Dec 16, 2023 • edited Loading

mroxso commented Dec 26, 2023

evmond1 commented Feb 13, 2024 • edited Loading

hemangjoshi37a commented Feb 20, 2024

akhil3417 commented Feb 29, 2024

opencoca commented Feb 29, 2024

resdig3 commented Mar 1, 2024

vishusinghal commented Apr 20, 2024

maramowicz commented May 1, 2024

hemangjoshi37a commented May 15, 2024

BradKML commented Jul 24, 2024 • edited Loading

hemangjoshi37a commented Jul 24, 2024

hgftrdw45ud67is8o89 commented Jul 26, 2024 • edited Loading

syedshahzebhasnain commented Jul 30, 2024 • edited Loading

TheWhiteWord commented Sep 8, 2023 •

edited

Loading

xkaraman commented Sep 20, 2023 •

edited

Loading

GitSimply commented Sep 20, 2023 •

edited

Loading

Egalitaristen commented Sep 26, 2023 •

edited

Loading

sankalp-25 commented Sep 28, 2023 •

edited

Loading

sankalp-25 commented Sep 28, 2023 •

edited

Loading

acbp commented Nov 28, 2023 •

edited

Loading

BackMountainDevil commented Dec 16, 2023 •

edited

Loading

evmond1 commented Feb 13, 2024 •

edited

Loading

BradKML commented Jul 24, 2024 •

edited

Loading

hgftrdw45ud67is8o89 commented Jul 26, 2024 •

edited

Loading

syedshahzebhasnain commented Jul 30, 2024 •

edited

Loading