Releases: LlamaEdge/rag-api-server
LlamaEdge-RAG 0.12.1
Major changes:
- (NEW) Add the
--ubatch-size
CLI option
LlamaEdge-RAG 0.12.0
Major changes:
- (New) Add the
--split-mode
CLI option - (BREAKING) Update the
--n-predict
CLI option- Update the type to
i32
- Update the default value to
-1
. Keep it consistent with the--n-predict
CLI option ofllama.cpp
- Update the type to
- Upgrade deps:
llama-core v0.26.0
chat-prompts v0.19.0
endpoints v0.24.0
LlamaEdge-RAG 0.11.2
Major changes:
- Upgrade deps:
llama-core v0.25.3
chat-prompts v0.18.6
endpoints v0.23.2
LlamaEdge-RAG 0.11.1
Major changes:
-
(New) Support API key
- Use
API_KEY
environment variable to set api-key when start API server, for exampleexport LLAMA_API_KEY=12345-6789-abcdef wasmedge --dir .:. --env API_KEY=$LLAMA_API_KEY \ --nn-preload default:GGML:AUTO:Llama-3.2-3B-Instruct-Q5_K_M.gguf \ --nn-preload embedding:GGML:AUTO:nomic-embed-text-v1.5.f16.gguf \ rag-api-server.wasm \ ...
- Send each request with the corresponding api-key, for example
curl --location 'http://localhost:8080/v1/chat/completions' \ --header 'Authorization: Bearer 12345-6789-abcdef' \ --header 'Content-Type: application/json' \ --data '...'
- Use
-
(New) Add
--context-window
CLI option for specifying the maximum number of user messages for the context retrieval. Note that if thecontext_window
field in the chat completion request appears, then ignore the setting of the CLI option.--context-window <CONTEXT_WINDOW> Maximum number of user messages used in the retrieval [default: 1]
LlamaEdge-RAG 0.11.0
Major changes:
-
(BREAKING) Rename the VectorDB related fields in the requests
- Rename
url_vdb_server
tovdb_server_url
- Rename
collection_name
tovdb_collection_name
- Rename
-
(NEW) Add the
vdb_api_key
field to the requests to/v1/create/rag
,/v1/chat/completion
, and/v1/retrieve
endpoints. The field allows users to access the VectorDB server which requires an API key for access. See vectordb.md for details. -
(NEW) Provide the support for setting VectorDB API key via the environment variable
VDB_API_KEY
. See vectordb.md for details. -
Add
vectordb.md
for introducing how to interact with VectorDB
LlamaEdge-RAG 0.10.0
Major changes:
-
Support multiple collections ( Fixes #28 )
-
Improve
--qdrant-collection-name
,--qdrant-limit
, and--qdrant-score-threshold
CLI options to support both single value and multiple comma-separated values, for examplewasmedge --dir .:. \ --nn-preload default:GGML:AUTO:Llama-3.2-3B-Instruct-Q5_K_M.gguf \ --nn-preload embedding:GGML:AUTO:nomic-embed-text-v1.5-f16.gguf \ rag-api-server.wasm \ ... --qdrant-url http://127.0.0.1:6333 \ --qdrant-collection-name paris,paris2 \ --qdrant-limit 2,3 \ --qdrant-score-threshold 0.5,0.6 \ ...
-
For the requests to both
/v1/chat/completions
and/v1/retrieve
endpoints,url_vdb_server
,collection_name
,limit
, andscore_threshold
fields support both single and multiple values. For example,-
Multiple values
curl --location 'http://localhost:8080/v1/retrieve' \ --header 'Content-Type: application/json' \ --data '{ "messages": [ ... ], ..., "url_vdb_server": "http://127.0.0.1:6333", "collection_name": ["paris","paris2"], "limit": [3,3], "score_threshold": [0.7,0.7], ... }'
-
Single value
curl --location 'http://localhost:8080/v1/retrieve' \ --header 'Content-Type: application/json' \ --data '{ "messages": [ ... ], ..., "url_vdb_server": "http://127.0.0.1:6333", "collection_name": ["paris"], "limit": [3], "score_threshold": [0.7], ... }'
-
-
-
Remove duplicated RAG search results ( Fixes #27 )
-
Upgrade dependencies:
llama-core v0.23.4
chat-prompts v0.18.1
endpoints v0.20.0
LlamaEdge-RAG 0.9.17
Major changes:
- Upgrade dependencies:
llama-core v0.23.3
chat-prompts v0.18.0
endpoints v0.19.0
LlamaEdge-RAG 0.9.16
Major change:
- Upgrade to
llama-core v0.23.0
,chat-prompts v0.17.5
, andendpoints v0.18.0
- (NEW) Allow to update qdrant settings in each
chat completion
andembedding
request:url_vdb_server
: The URL of the VectorDB server.collection_name
: The name of the collection in VectorDB.limit
: Max number of retrieved results.score_threshold
: The score threshold for the retrieved results.
LlamaEdge-RAG 0.9.15
Major changes:
-
New endpoints
GET /v1/files/{file_id}
: Retrieve information of a specific file by idGET /v1/files/{file_id}/content
: Retrieve the content of a specific file by idGET /v1/files/download/{file_id}
: Download a specific file by id
-
Upgrade to
llama-core v0.22.0
LlamaEdge-RAG 0.9.14
Major change:
- Support the dynamic number of latest user messages used in the context retrieval. The number is decided by the
context_window
field of chat requests. (Fixed #25 )