Releases · LlamaEdge/rag-api-server

13 Jan 06:51

github-actions

0.12.1

13f1ecf

LlamaEdge-RAG 0.12.1 Latest

Latest

Major changes:

(NEW) Add the --ubatch-size CLI option

Assets 4

09 Jan 14:30

github-actions

0.12.0

ccae7a3

LlamaEdge-RAG 0.12.0

Major changes:

(New) Add the --split-mode CLI option
(BREAKING) Update the --n-predict CLI option
- Update the type to i32
- Update the default value to -1. Keep it consistent with the --n-predict CLI option of llama.cpp
Upgrade deps:
- llama-core v0.26.0
- chat-prompts v0.19.0
- endpoints v0.24.0

Assets 4

06 Jan 18:15

github-actions

0.11.2

010bcf4

LlamaEdge-RAG 0.11.2

Major changes:

Upgrade deps:
- llama-core v0.25.3
- chat-prompts v0.18.6
- endpoints v0.23.2

Assets 4

21 Dec 15:58

github-actions

0.11.1

7e44986

LlamaEdge-RAG 0.11.1

Major changes:

(New) Support API key

Use API_KEY environment variable to set api-key when start API server, for example

export LLAMA_API_KEY=12345-6789-abcdef
wasmedge --dir .:. --env API_KEY=$LLAMA_API_KEY \
  --nn-preload default:GGML:AUTO:Llama-3.2-3B-Instruct-Q5_K_M.gguf \
  --nn-preload embedding:GGML:AUTO:nomic-embed-text-v1.5.f16.gguf \
  rag-api-server.wasm \
  ...

Send each request with the corresponding api-key, for example

curl --location 'http://localhost:8080/v1/chat/completions' \
--header 'Authorization: Bearer 12345-6789-abcdef' \
--header 'Content-Type: application/json' \
--data '...'

(New) Add --context-window CLI option for specifying the maximum number of user messages for the context retrieval. Note that if the context_window field in the chat completion request appears, then ignore the setting of the CLI option.
```
--context-window <CONTEXT_WINDOW>
          Maximum number of user messages used in the retrieval [default: 1]
```

Assets 4

11 Dec 07:19

github-actions

0.11.0

db828c9

LlamaEdge-RAG 0.11.0

Major changes:

(BREAKING) Rename the VectorDB related fields in the requests
- Rename url_vdb_server to vdb_server_url
- Rename collection_name to vdb_collection_name
(NEW) Add the vdb_api_key field to the requests to /v1/create/rag, /v1/chat/completion, and /v1/retrieve endpoints. The field allows users to access the VectorDB server which requires an API key for access. See vectordb.md for details.
(NEW) Provide the support for setting VectorDB API key via the environment variable VDB_API_KEY. See vectordb.md for details.
Add vectordb.md for introducing how to interact with VectorDB

Assets 4

08 Dec 14:09

github-actions

0.10.0

5764426

LlamaEdge-RAG 0.10.0

Major changes:

Support multiple collections ( Fixes #28 )

Improve --qdrant-collection-name, --qdrant-limit, and --qdrant-score-threshold CLI options to support both single value and multiple comma-separated values, for example

wasmedge --dir .:. \
--nn-preload default:GGML:AUTO:Llama-3.2-3B-Instruct-Q5_K_M.gguf \
--nn-preload embedding:GGML:AUTO:nomic-embed-text-v1.5-f16.gguf \
rag-api-server.wasm \
...
--qdrant-url http://127.0.0.1:6333 \
--qdrant-collection-name paris,paris2 \
--qdrant-limit 2,3 \
--qdrant-score-threshold 0.5,0.6 \
...

For the requests to both /v1/chat/completions and /v1/retrieve endpoints, url_vdb_server, collection_name, limit, and score_threshold fields support both single and multiple values. For example,

Multiple values

curl --location 'http://localhost:8080/v1/retrieve' \
--header 'Content-Type: application/json' \
--data '{
    "messages": [
        ...
    ],
    ...,
    "url_vdb_server": "http://127.0.0.1:6333",
    "collection_name": ["paris","paris2"],
    "limit": [3,3],
    "score_threshold": [0.7,0.7],
    ...
}'

Single value

  curl --location 'http://localhost:8080/v1/retrieve' \
  --header 'Content-Type: application/json' \
  --data '{
      "messages": [
          ...
      ],
      ...,
      "url_vdb_server": "http://127.0.0.1:6333",
      "collection_name": ["paris"],
      "limit": [3],
      "score_threshold": [0.7],
      ...
  }'

Remove duplicated RAG search results ( Fixes #27 )
Upgrade dependencies:
- llama-core v0.23.4
- chat-prompts v0.18.1
- endpoints v0.20.0

Assets 4

29 Nov 07:47

github-actions

0.9.17

62b6b98

LlamaEdge-RAG 0.9.17

Major changes:

Upgrade dependencies:
- llama-core v0.23.3
- chat-prompts v0.18.0
- endpoints v0.19.0

Assets 4

22 Nov 14:05

github-actions

0.9.16

78ff862

LlamaEdge-RAG 0.9.16

Major change:

Upgrade to llama-core v0.23.0, chat-prompts v0.17.5, and endpoints v0.18.0
(NEW) Allow to update qdrant settings in each chat completion and embedding request:
- url_vdb_server: The URL of the VectorDB server.
- collection_name: The name of the collection in VectorDB.
- limit: Max number of retrieved results.
- score_threshold: The score threshold for the retrieved results.

Assets 4

12 Nov 07:24

github-actions

0.9.15

870b988

LlamaEdge-RAG 0.9.15

Major changes:

New endpoints
- GET /v1/files/{file_id}: Retrieve information of a specific file by id
- GET /v1/files/{file_id}/content: Retrieve the content of a specific file by id
- GET /v1/files/download/{file_id}: Download a specific file by id
Upgrade to llama-core v0.22.0

Assets 4

06 Nov 08:00

github-actions

0.9.14

b610b1c

LlamaEdge-RAG 0.9.14

Major change:

Support the dynamic number of latest user messages used in the context retrieval. The number is decided by the context_window field of chat requests. (Fixed #25 )

Assets 4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: LlamaEdge/rag-api-server

LlamaEdge-RAG 0.12.1

LlamaEdge-RAG 0.12.0

LlamaEdge-RAG 0.11.2

LlamaEdge-RAG 0.11.1

LlamaEdge-RAG 0.11.0

LlamaEdge-RAG 0.10.0

LlamaEdge-RAG 0.9.17

LlamaEdge-RAG 0.9.16

LlamaEdge-RAG 0.9.15

LlamaEdge-RAG 0.9.14