Skip to content

Releases: LlamaEdge/rag-api-server

LlamaEdge-RAG 0.12.1

13 Jan 06:51
Compare
Choose a tag to compare

Major changes:

  • (NEW) Add the --ubatch-size CLI option

LlamaEdge-RAG 0.12.0

09 Jan 14:30
Compare
Choose a tag to compare

Major changes:

  • (New) Add the --split-mode CLI option
  • (BREAKING) Update the --n-predict CLI option
    • Update the type to i32
    • Update the default value to -1. Keep it consistent with the --n-predict CLI option of llama.cpp
  • Upgrade deps:
    • llama-core v0.26.0
    • chat-prompts v0.19.0
    • endpoints v0.24.0

LlamaEdge-RAG 0.11.2

06 Jan 18:15
Compare
Choose a tag to compare

Major changes:

  • Upgrade deps:
    • llama-core v0.25.3
    • chat-prompts v0.18.6
    • endpoints v0.23.2

LlamaEdge-RAG 0.11.1

21 Dec 15:58
Compare
Choose a tag to compare

Major changes:

  • (New) Support API key

    • Use API_KEY environment variable to set api-key when start API server, for example
      export LLAMA_API_KEY=12345-6789-abcdef
      wasmedge --dir .:. --env API_KEY=$LLAMA_API_KEY \
        --nn-preload default:GGML:AUTO:Llama-3.2-3B-Instruct-Q5_K_M.gguf \
        --nn-preload embedding:GGML:AUTO:nomic-embed-text-v1.5.f16.gguf \
        rag-api-server.wasm \
        ...
    • Send each request with the corresponding api-key, for example
      curl --location 'http://localhost:8080/v1/chat/completions' \
      --header 'Authorization: Bearer 12345-6789-abcdef' \
      --header 'Content-Type: application/json' \
      --data '...'
  • (New) Add --context-window CLI option for specifying the maximum number of user messages for the context retrieval. Note that if the context_window field in the chat completion request appears, then ignore the setting of the CLI option.

    --context-window <CONTEXT_WINDOW>
              Maximum number of user messages used in the retrieval [default: 1]

LlamaEdge-RAG 0.11.0

11 Dec 07:19
Compare
Choose a tag to compare

Major changes:

  • (BREAKING) Rename the VectorDB related fields in the requests

    • Rename url_vdb_server to vdb_server_url
    • Rename collection_name to vdb_collection_name
  • (NEW) Add the vdb_api_key field to the requests to /v1/create/rag, /v1/chat/completion, and /v1/retrieve endpoints. The field allows users to access the VectorDB server which requires an API key for access. See vectordb.md for details.

  • (NEW) Provide the support for setting VectorDB API key via the environment variable VDB_API_KEY. See vectordb.md for details.

  • Add vectordb.md for introducing how to interact with VectorDB

LlamaEdge-RAG 0.10.0

08 Dec 14:09
Compare
Choose a tag to compare

Major changes:

  • Support multiple collections ( Fixes #28 )

    • Improve --qdrant-collection-name, --qdrant-limit, and --qdrant-score-threshold CLI options to support both single value and multiple comma-separated values, for example

      wasmedge --dir .:. \
      --nn-preload default:GGML:AUTO:Llama-3.2-3B-Instruct-Q5_K_M.gguf \
      --nn-preload embedding:GGML:AUTO:nomic-embed-text-v1.5-f16.gguf \
      rag-api-server.wasm \
      ...
      --qdrant-url http://127.0.0.1:6333 \
      --qdrant-collection-name paris,paris2 \
      --qdrant-limit 2,3 \
      --qdrant-score-threshold 0.5,0.6 \
      ...
    • For the requests to both /v1/chat/completions and /v1/retrieve endpoints, url_vdb_server, collection_name, limit, and score_threshold fields support both single and multiple values. For example,

      • Multiple values

        curl --location 'http://localhost:8080/v1/retrieve' \
        --header 'Content-Type: application/json' \
        --data '{
            "messages": [
                ...
            ],
            ...,
            "url_vdb_server": "http://127.0.0.1:6333",
            "collection_name": ["paris","paris2"],
            "limit": [3,3],
            "score_threshold": [0.7,0.7],
            ...
        }'
      • Single value

          curl --location 'http://localhost:8080/v1/retrieve' \
          --header 'Content-Type: application/json' \
          --data '{
              "messages": [
                  ...
              ],
              ...,
              "url_vdb_server": "http://127.0.0.1:6333",
              "collection_name": ["paris"],
              "limit": [3],
              "score_threshold": [0.7],
              ...
          }'
  • Remove duplicated RAG search results ( Fixes #27 )

  • Upgrade dependencies:

    • llama-core v0.23.4
    • chat-prompts v0.18.1
    • endpoints v0.20.0

LlamaEdge-RAG 0.9.17

29 Nov 07:47
Compare
Choose a tag to compare

Major changes:

  • Upgrade dependencies:
    • llama-core v0.23.3
    • chat-prompts v0.18.0
    • endpoints v0.19.0

LlamaEdge-RAG 0.9.16

22 Nov 14:05
Compare
Choose a tag to compare

Major change:

  • Upgrade to llama-core v0.23.0, chat-prompts v0.17.5, and endpoints v0.18.0
  • (NEW) Allow to update qdrant settings in each chat completion and embedding request:
    • url_vdb_server: The URL of the VectorDB server.
    • collection_name: The name of the collection in VectorDB.
    • limit: Max number of retrieved results.
    • score_threshold: The score threshold for the retrieved results.

LlamaEdge-RAG 0.9.15

12 Nov 07:24
Compare
Choose a tag to compare

Major changes:

  • New endpoints

    • GET /v1/files/{file_id}: Retrieve information of a specific file by id
    • GET /v1/files/{file_id}/content: Retrieve the content of a specific file by id
    • GET /v1/files/download/{file_id}: Download a specific file by id
  • Upgrade to llama-core v0.22.0

LlamaEdge-RAG 0.9.14

06 Nov 08:00
Compare
Choose a tag to compare

Major change:

  • Support the dynamic number of latest user messages used in the context retrieval. The number is decided by the context_window field of chat requests. (Fixed #25 )