-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Frontend] Rerank API (Jina- and Cohere-compatible API) #12376
base: main
Are you sure you want to change the base?
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
Signed-off-by: Kyle Mistele <[email protected]>
Signed-off-by: Kyle Mistele <[email protected]>
Signed-off-by: Kyle Mistele <[email protected]>
also, add documentation and update client with instructions Signed-off-by: Kyle Mistele <[email protected]>
Signed-off-by: Kyle Mistele <[email protected]>
Signed-off-by: Kyle Mistele <[email protected]>
Signed-off-by: Kyle Mistele <[email protected]>
also, add documentation and update client with instructions Signed-off-by: Kyle Mistele <[email protected]>
Signed-off-by: Kyle Mistele <[email protected]>
Signed-off-by: Kyle Mistele <[email protected]>
Signed-off-by: Kyle Mistele <[email protected]>
Signed-off-by: Kyle Mistele <[email protected]>
…example Signed-off-by: Kyle Mistele <[email protected]>
Signed-off-by: Kyle Mistele <[email protected]>
Signed-off-by: Kyle Mistele <[email protected]>
Signed-off-by: Kyle Mistele <[email protected]>
Signed-off-by: Kyle Mistele <[email protected]>
Signed-off-by: Kyle Mistele <[email protected]>
Signed-off-by: Kyle Mistele <[email protected]>
Is there an issue with docker or dependencies being downloaded? CI is failing for reasons that seem unrelated to my code. Something about invalid |
"default": (ScoreRequest, create_score), | ||
"default": (RerankRequest, do_rerank) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can we just remove (ScoreRequest, create_score)
? I imagine we want to keep that interface around and just offer rerank as a new option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was suggested by @DarkLight1337
state.jinaai_serving_reranking = JinaAIServingRerank( | ||
engine_client, | ||
model_config, | ||
state.openai_serving_models, | ||
request_logger=request_logger) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should only make it in the rerank task case, like above
state.jinaai_serving_reranking = JinaAIServingRerank( | |
engine_client, | |
model_config, | |
state.openai_serving_models, | |
request_logger=request_logger) | |
state.jinaai_serving_reranking = JinaAIServingRerank( | |
engine_client, | |
model_config, | |
state.openai_serving_models, | |
request_logger=request_logger | |
) if model_config.task == "rerank" else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems duplicated from examples/online_serving/jinaai_rerank_client.py
?
Example of using the OpenAI entrypoint's rerank API which is compatible with | ||
Jina and Cohere https://jina.ai/reranker | ||
run: vllm serve --model BAAI/bge-reranker-base |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
run: vllm serve --model BAAI/bge-reranker-base | |
run: vllm serve BAAI/bge-reranker-base |
@@ -50,6 +50,11 @@ In addition, we have the following custom APIs: | |||
- Applicable to all [pooling models](../models/pooling_models.md). | |||
- [Score API](#score-api) (`/score`) | |||
- Only applicable to [cross-encoder models](../models/pooling_models.md) (`--task score`). | |||
- [Re-rank API](#rerank-api) (`/rerank`, `/v1/rerank`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- [Re-rank API](#rerank-api) (`/rerank`, `/v1/rerank`) | |
- [Re-rank API](#rerank-api) (`/rerank`, `/v1/rerank`, `/v2/rerank`) |
This PR introduces an explicit re-rank API at
/rerank
and/v1/rerank
. The same fundamental relevant-scoring functionality with cross-encoder models has been previously implemented in/score
, but this API is unique and does not follow any standardized API, and therefore requires custom code to consume.The endpoints introduced in this PR use the same cross-encoder scoring functionality under the hood, but they reproduce the API behavior of Jina AI's re-ranker and therefore also of Cohere's re-ranker (which is a subset of Jina's API) in terms of API request and response schemas and behavior.
Why Jina and Cohere? OpenAI does not have a re-ranking API. Jina and Cohere's API standards are very common for re-rank APIs, simliar to how OpenAI's chat completions API has become near-universally adopted. The Jina/Cohere re-rank API format has notably been followed by other labs and projects, including:
The code in
serving_rerank.py
is 90% the same asserving_score.py
, but implements the rerank API as described above. This enhances vLLM's compatibility with Open-source tools & frameworks that use existing well-defined APIs, allowing vLLM to be used for self-hosted RAG setup with off-the-shelf toolingFIX infiniflow/ragflow#4316