Replies: 2 comments 1 reply
-
Based on the provided code snippets and the behavior you've observed, it appears that the This two-tiered application of The rerank stage then applies these parameters again, but this time to refine the results further, ensuring that the final output is not only relevant but also of high quality. The As for your concern about the redundancy when using very small Regarding the configuration of In conclusion, the observed behavior of applying For more details, you can refer to the following code sources:
I hope this clarifies your question. If you have further questions or need more clarification, feel free to ask. Sources
|
Beta Was this translation helpful? Give feedback.
-
@JohnJyong Hi, could you take a look and give some explaination, to see if it can resolve my doubt, thanks. |
Beta Was this translation helpful? Give feedback.
-
Self Checks
Dify version
0.5.10
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
enable the rerank model, and set threshold value and topk value, debug the recall behavior.✔️ Expected Behavior
topk and threshold value should only affect rerank stage, like the doc describes.
(sorry, I didn't find this explaination in English doc...)
the last line words says, the topk and threshold value will only affect the rerank stage.
(there is another doubt, somehow, I assume that dify has a mechanism that recall a limited item set from vectordb and send them to rerank stage)
so the RAG with rerank steps will be like:
sim search and get a bunch of doc chunks ==> somehow? to get n (n>k) chunks ==> rerank stage ==> filtered by the treshold
==> get top-k from reordered and filtered chunk list ==> got final reference
❌ Actual Behavior
from the code, seems the recall logic use topk and threshold both in the first retrival and rerank stage.
so current RAG with rerank steps are:
sim search ==> filtered by threshold and get top-k chunks ==> rerank stage to get reodered chunk list ==> again, filtered by threshold and get top-k chunks ==> got final reference
I think reranking chunks within the top-k chunks got from the first retrival stage is kind of meaningless, if user always tend to use very small top-k value like 2, 3... there is no meaning for reranking with in this kind of small chunk set.
what do you think?
Beta Was this translation helpful? Give feedback.
All reactions