-
Notifications
You must be signed in to change notification settings - Fork 76
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #520 from deeppavlov/dev
Release v1.8.2
- Loading branch information
Showing
10 changed files
with
149 additions
and
24 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
# Document Retriever | ||
|
||
## Description | ||
|
||
Document Retriever is an annotator with two endpoints used to retrieve `PARAGRAPHS_NUM` document parts most relevant to the user request. | ||
|
||
1. **train_and_upload_model** endpoint converts the documents provided by the user to txt format (if necessary) and splits them into chunks of ~100 words. Chunks are then transformed into a TF-IDF matrix; the resulting vectors and the vectorizer are saved for future use. This step is performed only once, in the beginning of the dialog. | ||
Documents (txt format), matrix, and vectorizer are uploaded to file server to be used by **return_candidates** endpoint and **dff_document_qa_llm** skill. | ||
2. **return_candidates** endpoint downloads TF-IDF matrix and vectorizer from the file server. It then converts the user’s utterance into a TF-IDF vector and finds `PARAGRAPHS_NUM` candidates with highest cosine similarity among TF-IDF vectors of text chunks. | ||
|
||
## Parameters | ||
|
||
``` | ||
CONFIG_PATH: configuration file with parameters for doc_retriever model | ||
FILE_SERVER_TIMEOUT: timeout for request where files are stored | ||
PARAGRAPHS_NUM: number of most relevant chunks to retrieve. Don't make this number too large or the chunks won't fit into LLM context! | ||
DOC_PATH_OR_LINK: paths or link to the files to be use for Question Answering. If paths, those are paths to files in `documents` folder in dream. If links, those must point to a file, not an Internet page. NB: file paths/links must be separated by a comma and no whitespace. | ||
``` | ||
|
||
## Dependencies | ||
|
||
- **return_candidates** endpoint depends on **train_and_upload_model** endpoint |
2 changes: 1 addition & 1 deletion
2
annotators/prompt_selector/service_configs/dream_persona_openai_prompted/environment.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
SERVICE_PORT: 8135 | ||
SERVICE_NAME: prompt_selector | ||
N_SENTENCES_TO_RETURN: 3 | ||
PROMPTS_TO_CONSIDER: dream_persona | ||
PROMPTS_TO_CONSIDER: dream_persona,dream_faq | ||
FLASK_APP: server |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
# LLM-based Q&A on Documents Skill | ||
|
||
## Description | ||
|
||
LLM-based Q&A on Documents Skill answers questions about long documents provided by the user. It passes on document chunks most relevant to the user's question alongside with an instruction and the dialog context as a prompt to ChatGPT. | ||
|
||
## Parameters | ||
|
||
``` | ||
GENERATIVE_SERVICE_URL: LLM to utilize | ||
GENERATIVE_SERVICE_CONFIG: configuration file with generative parameters to utilize | ||
GENERATIVE_TIMEOUT: timeout for request to LLM | ||
N_UTTERANCES_CONTEXT: number of last utterances to consider as a dialog context | ||
ENVVARS_TO_SEND: API keys splitted by comma to get as env variables | ||
FILE_SERVER_TIMEOUT: timeout for request where files are stored | ||
DOCUMENT_PROMPT_FILE: file to get the instruction from (to insert into prompt guiding the Question Answering model) | ||
``` | ||
|
||
## Dependencies | ||
|
||
- LLM service provided in `GENERATIVE_SERVICE_URL` | ||
- annotator Document Retriever (both endpoints) | ||
- API keys in environmental variables for key-required LLMs (OpenAI API, Anthropic API) | ||
|
||
|
||
## How to get OpenAI API key | ||
|
||
Go to OpenAI and find your Secret API key in your [user settings](https://platform.openai.com/account/api-keys). |