-
Notifications
You must be signed in to change notification settings - Fork 2
/
.env
56 lines (51 loc) · 2.21 KB
/
.env
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
## Papers Directory Path ##
## Defaults
## PAPERS_INPUT_DIR=./papers_to_be_shorten
## PAPERS_OUTPUT_DIR=./papers_shorten_result
## OUTPUT_PREFIX="Shortened_"
PAPERS_INPUT_DIR=./papers_to_be_shorten
PAPERS_OUTPUT_DIR=./papers_shorten_result
OUTPUT_PREFIX="Shortened_"
## LANGUAGE MODEL SETTINGS ##
## Defaults
## OPENAI_API_KEY=your_api_key # https://platform.openai.com/account/api-keys
##
## LANG_MODEL_NAME=gpt-3.5-turbo # (str) https://platform.openai.com/docs/models/gpt-3-5
## TEXT_TOKEN_LEN=3000 # (int) https://platform.openai.com/docs/models/gpt-3-5
##
## MODEL_TEMPERATURE=0.1 # (float) [0, 2]
## MODEL_TOP_P=0.2 # (float) [0, 1]
## MODEL_FREQUENCY_PENALTY=1 # (float) [-2, 2]
## MODEL_PRESENCE_PENALTY=1 # (float) [-2, 2]
OPENAI_API_KEY=your_api_key
LANG_MODEL_NAME=gpt-3.5-turbo
TEXT_TOKEN_LEN=3000
MODEL_TEMPERATURE=0.1
MODEL_TOP_P=0.2
MODEL_PRESENCE_PENALTY=1
MODEL_FREQUENCY_PENALTY=1
## TEXT SETTINGS ##
## Defaults
## Shorten provided document in `SHORTEN_RATIO` ratio of token length and repeat it `SHORTEN_REPEAT` times.
## In conclusion, the document will be shortened in (`SHORTEN_RATIO` ** `SHORTEN_REPEAT`) ratio of token length.
## SHORTEN_REPEAT=3 # (int) bigger than 0
## SHORTEN_RATIO=0.4 # (float) (0, 1]
##
## Sum of below two TOKEN_RATIOs should be under 1.0.
## Model references previous/next text truncated by each TOKEN_RATIO.
## Low TOKEN_RATIOs can result in efficient token usage, but it may also lead to inconsitancy.
## PREVIOUS_TEXT_TOKEN_RATIO=0.4 # (float) [0, 1)
## NEXT_TEXT_TOKEN_RATIO=0.2 # (float) [0, 1)
##
## Description of how the token count be calculated.
## CURRENT_TEXT_TOKEN_CNT = TEXT_TOKEN_LEN * (1 + SHORTEN_RATIO + PREVIOUS_TEXT_TOKEN_RATIO + NEXT_TEXT_TOKEN_RATIO)
## PREVIOUS_TEXT_TOKEN_CNT = CURRENT_TEXT_TOKEN_CNT * PREVIOUS_TEXT_TOKEN_RATIO
## NEXT_TEXT_TOKEN_CNT = CURRENT_TEXT_TOKEN_CNT * NEXT_TEXT_TOKEN_RATIO
## SHORTENED_TEXT_TOKEN_CNT = CURRENT_TEXT_TOKEN_CNT * SHORTEN_RATIO
SHORTEN_REPEAT=3
SHORTEN_RATIO=0.4
PREVIOUS_TEXT_TOKEN_RATIO=0.4
NEXT_TEXT_TOKEN_RATIO=0.2
## SUPPORTED FILE EXTENSIONS FOR PARSING
## ".txt", ".csv", ".pdf", ".doc", ".docx", ".json", ".xml", ".yaml", ".html", ".md", ".tex"
## Implement the extension you want, in 'file_operation_utils.py'.