Replies: 6 comments
-
I found a similar unsolved discussion regarding URLs being removed from responses during segmentation. You can view it here [1]. To address the issue of URLs being removed during segmentation, you can adjust the configuration settings in Dify. Specifically, you can modify the "process_rule": {
"mode": "custom",
"rules": {
"pre_processing_rules": [
{
"id": "remove_extra_spaces",
"enabled": true
},
{
"id": "remove_urls_emails",
"enabled": false
}
],
"segmentation": {
"separator": "\n",
"max_tokens": 1000
}
}
} This configuration will prevent URLs from being removed during the segmentation process [2]. To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
Beta Was this translation helpful? Give feedback.
-
How do we change the process_rule configuration? Could you be more specific?
2024年12月20日(金) 9:38 dosubot[bot] ***@***.***>:
… I found a similar unsolved discussion regarding URLs being removed from
responses during segmentation. You can view it here
<#11856> [1]
<#11856>.
To address the issue of URLs being removed during segmentation, you can
adjust the configuration settings in Dify. Specifically, you can modify the
process_rule configuration to retain URLs by setting the
remove_urls_emails rule to false. Here is an example configuration:
"process_rule": {
"mode": "custom",
"rules": {
"pre_processing_rules": [
{
"id": "remove_extra_spaces",
"enabled": true
},
{
"id": "remove_urls_emails",
"enabled": false
}
],
"segmentation": {
"separator": "\n",
"max_tokens": 1000
}
}
}
This configuration will prevent URLs from being removed during the
segmentation process [2]
<https://github.com/langgenius/dify/blob/main/web/app/(commonLayout)/datasets/template/template.en.mdx>
.
*To continue talking to Dosu <https://dosu.dev>, mention @dosu
<https://github.com/dosu>.*
------------------------------
Help Dosu learn! Give it feedback: Great Response
<https://app.dosu.dev/response-feedback/7ba7074e-b52d-4317-b2a6-5082affc324d?feedback_type=great_response>
| Irrelevant Answer
<https://app.dosu.dev/response-feedback/7ba7074e-b52d-4317-b2a6-5082affc324d?feedback_type=irrelevant_answer>
| Incorrect Sources
<https://app.dosu.dev/response-feedback/7ba7074e-b52d-4317-b2a6-5082affc324d?feedback_type=incorrect_sources>
| Too Verbose
<https://app.dosu.dev/response-feedback/7ba7074e-b52d-4317-b2a6-5082affc324d?feedback_type=too_verbose>
| Hallucination
<https://app.dosu.dev/response-feedback/7ba7074e-b52d-4317-b2a6-5082affc324d?feedback_type=hallucination>
| Other
<https://app.dosu.dev/response-feedback/7ba7074e-b52d-4317-b2a6-5082affc324d?feedback_type=other>
—
Reply to this email directly, view it on GitHub
<#11861 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APHDDGVDASHPP5Y7VD6BEDL2GNRG5AVCNFSM6AAAAABT6C2VR2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNJWGAZDENZQHA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
松永 力
|
Beta Was this translation helpful? Give feedback.
-
To change the
This process will ensure that URLs are retained during the segmentation process [1]. To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
Beta Was this translation helpful? Give feedback.
-
The process_rule section seems to be in json format. If I were to write it in docker-compose.yaml, how would it look like? |
Beta Was this translation helpful? Give feedback.
-
The process_rule section seems to be in json format. If I were to write it
in docker-compose.yaml, how would it look like?
2024年12月20日(金) 9:52 dosubot[bot] ***@***.***>:
… To change the process_rule configuration in Dify, you need to modify the
configuration file where these settings are defined. Here’s a step-by-step
guide:
1.
*Locate the Configuration File*: The process_rule settings are
typically part of a configuration file in your Dify setup. If you are using
a self-hosted Docker deployment, these settings might be in a JSON or YAML
file within your project directory.
2.
*Edit the Configuration*: Open the configuration file and locate the
process_rule section. Modify the remove_urls_emails rule by setting
its enabled field to false. Here’s an example configuration:
{
"process_rule": {
"mode": "custom",
"rules": {
"pre_processing_rules": [
{
"id": "remove_extra_spaces",
"enabled": true
},
{
"id": "remove_urls_emails",
"enabled": false
}
],
"segmentation": {
"separator": "\n",
"max_tokens": 1000
}
}
}
}
3.
*Apply the Changes*: If you are using Docker, you will need to restart
your Docker containers to apply the changes. Use the following commands:
docker compose down
docker compose up -d
This process will ensure that URLs are retained during the segmentation
process [1]
<https://github.com/langgenius/dify/blob/main/web/app/(commonLayout)/datasets/template/template.en.mdx>
.
*To continue talking to Dosu <https://dosu.dev>, mention @dosu
<https://github.com/dosu>.*
------------------------------
Help Dosu learn! Give it feedback: Great Response
<https://app.dosu.dev/response-feedback/7e030d7b-e560-4cf0-8c89-57fb57db16a2?feedback_type=great_response>
| Irrelevant Answer
<https://app.dosu.dev/response-feedback/7e030d7b-e560-4cf0-8c89-57fb57db16a2?feedback_type=irrelevant_answer>
| Incorrect Sources
<https://app.dosu.dev/response-feedback/7e030d7b-e560-4cf0-8c89-57fb57db16a2?feedback_type=incorrect_sources>
| Too Verbose
<https://app.dosu.dev/response-feedback/7e030d7b-e560-4cf0-8c89-57fb57db16a2?feedback_type=too_verbose>
| Hallucination
<https://app.dosu.dev/response-feedback/7e030d7b-e560-4cf0-8c89-57fb57db16a2?feedback_type=hallucination>
| Other
<https://app.dosu.dev/response-feedback/7e030d7b-e560-4cf0-8c89-57fb57db16a2?feedback_type=other>
—
Reply to this email directly, view it on GitHub
<#11861 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APHDDGQ5FHWQRTSOSKXIFZD2GNS5XAVCNFSM6AAAAABT6C2VR2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNJWGAZTMNJZGM>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
松永 力
|
Beta Was this translation helpful? Give feedback.
-
To convert the version: '3.8'
services:
your_service_name:
image: your_image_name
environment:
PROCESS_RULE_MODE: "custom"
PRE_PROCESSING_RULES: |
- id: remove_extra_spaces
enabled: true
- id: remove_urls_emails
enabled: false
SEGMENTATION_SEPARATOR: "\n"
SEGMENTATION_MAX_TOKENS: 1000 Replace To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
Beta Was this translation helpful? Give feedback.
-
Self Checks
Dify version
"server_version": "0.11.0"
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
There is a problem with URLs being removed from responses during segmentation, what is the recommended format for retaining URLs, is Markdown formatting or special character escaping required?
✔️ Expected Behavior
URLs remain in the responses even after segmentation.
❌ Actual Behavior
No response
Beta Was this translation helpful? Give feedback.
All reactions