You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This PDF contains various section title followed by tables. When parsing these structures it seems like the titles will be adjoined to the texts before and after, thus distorting the meaning (or spelling) of these important title texts. Thus during a chat involving this PDF if you ask a question whether the document contains a section with this title, the reply is "NO", which is outright misleading.
Look for the title "MANAGEMENT DISCUSSION AND ANALYSIS" in the following screenshot:
Expected behavior
Section and table title should keep their semantic meaning and spelling intact during chunking.
Steps to reproduce
Upload the file, and carry out parsing using the "General" template.
Additional information
I suppose the embedding model chosen is irrelevant for this issue, but FYI embedding model used was Gemini.
The text was updated successfully, but these errors were encountered:
Is there an existing issue for the same bug?
RAGFlow workspace code commit ID
1160b58
RAGFlow image version
demo.ragflow.io
Other environment information
No response
Actual behavior
This PDF contains various section title followed by tables. When parsing these structures it seems like the titles will be adjoined to the texts before and after, thus distorting the meaning (or spelling) of these important title texts. Thus during a chat involving this PDF if you ask a question whether the document contains a section with this title, the reply is "NO", which is outright misleading.
Look for the title "MANAGEMENT DISCUSSION AND ANALYSIS" in the following screenshot:
Expected behavior
Section and table title should keep their semantic meaning and spelling intact during chunking.
Steps to reproduce
Additional information
I suppose the embedding model chosen is irrelevant for this issue, but FYI embedding model used was Gemini.
The text was updated successfully, but these errors were encountered: