You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
LLamaparse is missing chunks of text when parsing PDF / How Do You Ensure Your Parser Fully Parses a Document Without Missing Content (Text/Tables/Information)?
#566
Open
MuhammedTech opened this issue
Dec 25, 2024
· 2 comments
I’ve been testing LlamaParse for PDF parsing, and I was surprised to find that when I manually checked the output, some text seemed to be missing. I’m wondering how others ensure that the parser truly processes the entire document and doesn't leave out or miss any important pieces of information (text, tables, etc.).
How do you guys test your parsers to make sure they parse the whole document without any omissions? Do you use any specific validation techniques or post-processing checks to ensure completeness?
I’d love to hear your experiences and recommendations for improving document parsing accuracy
The text was updated successfully, but these errors were encountered:
I am interested to understand this as well. From my testing, I realized long and complicated parsing instructions tends to degrade the quality of output (e.g. table is being parsed but the contents within are being reduced, as though summarized to a minimal).
I had the same experience with a document-only / default-params call to the parsing/upload REST API endpoint, as described here, but there are quite a few parameters (documented here) that might help optimize the completeness of the output. If I find out anything helpful, I'll add another comment here with details.
I’ve been testing LlamaParse for PDF parsing, and I was surprised to find that when I manually checked the output, some text seemed to be missing. I’m wondering how others ensure that the parser truly processes the entire document and doesn't leave out or miss any important pieces of information (text, tables, etc.).
How do you guys test your parsers to make sure they parse the whole document without any omissions? Do you use any specific validation techniques or post-processing checks to ensure completeness?
I’d love to hear your experiences and recommendations for improving document parsing accuracy
The text was updated successfully, but these errors were encountered: