LLamaparse is missing chunks of text when parsing PDF / How Do You Ensure Your Parser Fully Parses a Document Without Missing Content (Text/Tables/Information)? #566

MuhammedTech · 2024-12-25T03:47:07Z

I’ve been testing LlamaParse for PDF parsing, and I was surprised to find that when I manually checked the output, some text seemed to be missing. I’m wondering how others ensure that the parser truly processes the entire document and doesn't leave out or miss any important pieces of information (text, tables, etc.).

How do you guys test your parsers to make sure they parse the whole document without any omissions? Do you use any specific validation techniques or post-processing checks to ensure completeness?

I’d love to hear your experiences and recommendations for improving document parsing accuracy

galvangoh · 2024-12-30T01:11:38Z

I am interested to understand this as well. From my testing, I realized long and complicated parsing instructions tends to degrade the quality of output (e.g. table is being parsed but the contents within are being reduced, as though summarized to a minimal).

rthomas67 · 2025-01-03T15:59:07Z

I had the same experience with a document-only / default-params call to the parsing/upload REST API endpoint, as described here, but there are quite a few parameters (documented here) that might help optimize the completeness of the output. If I find out anything helpful, I'll add another comment here with details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLamaparse is missing chunks of text when parsing PDF / How Do You Ensure Your Parser Fully Parses a Document Without Missing Content (Text/Tables/Information)? #566

LLamaparse is missing chunks of text when parsing PDF / How Do You Ensure Your Parser Fully Parses a Document Without Missing Content (Text/Tables/Information)? #566

MuhammedTech commented Dec 25, 2024

galvangoh commented Dec 30, 2024

rthomas67 commented Jan 3, 2025

LLamaparse is missing chunks of text when parsing PDF / How Do You Ensure Your Parser Fully Parses a Document Without Missing Content (Text/Tables/Information)? #566

LLamaparse is missing chunks of text when parsing PDF / How Do You Ensure Your Parser Fully Parses a Document Without Missing Content (Text/Tables/Information)? #566

Comments

MuhammedTech commented Dec 25, 2024

galvangoh commented Dec 30, 2024

rthomas67 commented Jan 3, 2025