Skip to content

How to get the words in the right order from the json result file? #511

Answered by fg-mindee
piegu asked this question in Q&A
Discussion options

You must be logged in to vote

For anyone looking for a solution, as mentioned by @charlesmindee earlier, we integrated line aggregation in #537. This should make its way to a release this week, but for now, you will need to install the developer version to enjoy the benefits on the high-level API.

It is enabled by default, so the basic usage snippet will work:

from doctr.io import DocumentFile
from doctr.models import ocr_predictor

model = ocr_predictor(pretrained=True)
doc = DocumentFile.from_pdf("path/to/your.pdf").as_images()
result = model(doc)
json_result = result.export()

Feel free to ask if you have any questions :)

Replies: 7 comments 13 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
5 replies
@fg-mindee
Comment options

@dhea1323
Comment options

@fg-mindee
Comment options

@dhea1323
Comment options

@fg-mindee
Comment options

Comment options

You must be logged in to vote
1 reply
@fg-mindee
Comment options

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
3 replies
@fg-mindee
Comment options

@charlesmindee
Comment options

@orlandito24
Comment options

Comment options

You must be logged in to vote
4 replies
@piegu
Comment options

@fg-mindee
Comment options

@fg-mindee
Comment options

@fg-mindee
Comment options

Comment options

You must be logged in to vote
0 replies
Answer selected by fg-mindee
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
module: io Related to doctr.io
6 participants