How to get rectangular co-ordinates of detected text of an image #533

PoornaSaiNagendra · 2021-10-20T18:14:29Z

PoornaSaiNagendra
Oct 20, 2021

I am using the default doctr code for ocrpredictor. The values in the geometry of the result.export() contains clipped values. How can I reconstruct the actual text detected coordinates.

Since geometry type is a tuple( tuple( float, float), tuple( float, float)) I am getting decimal values when I tried the below code snippet. Also, they are not the correct coordinates.

json_output = result.export()

words_dic = json_output['pages'][0]['blocks'][0]['lines'][0]['words']

words_dic = json_output['pages'][0]['blocks'][0]['lines'][0]['words']

bboxes = []

for word in words_dic:
  if len(word['geometry']) != 5:
    height, width = json_output['pages'][0]['dimensions'][0], json_output['pages'][0]['dimensions'][0]
    (xmin, ymin), (xmax, ymax) = word['geometry']
    # Switch to absolute coords
    xmin, w = xmin * width, (xmax - xmin) * width
    ymin, h = ymin * height, (ymax - ymin) * height
    bboxes.append((xmin, ymin, w, h))

Is there any way to get the actual coordinates while using the default doctor code? Actually, I want this for 2 reasons:

To sort the boxes in the desired way,
To crop the detected text part in the image.

Thanks for any help you can provide in resolving this.

Answered by PoornaSaiNagendra

Oct 22, 2021

By multiplying the relative coords and page dimensions I got values in decimals, here I am providing you the first 5 values in format (xmin, w, ymin, h)

(656.0859375, 30.8642578125, 712.5234375, 40.564453125)
(600.5302734375, 59.96484375, 685.1865234375, 69.6650390625)
(155.203125, 82.0107421875, 186.94921875, 95.23828125)
(128.748046875, 82.0107421875, 156.0849609375, 96.1201171875)
(198.4130859375, 80.2470703125, 300.7060546875, 98.765625)

Page dimension is (903, 638)

Geometry of first 5 are:

(0.7265625, 0.0341796875) (0.7890625, 0.044921875)
(0.6650390625, 0.06640625) (0.7587890625, 0.0771484375)
(0.171875, 0.0908203125) (0.20703125, 0.10546875)
(0.142578125, 0.0908203125) (0.172851562…

View full answer

fg-mindee · 2021-10-21T08:13:06Z

fg-mindee
Oct 21, 2021

Hi @PoornaSaiNagendra 👋

If you print the export directly, you will see that it provides the page dimensions and since coordinates are relative, this should help you reconstruct the document!

So a few questions:

what do you mean by actual coordinates? If you mean absolute coords, yes by multiplying the relative coords and page dimensions
could you provide an example of values for a bbox, and what you expected it to be please?

You can check https://github.com/mindee/doctr/blob/main/doctr/utils/visualization.py to see how we use the export to plot things. Let me know if you still have some troubles!

0 replies

PoornaSaiNagendra · 2021-10-22T05:35:20Z

PoornaSaiNagendra
Oct 22, 2021
Author

By multiplying the relative coords and page dimensions I got values in decimals, here I am providing you the first 5 values in format (xmin, w, ymin, h)

(656.0859375, 30.8642578125, 712.5234375, 40.564453125)
(600.5302734375, 59.96484375, 685.1865234375, 69.6650390625)
(155.203125, 82.0107421875, 186.94921875, 95.23828125)
(128.748046875, 82.0107421875, 156.0849609375, 96.1201171875)
(198.4130859375, 80.2470703125, 300.7060546875, 98.765625)

Page dimension is (903, 638)

Geometry of first 5 are:

(0.7265625, 0.0341796875) (0.7890625, 0.044921875)
(0.6650390625, 0.06640625) (0.7587890625, 0.0771484375)
(0.171875, 0.0908203125) (0.20703125, 0.10546875)
(0.142578125, 0.0908203125) (0.1728515625, 0.1064453125)
(0.2197265625, 0.0888671875) (0.3330078125, 0.109375)

When I tried using the below code for cropping

cv2.imwrite(img_name, img[bboxes[i][0] : bboxes[i][2], bboxes[i][1] : bboxes[i][3]])

It is showing an error: slice indices must be integers or None or have an __index__ method

Since the reconstructed values are decimals how can I get the values in integers form without losing the detected text region?
Also when I tried math.floor() to convert them to integer values, the cropped images doesn't contain text.

So in my case, I need absolute coords. in integer type.

2 replies

fg-mindee Oct 22, 2021

Hi again 👋

First, since relative coords do not have inifinite precision, it's expected to yield values with decimals (feel free to round and cast them to integers for your own use).

here I am providing you the first 5 values in format (xmin, w, ymin, h)

What do you mean? Is this the Ground truth?

When I tried using the below code for cropping

cv2.imwrite(img_name, img[bboxes[i][0] : bboxes[i][2], bboxes[i][1] : bboxes[i][3]])

I'd need the code snippet to help you here 😅

More generally speaking, DocTR includes everything for both result plotting or even page synthesis (without you having to understand the result format). If you're looking for result interpretation with end-to-end OCR, I suggest checking https://github.com/mindee/doctr#putting-it-together (all examples for visualization)

Now if you mean text detection only, the output of the model & predictors are by default in format xmin, ymin, xmax, ymax, score with the score being the objectness.

Let me know if you're still having troubles :)

fg-mindee Oct 29, 2021

Hi @PoornaSaiNagendra 👋

Would you mind marking the relevant message as an answer for this discussion please? It will help potential future visitors to quickly identify the ins & outs of the topic :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get rectangular co-ordinates of detected text of an image #533

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

How to get rectangular co-ordinates of detected text of an image #533

PoornaSaiNagendra Oct 20, 2021

Replies: 2 comments · 2 replies

fg-mindee Oct 21, 2021

PoornaSaiNagendra Oct 22, 2021 Author

fg-mindee Oct 22, 2021

fg-mindee Oct 29, 2021

PoornaSaiNagendra
Oct 20, 2021

Replies: 2 comments 2 replies

fg-mindee
Oct 21, 2021

PoornaSaiNagendra
Oct 22, 2021
Author