How to get rectangular co-ordinates of detected text of an image #533
-
I am using the default doctr code for ocrpredictor. The values in the geometry of the result.export() contains clipped values. How can I reconstruct the actual text detected coordinates. Since geometry type is a tuple( tuple( float, float), tuple( float, float)) I am getting decimal values when I tried the below code snippet. Also, they are not the correct coordinates. json_output = result.export()
words_dic = json_output['pages'][0]['blocks'][0]['lines'][0]['words']
words_dic = json_output['pages'][0]['blocks'][0]['lines'][0]['words']
bboxes = []
for word in words_dic:
if len(word['geometry']) != 5:
height, width = json_output['pages'][0]['dimensions'][0], json_output['pages'][0]['dimensions'][0]
(xmin, ymin), (xmax, ymax) = word['geometry']
# Switch to absolute coords
xmin, w = xmin * width, (xmax - xmin) * width
ymin, h = ymin * height, (ymax - ymin) * height
bboxes.append((xmin, ymin, w, h)) Is there any way to get the actual coordinates while using the default doctor code? Actually, I want this for 2 reasons:
Thanks for any help you can provide in resolving this. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
Hi @PoornaSaiNagendra 👋 If you print the export directly, you will see that it provides the page dimensions and since coordinates are relative, this should help you reconstruct the document! So a few questions:
You can check https://github.com/mindee/doctr/blob/main/doctr/utils/visualization.py to see how we use the export to plot things. Let me know if you still have some troubles! |
Beta Was this translation helpful? Give feedback.
-
By multiplying the relative coords and page dimensions I got values in decimals, here I am providing you the first 5 values in format (xmin, w, ymin, h) (656.0859375, 30.8642578125, 712.5234375, 40.564453125) Page dimension is (903, 638) Geometry of first 5 are: (0.7265625, 0.0341796875) (0.7890625, 0.044921875) When I tried using the below code for cropping cv2.imwrite(img_name, img[bboxes[i][0] : bboxes[i][2], bboxes[i][1] : bboxes[i][3]]) It is showing an error: slice indices must be integers or None or have an __index__ method Since the reconstructed values are decimals how can I get the values in integers form without losing the detected text region? So in my case, I need absolute coords. in integer type. |
Beta Was this translation helpful? Give feedback.
By multiplying the relative coords and page dimensions I got values in decimals, here I am providing you the first 5 values in format (xmin, w, ymin, h)
(656.0859375, 30.8642578125, 712.5234375, 40.564453125)
(600.5302734375, 59.96484375, 685.1865234375, 69.6650390625)
(155.203125, 82.0107421875, 186.94921875, 95.23828125)
(128.748046875, 82.0107421875, 156.0849609375, 96.1201171875)
(198.4130859375, 80.2470703125, 300.7060546875, 98.765625)
Page dimension is (903, 638)
Geometry of first 5 are:
(0.7265625, 0.0341796875) (0.7890625, 0.044921875)
(0.6650390625, 0.06640625) (0.7587890625, 0.0771484375)
(0.171875, 0.0908203125) (0.20703125, 0.10546875)
(0.142578125, 0.0908203125) (0.172851562…