Tesseract.js (v5.1.1) Fails on Angled Text Recognition #963

rajkumardongre · 2024-09-27T03:09:31Z

I am using Tesseract.js version 5.1.1 to build a simple OCR solution. While it works well on images with horizontally or vertically aligned text (top to bottom), I encounter issues when the text is at an angle or written bottom to up.

Observations:

Text orientation issues:
- The OCR works fine with horizontal and top-to-bottom vertical text.
- It fails or performs inconsistently with text written at certain angles, especially:
  - Angled text: Between 0° and 90° clockwise directon.
  - Bottom-to-up vertical text:Text written from bottom to top, i.e., between 90° and 270° clockwise, is not detected.
Settings applied:
- I am setting rotateAuto: true, which works in some cases but fails to detect text properly at certain angles.
- The orientation detection setting is enabled but does not seem to improve recognition accuracy at problematic angles.

const worker = await createWorker(storedLangCodeList, 1);
await worker.setParameters({
    tessedit_pageseg_mode: PSM.AUTO_OSD,
});
const ret = await worker.recognize(imgURL, {rotateAuto: true});
console.log(ret.data.text)

Expected Behavior:

OCR should be able to correctly detect and process text, regardless of its orientation or the angle at which it appears in the image.

Actual Behavior:

Text at an angle or written from bottom to top is either not recognized or inaccurately detected.

Steps to Reproduce:

Use Tesseract.js v5.1.1 with an image containing angled or bottom-to-top vertical text.
Set rotateAuto: true and enable orientation detection.
Attempt OCR on the image and observe inconsistent results.

Additional Information:

Browser: Chrome
Tesseract.js Version: 5.1.1
Issue occurs on both local development and production environments.

Images for Reference:

images of the text where recognition fails

The text was updated successfully, but these errors were encountered:

jcaron23 · 2024-09-29T14:21:52Z

There’s another issue which explains that rotateAuto only works up to +-10 degrees, so indeed it won’t work with text angled at 45 degrees.

I’m not sure if this is something that could be changed or is somehow inherent to the method.

It’s probably a good idea to use another method to detect the text (including position and angle) before feeding it into Tesseract once straightened.

apexkid · 2024-09-29T16:53:36Z

+1 to the issue.

Hey Jacques,

How would you recommend to "detect the text (including position and angle) before feeding it ". Is there a built in function in teserractJs for this or you suggesting to use some other library like OpenCv to accomplish this?

Balearica · 2024-09-30T02:37:04Z

@rajkumardongre I think there are two separate issues here. First is the case of images that are scanned in a particular orientation--one of 0, 90, 180, or 270 degrees. This should be something Tesseract is capable of handling, however appears to be bugged in certain cases. This is already being discussed in #940.

The second issue brought up above is images rotated by some amount that is not a standard orientation, nor an amount close enough to be handled by rotateAuto. For example, rotation by 45 degrees. Even if all the bugs were patched, this is not something Tesseract supports, so this would require a third-party tool.

EffDuBois · 2024-10-17T17:11:41Z

+1 to the issue.

Hey Jacques,

How would you recommend to "detect the text (including position and angle) before feeding it ". Is there a built in function in teserractJs for this or you suggesting to use some other library like OpenCv to accomplish this?

So after testing with tesseract, the detect feature doesn't seem to work on text slanted at angles anything other than 0 90 180 or 270 degrees. It doesn't work with 45 degrees or something.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tesseract.js (v5.1.1) Fails on Angled Text Recognition #963

Tesseract.js (v5.1.1) Fails on Angled Text Recognition #963

rajkumardongre commented Sep 27, 2024

jcaron23 commented Sep 29, 2024

apexkid commented Sep 29, 2024

Balearica commented Sep 30, 2024

EffDuBois commented Oct 17, 2024

Tesseract.js (v5.1.1) Fails on Angled Text Recognition #963

Tesseract.js (v5.1.1) Fails on Angled Text Recognition #963

Comments

rajkumardongre commented Sep 27, 2024

Observations:

Expected Behavior:

Actual Behavior:

Steps to Reproduce:

Additional Information:

Images for Reference:

jcaron23 commented Sep 29, 2024

apexkid commented Sep 29, 2024

Balearica commented Sep 30, 2024

EffDuBois commented Oct 17, 2024