You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I can't get Cyrillic symbols in OCR layers for PDF files, that generated by k2pdfopt.
I can't reproduce an issue for PDF documents with Latin symbols.
2. Data
KiraIdeal.jpg — file with 2 Russian words Кира Идеал!:
KiraIdeal.pdf — PDF without OCR, that I convert from previous .jpg
3. Steps to reproduce
I download and install Tesseract (see section 4 of this issue) → I download v2.51a version for my 64-bit Windows from here → I add path with k2pdfopt.exe to user PATH environment variable → I set TESSDATA_PREFIX environment variable, as described here.
I run command:
k2pdfopt -mode copy -ocr -ocrlang rus KiraIdeal.pdf
1. Summary
I can't get Cyrillic symbols in OCR layers for PDF files, that generated by k2pdfopt.
I can't reproduce an issue for PDF documents with Latin symbols.
2. Data
KiraIdeal.jpg
— file with 2 Russian wordsКира Идеал!
:KiraIdeal.pdf
— PDF without OCR, that I convert from previous.jpg
3. Steps to reproduce
I download and install Tesseract (see section 4 of this issue) → I download v2.51a version for my 64-bit Windows from here → I add path with
k2pdfopt.exe
to userPATH
environment variable → I setTESSDATA_PREFIX
environment variable, as described here.I run command:
4. Actual behavior
KiraIdeal_k2_opt.pdf
:Copy and paste text from
KiraIdeal_k2_opt.pdf
:5. Expected behavior
If tesseract command:
or tesseract command, that generate PDF:
I copy and paste text from
KiraIdealTesseract.pdf
:or k2pdfopt command:
KiraIdeal.txt
:6. Not helped
I can't find, how I can solve my problem in official site pages:
7. Enviroment
Thanks.
The text was updated successfully, but these errors were encountered: