-
Notifications
You must be signed in to change notification settings - Fork 256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Unipdf is messing up the unicode while merging Searchable PDF documents and applying PDF/1-a #550
Comments
Also if it helps, the two documents that I attached to this ticket claim to be PDF/A-1a compliant. But when I try validating either of these documents using unipdf PDF/A-1a validation function, the validation fails with some errors. But again if I try to enforce PDF/A-1a standard on the merged document, the unicode gets messed up and I am unable to search for any text. |
Page 1 of OCRed image0069.pdf Please find the two PDF files I used for the sample program |
Hello @sagar-kalburgi-ripcord we have fixed your issue, it was merged into development branch of unipdf source repository (I believe you have access to it and can test it). It will be included in the next release of UniPDF as well, we will let you know when it will be out. |
Hi @anovik, Thanks! Sure I will test it. Any idea when the next release of UniPDF is going to be? |
@sagar-kalburgi-ripcord It is planned for the end of April. |
@anovik I tested the fix from your development branch and it looks good! |
@sagar-kalburgi-ripcord We completely understand the urgency of your situation and are prioritizing the release of a hotfix to address this issue as quickly as possible. We'll keep you updated on the progress and notify you as soon as the release is completed. |
ok thanks! |
@sagar-kalburgi-ripcord The new release of UniPDF is available https://github.com/unidoc/unipdf/releases/tag/v3.57.0 and it includes this issue. Closing the current ticket, feel free to re-open it in case of any problems. |
Description
When I try merging two searchable PDF documents (produced by an OCR engine) using unipdf I am also specifying that PDF/1-a standard needs to be applied before writing the result of the merge to an output PDF file. But when I open the output PDF file, I see that the unicode is messed up because I am unable to search for any existing word on it, and when I copy some of the text on the output PDF and paste it on a notepad, I see this unrecognizable text:
ıˇ
ˇ
ı ˇı
ı
Expected Behavior
When I open the output PDF file and search for an existing word, it should show up. And when I copy the textual contents of the output PDF file and paste it on a notepad, it should paste the exact text that's present.
Actual Behavior
Steps to reproduce the behavior:
Please run the below sample program with a valid Unidoc license and using the attached PDF files.
Attachments
2 PDF files have been attached
The text was updated successfully, but these errors were encountered: