Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode transfromation from txt to pdf not supported #152

Open
justemu opened this issue Oct 31, 2019 · 5 comments
Open

Unicode transfromation from txt to pdf not supported #152

justemu opened this issue Oct 31, 2019 · 5 comments

Comments

@justemu
Copy link

justemu commented Oct 31, 2019

T-Engine: transform-misc
Version: 2.1.0
Task : transformation of a txt file (unicode) to pdf failed
Error Information:

31 Oct 2019 11:16:56 510 txt pdf ERROR 本文档库子文件夹权限设置规范 688 bytes 274 ms textToPdf Failed 09311139 textToPdf returned a 400 status Miscellaneous Transformers - U+76EE ('.notdef') is not available in this font Helvetica encoding: WinAnsiEncoding http://transform-misc:8090/transform targetExtension=pdf sourceMimetype=text/plain sourceExtension=txt targetMimetype=application/pdf

Analysis:
The text file contains Unicode characters, which lead to the error.
I have confirmed this error with other Unicode characters, resulting to the same error.

How to improve unicode compatibility of the transform services? Switch the font file or change the WinAnsiEncoding?

@justemu
Copy link
Author

justemu commented Nov 2, 2019

The Problem should be font "Helvetica" does not contain Asian unicode characters.
The solution should be replace the font "Helvetica" with "FZLTXHJW.TTF".
The font "FZLTXHJW.TTF" is a super-collection of "Helvetica" -- Both the western alphabets and asian characters.

I have uploaded the font file as an attachment. Could somebody help fix the source code?

FZLTXHJW.TTF.zip

@ariksidney
Copy link

It's probably linked to this issue I discovered yesterday: Alfresco/alfresco-docker-base-java#60

@montgolfiere
Copy link
Contributor

montgolfiere commented Jun 17, 2021

Thanks @ariksidney

I'm not sure if OP is using docker image or not. JFYI: T-Core 2.1.0 was based on CentOS 7 (rather than CentOS 8).

https://github.com/Alfresco/alfresco-transform-core/blob/2.1.0/alfresco-docker-transform-misc/Dockerfile

In any case, we just released T-Core 2.5.1 which should have the new updated Java Base Image (11.0.11 / CentOS 8 - including the UTF-8 fix on CentOS 8).

@hi-ko
Copy link

hi-ko commented Apr 5, 2022

This seems to be all somehow related: MNT-22398 Transform Services AIO Engine Not Handling CSV Files with umlauts

This time WinAnsiEncoding is the issue.

I checked a work around forcing to pick libreoffice by overwriting the textToPdf transfomer in shared/classes/alfresco/extension/transform/pipelines without pdf as targetMediaType in textToPdf transformer. Now I get the same exception as described in MNT-22398 from the libreoffice transformer:

Caused by: org.alfresco.error.AlfrescoRuntimeException: 03050020 libreoffice returned a 400 status All in One Transformer - U+FEFF ('zerowidthjoiner') is not available in the font Helvetica, encoding: WinAnsiEncoding http://localhost:8090/transform targetExtension=pdf sourceEncoding=UTF-8 sourceMimetype=text/csv sourceExtension=csv targetMimetype=application/pdf
Apr 05 09:37:09 alf-test-72 alfresco[44731]:         at org.alfresco.repo.content.transform.RemoteTransformerClient.request(RemoteTransformerClient.java:193)
Apr 05 09:37:09 alf-test-72 alfresco[44731]:         at org.alfresco.repo.content.transform.RemoteTransformerClient.request(RemoteTransformerClient.java:99)
Apr 05 09:37:09 alf-test-72 alfresco[44731]:         at org.alfresco.repo.content.transform.LocalTransformImpl.transformImpl(LocalTransformImpl.java:193)
Apr 05 09:37:09 alf-test-72 alfresco[44731]:         at org.alfresco.repo.content.transform.AbstractLocalTransform.transformWithDebug(AbstractLocalTransform.java:160)
Apr 05 09:37:09 alf-test-72 alfresco[44731]:         ... 11 more

@hi-ko
Copy link

hi-ko commented Apr 5, 2022

I suggest to rename the ticket to something like: transfromation from txt to pdf does not support common encodings

I testet without docker in ubuntu 20.04 having locale en_US.UTF-8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants