-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ALTO - PAGE xml: Object mapping and possible transformation generation #48
Comments
Document with list of features here: |
I made a start here: Some issues that need discussing:
The idea is to extend the JPageConverter to accept ALTO as target format. Already added but not tested: |
@chris1010010 This is great for a head start, many thanks! I will also circulate this within the @OCR-D community for comments and contributions. |
@cneud |
I made some progress in the Java converter. Open issues: SP, HYP, margins |
FYI there is also ongoing work in the German OCR SIG to complete what Christian started, cf. https://github.com/maxnth/page-alto-ressources and https://github.com/maxnth/prima-core-libs/branches |
As per the 2021-04-29 Board Meeting, I am linking the ocrd-page-to-alto TODO list here, which gives a nice summary of missing equivalencies. Kudos to everyone who has worked on this. |
On face-2-face conference in Vienna the idea came up to generate a conversion between PAGE and ALTO as best-practice mapping between the different standard objects.
If feasible, a transformation could be provided by XSLT.
The idea is to create a mapping on the latest ALTO version 4 to upcoming PAGE version in June and from there going backwards as far this makes sense.
Target is to get a common solution for mapping especially for objects where no exact matching is possible and workarounds or compromises need to be defined.
The text was updated successfully, but these errors were encountered: