GitHub - OCR4all/ocr4all-backend: Master repository containing all required submodules to get the new OCR4all backend (still WIP) up and running

https://github.com/OCR4all/ocr4all-backend.git# ocr4all-backend Master repository containing all required submodules to get the new OCR4all backend (still WIP) up and running

Contained submodules

Getting started

Requirements

git
Java 17
mvn
docker compose
bash (optional)

Download

Clone this repository recursively.

git clone --recurse-submodules --remote-submodules https://github.com/OCR4all/ocr4all-backend.git

An SSH Public key connected with your GitHub-Account is required.

Build

To steps are required to build the application:

compile the libraries and package the jars: run the bash script ocr4all-build.sh with the argument build.
build docker images: run docker compose with the argument build. The file docker-env-dev gives an example of a common setup of a development environment which stores the application data in the user's home directory ${user.home}/ocr4all/dev.

Application

To start the application run docker compose with the argument up. The server HTTP port is set to 9090. As by build, the file docker-env-dev gives an example of a common setup of a development environment.

Defaults

The defaults for the application are defined in the file src/main/resources/application.yml of the projects ocr4all-app, ocr4all-app-calamari-msa and ocr4all-app-ocrd-msa. Several profiles are defined that can be used to control the behaviour of the application.

Security

Authentication/authorisation is activated in the server profile and deactivated in the desktop profile.

Authentication/authorisation is configured in the following files in the ocr4all/workspace/.ocr4all folder (see below for an example setup): users, passwords and groups. After authentication in the application with administrative rights, the API can be used to manage users, passwords and groups.

A default administrator user is created, if the application has the server and development profile enabled and/or the application property ocr4all.application.security.administrator.create is set to true and no administrator user exists. The login credentials are

username: admin
password: ocr4all

Example: rights management setup

File user admin:active::Administrator user
File password (password ocr4all) admin:{bcrypt}$2a$10$rqYn8YjNLzegNMYZVFtvAuwAZBWFgZQ9bprHhjhHnk3oGUPdEPkYq
File group admin:active:admin:Administrator group

Using ocr-d processors

Install models in ocr4all/opt/ocr-d/resources (see ocr-d resource list)

Calamari recognize download desired models in subfolder ocrd-calamari-recognize
Tesserocr recognize download desired models in subfolder ocrd-tesserocr-recognize

API

API documentation

The Swagger UI for the API documentation can be accessed under http://localhost:9090/api/doc/swagger-ui/index.html.

Example

An example of using the API.

instance
Method: GET
URL: http://localhost:9090/api/v1.0/instance

if authentication/authorization is activated, then login - for further communication, use the bearer token from the authorization KEY from the header or the token from the response body
Method: POST
URL: http://localhost:9090/api/v1.0/login
Body:
{
    "username": "admin",
	"password": "ocr4all"
}

create project
Method: GET
URL: http://localhost:9090/api/v1.0/project/create?id=project_01

Add in exchange folder the images
folder: ocr4all/exchange/project_01/images

See running/done jobs
Method: GET
URL: http://localhost:9090/api/v1.0/job/scheduler/snapshot/administration

Import the images in the project from exchange folder
Method: POST
URL: http://localhost:9090/api/v1.0/spi/import/schedule/project_01
Body:
{
"id": "de.uniwuerzburg.zpd.ocr4all.application.core.spi.imp.provider.ImageImport",
"strings": [
{"argument": "source-folder", "value": "images"}
],
"selects": [
{"argument": "image-formats", "values": ["tif"]}
]
}

Create a sandbox
Method: GET
URL: http://localhost:9090/api/v1.0/sandbox/create/project_01?id=sandbox_01

Launch the sandbox
Method: POST
URL: http://localhost:9090/api/v1.0/spi/launcher/schedule/project_01/sandbox_01
Body:
{
"id": "de.uniwuerzburg.zpd.ocr4all.application.core.spi.launcher.provider.SandboxLauncher",
"images": [
{"argument": "images", "values": [1,2,3,4,5,6]}
],
"label": "launcher default with images",
"description": "description launcher default with images"
}

Using ocr-d processors

preprocessing: Binarize
Method: POST
URL: http://localhost:9090/api/v1.0/spi/preprocessing/schedule/project_01/sandbox_01
Body:
{
"id": "de.uniwuerzburg.zpd.ocr4all.application.ocrd.spi.msa.preprocessing.MsaCISOcropyBinarize",
"parent-snapshot": {"track": []},
"label": "cis binarize default",
"description": "ocr-d cis ocropy binarize default"
}

olr: Segment region
Method: POST
URL: http://localhost:9090/api/v1.0/spi/olr/schedule/project_01/sandbox_01
Body:
{
"id": "de.uniwuerzburg.zpd.ocr4all.application.ocrd.spi.msa.olr.MsaTesserocrSegmentRegion",
"parent-snapshot": {"track": [1]},
"label": "tesserocr segment region default",
"description": "ocr-d tesserocr segment region default"
}

olr: Segment line

Method: POST
URL: http://localhost:9090/api/v1.0/spi/olr/schedule/project_01/sandbox_01
Body:
{
"id": "de.uniwuerzburg.zpd.ocr4all.application.ocrd.spi.msa.olr.MsaTesserocrSegmentLine",
"parent-snapshot": {"track": [1,1]},
"label": "tesserocr segment line default",
"description": "ocr-d tesserocr segment line default"
}

ocr: Calamari recognize
Method: POST
URL: http://localhost:9090/api/v1.0/spi/ocr/schedule/project_01/sandbox_01
Body:
{
"id": "de.uniwuerzburg.zpd.ocr4all.application.ocrd.spi.msa.ocr.MsaCalamariRecognize",
"selects": [ {"argument": "checkpoint_dir", "values": ["fraktur_historical"]} ],
"parent-snapshot": {"track": [1,1,1]},
"label": "Calamari model",
"description": "ocr-d Calamari model fraktur_historical"
}

ocr: Tesserocr recognize
Method: POST
URL: http://localhost:9090/api/v1.0/spi/ocr/schedule/project_01/sandbox_01
Body:
{
"id": "de.uniwuerzburg.zpd.ocr4all.application.ocrd.spi.msa.ocr.MsaTesserocrRecognize",
"selects": [{"argument": "model", "values": ["deu", "frk"]}],
"parent-snapshot": {"track": [1,1,1]},
"label": "Tesserocr models",
"description": "ocr-d Tesserocr models deu + frk"
}

Results will be available in the following directories:

Calamari recognize ocr4all/workspace/projects/project_01/sandboxes/sandbox_01/snapshots/derived/1/derived/1/derived/1/derived/1/sandbox
Tesserocr recognize ocr4all/workspace/projects/project_01/sandboxes/sandbox_01/snapshots/derived/1/derived/1/derived/1/derived/2/sandbox

Name		Name	Last commit message	Last commit date
Latest commit History 154 Commits
ocr4all-app @ 49dd132		ocr4all-app @ 49dd132
ocr4all-app-calamari-communication @ d0ca484		ocr4all-app-calamari-communication @ d0ca484
ocr4all-app-calamari-msa @ 623abc9		ocr4all-app-calamari-msa @ 623abc9
ocr4all-app-calamari-spi @ 863d787		ocr4all-app-calamari-spi @ 863d787
ocr4all-app-communication @ 55045d7		ocr4all-app-communication @ 55045d7
ocr4all-app-msa @ 5fc72b9		ocr4all-app-msa @ 5fc72b9
ocr4all-app-ocrd-communication @ a9714ef		ocr4all-app-ocrd-communication @ a9714ef
ocr4all-app-ocrd-msa @ 9003a35		ocr4all-app-ocrd-msa @ 9003a35
ocr4all-app-ocrd-spi @ d0a7044		ocr4all-app-ocrd-spi @ d0a7044
ocr4all-app-persistence @ fd98f7d		ocr4all-app-persistence @ fd98f7d
ocr4all-app-spi @ 2ba7f0c		ocr4all-app-spi @ 2ba7f0c
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
ocr4all-build.sh		ocr4all-build.sh
template.env		template.env

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Contained submodules

Getting started

Requirements

Download

Build

Application

Defaults

Security

Example: rights management setup

Using ocr-d processors

API

API documentation

Example

About

Releases

Packages

Contributors 2

Languages

License

OCR4all/ocr4all-backend

Folders and files

Latest commit

History

Repository files navigation

Contained submodules

Getting started

Requirements

Download

Build

Application

Defaults

Security

Example: rights management setup

Using ocr-d processors

API

API documentation

Example

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages