-
Notifications
You must be signed in to change notification settings - Fork 79
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
20 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# About | ||
|
||
hOCR is a format for representing OCR output, including layout information, character confidences, bounding boxes, and style information. It embeds this information invisibly in standard HTML. By building on standard HTML, it automatically inherits well-defined support for most scripts, languages, and common layout options. Furthermore, unlike previous OCR formats, the recognized text and OCR-related information co-exist in the same file and survives editing and manipulation. hOCR markup is independent of the presentation. | ||
|
||
There is a [http://docs.google.com/View?docid=dfxcv4vc_67g844kf Public Specification for the hOCR Format]. | ||
|
||
# Available Programs | ||
|
||
Included command line programs: | ||
|
||
* hocr-check -- check the hOCR file for errors | ||
* hocr-combine -- combine pages in multiple hOCR files into a single document | ||
* hocr-eval -- compute number of segmentation and OCR errors | ||
* hocr-eval-geom -- compute over, under, and mis-segmentations | ||
* hocr-eval-lines -- compute OCR errors of hOCR output relative to text ground truth | ||
* hocr-split -- split an hOCR file into individual pages | ||
* hocr-merge-dc -- merge Dublin Core meta data into the hOCR HTML header | ||
|
||
See the [CommandLine] Wiki page for more information. | ||
|