-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathREADME
111 lines (66 loc) · 2.89 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
----------------------------------------------------------------
NHocr - the Japanese OCR
----------------------------------------------------------------
1. Introduction
NHocr is a command line OCR (Optical Character Recognition)
program for Japanese language. It has been designed to recognize
machine-printed Japanese characters and some ASCII characters
/symbols in an image.
NHocr is probably the first Open Source Japanese OCR software,
except some experimental, partial codes open to academic
communities.
"nhocr" command reads PBM/PGM/PPM image file(s), recognizes the
text line image for each file, and produces text data in UTF-8.
Each file should contain only ONE horizontal text line image
in line recognition mode, or only ONE text block in block
recognition mode, without any surrounding lines or dirt.
You can also use NHocr through WeOCR service at:
http://maggie.ocrgrid.org/nhocr/
The program is highly experimental, and the character
recognition performance is limited. (You will be happier
with a commercial product if you want a high performance OCR.)
The character feature used in NHocr is based on Peripheral
Local Moment (P-LM) proposed by Hori et al. in late 90's.
NHocr is originally a product of the author's weekend
programming. The development work may be rather slow.
2. Installation and configuration
1) Run configure script in the top directory.
Then, build and install the programs.
$ ./configure
$ make
(switch to root if necessary)
# make install
Add --enable-gramd option if you want to enable gramd support
(UNIX only). See also README-gramd.
Note: Since NHocr 0.22, a part of the image manipulation
library package O2-tools-2.xx, required in earlier releases,
is included in the source tree. There is no need to build
and install O2-tools separately.
2) If you want to use dictionary files in a non-standard
directory, you need to specify the location by setting the
environment variable NHOCR_DICDIR.
For example, if the dictionary files are in /opt/nhocr/DIC ,
$ NHOCR_DICDIR=/opt/nhocr/DIC ; export NHOCR_DICDIR
3) If you want to change the combination of character sets, you
can set the dictionary codes using the environment variable
NHOCR_DICCODES.
For example:
$ NHOCR_DICCODES=ascii+:zh_CN ; export NHOCR_DICCODES
The built-in default is ascii+:jpn for ASCII and Japanese
characters.
3. Usage
Running nhocr without any argument will show the usage.
A typical usage is:
$ nhocr -line -o output.txt input.pgm
4. Using NHocr with OCRopus
NHocr can be used as a line recognizer together with OCRopus,
a document analysis and OCR system.
NHocr-OCRopus bridge is included in the package. See the Lua
scripts in ocropus/ directory.
5. License
See LICENSE file.
For details:
http://code.google.com/p/nhocr/
http://sourceforge.jp/projects/nhocr/
--
Aug. 29, 2014 Hideaki Goto, Tohoku University, Japan