Skip to content

This script allows to use a Feedscanner (I'm using a Fujitsu ScanSnap S510) and, already while scanning, turns the scans into searchable PDFs, skipping blank pages

Notifications You must be signed in to change notification settings

NormanTUD/AutoFeedscanOCR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

AutoFeedscanOCR

This script allows to use a Feedscanner (I'm using a Fujitsu ScanSnap S510) in Duplex-Mode and, already while scanning, turns the scans into searchable PDFs, skipping blank pages automatically.

In the end, this creates a "gesamt.pdf", in which all of the scanned files are combined into one large searchable PDF-file.

If you scan in any other language than german, consider changing the scan.sh line

tesseract -l deu $FILENAME $BASENAME pdf &

to the abbreviation of your language (instead of "deu").

Please install scanimage and the latest Tesseract version.

HOW TO INSTALL THE LATEST TESSERACT-VERSION:

apt-get -y install g++ autoconf automake libtool pkg-config libpng-dev libtiff5-dev zlib1g-dev automake ca-certificates g++ git libtool libleptonica-dev make pkg-config asciidoc libpango1.0-dev

mkdir ~/tesseractsource

cd ~/tesseractsource; git clone --depth 1 https://github.com/tesseract-ocr/tesseract.git

cd ~/tesseractsource/tesseract; ./autogen.sh; autoreconf -i; ./configure; make; make install; ldconfig

This code is licensed under the WTFPL.

About

This script allows to use a Feedscanner (I'm using a Fujitsu ScanSnap S510) and, already while scanning, turns the scans into searchable PDFs, skipping blank pages

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published