A PHP wrapper for Tesseract-OCR binary.
Originally inspired from ddeboer/tesseract with added features + some Improvements.
$ composer require ahmedghanem00/tesseract-ocr
if the tesseract is added to your path, You can just do:
$tesseract = new \ahmedghanem00\TesseractOCR\Tesseract();
Otherwise, You can do:
$tesseract = new \ahmedghanem00\TesseractOCR\Tesseract("path/to/binary/location");
# OR, If you already have an initiated instance
$tesseract->setBinaryPath("path/to/binary/location");
To specify the tesseract process timeout:
$tesseract = new \ahmedghanem00\TesseractOCR\Tesseract(processTimeout: 3);
# OR
$tesseract->setProcessTimeout(2.5);
To specify a custom tessdata-dir:
$tesseract->setTessDataDirPath("path/to/data/dir")
To reset tessdata-dir to default:
$tesseract->resetTessDataDirPath();
To get version of the binary:
$version = $tesseract->getVersion();
To get all the supported languages:
$languages = $tesseract->getSupportedLanguages();
To OCR an Image:
$result = $tesseract->recognize("test.png");
## OR
$result = $tesseract->recognize("https://example.com/test.png");
## etc.
Thanks to the Intervention/image package. The recognize method can accept different sources for an image:
- Path of the image in filesystem.
- URL of an image (allow_url_fopen must be enabled).
- Binary image data.
- Data-URL encoded image data.
- Base64 encoded image data.
- PHP resource of type gd
- Imagick instance
- Intervention\Image\Image instance
- SplFileInfo instance (To handle Laravel file uploads via Symfony\Component\HttpFoundation\File\UploadedFile)
To Specify the language(s):
$result = $tesseract->recognize("test.png", langs: ["eng", "ara"]);
To specify the Page-Segmentation-Model (PSM):
use ahmedghanem00\TesseractOCR\Enum\PSM;
# using PSM enum
$result = $tesseract->recognize("test.png", psm: PSM::SINGLE_BLOCK);
# OR by using id directly
$result = $tesseract->recognize("test.png", psm: 3);
To specify the OCR-Engine-Mode (OEM):
use ahmedghanem00\TesseractOCR\Enum\OEM;
# using OEM enum
$result = $tesseract->recognize("test.png", oem: OEM::LEGACY_WITH_LSTM);
# OR by using id directly
$result = $tesseract->recognize("test.png", oem: 3);
To specify the DPI of the input image:
$result = $tesseract->recognize("test.png", dpi: 200);
To make the recognize method output the result as a searchable PDF instead of raw text:
$pdfBinaryData = $tesseract->recognize("test.png", outputAsPDF: true);
file_put_contents("result.pdf", $pdfBinaryData)
To specify words-file or patterns-file:
$result = $tesseract->recognize("test.png", wordsFilePath: "/path/to/file");
# OR
$result = $tesseract->recognize("test.png", patternsFilePath: "/path/to/file");
To set a config parameters:
use ahmedghanem00\TesseractOCR\ConfigBag;
$config = ConfigBag::new()
->setParameter("tessedit_char_whitelist", "abcrety")
->setParameter("textord_pitch_range", 3);
$result = $tesseract->recognize("test.png", config: $config);
You can also run tesseract --print-parameters
to see the list of available config parameters.
Package is licensed under the MIT License. For more info, You can take a look at the License File.