Skip to content

machariasam/TextGenerator-Tesseract

Repository files navigation

Image-to-Text Converter

This Python script extracts texts from multiple image files, combines the texts and saves them in txt and MS docx file types.

Prerequisites

The following dependencies are required to run this script

  • Python 3
  • Pillow (PIL) library
  • pytesseract library
  • Tesseract OCR engine
  • docx library
  • On VsCode Terminal

    Install Pillow, an image processing library in python, using this command

      `pip install pillow`
    

    Install tesserct-ocr using this command

      `sudo apt install tesseract-ocr`
    

    Install pytesseract, Python wrapper for tesserct-ocr using this command

      `pip install pytesseract`
    

    Install docx, a library that allows creating and updating MS Word (.docx) files.

      `pip install python-docx`
    

Usage

  • Clone or download this repository to your local machine.

  • Place your image files in the imagetotxt folder.

  • List the names of the image files you want to convert in the ordered.txt file, this is in the order you want the images to be processed.

  • Run the script using the following command:

      `/bin/python3 /your_directory/TextGenerator-Tesseract/image_example.py`
    
  • The output will be saved in output.txt and output.docx files.

Additional Notes

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages