Found yourself in quick need to extract text from an image in Discord? Meet Jet, a discord bot that aims to extract text from any image attachment you feed it!
Powered by Python-tesseract
See blog post here: Jet - OCR Funsies on Discord
The purpose of this project is to create a discord bot where as a user, I would type a command along with an attachment that has text and in return, the bot would return the user the text within the image through optical character recognition (OCR).
During the years of 2022, I became quite interested in Japanese culture and the likes through games (ex. gacha games) and media (variety shows).
This interest led me to taking an interest in picking up Japanese as a language to learn. However, one problem arises during my own self-studies:
If I wanted to learn Japanese, I would need to learn the Japanese characters and invest time into learning the various kanji that represent the fundamentals of the Japanese language, but I don't exactly have a Japanese keyboard nor know the characters in the first place. How would I be able grab each individual (or set of) Japanese kanji in the first place?
Gaki No Tsukai 2012 - Kiki Mizu Yokan episode
Through my development journey, I came across the concept of OCR and how it works. This gave me the idea where if I were to utilize the power of OCR to grab text from images that I provide, I would be able to easily retrieve various kanji to use them for my self-studies.
Python version: 3.8.10
- Clone this repository
- Using Python, install the necessary packages within
requirements.txt
Example:
pip install -r requirements.txt
- Within the root project folder, create a
.env
file. This will store your Discord token. Your directory should look something like this:
OCR-BOT/
├── tests/
├── image_util.py
├── main.py
├── ocr_reader.py
├── .env <--- your newly created .env file!
└── ...
- Inside your
.env
file, add your Discord token with the following:
DISCORD_TOKEN='YOUR TOKEN HERE'
- Run
main.py
and you should be good to go!
Example:
python main.py
While online, the bot will listen to the $text
command. This command must include an image attachment that is meant to read. If successful, the bot will respond a message to confirm the attachment.
When the attachment is read, Pytesseract's image_to_osd()
function is called to collect information about the attachment. This information includes:
- Page Orientation (Degrees)
- Page Rotation
- Script/Language detection
Pytesseract's image_to_string()
function is then called utilizing the detected language.
- If an image that contains English is detected, then
Latin
(English) will be passed to theimage_to_string()
function as the primary language - If an image that contains Japanese is detected, then
Japanese
orKatakana
will be passed to theimage_to_string()
function instead
Once the text has been returned, the bot edits their confirmation message and replace it with the acquired text.
Note: Images are parse as-is without any preprocessing (OpenCV). Expect inaccurate outputs.
All commands are prepended with the $
symbol.
-
$text
- Extracts text from an image while providing accuracy on how well the extraction performed. -
$text jpn
- this command can be used for better accuracy for images that contain Japanese text if the automatic language is inaccurate.
At the end of the day, this is simply a hobby project that I made for myself and a couple of other friends within my Discord circle of friends that can make use of it.
Although I am quite interested in OCR and it's ability, I have little knowledge of the matter.
Jet will throw wacky outputs if users feed images that contain verbose backgrounds or include different symbols/icons (that isn't just text).
Even though the bot isn't perfect, it suit my use cases completely.
Nothing concretely planned to be fixed here, however here are some issues that arose from usage:
- Detected languages from
image_to_osd()
can be quite inaccurate.Hebrew
orHan
can be detected from attaching an image with JapaneseCyrillic
can sometimes be detected from attaching an image with English
- Inaccurate results despite having little background noise.