The Speech Recognition Assistant is a Python-based tool designed to help individuals with speech difficulties convert their spoken words into text. This tool leverages advanced speech recognition models, audio processing, and natural language processing techniques to provide accurate and contextually appropriate text output. It includes a user-friendly GUI built with Tkinter.
- Deep Learning with Wav2Vec 2.0: Utilizes Facebook's Wav2Vec 2.0 model for robust and adaptable speech recognition.
- Advanced Audio Pre-processing: Includes noise reduction, dynamic range compression, and time stretching for better clarity.
- Contextual Phrase Matching: Implements Natural Language Processing (NLP) to match recognized speech with predefined phrases.
- Continuous Learning: Includes a feedback loop where the system learns from user corrections, improving over time.
- User-Friendly GUI: A simple graphical user interface (GUI) built with Tkinter makes the application easy to use.
- YOU ONLY NEED ONLY VERSION OF PYTHON TO RUN THIS !!
- Python 3.11.6
- Python 3.11.9
- Python 3.12.1
Before running the application, ensure you have the following dependencies installed:
pip install torch transformers pydub librosa fuzzywuzzy nltk soundfile tk
- Alternatively, you can use the provided requirements.bat script to install these packages separately.
- Run the Application:
- Execute the
main.py
script to launch the GUI. - The application will display a window with a "Start Recognition" button.
- Start Speech Recognition:
- Click the "Start Recognition" button.
- The application will listen to your speech and attempt to convert it to text.
- Feedback and Correction:
- The application will display the recognized text and ask if it's correct.
- If the text is incorrect, you can provide the correct phrase, which the system will learn and remember for future use.
- View Final Output:
- After processing and possible correction, the final recognized text will be displayed in a message box.
- Predefined Phrases: You can edit or add to the
predefined_phrases
list in the script to match the most common phrases the user might say. - Model Training: While the script uses a pre-trained model, you can replace it with a custom-trained model if necessary.
- Feel free to fork this repository, make improvements, and submit pull requests. Your contributions are welcome!
- Facebook AI: For the Wav2Vec 2.0 model.
- NLTK: For providing NLP tools.
- Librosa: For audio processing.
- Pydub: For simple and easy audio manipulation.
- Tkinter: For the GUI framework.
- FuzzyWuzzy: For string matching and scoring.
- SoundFile: For reading and writing sound files.
- Hugging Face Transformers: For providing state-of-the-art machine learning models.
- Python-Levenshtein: For fast and efficient Levenshtein distance computation.
- SpeechRecognition: For converting speech to text.
-
Save the Files:
- Save the Python script as
main.py
. - Save the batch script as
requirements.bat
. - Save the README content as
README.md
in your project directory.
- Save the Python script as
-
Run the Batch Script:
- Double-click the
requirements.bat
file to install all necessary packages. - If an error occurs during installation, the script will notify you and stop.
- Double-click the
-
Run the Main Script:
- After installing the dependencies, run
main.py
to start the application.
- After installing the dependencies, run
This setup should provide everything you need to get the project up and running, with clear instructions and a straightforward workflow.
-
🌐 Visit the Official
ffmpeg
Website:- Go to the official
ffmpeg
download page. - BtbN
- Go to the official
-
💻 Select the Windows Build:
- Under "Get packages & executable files", look for "Windows builds by BtbN" and click on the link.
-
⬇️ Download the Latest Release:
- On the BtbN page, select the latest release version.
- Choose the build based on your system architecture (
ffmpeg-master-latest-win64-gpl.zip
for 64-bit orffmpeg-master-latest-win32-gpl.zip
for 32-bit). - Click the link to download the zip file.
- 📂 Extract the Downloaded Zip File:
- Locate the downloaded
ffmpeg
zip file in your Downloads folder. - Right-click the zip file and select "Extract All..." or use a tool like 7-Zip or WinRAR.
- Extract the contents to a folder, for example,
C:\ffmpeg
.
- Locate the downloaded
-
🖥️ Open System Properties:
- Right-click on "This PC" or "Computer" on your desktop or in File Explorer, and select "Properties".
- Click on "Advanced system settings" on the left side.
- In the System Properties window, click on the "Environment Variables" button.
-
🔧 Edit the System Path:
- In the Environment Variables window, under the "System variables" section, scroll down and select the
Path
variable, then click "Edit". - In the Edit Environment Variable window, click "New" and enter the path to the
bin
directory inside yourffmpeg
folder (e.g.,C:\ffmpeg\bin
). - Click "OK" to close all windows.
- In the Environment Variables window, under the "System variables" section, scroll down and select the
-
💬 Open Command Prompt:
- Press
Win + R
, typecmd
, and press Enter.
- Press
-
🔍 Check
ffmpeg
Version:- In the Command Prompt, type
ffmpeg -version
and press Enter. - If installed correctly, you'll see information about the
ffmpeg
version and configuration.
- In the Command Prompt, type
By following these steps, you'll have ffmpeg
installed and configured on your Windows system, ready for use with pydub
and other audio processing tasks.