Feature extraction of speech signal is the initial stage of any speech recognition system.
-
Updated
Sep 3, 2020 - Python
Feature extraction of speech signal is the initial stage of any speech recognition system.
A python library to generate speech dataset from Youtube videos
[T-IFS] RNN-SM: Fast Steganalysis of VoIP Streams Using Recurrent Neural Network
Construct a speech dataset and implement an algorithm for trigger word detection (sometimes also called keyword detection, or wakeword detection).
Download speech datasets (English and non-English) for Automatic Speech Recognition
A transcribed speech dataset in Wolof, Pulaar and Sereer, to support agriculture. Funded by Lacuna Fund.
ManaTTS is the largest open Persian speech dataset with 86+ hours of transcribed audio. Includes data collection pipeline and tools. Suitable for Persian text-to-speech models.
Deepfake cross-lingual evaluation dataset (DECRO) is constructed to evaluate the influence of language differences on deepfake detection.
Numpy-librosa implementation of Speech dataset pipeline
Persian spoken digit recognition
A simple CNN-LSTM deep neural model using Tensorflow to classify emotions from a speech dataset
Voice activity detection and speaker gender segmentation audiovisual corpus
A free licensed Persian TTS dataset including 6+ hours of audio-text pairs with subject
A full-stack webapp for collecting and managing speech datasets.
EmoTa is an open-access Tamil Speech Emotion Recognition dataset with 936 utterances from 22 native speakers, covering five emotions (anger, happiness, sadness, fear, and neutrality). It supports emotion classification tasks and advances Tamil language processing.
Simple script that creates a speech dataset quickly
top dataset for voice conversion models
Add a description, image, and links to the speech-dataset topic page so that developers can more easily learn about it.
To associate your repository with the speech-dataset topic, visit your repo's landing page and select "manage topics."