StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
-
Updated
Aug 10, 2024 - Python
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Self-Supervised Speech Pre-training and Representation Learning Toolkit
Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit
This repo contains the source code of the first deep learning-base singing voice beat tracking system. It leverages WavLM and DistilHuBERT pre-trained speech models to create vocal embeddings and trains linear multi-head self-attention layers on top of them to extract vocal beat activations. Then, it uses HMM decoder to infer signing beats and t…
A neural speech codec based on discrete WavLM representations
This repository contain the code of the main part of my master thesis degree at Politecnico di Torino in Data science & Engineering
In this repository, the wavLM model is used for quality and poor quality data for speaker verification task, and the PyCM library is used for evaluation.
A collections of audio codecs with a standardized API
SOTA method for self-supervised speaker verification leveraging a large-scale pretrained ASR model.
This repo contains code used in the paper "Characterizing the temporal dynamics of universal speech representations for generalizable deepfake detection"
Acoustic Transformer Models for Audio Classification
CryCeleb2023 experiments
Add a description, image, and links to the wavlm topic page so that developers can more easily learn about it.
To associate your repository with the wavlm topic, visit your repo's landing page and select "manage topics."