Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[multi-modal] scope out video / audio semantic conventions #1081

Open
Tracked by #495
axiomofjoy opened this issue Oct 24, 2024 · 1 comment
Open
Tracked by #495

[multi-modal] scope out video / audio semantic conventions #1081

axiomofjoy opened this issue Oct 24, 2024 · 1 comment

Comments

@axiomofjoy
Copy link
Contributor

No description provided.

@SiyuanQi
Copy link

SiyuanQi commented Nov 3, 2024

I’m voting in support of this feature! It would be incredibly valuable if it could also enable the creation of datasets specifically for text-to-speech models like Whisper. Adding this capability would streamline dataset preparation for TTS tasks and enhance workflow efficiency for developers working with audio models. Thank you for considering this addition!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 📘 Todo
Development

No branches or pull requests

2 participants