[multi-modal] scope out video / audio semantic conventions #1081

axiomofjoy · 2024-10-24T23:58:45Z

No description provided.

SiyuanQi · 2024-11-03T09:32:02Z

I’m voting in support of this feature! It would be incredibly valuable if it could also enable the creation of datasets specifically for text-to-speech models like Whisper. Adding this capability would streamline dataset preparation for TTS tasks and enhance workflow efficiency for developers working with audio models. Thank you for considering this addition!

axiomofjoy mentioned this issue Oct 24, 2024

🗺️ Vision / multi-modal #495

Closed

62 tasks

github-project-automation bot added this to phoenix Oct 24, 2024

github-project-automation bot moved this to 📘 Todo in phoenix Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[multi-modal] scope out video / audio semantic conventions #1081

[multi-modal] scope out video / audio semantic conventions #1081

axiomofjoy commented Oct 24, 2024

SiyuanQi commented Nov 3, 2024

[multi-modal] scope out video / audio semantic conventions #1081

[multi-modal] scope out video / audio semantic conventions #1081

Comments

axiomofjoy commented Oct 24, 2024

SiyuanQi commented Nov 3, 2024