You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a crazy idea and we're probably a few years away from being able to do it on our local machines. Treat this request more as a conversation.
When something like this is implemented to generate subtitles from the audio files, we could generate images for each page/minute/chapter/scene/whatever and show them in the player when listening to an audiobook.
When a audiobook is imported and his subtitles are generated, queue image generation using a predefined prompt (something like "an illustration for a book of this scene: %s") that could also be changed by the user (maybe I want a specific style for the illustration).
Since image generation prompts are usually short, I guess we should give all the previous (read) text to an LLM and ask it to "create a short prompt of the current scene/chapter to use in an image generator model, from this text: %s".
The file structure would be pretty simple, a folder with images and a text file with timestamps (like an srt). This could spark a community generated library of "book illustrations". It could also mean that ABS could just be a consumer of this format, not the creator.
Why?
I listen to audiobooks in small chunk of 5-15 minutes and having some context when I start a session would be great.
I also love cool images and having illustration for spaceship battles while listening to Expeditionary Forces, changing with every battle, switching to some tacticool Ruhar soldier when boots hit the ground or some crazy plan takes place on an alien planet, would throw me in book right away.
This discussion was converted from issue #2824 on December 21, 2024 16:52.
Heading
Bold
Italic
Quote
Code
Link
Numbered list
Unordered list
Task list
Attach files
Mention
Reference
Menu
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Describe the feature/enhancement
This is a crazy idea and we're probably a few years away from being able to do it on our local machines. Treat this request more as a conversation.
When something like this is implemented to generate subtitles from the audio files, we could generate images for each page/minute/chapter/scene/whatever and show them in the player when listening to an audiobook.
When a audiobook is imported and his subtitles are generated, queue image generation using a predefined prompt (something like "an illustration for a book of this scene: %s") that could also be changed by the user (maybe I want a specific style for the illustration).
Since image generation prompts are usually short, I guess we should give all the previous (read) text to an LLM and ask it to "create a short prompt of the current scene/chapter to use in an image generator model, from this text: %s".
The file structure would be pretty simple, a folder with images and a text file with timestamps (like an srt). This could spark a community generated library of "book illustrations". It could also mean that ABS could just be a consumer of this format, not the creator.
Why?
I listen to audiobooks in small chunk of 5-15 minutes and having some context when I start a session would be great.
I also love cool images and having illustration for spaceship battles while listening to Expeditionary Forces, changing with every battle, switching to some tacticool Ruhar soldier when boots hit the ground or some crazy plan takes place on an alien planet, would throw me in book right away.
The Reddit post that sparked this idea for me
Beta Was this translation helpful? Give feedback.
All reactions