-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Image to Text docs #681
Open
mjh1
wants to merge
1
commit into
main
Choose a base branch
from
mh/image-to-text
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
--- | ||
openapi: post /image-to-text | ||
--- | ||
|
||
<Info> | ||
The default Gateway used in this guide is the public | ||
[Livepeer.cloud](https://www.livepeer.cloud/) Gateway. It is free to use but | ||
not intended for production-ready applications. For production-ready | ||
applications, consider using the [Livepeer Studio](https://livepeer.studio/) | ||
Gateway, which requires an API token. Alternatively, you can set up your own | ||
Gateway node or partner with one via the `ai-video` channel on | ||
[Discord](https://discord.gg/livepeer). | ||
</Info> | ||
|
||
<Note> | ||
Please note that the exact parameters, default values, and responses may vary | ||
between models. For more information on model-specific parameters, please | ||
refer to the respective model documentation available in the [image-to-text | ||
pipeline](/ai/pipelines/image-to-text). Not all parameters might be available | ||
for a given model. | ||
</Note> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,95 @@ | ||
--- | ||
title: Image-to-Text | ||
--- | ||
|
||
## Overview | ||
|
||
The `image-to-text` pipeline converts images into text captions. This pipeline is powered by the latest models in the HuggingFace [text-to-image](https://huggingface.co/models?pipeline_tag=text-to-image) pipeline. | ||
|
||
<div align="center"> | ||
|
||
</div> | ||
|
||
## Models | ||
|
||
### Warm Models | ||
|
||
The current warm model requested for the `image-to-text` pipeline is: | ||
|
||
- [Salesforce/blip-image-captioning-large](https://huggingface.co/Salesforce/blip-image-captioning-large) | ||
|
||
<Tip> | ||
For faster responses with different | ||
[image-to-text](https://huggingface.co/models?pipeline_tag=text-to-image) | ||
diffusion models, ask Orchestrators to load it on their GPU via the `ai-video` | ||
channel in [Discord Server](https://discord.gg/livepeer). | ||
</Tip> | ||
|
||
### On-Demand Models | ||
|
||
The following models have been tested and verified for the `image-to-text` | ||
pipeline: | ||
|
||
<Note> | ||
If a specific model you wish to use is not listed, please submit a [feature | ||
request](https://github.com/livepeer/ai-worker/issues/new?assignees=&labels=enhancement%2Cmodel&projects=&template=model_request.yml) | ||
on GitHub to get the model verified and added to the list. | ||
</Note> | ||
|
||
{/* prettier-ignore */} | ||
<Accordion title="Tested and Verified Diffusion Models"> | ||
- [Salesforce/blip-image-captioning-large](https://huggingface.co/Salesforce/blip-image-captioning-large) | ||
</Accordion> | ||
|
||
## Basic Usage Instructions | ||
|
||
<Tip> | ||
For a detailed understanding of the `image-to-text` endpoint and to experiment | ||
with the API, see the [Livepeer AI API | ||
Reference](/ai/api-reference/image-to-text). | ||
</Tip> | ||
|
||
To create an image caption using the `image-to-text` pipeline, submit a | ||
`POST` request to the Gateway's `image-to-text` API endpoint: | ||
|
||
```bash | ||
curl -X POST "https://<GATEWAY_IP>/image-to-text" \ | ||
-F model_id=Salesforce/blip-image-captioning-large \ | ||
-F image=@<PATH_TO_FILE> | ||
``` | ||
|
||
In this command: | ||
|
||
- `<GATEWAY_IP>` should be replaced with your AI Gateway's IP address. | ||
- `model_id` is the diffusion model to use. | ||
- `image` is the path to the image file to be captioned. | ||
|
||
<Note> | ||
Maximum request size: 50 MB | ||
</Note> | ||
|
||
For additional optional parameters, refer to the | ||
[Livepeer AI API Reference](/ai/api-reference/image-to-text). | ||
|
||
## Orchestrator Configuration | ||
|
||
To configure your Orchestrator to serve the `image-to-text` pipeline, refer to | ||
the [Orchestrator Configuration](/ai/orchestrators/get-started) guide. | ||
|
||
### System Requirements | ||
|
||
The following system requirements are recommended for optimal performance: | ||
|
||
- [NVIDIA GPU](https://developer.nvidia.com/cuda-gpus) with **at least 12GB** of | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @rickstaa any idea where to find a realistic suggestion on the VRAM? I don't think the Salesforce model needs much at all but not sure how to find out for certain. |
||
VRAM. | ||
|
||
## API Reference | ||
|
||
<Card | ||
title="API Reference" | ||
icon="rectangle-terminal" | ||
href="/ai/api-reference/image-to-text" | ||
> | ||
Explore the `image-to-text` endpoint and experiment with the API in the | ||
Livepeer AI API Reference. | ||
</Card> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
--- | ||
title: "Image To Text" | ||
openapi: "POST /api/beta/generate/image-to-text" | ||
--- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure about the price, I copied other
image-to
pipelines