Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Image Query Drivers with Prompt Drivers #1340

Merged
merged 4 commits into from
Nov 13, 2024
Merged

Conversation

collindutter
Copy link
Member

@collindutter collindutter commented Nov 12, 2024

Describe your changes

Image Query Drivers were added early on before Prompt Drivers supported image inputs. Now they provide no value other than some syntactic niceties. This PR removes them and improves some syntax with Prompt Drivers.

Note that Image Query Tool has been kept since we don't have a Prompt Tool (though maybe we should...separate discussion).

Added

  • PromptStack.from_artifact factory method for creating a Prompt Stack with a user message from an Artifact.

Changed

  • BREAKING: Removed all ImageQueryDrivers, use PromptDrivers instead.
  • BREAKING: Removed ImageQueryTask, use PromptTask instead.
  • BREAKING: Updated ImageQueryTool.image_query_driver to ImageQueryTool.prompt_driver.
  • BasePromptDriver.run can now accept an Artifact in addition to a Prompt Stack.

Issue ticket number and link

NA


📚 Documentation preview 📚: https://griptape--1340.org.readthedocs.build//1340/

Copy link

codecov bot commented Nov 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

📢 Thoughts on this report? Let us know!

@collindutter collindutter force-pushed the refactor/image-query branch 2 times, most recently from 39eba58 to ab48bb6 Compare November 12, 2024 23:17
@collindutter collindutter marked this pull request as ready for review November 12, 2024 23:20
Copy link
Contributor

@emjay07 emjay07 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be good to copy the examples from the migration.md to the PromptDriver or PromptTask docs or a recipe.

from griptape.structures import Agent
from griptape.tools import ImageQueryTool

# Create an Image Query Driver.
driver = OpenAiImageQueryDriver(model="gpt-4o")
driver = OpenAiChatPromptDriver(model="gpt-4o")

# Create an Image Query Tool configured to use the engine.
tool = ImageQueryTool(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious about the use case when an ImageQueryTool is still useful? if the prompt is "describe this image", the model should be able to do that now without using a tool.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still value in having a Tool that can query images from the file system/task memory. Long term this functionality should maybe be baked into the File Manager Tool but that would require refactors outside the scope of this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it. in that case, may be good to add ImageQueryTool to this list.

@@ -40,7 +40,6 @@ Drivers facilitate interactions with external resources and services:
- 🔢 **Embedding Drivers** generate vector embeddings from textual inputs.
- 💾 **Vector Store Drivers** manage the storage and retrieval of embeddings.
- 🎨 **Image Generation Drivers** create images from text descriptions.
- 🔎 **Image Query Drivers** query images from text queries.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up to you on this, but may be good to add a line in the prompt drivers that they can now handle multi-modal queries or something so it doesn't seem like we don't support it at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this list could focus on functionality (like verbs), then mention which driver gives it to you 🤷 .

@collindutter collindutter force-pushed the refactor/image-query branch 2 times, most recently from c8cc248 to 2767e54 Compare November 13, 2024 18:31
@collindutter collindutter requested a review from emjay07 November 13, 2024 18:32
Copy link
Contributor

@dylanholmes dylanholmes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@@ -40,7 +40,6 @@ Drivers facilitate interactions with external resources and services:
- 🔢 **Embedding Drivers** generate vector embeddings from textual inputs.
- 💾 **Vector Store Drivers** manage the storage and retrieval of embeddings.
- 🎨 **Image Generation Drivers** create images from text descriptions.
- 🔎 **Image Query Drivers** query images from text queries.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this list could focus on functionality (like verbs), then mention which driver gives it to you 🤷 .

@collindutter collindutter merged commit ba3a140 into dev Nov 13, 2024
15 checks passed
@collindutter collindutter deleted the refactor/image-query branch November 13, 2024 21:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants