Skip to content

Commit

Permalink
Release/v0.33.0 (#1243)
Browse files Browse the repository at this point in the history
Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Matt Vallillo <[email protected]>
Co-authored-by: dylanholmes <[email protected]>
Co-authored-by: Vasily Vasinov <[email protected]>
Co-authored-by: CJ Kindel <[email protected]>
Co-authored-by: Emily Danielson <[email protected]>
Co-authored-by: hkhajgiwale <[email protected]>
Co-authored-by: Harsh Khajgiwale <[email protected]>
Co-authored-by: Anush <[email protected]>
Co-authored-by: datashaman <[email protected]>
Co-authored-by: Zach Giordano <[email protected]>
Co-authored-by: Andrew French <[email protected]>
Co-authored-by: Stefano Lottini <[email protected]>
Co-authored-by: James Clarendon <[email protected]>
Co-authored-by: Michal <[email protected]>
Co-authored-by: Ikko Eltociear Ashimine <[email protected]>
Co-authored-by: torabshaikh <[email protected]>
Co-authored-by: Aodhan Roche <[email protected]>
Co-authored-by: Kyle Roche <[email protected]>
Co-authored-by: William Price <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: billytrend-cohere <[email protected]>
  • Loading branch information
1 parent 04fc257 commit 91fd268
Show file tree
Hide file tree
Showing 303 changed files with 3,763 additions and 2,763 deletions.
4 changes: 4 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ updates:
directory: "/"
schedule:
interval: "weekly"
versioning-strategy: increase-if-necessary
groups:
dependencies:
dependency-type: "production"
Expand All @@ -15,6 +16,9 @@ updates:
update-types:
- "minor"
- "patch"
allow:
- dependency-type: production
- dependency-type: development
- package-ecosystem: "github-actions"
directory: "/"
schedule:
Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/docs-integration-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,8 @@ jobs:
QDRANT_CLUSTER_API_KEY: ${{ secrets.INTEG_QDRANT_CLUSTER_API_KEY }}
ASTRA_DB_API_ENDPOINT: ${{ secrets.INTEG_ASTRA_DB_API_ENDPOINT }}
ASTRA_DB_APPLICATION_TOKEN: ${{ secrets.INTEG_ASTRA_DB_APPLICATION_TOKEN }}
TAVILY_API_KEY: ${{ secrets.INTEG_TAVILY_API_KEY }}
EXA_API_KEY: ${{ secrets.INTEG_EXA_API_KEY }}
services:
postgres:
image: ankane/pgvector:v0.5.0
Expand Down
88 changes: 87 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,85 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## Unreleased

## [0.33.0] - 2024-10-09

## Added
- `Workflow.input_tasks` and `Workflow.output_tasks` to access the input and output tasks of a Workflow.
- Ability to pass nested list of `Tasks` to `Structure.tasks` allowing for more complex declarative Structure definitions.
- `TavilyWebSearchDriver` to integrate Tavily's web search capabilities.
- `ExaWebSearchDriver` to integrate Exa's web search capabilities.
- `Workflow.outputs` to access the outputs of a Workflow.
- `BaseFileLoader` for Loaders that load from a path.
- `BaseLoader.fetch()` method for fetching data from a source.
- `BaseLoader.parse()` method for parsing fetched data.
- `BaseFileManager.encoding` to specify the encoding when loading and saving files.
- `BaseWebScraperDriver.extract_page()` method for extracting data from an already scraped web page.
- `TextLoaderRetrievalRagModule.chunker` for specifying the chunking strategy.
- `file_utils.get_mime_type` utility for getting the MIME type of a file.
- `BaseRulesetDriver` for loading a `Ruleset` from an external source.
- `LocalRulesetDriver` for loading a `Ruleset` from a local `.json` file.
- `GriptapeCloudRulesetDriver` for loading a `Ruleset` resource from Griptape Cloud.
- Parameter `alias` on `GriptapeCloudConversationMemoryDriver` for fetching a Thread by alias.
- Basic support for OpenAi Structured Output via `OpenAiChatPromptDriver.response_format` parameter.
- Ability to pass callable to `activity.schema` for dynamic schema generation.

### Changed
- **BREAKING**: Renamed parameters on several classes to `client`:
- `bedrock_client` on `AmazonBedrockCohereEmbeddingDriver`.
- `bedrock_client` on `AmazonBedrockCohereEmbeddingDriver`.
- `bedrock_client` on `AmazonBedrockTitanEmbeddingDriver`.
- `bedrock_client` on `AmazonBedrockImageGenerationDriver`.
- `bedrock_client` on `AmazonBedrockImageQueryDriver`.
- `bedrock_client` on `AmazonBedrockPromptDriver`.
- `sagemaker_client` on `AmazonSageMakerJumpstartEmbeddingDriver`.
- `sagemaker_client` on `AmazonSageMakerJumpstartPromptDriver`.
- `sqs_client` on `AmazonSqsEventListenerDriver`.
- `iotdata_client` on `AwsIotCoreEventListenerDriver`.
- `s3_client` on `AmazonS3FileManagerDriver`.
- `s3_client` on `AwsS3Tool`.
- `iam_client` on `AwsIamTool`.
- `pusher_client` on `PusherEventListenerDriver`.
- `mq` on `MarqoVectorStoreDriver`.
- `model_client` on `GooglePromptDriver`.
- `model_client` on `GoogleTokenizer`.
- **BREAKING**: Renamed parameter `pipe` on `HuggingFacePipelinePromptDriver` to `pipeline`.
- **BREAKING**: Removed `BaseFileManager.default_loader` and `BaseFileManager.loaders`.
- **BREAKING**: Loaders no longer chunk data, use a Chunker to chunk the data.
- **BREAKING**: Removed `fileutils.load_file` and `fileutils.load_files`.
- **BREAKING**: Removed `loaders-dataframe` and `loaders-audio` extras as they are no longer needed.
- **BREKING**: `TextLoader`, `PdfLoader`, `ImageLoader`, and `AudioLoader` now take a `str | PathLike` instead of `bytes`. Passing `bytes` is still supported but deprecated.
- **BREAKING**: Removed `DataframeLoader`.
- **BREAKING**: Update `pypdf` dependency to `^5.0.1`.
- **BREAKING**: Update `redis` dependency to `^5.1.0`.
- **BREAKING**: Remove `torch` extra from `transformers` dependency. This must be installed separately.
- **BREAKING**: Split `BaseExtractionEngine.extract` into `extract_text` and `extract_artifacts` for consistency with `BaseSummaryEngine`.
- **BREAKING**: `BaseExtractionEngine` no longer catches exceptions and returns `ErrorArtifact`s.
- **BREAKING**: `JsonExtractionEngine.template_schema` is now required.
- **BREAKING**: `CsvExtractionEngine.column_names` is now required.
- **BREAKING**: Renamed`RuleMixin.all_rulesets` to `RuleMixin.rulesets`.
- **BREAKING**: Renamed `GriptapeCloudKnowledgeBaseVectorStoreDriver` to `GriptapeCloudVectorStoreDriver`.
- **BREAKING**: `OpenAiChatPromptDriver.response_format` is now a `dict` instead of a `str`.
- `MarkdownifyWebScraperDriver.DEFAULT_EXCLUDE_TAGS` now includes media/blob-like HTML tags
- `StructureRunTask` now inherits from `PromptTask`.
- Several places where API clients are initialized are now lazy loaded.
- `BaseVectorStoreDriver.upsert_text_artifacts` now returns a list or dictionary of upserted vector ids.
- `LocalFileManagerDriver.workdir` is now optional.
- `filetype` is now a core dependency.
- `FileManagerTool` now uses `filetype` for more accurate file type detection.
- `BaseFileLoader.load_file()` will now either return a `TextArtifact` or a `BlobArtifact` depending on whether `BaseFileManager.encoding` is set.
- `Structure.output`'s type is now `BaseArtifact` and raises an exception if the output is `None`.
- `JsonExtractionEngine.extract_artifacts` now returns a `ListArtifact[JsonArtifact]`.
- `CsvExtractionEngine.extract_artifacts` now returns a `ListArtifact[CsvRowArtifact]`.
- Remove `manifest.yml` requirements for custom tool creation.

### Fixed
- Anthropic native Tool calling.
- Empty `ActionsSubtask.thought` being logged.
- `RuleMixin` no longer prevents setting `rulesets` _and_ `rules` at the same time.
- `PromptTask` will merge in its Structure's Rulesets and Rules.
- `PromptTask` not checking whether Structure was set before building Prompt Stack.
- `BaseTask.full_context` context being empty when not connected to a Structure.

## [0.32.0] - 2024-09-17

### Added
Expand All @@ -22,8 +101,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Changed
- **BREAKING**: Removed `CsvRowArtifact`. Use `TextArtifact` instead.
- **BREAKING**: Removed `DataframeLoader`.
- **BREAKING**: Removed `MediaArtifact`, use `ImageArtifact` or `AudioArtifact` instead.
- **BREAKING**: `CsvLoader`, `DataframeLoader`, and `SqlLoader` now return `list[TextArtifact]`.
- **BREAKING**: `CsvLoader` and `SqlLoader` now return `ListArtifact[TextArtifact]`.
- **BREAKING**: Removed `ImageArtifact.media_type`.
- **BREAKING**: Removed `AudioArtifact.media_type`.
- **BREAKING**: Removed `BlobArtifact.dir_name`.
Expand All @@ -44,6 +124,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Added
- Parameter `meta: dict` on `BaseEvent`.
- `AzureOpenAiTextToSpeechDriver`.
- Ability to use Event Listeners as Context Managers for temporarily setting the Event Bus listeners.
- `JsonSchemaRule` for instructing the LLM to output a JSON object that conforms to a schema.
- Ability to use Drivers Configs as Context Managers for temporarily setting the default Drivers.

### Changed
- **BREAKING**: Drivers, Loaders, and Engines now raise exceptions rather than returning `ErrorArtifacts`.
Expand All @@ -52,6 +136,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- **BREAKING**: `BaseConversationMemoryDriver.load` now returns `tuple[list[Run], dict]`. This represents the runs and metadata.
- **BREAKING**: `BaseConversationMemoryDriver.store` now takes `runs: list[Run]` and `metadata: dict` as input.
- **BREAKING**: Parameter `file_path` on `LocalConversationMemoryDriver` renamed to `persist_file` and is now type `Optional[str]`.
- **BREAKING**: Removed the `__all__` declaration from the `griptape.mixins` module.
- `Defaults.drivers_config.conversation_memory_driver` now defaults to `LocalConversationMemoryDriver` instead of `None`.
- `CsvRowArtifact.to_text()` now includes the header.

Expand All @@ -62,6 +147,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Missing `maxTokens` inference parameter in `AmazonBedrockPromptDriver`.
- Incorrect model in `OpenAiDriverConfig`'s `text_to_speech_driver`.
- Crash when using `CohereRerankDriver` with `CsvRowArtifact`s.
- Crash when passing "empty" Artifacts or no Artifacts to `CohereRerankDriver`.


## [0.30.2] - 2024-08-26
Expand Down
159 changes: 159 additions & 0 deletions MIGRATION.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,165 @@
# Migration Guide

This document provides instructions for migrating your codebase to accommodate breaking changes introduced in new versions of Griptape.
## 0.32.X to 0.33.X

### Removed `DataframeLoader`

`DataframeLoader` has been removed. Use `CsvLoader.parse` or build `TextArtifact`s from the dataframe instead.

#### Before

```python
DataframeLoader().load(df)
```

#### After
```python
# Convert the dataframe to csv bytes and parse it
CsvLoader().parse(bytes(df.to_csv(line_terminator='\r\n', index=False), encoding='utf-8'))
# Or build TextArtifacts from the dataframe
[TextArtifact(row) for row in source.to_dict(orient="records")]
```

### `TextLoader`, `PdfLoader`, `ImageLoader`, and `AudioLoader` now take a `str | PathLike` instead of `bytes`.

#### Before
```python
PdfLoader().load(Path("attention.pdf").read_bytes())
PdfLoader().load_collection([Path("attention.pdf").read_bytes(), Path("CoT.pdf").read_bytes()])
```

#### After
```python
PdfLoader().load("attention.pdf")
PdfLoader().load_collection([Path("attention.pdf"), "CoT.pdf"])
```

### Removed `fileutils.load_file` and `fileutils.load_files`

`griptape.utils.file_utils.load_file` and `griptape.utils.file_utils.load_files` have been removed.
You can now pass the file path directly to the Loader.

#### Before

```python
PdfLoader().load(load_file("attention.pdf").read_bytes())
PdfLoader().load_collection(list(load_files(["attention.pdf", "CoT.pdf"]).values()))
```

```python
PdfLoader().load("attention.pdf")
PdfLoader().load_collection(["attention.pdf", "CoT.pdf"])
```

### Loaders no longer chunk data

Loaders no longer chunk the data after loading it. If you need to chunk the data, use a [Chunker](https://docs.griptape.ai/stable/griptape-framework/data/chunkers/) after loading the data.

#### Before

```python
chunks = PdfLoader().load("attention.pdf")
vector_store.upsert_text_artifacts(
{
"griptape": chunks,
}
)
```

#### After
```python
artifact = PdfLoader().load("attention.pdf")
chunks = Chunker().chunk(artifact)
vector_store.upsert_text_artifacts(
{
"griptape": chunks,
}
)
```


### Removed `torch` extra from `transformers` dependency

The `torch` extra has been removed from the `transformers` dependency. If you require `torch`, install it separately.

#### Before
```bash
pip install griptape[drivers-prompt-huggingface-hub]
```

#### After
```bash
pip install griptape[drivers-prompt-huggingface-hub]
pip install torch
```

### `CsvLoader`, `DataframeLoader`, and `SqlLoader` return types

`CsvLoader`, `DataframeLoader`, and `SqlLoader` now return a `list[TextArtifact]` instead of `list[CsvRowArtifact]`.

If you require a dictionary, set a custom `formatter_fn` and then parse the text to a dictionary.

#### Before

```python
results = CsvLoader().load(Path("people.csv").read_text())

print(results[0].value) # {"name": "John", "age": 30}
print(type(results[0].value)) # <class 'dict'>
```

#### After
```python
results = CsvLoader().load(Path("people.csv").read_text())

print(results[0].value) # name: John\nAge: 30
print(type(results[0].value)) # <class 'str'>

# Customize formatter_fn
results = CsvLoader(formatter_fn=lambda x: json.dumps(x)).load(Path("people.csv").read_text())
print(results[0].value) # {"name": "John", "age": 30}
print(type(results[0].value)) # <class 'str'>

dict_results = [json.loads(result.value) for result in results]
print(dict_results[0]) # {"name": "John", "age": 30}
print(type(dict_results[0])) # <class 'dict'>
```

Renamed `GriptapeCloudKnowledgeBaseVectorStoreDriver` to `GriptapeCloudVectorStoreDriver`.

#### Before
```python
from griptape.drivers.griptape_cloud_knowledge_base_vector_store_driver import GriptapeCloudKnowledgeBaseVectorStoreDriver

driver = GriptapeCloudKnowledgeBaseVectorStoreDriver(...)
```

#### After
```python
from griptape.drivers.griptape_cloud_vector_store_driver import GriptapeCloudVectorStoreDriver

driver = GriptapeCloudVectorStoreDriver(...)
```

### `OpenAiChatPromptDriver.response_format` is now a `dict` instead of a `str`.

`OpenAiChatPromptDriver.response_format` is now structured as the `openai` SDK accepts it.

#### Before
```python
driver = OpenAiChatPromptDriver(
response_format="json_object"
)
```

#### After
```python
driver = OpenAiChatPromptDriver(
response_format={"type": "json_object"}
)
```

## 0.31.X to 0.32.X

### Removed `MediaArtifact`
Expand Down
4 changes: 3 additions & 1 deletion docs/examples/src/load_query_and_chat_marqo_1.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import os

from griptape import utils
from griptape.chunkers import TextChunker
from griptape.drivers import MarqoVectorStoreDriver, OpenAiEmbeddingDriver
from griptape.loaders import WebLoader
from griptape.structures import Agent
Expand All @@ -25,11 +26,12 @@

# Load artifacts from the web
artifacts = WebLoader().load("https://www.griptape.ai")
chunks = TextChunker().chunk(artifacts)

# Upsert the artifacts into the vector store
vector_store.upsert_text_artifacts(
{
namespace: artifacts,
namespace: chunks,
}
)

Expand Down
7 changes: 4 additions & 3 deletions docs/examples/src/query_webpage_1.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
import os

from griptape.chunkers import TextChunker
from griptape.drivers import LocalVectorStoreDriver, OpenAiEmbeddingDriver
from griptape.loaders import WebLoader

vector_store = LocalVectorStoreDriver(embedding_driver=OpenAiEmbeddingDriver(api_key=os.environ["OPENAI_API_KEY"]))

artifacts = WebLoader(max_tokens=100).load("https://www.griptape.ai")
artifacts = WebLoader().load("https://www.griptape.ai")
chunks = TextChunker().chunk(artifacts)

for a in artifacts:
vector_store.upsert_text_artifact(a, namespace="griptape")
vector_store.upsert_text_artifacts({"griptape": chunks})

results = vector_store.query("creativity", count=3, namespace="griptape")

Expand Down
7 changes: 4 additions & 3 deletions docs/examples/src/query_webpage_astra_db_1.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import os

from griptape.chunkers import TextChunker
from griptape.drivers import (
AstraDbVectorStoreDriver,
OpenAiChatPromptDriver,
Expand Down Expand Up @@ -43,9 +44,9 @@
),
)

artifacts = WebLoader(max_tokens=256).load(input_blogpost)

vector_store_driver.upsert_text_artifacts({namespace: artifacts})
artifacts = WebLoader().load(input_blogpost)
chunks = TextChunker(max_tokens=256).chunk(artifacts)
vector_store_driver.upsert_text_artifacts({namespace: chunks})

rag_tool = RagTool(
description="A DataStax blog post",
Expand Down
6 changes: 4 additions & 2 deletions docs/examples/src/talk_to_a_pdf_1.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import requests

from griptape.chunkers import TextChunker
from griptape.drivers import LocalVectorStoreDriver, OpenAiChatPromptDriver, OpenAiEmbeddingDriver
from griptape.engines.rag import RagEngine
from griptape.engines.rag.modules import PromptResponseRagModule, VectorStoreRetrievalRagModule
Expand Down Expand Up @@ -30,9 +31,10 @@
rag_engine=engine,
)

artifacts = PdfLoader().load(response.content)
artifacts = PdfLoader().parse(response.content)
chunks = TextChunker().chunk(artifacts)

vector_store.upsert_text_artifacts({namespace: artifacts})
vector_store.upsert_text_artifacts({namespace: chunks})

agent = Agent(tools=[rag_tool])

Expand Down
Loading

0 comments on commit 91fd268

Please sign in to comment.