Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetch data source view in agent configuration #6708

Merged
merged 11 commits into from
Aug 9, 2024

Conversation

flvndvd
Copy link
Contributor

@flvndvd flvndvd commented Aug 8, 2024

Description

This PR refactors the fetch agent configuration. Now, we fetch all data sources and any associated data source views for both retrieval and process action configurations. This change is necessary to pass these data source views to the core when performing any of these actions. Data source views are essential for conditioning access to a data source when reading documents from it. The code that calls core will be modified in a subsequent PR.

Additionally, I started splitting the fetch agent configuration code by action, with each file corresponding to a specific action’s fetch logic. Parallelization is still maintained. To keep the same "latency" we should move all the actions fetching logic to this pattern.

I left several TODOs since the UI is not yet aware of data source views. However, considering that the types are shared for both posting a configuration and fetching, this PR will break the builder UI for users currently on it. Given the current load, this should be manageable.

Risk

Safe to rollback.

Deploy Plan

No SQL migration, only some stuff to please Sequelize.

@flvndvd flvndvd changed the title Fetch data source view in agent configuration 🚧 Fetch data source view in agent configuration Aug 8, 2024
Copy link

github-actions bot commented Aug 8, 2024

Warnings
⚠️

Files in **/lib/models/ have been modified and the PR has the migration-ack label. Don't forget to run the migration from prodbox.

Generated by 🚫 dangerJS against b0c7c8d

@flvndvd flvndvd marked this pull request as ready for review August 9, 2024 07:59
@flvndvd flvndvd changed the title 🚧 Fetch data source view in agent configuration Fetch data source view in agent configuration Aug 9, 2024
Copy link
Contributor

@fontanierh fontanierh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think adding a DB roundtrip here is desirable, we use this function a lot and latency needs to stay as low as possible

agentTablesConfigurationTables,
dustApps,
] = await Promise.all([
retrievalDatasourceConfigurationsPromise,
processDatasourceConfigurationsPromise,
fetchAgentRetrievalConfigurationsActions({ configurationIds, variant }),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it ConfigurationActions ? Do you mean ActionConfigurations ?

`Couldn't find topK for retrieval configuration ${retrievalConfig.id}} with 'custom' topK mode`
);
}
const retrievalActions =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are not actions, they are action configurations (naming is quite confusing IMO)

import { DataSourceViewResource } from "@app/lib/resources/data_source_view_resource";
import { DataSourceViewModel } from "@app/lib/resources/storage/models/data_source_view";

export async function fetchAgentProcessConfigurationsActions({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will add latency. Before, we were getting the action config with the rest of action configs, and had a second step for all the "secondary" resources linked to action configs. Now we have an extra step

@flvndvd
Copy link
Contributor Author

flvndvd commented Aug 9, 2024

I don't think adding a DB roundtrip here is desirable, we use this function a lot and latency needs to stay as low as possible

Seen IRL, I'll just go all the way in and fully migrate all action configurations to this pattern to maintain a consistent latency. Henry's concern is not about the additional database calls but rather the fact that these two functions are slightly slower compared to the others.

Copy link
Contributor

@fontanierh fontanierh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, much nicer like this

let's make sure it's thoroughly tested because it's very easy to break this function by mistake 🙏

@@ -37,6 +37,12 @@ import {
DEFAULT_TABLES_QUERY_ACTION_NAME,
DEFAULT_WEBSEARCH_ACTION_NAME,
} from "@app/lib/api/assistant/actions/names";
import { fetchBrowseActionsConfigurations } from "@app/lib/api/assistant/configuration/browse";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the double plural here feels a little weird to me, I would rather fetchBrowseActionConfigurations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, updated.

configurationIds: ModelId[];
variant: "light" | "full";
}): Promise<Map<ModelId, BrowseConfigurationType[]>> {
if (variant !== "full") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I feel it would be better to not pass variant to these functions, and just handle that in parent (i.e, not call the function at all if variant is not full)

But I can live with this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will give it some more love in a follow up PR.

@flvndvd
Copy link
Contributor Author

flvndvd commented Aug 9, 2024

Thanks, much nicer like this

let's make sure it's thoroughly tested because it's very easy to break this function by mistake 🙏

Tested with all the dust apps locally (except table query).

@flvndvd flvndvd added the migration-ack 📁 Label to acknowledge that a migration is required. label Aug 9, 2024
@flvndvd flvndvd merged commit 362b099 into main Aug 9, 2024
6 checks passed
@flvndvd flvndvd deleted the flav/draft-fetch-data-source-view branch August 9, 2024 12:08
albandum pushed a commit that referenced this pull request Aug 28, 2024
* Fetch data source view in agent configuration

* Setup Sequelize relationship

* Return both data source id and data source view id.

* Refactor process action

* Address naming comments from review

* Refactor dust app fetching logic

* Refactor table query actions fetch

* Refactor websearch and browse fetch logic

* ✨

* ✨
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
migration-ack 📁 Label to acknowledge that a migration is required.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants