Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upsert documents & tables on upload for JIT. #8698

Merged
merged 12 commits into from
Nov 20, 2024

Conversation

Fraggle
Copy link
Contributor

@Fraggle Fraggle commented Nov 16, 2024

Description

To support JIT actions, we need to upsert the documents and / or tables to a dataSource in the conversations space.

File model changes:

  • Added useCaseMetadata, to hold the conversation sid from which the file was uploaded
    • I hesitated to store the conversation
      id instead but that would mean fetching the conversation in all callers of the processAndStoreFile function based on useCase & metadata, which I feel are not their concerns.
    • I also hesitated between flexible metadata and a dedicated property but I feel like having flexibility will be useful.
  • Added snippet, to hold the LLM summarized description of the content (will fill in an upcoming PR).

Client-side changes:

  • Add useCaseMetada as an optional argument of useFileUploaderService for both front and extension
  • Add useCaseMetadata to the dust-client types

Backend-side changes:

  • Simplify a bit the code from upload.ts.
  • Added a upsert.ts file for JIT processing (for now).
  • Do one then the other (upsert depends on upload)
  • Silently fail upsert.

Next PRs:

  • Add LLM sumarization for file to store in File.snippet

Risk

Since everything is optional and behind a feature flag, it should not break anything.

Deploy Plan

Apply migration on prodbox, then deploy front

@Fraggle Fraggle requested a review from spolu November 16, 2024 05:35
@Fraggle Fraggle force-pushed the 1571-upsert-the-conversation-datasource-on-file-upload branch 2 times, most recently from 2bcec63 to 444dae2 Compare November 16, 2024 11:24
Copy link
Contributor

@spolu spolu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

front/lib/api/files/upload.ts Outdated Show resolved Hide resolved
front/lib/api/files/upload.ts Outdated Show resolved Hide resolved
@Fraggle Fraggle force-pushed the 1571-upsert-the-conversation-datasource-on-file-upload branch 2 times, most recently from 709e334 to c55c108 Compare November 18, 2024 21:17
Copy link

github-actions bot commented Nov 18, 2024

Warnings
⚠️

Files in **/lib/models/ have been modified and the PR has the migration-ack label. Don't forget to run the migration from prodbox.

⚠️

Files in **/api/v1/ have been modified and the PR has the documentation-ack label.
Don't forget to run npm run docs and use the Deploy OpenAPI Docs Github action to update https://docs.dust.tt/reference.

Generated by 🚫 dangerJS against 21054fc

@Fraggle Fraggle force-pushed the 1571-upsert-the-conversation-datasource-on-file-upload branch from 238db9e to 330d600 Compare November 18, 2024 22:13
@Fraggle Fraggle changed the title wip Upsert documents & tables on upload for JIT. Nov 18, 2024
@Fraggle Fraggle added the migration-ack 📁 Label to acknowledge that a migration is required. label Nov 19, 2024
@Fraggle Fraggle marked this pull request as ready for review November 19, 2024 07:34
Copy link
Contributor

@spolu spolu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments from skimming through it

front/lib/api/data_sources.ts Outdated Show resolved Hide resolved
front/lib/resources/storage/models/files.ts Outdated Show resolved Hide resolved
@Fraggle Fraggle added the documentation-ack Used to acknowledge that documentation is up-to-date with this PR label Nov 19, 2024
@Fraggle Fraggle requested a review from spolu November 19, 2024 07:57
front/lib/resources/storage/models/files.ts Outdated Show resolved Hide resolved
front/migrations/db/migration_115.sql Outdated Show resolved Hide resolved
front/public/swagger.json Outdated Show resolved Hide resolved
front/lib/api/data_sources.ts Show resolved Hide resolved
front/lib/api/files/upload.ts Outdated Show resolved Hide resolved
front/lib/api/files/upload.ts Show resolved Hide resolved
front/lib/api/files/upload.ts Outdated Show resolved Hide resolved
front/lib/api/files/upload.ts Outdated Show resolved Hide resolved
@Fraggle Fraggle force-pushed the 1571-upsert-the-conversation-datasource-on-file-upload branch from fc56fc2 to dfafb75 Compare November 19, 2024 12:26
Copy link
Contributor

@flvndvd flvndvd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of introducing the JIT brick in the file API this way. IMHO, it seems more like a post-processing task rather than a pre-processing one. Is there a chance we could move this to a second step after the file has been uploaded?

front/lib/api/files/upload.ts Outdated Show resolved Hide resolved
front/lib/api/files/upload.ts Outdated Show resolved Hide resolved
front/lib/api/files/upload.ts Outdated Show resolved Hide resolved
front/lib/api/files/upload.ts Outdated Show resolved Hide resolved
front/lib/api/files/upload.ts Outdated Show resolved Hide resolved
front/lib/api/files/upload.ts Outdated Show resolved Hide resolved
@Fraggle Fraggle force-pushed the 1571-upsert-the-conversation-datasource-on-file-upload branch 3 times, most recently from 6d5a92e to 60e901f Compare November 19, 2024 16:55
Copy link
Contributor

@spolu spolu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very reasonable.

We don't handle yet the file upload and then conversation creation yet do we?

front/pages/api/v1/w/[wId]/files/[fileId].ts Outdated Show resolved Hide resolved
front/lib/api/files/upsert.ts Show resolved Hide resolved
@Fraggle Fraggle force-pushed the 1571-upsert-the-conversation-datasource-on-file-upload branch 3 times, most recently from a1f5e28 to b8964a7 Compare November 19, 2024 17:30
@Fraggle
Copy link
Contributor Author

Fraggle commented Nov 19, 2024

Looks very reasonable.

🙏🏼

We don't handle yet the file upload and then conversation creation yet do we?

We do now (check last commit)

Copy link
Contributor

@flvndvd flvndvd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks way better this way, left a few comments.

front/lib/api/files/text_extraction.ts Show resolved Hide resolved
front/lib/api/files/text_extraction.ts Show resolved Hide resolved
front/lib/api/files/upload.ts Show resolved Hide resolved
front/lib/api/files/upload.ts Show resolved Hide resolved
front/lib/api/files/upsert.ts Outdated Show resolved Hide resolved
front/lib/api/files/upsert.ts Outdated Show resolved Hide resolved
front/lib/api/files/upsert.ts Show resolved Hide resolved
front/lib/api/files/upsert.ts Outdated Show resolved Hide resolved
front/lib/api/files/upsert.ts Outdated Show resolved Hide resolved
@Fraggle Fraggle force-pushed the 1571-upsert-the-conversation-datasource-on-file-upload branch from b8964a7 to 35f5924 Compare November 20, 2024 09:40
@Fraggle Fraggle merged commit e9a2446 into main Nov 20, 2024
4 checks passed
@Fraggle Fraggle deleted the 1571-upsert-the-conversation-datasource-on-file-upload branch November 20, 2024 10:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation-ack Used to acknowledge that documentation is up-to-date with this PR migration-ack 📁 Label to acknowledge that a migration is required.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants