Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enh(notion) change how we process dbs #4173

Merged
merged 11 commits into from
Mar 6, 2024

Conversation

fontanierh
Copy link
Contributor

Description

  • Remove all the inline database logic (which was broken anyway...). This may save quite a bit of requests to notion 🤔. We continue to add child databases we don't know about to "resources to check"
  • Insert inline placeholders (Child Database ${DB_TITLE}) for databases that are children of a block
  • upsert databases to semantic store as CSV files (using header as prefix)
  • upsert pages that don't have a body if one of their properies has over 256 characters

Risk

Notion sync could get stuck, but easy to rollback

Deploy Plan

Requires a queue bump + restart of workflows

Copy link
Contributor

@spolu spolu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great but the prefix logic seems not adequate? We want the headler here and not in the csvForDocument

dataSourceConfig,
documentId: `notion-database-${databaseId}`,
documentContent: {
prefix: `${databaseName}\ncsvHeader`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

csvHeader?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is the header right ? What you don't want is DB title ?

happy to remove csv header from the body though

Copy link
Contributor Author

@fontanierh fontanierh Mar 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I see it ! Fixed (removed DB name from prefix also, did we want it ?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

used the renderPrefixSection with the defaults -- should I override to have more than 64 tokens for the header ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok ended up doubling it.

@fontanierh fontanierh force-pushed the notion-change-how-we-process-dbs branch from df61068 to de20391 Compare March 6, 2024 15:57
@fontanierh fontanierh force-pushed the notion-change-how-we-process-dbs branch from e8b4748 to b73c4fc Compare March 6, 2024 16:39
Copy link
Contributor

@spolu spolu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

dataSourceConfig,
documentId: `notion-database-${databaseId}`,
documentContent: {
prefix: csvHeader,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thiiiink we want the DB name in the prefix no?
Also if csvHeader is > than half chunk size the upsert will fail.

You can likely use the functions that have been built by @philipperolet that are meant to do that but maybe the numbers are a bit tight for this use case.

In any case we want to tokenize and split at half chunk size aka 256 tokens.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is outdated right?

@fontanierh fontanierh merged commit 3ee25da into main Mar 6, 2024
5 checks passed
@fontanierh fontanierh deleted the notion-change-how-we-process-dbs branch March 6, 2024 21:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants