Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Keyword search] Webcrawler folders update & backfill #9515

Merged
merged 5 commits into from
Dec 19, 2024

Conversation

philipperolet
Copy link
Contributor

Description

Fixes https://github.com/dust-tt/tasks/issues/1790

Risk

na, messing up folders but they're unused

Deploy Plan

deploy connectors
run backfill

Copy link
Contributor

@aubin-tchoi aubin-tchoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

async (folder) => {
logger.info({
folder,
execute,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd log the result of getParents here to make sure nothing's going seriously wrong

if (createdFolders.has(folder)) {
continue;
}

const logicalParent = isTopFolder(request.url)
? null
: getFolderForUrl(folder);
await WebCrawlerFolder.upsert({
const [webCrawlerFolder] = await WebCrawlerFolder.upsert({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: in the internalId passed here I don't love the fact that we call stableIdForUrl again since we could get it from getParentsForPage (in the original code)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, but the alternative parentFolderIds[index+1] is not very satisfying either, not very readable, etc. Maybe marginally better but not very clear
Also, outside the scope of the PR so will leave as is 👍

Copy link
Contributor

@aubin-tchoi aubin-tchoi Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you extracted the parents in a variable and do parents[0] that would make of both worlds no? (Now possible because we do the slice)

@philipperolet philipperolet merged commit 472322e into main Dec 19, 2024
9 checks passed
@philipperolet philipperolet deleted the webcrawler-folders branch December 19, 2024 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants