Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MS Connector] Compute parents array for document upsertion #6233

Merged
merged 6 commits into from
Jul 16, 2024

Conversation

philipperolet
Copy link
Contributor

Description

Fixes https://github.com/dust-tt/tasks/issues/1053

This PR adds computation of parents array when we upsert a document, allowing hierarchy selection in datasources actions for assistants to work. Additionally:

  • it ensures root selected folders are not redundant, i.e. when a folder is selected but a parent is also selected, the folder is not added to the sync (it would break parents computation in some edge cases);
  • it adds comments explaining the design choices for a specific, internal microsoft id, rather than using the external id of the provider as in other connectors.

Risk & deploy

na (gated)

Copy link
Contributor

@JulesBelveze JulesBelveze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👌🏼

},
(connectorId, parentInternalId, startSyncTs) =>
`microsoft-${connectorId}-parent-${parentInternalId}-syncms-${startSyncTs}`,
10 * 60 * 1000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's store it in a variable?

Copy link
Contributor

@lasryaric lasryaric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left 2 comments. I am not super familiar with the context but the code seems reasonable to me. Thanks.

@@ -117,14 +122,84 @@ export async function getSiteNodesToSync(
[] as MicrosoftNode[]
);

// for all folders, check if a parent folder or drive is already in the list,
// in which case remove it this can happen because when a user selects a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing a . after "remove it".

// Keeping them both in the sync list can result in various kinds of issues,
// e.g. if a child folder is synced before the parent, then the child folder's
// files' parents array will be incomplete, thus the need to prune the list
const nodesToSync = allNodes.filter(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a O(N^2) loop, are we sure it's properly bounded?
Is there not a better way to do that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No prob here; max max a hundred of items (it's the user selection of folders), and the costly part is only run O(n) 👍

@philipperolet philipperolet merged commit 6a19401 into main Jul 16, 2024
3 checks passed
@philipperolet philipperolet deleted the ms-parents branch July 16, 2024 12:57
albandum pushed a commit that referenced this pull request Aug 28, 2024
* comments on internal id for microsoft

* exclude folders whose parents are already in sync list

* parents implementation

* clean

* fix filter/async

* review
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants