-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More robust delete_downstream_merge
#806
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still some linter issues but otherwise looks good.
Is this potentially expensive with a large database? |
It is, yes. Using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested in the lab database for conditions that were problems before and they looked good. Thanks @CBroz1 !
Cases:
- downstream table doesn't contain original restrictions
- projection of key name between tables
Description
Now leveraging the underlying datajoint dependency on networkx to find the shortest pipeline paths
between a given table and various merges, or Session and a given table, for
delete_downstream_merge
andcheck_permissions
inSpyglassMixin
. By running a join on all tables across two points, each of these funcs should be more robust to edge cases.Initial drafts proved to be very slow, building these chains across arbitrary pipeline points by hand. Even with networkx method, I opted to cache connections to improve speed when folks assign tables (e.g.,
from spy import Table; t=Table(); t.delete_downstream
)common/common_usage.py
: proposed new usage tracking function to monitor how long the cautious delete process takes, and where is it is used fromdj_merge_tables
: Migratedelete_downstream_merge
out into the mixin to cache calculated links betterdj_mixin.py
: add cache of merge tables found and pipeline connections I dubbed 'chains' from self to merge tables. In practice, the number of merge tables I'm able to find depends on whether or not the user has imported them. Thereload_cache
flag allows a user to see a blocking merge part, import the relevant table, and then reload the cacheblack
version 24.0 means a lot of minor changes elsewhere.Checklist:
CITATION.cff
CHANGELOG.md