Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More robust delete_downstream_merge #806

Merged
merged 11 commits into from
Jan 31, 2024
Merged

More robust delete_downstream_merge #806

merged 11 commits into from
Jan 31, 2024

Conversation

CBroz1
Copy link
Member

@CBroz1 CBroz1 commented Jan 26, 2024

Description

Now leveraging the underlying datajoint dependency on networkx to find the shortest pipeline paths
between a given table and various merges, or Session and a given table, for delete_downstream_merge and check_permissions in SpyglassMixin. By running a join on all tables across two points, each of these funcs should be more robust to edge cases.

Initial drafts proved to be very slow, building these chains across arbitrary pipeline points by hand. Even with networkx method, I opted to cache connections to improve speed when folks assign tables (e.g., from spy import Table; t=Table(); t.delete_downstream)

  • Fixes Cautious delete restriction #791 : Join on all tables between target and merges.
    • common/common_usage.py : proposed new usage tracking function to monitor how long the cautious delete process takes, and where is it is used from
    • dj_merge_tables: Migrate delete_downstream_merge out into the mixin to cache calculated links better
    • dj_mixin.py: add cache of merge tables found and pipeline connections I dubbed 'chains' from self to merge tables. In practice, the number of merge tables I'm able to find depends on whether or not the user has imported them. The reload_cache flag allows a user to see a blocking merge part, import the relevant table, and then reload the cache
    • Future versions could intercept the datajoint delete error and load these tables.
    • Edited docs and notebooks to reflect new functionality
  • Ran jupytext on notebook edits from previous PRs
  • New black version 24.0 means a lot of minor changes elsewhere.

Checklist:

  • This PR should be accompanied by a release: Maybe not this one, but soon?
  • (If release) I have updated the CITATION.cff
  • I have updated the CHANGELOG.md
  • I have added/edited docs/notebooks to reflect the changes

@edeno edeno linked an issue Jan 27, 2024 that may be closed by this pull request
@edeno edeno added the infrastructure Unix, MySQL, etc. settings/issues impacting users label Jan 29, 2024
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@CBroz1 CBroz1 marked this pull request as ready for review January 29, 2024 22:41
@CBroz1 CBroz1 mentioned this pull request Jan 30, 2024
Copy link
Collaborator

@edeno edeno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still some linter issues but otherwise looks good.

src/spyglass/utils/dj_mixin.py Outdated Show resolved Hide resolved
src/spyglass/utils/dj_mixin.py Outdated Show resolved Hide resolved
@edeno edeno requested a review from samuelbray32 January 30, 2024 16:37
@edeno
Copy link
Collaborator

edeno commented Jan 30, 2024

Join on all tables between target and merges

Is this potentially expensive with a large database?

@CBroz1
Copy link
Member Author

CBroz1 commented Jan 30, 2024

Join on all tables between target and merges

Is this potentially expensive with a large database?

It is, yes. Using networkx is a huge speed improvement over doing the search myself to find the chain, but the join process could be expensive. With narrow restrictions on the parent, I found it pretty quick to run the join, especially with the cached chain. My new usage table is designed to monitor how this gets used, and how long it takes. If still cumbersome, I can look into replacing TableChain.join's python join process with something that would be more SQL-native

Copy link
Collaborator

@samuelbray32 samuelbray32 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested in the lab database for conditions that were problems before and they looked good. Thanks @CBroz1 !

Cases:

  • downstream table doesn't contain original restrictions
  • projection of key name between tables

@CBroz1 CBroz1 requested a review from edeno January 30, 2024 22:02
@edeno edeno merged commit b42432f into LorenFrankLab:master Jan 31, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infrastructure Unix, MySQL, etc. settings/issues impacting users
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cautious delete restriction
3 participants