[Sync] Reorganize output directories + deduplicate snakefiles + bugfixes #86

keithchev · 2024-05-10T15:48:46Z

This PR introduces the following PRs from the private repo:

Fixed similarity matrix issue, added documentation of caveats, and updated capitalization in plots (#143)
Fix bug when no input PDB files are provided (#138)
Using foldseek to re-calculate TM scores of each input PDB against all PDBs (#136)
Add dev docs (#134)
Minor changes to sync-related Actions workflows (#135)
Deduplicate snakefiles (#97)
Add workflows to check for and sync changes in the public repo (#101)
Add an optional CI step to run tests without mocks (#102)
Reorganize output directories (#92)

* initial reorg and rewrite of snakefile (WIP) * don't set shell=True (args are evidenty not passed to blastp) * use output references in expand expression * use all caps for all global variables * do not use rule references in rule because it does not work * simplify some logic in the snakefile * update snakefile_ff to reflect changes made to the main snakefile * add test for the pipeline in cluster mode * fix typos/bugs in snakefile_ff * reorganize test artifacts and fix search-mode test * make the features file required in cluster mode * add input PDB files and features file for cluster-mode test * more renaming and remove params that are just global variables (which can be used directly in shell commands) * use indexing on all unnamed outputs for clarity * update readme and rename cluster_similarity.py to plot_cluster_similarity.py * use named outputs for all internal rules (even if they have only one output) * move rule all to the end of the snakefile to allow defining its inputs symbolically * add a comment explaining why rule all is at the end of the snakefile * improve wildcard constraint and use lowercase filename * fix mistake in plot_interactive and eliminate need for input function for aggregate_features --------- Signed-off-by: Keith Cheveralls <[email protected]>

* add conditional step to run tests without mocks to existing test workflow * move conftest.py to repo root and add a --no-mocks pytest CLI option, rename env variable for clarity * run the tests when the PR is labeled * update comment about env variable

* first draft of actions to check for and sync updates from the public repo * rename workflow, simplify, add more comments * make the open-sync-pr workflow consistent with the prior SOP * fix indentation and use range syntax in git log commands * bump actions/checkout to v4 * check repo owner and name in the sync action * better error message when verify-no-new-commits fails * create sync branch in a way that won't fail if there are merge conflicts

* start merging snakefile_ff into the main snakefile (WIP) and drop params that are wildcards * first attempt at merging cluster mode into the main snakefile (WIP) * fix mistakes in the snakefile and add a demo for cluster mode * update tests after merging search and cluster modes * define key_protids in the test config in cluster mode * reorganize validation logic in snakefile (WIP) and merge configs by adding mode-specific config sections * move BLAST_OUTFMT to constants.py and use an enum for mode * move config-related logic into its own module, fix bugs * don't use config subsections for mode-specific settings, use long form of all CLI args in snakefile, fix too-long lines * rename configuration.py to config_utils.py * avoid copying pdb files from input to output in cluster mode * delete cluster-mode-specific files and update readme * some variable renaming and changes to the logic in config_utils.py for clarity * update the rulegraphs and add makefile rules to generate them * add back missing shell=True * adjust formatting in the snakefile * remove unneeded config params from the test configs * capitalize comments in config.yml * improve docstring and minor edits for clarity in snakefile * rename override_file to features_override_file for clarity * update path to demo config in tests action and the readme * fix too-long lines in makefile * clarify that features_file is a TSV file, improve some comments * enums should be singular and used consistently * drop redundant len * make file globbing case insensitive --------- Signed-off-by: Keith Cheveralls <[email protected]>

Sync with the public repo

* initial incomplete draft * move material from notion to contributing docs * address review comments * add testing section * use markdown for list numbering Co-authored-by: Dennis August Sun <[email protected]> Signed-off-by: Keith Cheveralls <[email protected]> * fix repo url and rewrite update-mocks section to use a list w example commands * ask external contributors to use forks --------- Signed-off-by: Keith Cheveralls <[email protected]> Co-authored-by: Dennis August Sun <[email protected]>

… PDBs (#136) * using foldseek to re-calculate TMscores of each input PDB against all PDBs * addressing some of Dennis' comments * addressing most of Keith's comments * addressing comment on Snakefile rule dependencies * solving error caused by no key_protid in config file in cluster-mode * drop duplicated methods by importing them from foldseek_clustering and write an empty tsv when no PDBs are in the query directory * fix variable naming and delete some redundant comments * update readme * fix inputs to the aggregate_features rule and more readme updates * add a comment explaining why the key_protid PDBs are not copied at the snakemake level * update the DAG visualizations * fix formatting in readme --------- Co-authored-by: Keith Cheveralls <[email protected]>

* fix paths in snakefile so make_pdb is called when there is no input pdb file * don't overwrite the PDB if it exists in esmfold_apiquery * revert the last commit and use to prevent make_pdb from overwriting the input PDBs * fix formatting

…dated capitilization in plots (#143)

mezarque

lgtm! Thanks for shipping this!

braebigge

Looks good to me too, thanks again Keith!

keithchev and others added 13 commits January 17, 2024 14:51

Merge branch 'main' into sync-from-public

91116f0

Merge pull request #104 from Arcadia-Science/sync-from-public

c1dd061

Sync with the public repo

Merge main (and resolve conflicts in snakefile)

f928a28

Merge pull request #107 from Arcadia-Science/sync-from-public

10da4f5

Sync with the public repo

Minor changes to sync-related Actions workflows (#135)

c5027d2

fixed similarity matrix issue, added documentation of caveats, and up…

128945c

…dated capitilization in plots (#143)

keithchev requested review from braebigge, mertcelebi and mezarque May 10, 2024 15:49

mezarque approved these changes May 10, 2024

View reviewed changes

mezarque mentioned this pull request May 10, 2024

Implement TM-score fill-in for missing data relative to inputs #50

Closed

braebigge approved these changes May 10, 2024

View reviewed changes

keithchev merged commit b6dfc76 into main May 10, 2024
3 checks passed

keithchev deleted the release/0.5.0 branch May 10, 2024 16:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Sync] Reorganize output directories + deduplicate snakefiles + bugfixes #86

[Sync] Reorganize output directories + deduplicate snakefiles + bugfixes #86

keithchev commented May 10, 2024 •

edited

Loading

mezarque left a comment

braebigge left a comment

[Sync] Reorganize output directories + deduplicate snakefiles + bugfixes #86

[Sync] Reorganize output directories + deduplicate snakefiles + bugfixes #86

Conversation

keithchev commented May 10, 2024 • edited Loading

mezarque left a comment

Choose a reason for hiding this comment

braebigge left a comment

Choose a reason for hiding this comment

keithchev commented May 10, 2024 •

edited

Loading