Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Sync] Reorganize output directories + deduplicate snakefiles + bugfixes #86

Merged
merged 13 commits into from
May 10, 2024

Conversation

keithchev
Copy link
Member

@keithchev keithchev commented May 10, 2024

This PR introduces the following PRs from the private repo:

  • Fixed similarity matrix issue, added documentation of caveats, and updated capitalization in plots (#143)
  • Fix bug when no input PDB files are provided (#138)
  • Using foldseek to re-calculate TM scores of each input PDB against all PDBs (#136)
  • Add dev docs (#134)
  • Minor changes to sync-related Actions workflows (#135)
  • Deduplicate snakefiles (#97)
  • Add workflows to check for and sync changes in the public repo (#101)
  • Add an optional CI step to run tests without mocks (#102)
  • Reorganize output directories (#92)

keithchev and others added 13 commits January 17, 2024 14:51
* initial reorg and rewrite of snakefile (WIP)

* don't set shell=True (args are evidenty not passed to blastp)

* use output references in expand expression

* use all caps for all global variables

* do not use rule references in rule  because it does not work

* simplify some logic in the snakefile

* update snakefile_ff to reflect changes made to the main snakefile

* add test for the pipeline in cluster mode

* fix typos/bugs in snakefile_ff

* reorganize test artifacts and fix search-mode test

* make the features file required in cluster mode

* add input PDB files and features file for cluster-mode test

* more renaming and remove params that are just global variables (which can be used directly in shell commands)

* use indexing on all unnamed outputs for clarity

* update readme and rename cluster_similarity.py to plot_cluster_similarity.py

* use named outputs for all internal rules (even if they have only one output)

* move rule all to the end of the snakefile to allow defining its inputs symbolically

* add a comment explaining why rule all is at the end of the snakefile

* improve wildcard constraint and use lowercase filename

* fix mistake in plot_interactive and eliminate need for input function for aggregate_features

---------

Signed-off-by: Keith Cheveralls <[email protected]>
* add conditional step to run tests without mocks to existing test workflow

* move conftest.py to repo root and add a --no-mocks pytest CLI option, rename env variable for clarity

* run the tests when the PR is labeled

* update comment about env variable
* first draft of actions to check for and sync updates from the public repo

* rename workflow, simplify, add more comments

* make the open-sync-pr workflow consistent with the prior SOP

* fix indentation and use range syntax in git log commands

* bump actions/checkout to v4

* check repo owner and name in the sync action

* better error message when verify-no-new-commits fails

* create sync branch in a way that won't fail if there are merge conflicts
* start merging snakefile_ff into the main snakefile (WIP) and drop params that are wildcards

* first attempt at merging cluster mode into the main snakefile (WIP)

* fix mistakes in the snakefile and add a demo for cluster mode

* update tests after merging search and cluster modes

* define key_protids in the test config in cluster mode

* reorganize validation logic in snakefile (WIP) and merge configs by adding mode-specific config sections

* move BLAST_OUTFMT to constants.py and use an enum for mode

* move config-related logic into its own module, fix bugs

* don't use config subsections for mode-specific settings, use long form of all CLI args in snakefile, fix too-long lines

* rename configuration.py to config_utils.py

* avoid copying pdb files from input to output in cluster mode

* delete cluster-mode-specific files and update readme

* some variable renaming and changes to the logic in config_utils.py for clarity

* update the rulegraphs and add makefile rules to generate them

* add back missing shell=True

* adjust formatting in the snakefile

* remove unneeded config params from the test configs

* capitalize comments in config.yml

* improve docstring and minor edits for clarity in snakefile

* rename override_file to features_override_file for clarity

* update path to demo config in tests action and the readme

* fix too-long lines in makefile

* clarify that features_file is a TSV file, improve some comments

* enums should be singular and used consistently

* drop redundant len

* make file globbing case insensitive

---------

Signed-off-by: Keith Cheveralls <[email protected]>
* initial incomplete draft

* move material from notion to contributing docs

* address review comments

* add testing section

* use markdown for list numbering

Co-authored-by: Dennis August Sun <[email protected]>
Signed-off-by: Keith Cheveralls <[email protected]>

* fix repo url and rewrite update-mocks section to use a list w example commands

* ask external contributors to use forks

---------

Signed-off-by: Keith Cheveralls <[email protected]>
Co-authored-by: Dennis August Sun <[email protected]>
… PDBs (#136)

* using foldseek to re-calculate TMscores of each input PDB against all PDBs

* addressing some of Dennis' comments

* addressing most of Keith's comments

* addressing comment on Snakefile rule dependencies

* solving error caused by no key_protid in config file in  cluster-mode

* drop duplicated methods by importing them from foldseek_clustering and
write an empty tsv when no PDBs are in the query directory

* fix variable naming and delete some redundant comments

* update readme

* fix inputs to the aggregate_features rule and more readme updates

* add a comment explaining why the key_protid PDBs are not copied at the snakemake level

* update the DAG visualizations

* fix formatting in readme

---------

Co-authored-by: Keith Cheveralls <[email protected]>
* fix paths in snakefile so make_pdb is called when there is no input pdb file

* don't overwrite the PDB if it exists in esmfold_apiquery

* revert the last commit and use  to prevent make_pdb from overwriting the input PDBs

* fix formatting
Copy link
Member

@mezarque mezarque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! Thanks for shipping this!

Copy link
Contributor

@braebigge braebigge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me too, thanks again Keith!

@keithchev keithchev merged commit b6dfc76 into main May 10, 2024
3 checks passed
@keithchev keithchev deleted the release/0.5.0 branch May 10, 2024 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants