Skip to content

v0.4.2

Compare
Choose a tag to compare
@keithchev keithchev released this 18 Jan 00:59
· 20 commits to main since this release
9afd286

ProteinCartography v0.4.2

Overview

This release introduces minor changes that improve the performance of the pipeline. These changes are not breaking changes and do not modify the nature of the files generated by the pipeline. This means that, for example, it should be possible to re-run the pipeline on existing output directories to re-generate the final visualizations and plots.

A note about this release

Unfortunately, we had to make this release by force-pushing to main in order to reconcile diverged commit histories between this public repo and our internal development repo. If you've forked this repo, this will prevent you from updating your fork using git pull upstream. For users of the pipeline, we recommend re-cloning or re-forking the repo. For developers, it will be necessary to hard-reset your fork's main to upstream/main and then rebase your development branches. We're very sorry for the inconvenience that this causes!

Changes

User-facing changes

  • Add config parameters to support snakemake parallelization (external contribution from Noah Lebovic)
  • Add plotting of within-cluster distributions of quantitative attributes (see plot_cluster_distribution.py)
  • Support asessing ESMFold proteins in assess_pdbs.py
  • Support .faa files as input files
  • Support using a custom foldseek server

Performance improvements

  • Update default blast parameters (word_size and evalue) to improve runtimes
  • Improve recovery from blast failures
  • Add recovery from partial failures in fetch_uniprot_metadata.py
  • Improve recovery from foldseek API timeouts
  • Enable handling of empty foldseek results
  • Improve efficiency of downloading PDBs from AlphaFold

Engineering improvements

  • Set up CI and create basic integration tests
  • Add ruff and snakefmt for formatting and linting
  • Reorganize the conda envs (add a development env and condense the snakemake conda envs)
  • Use SeqIO for parsing FASTA files in esmfold_apiquery.py
  • Refactor plot_interactive.py to improve organization and readability