Releases: Clinical-Genomics/microSALT
Releases · Clinical-Genomics/microSALT
v4.1.0
v4.0.0
What's Changed
- Adding self tests by @henningonsbring in #120
- Attempt to make resistance searches more compatible by @sylvinite in #121
- Updated and fixed bugs in pubmlst and ncbi downloads by @talnor in #123
- Save job_ids.yaml file for trailblazer tracking by @Mropat in #126
- Fixed a bug when running bad sample. by @sylvinite in #127
- Fixed bug in mlst download for some organisms by @talnor in #130
- Updated pubMLST download links by @talnor in #132
- slurmid hotfix by @sylvinite in #134
- Fix dependencies by @talnor in #143
- Add reference genome size to typing report by @henningonsbring in #142
- specify openjdk version by @henningonsbring in #153
- fix(SWEDAC) SWEDAC logo only on MW samples by @Vince-janv in #159
- Update CODEOWNERS by @pbiology in #162
- Fix missing duplication rate in json report by @seallard in #169
- Fixed conda installation by @talnor in #166
- Add deployment by @talnor in #171
- Move CI to GitHub Actions by @samuell in #175
- Use of --isolate flag, and update of SPAdes assembler by @talnor in #165
- 149 fix missing resistances by @samuell in #179
- Update pull_request_template.md by @AnnaLeinfelt in #181
- Fix missing vim genes by @samuell in #182
- Fix #183: Remove potentially consufing "Kmer täckning" columns by @samuell in #184
- Release candidate 4.0.0 by @samuell in #178
- Fix #185: Add back more greedy elimination of similar results by @samuell in #187
New Contributors
- @henningonsbring made their first contribution in #120
- @Mropat made their first contribution in #126
- @Vince-janv made their first contribution in #159
- @pbiology made their first contribution in #162
- @seallard made their first contribution in #169
- @samuell made their first contribution in #175
- @AnnaLeinfelt made their first contribution in #181
Full Changelog: v3.1.0...v4.0.0
Environment agnostability
Merge pull request #116 from Clinical-Genomics/fix_reportfolder Improved report storage architecture
cgMLST pre-release
End-user:
- Style updates to generated reports:
- Zebra tiles
- Threshold subheaders
- Internal Sequence Types renamed from negative numbers to prefix "I"
- Printing information
- Both reports now have a version suffix as opposed to a generation date. Version is automatically incremented if a new report is generated that also has differing information.
- QC report can now be generated despite lacking info for most fields.
- Added both physical and virtual contact information to both reports
Bioinformatics:
- Analysis subfolders have now moved. Quast results are a subfolder of assembly. All results generated through blast are supplied in the blast_search superfolder.
- Updated analysis software used:
- BLAST 2.5.0 -> 2.9.0
- bwa 0.7.15 -> 0.7.17
- picard 2.18.26 -> 2.20.3
- quast 5.0.1 -> 5.0.2
- samtools 1.6 -> 1.9
- spades 3.12.0 -> 3.13.1
- trimmomatic 0.38 -> 0.39
- Moved always necessary paths (adapters, ExPEC) from the configuration into the source code
- Configuration for each analysis run is now automatically generated and stored in config.log under the analysis results folder
- microSALT now has a clone-free one-line installer that sets everything but the configuration file up, regardless of branch of interest.
- configExample.json more informatively generalised. Additionally the file is generic enough to successfully run microSALT's pytest on any machine.
- Pytesting of commands and configuration-consistency has been implemented
- Genologics configuration is now integrated into microSALT configuration
- Trimmed & reorganized fastq files are automatically removed after pipeline usage (in order to save space)
- Added ExPEC virulence gene dataset to E.coli analysis, and included said set in the repository
- Added ExPEC csv report
- Improved robustness of function that verifies existence of paired fastq files. It now allows 1/2 or forward/reverse for read direction.
- The resync Sequence Type CLI can now forcibly flag samples as resolved despite not having a Sequence Type assigned.
- The resync Sequence Type CLI now also displays ST with error code types (-1, -4)
- Lowered thresholds for "% BP at 10x coverage" due to the current threshold consistently being incorrectly triggered
- Coveralls has been implemented
Internals:
- E-mail functionality was re-approached in a different way due to issues with Google Groups spam filtering
- Resolved harmless (but annoying) installation error due to quast dependency
- Travis-CI now utilizes install.sh script as opposed to having a near identical but separate solution
- Added pigz as explicit dependency
- Improved robustness of pubMLST organism lookup function
- Cleaned up logging messages
- Running blast has now been generalized to a single function
- Functions and thresholds for resistances and similar, has been generalized to thresholds for motifs
- Lims Fetcher class can now prioritize specific project IDs when resolving external IDs.
- Added Virulence dataset support; kept inactive due to limitations in current report size.
- Added QC report support for NextSeq apptag
- Removed fetching of date_received value
- Additional spaces in user provided reference genome are now ignored
Report revisions
End-user:
- Swedish translation
- Clarifications about the analysis used and limitations
- Complete overhaul of the look and feel of both the Typing and QC reports
- Additional elements added to both report Types (stuff like extra parameters, dates, methods and verified organisms)
- Addition of various accreditation requirements (stuff like logotypes and explicit end of line mention)
Bioinformatics:
- QC Report is now automatically colored based on thresholds in the configuration
- The subcommand resync was added to keep track of Novel ST and their process to being uploaded to pubMLST
- Reports are now given distinct versions (as opposed to implied by date), and are also generated with a distinct timestamp
- Reports are now long-term stored in an additional 'reports' folder
- microSALT now warns if fastq file exceeds 1GB
Internals:
- Outside of accredited functions, microSALT can run samples with customer sample names (as opposed to Clinical Genomics ones)
- Outside of accredited functions, microSALT can run multiple projects and generate a huge result file (called a collection)
- The entirety of the genologics configuration is now integrated into the microSALT configuration
- Various improvements to the stability of the e-mailing functionality
QC integration
End-user:
- Quality Control (previously https://github.com/Clinical-Genomics/mwgs) is now integrated into microSALT.
Quality control uses primarily bwa, samtools and picard; to generate a report regarding quality of a sample when compared to a reference. - Reports now have suffix _Typing and _QC for Sequence Typing and Quality Control, respectively
Bioinformatics:
- CLI changed. All commands now stem from either microSALT analyse or microSALT utils.
- Flag --qc_only runs QC analysis exclusively
- Flag --untrimmed skips adapter trimming
- Any pull requests are now automatically verified to compile, and to follow
git lint
standards - Reference updater is no longer overly excited about updating MLST references
- README file updated, both graphically and with new features (build check etc.)
- Install script now supports stage environments
Internals:
- microSALT now loads customer ID
- Fresh installations now updates MLST & resistance references more reliably
- QC can now reference assemblies, also with Tandem hits (notably NIST)
- False wheel error is no longer displayed during installation
QoL QC-pre release
Bioinformatics:
- Logging is now more explicit and generates more 'task complete' text files, to more easily keep track of how a project is doing.
- Both e-mail and custom configuration can now additionally also be provided directly on the CLI
- README and configExample updated to both contain more useful info; and to be more readable
- Basic installation script
install.sh
now streamlines creation of microSALT instances a lot
--no_update flag added as to not waste time by unnecessarily check pubMLST for updates - microSALT now resolves symlinks, as to not falsely report a project exists despite “dead links”
- microSALT can now handle external/customer project names quite well, making research projects easier to reanalyze.
- microSALT now also reports in json format, primarily for microSALT<->Vogue interaction
- Functionality for internal ST management added under util -> resync
End-user:
- Novel ST are stored under temporary ST (so called [I]nternal ST). microSALT tracks the upload status of these samples.
- Control samples are now a lot more extensively gathered; and now have the threshold ‘-‘.
- microSALT can generate resistance overview report (csv format)
- microSALT logotype displays correctly
- There is no longer a case where ‘hits below threshold’ are displayed
- Unidentified organisms now default to written name
Internal:
- All jobs (including sample) are now based on the project function, with variations.
- Resistance "Type" is now automatically resolved instead of basing it on notes.txt
- Database indexation is a lot more foolproof
- Added very very basic travis CI support
- Added customer id/name tracking
- Replaced BLASTs internal top-filter with manual filter to guarantee best hit.
- microSALT now correctly reassigns externally-added organisms as pubMLST ones, if they are found in the latter at a later stage
- microSALT now handles the data page of pubMLST correctly again
Minor bugfixes
Resolved crucial bugs in both determining resistance overlap, and for starting the start sample
command.
Clearer analysis and upscaling
Bioinformatics:
- MLST will automatically add new species when sample requires it
- Workflow reworked. Start command now runs reference find + indexing, then sample specific analysis, followed by batch result upload and report generation
- MLST and resistance cutoffs can now be edited in the config file
- Log file and database name will now be created based on config file name
- Dry run options to try (most) features without posting to slurm
- Command to list all currently added MLST databases
- Safety features to click interface to safeguard against user errors
End-user:
- Top resistance hit no longer applies, instead resistances hits are now checked for overlap, and the worse hit is automatically removed (from same group)
- Resistance database has been expanded, and is automatically updated when new changes are found. This includes latest (developmental) resFinder instance, and expansion on what type of resistance some hits correlate to
- microSALT logo added to reports
- Misc. visual improvements to reports
- Near hits are considered 'Novel', both on ST and allele level, and distinguished from 'None'
- None and control results display all hits found
Internal:
- Temporary files are now written to scratch explicitly to improve harddrive access
- microSALT now supports running 50+ samples at once
- microSALT now uses SQLite implementation over MySQL
- MLST refreshing/indexing now happens once per project start, to avoid concurrency issues
- Desync bug that possibly caused BLAST runs to prematurely end is now resolved
- Default runtime reduced from 2 hours to 30 minutes (per sample) to reflect faster implementation
- Loading profiles into database is now more stable, loading something like Baumanii no longer requires two attempts
- Remnants of old auxillary gene/ST method have been removed
- Improved stability in fetching report results from BLAST files
Bioinfo quality of life updates
Internal:
- run_complete.out is now generated on analysis completition
- Updated readme
- Updated example config
End-user:
- Sorted resistance genes
- Legacy samples with no span metric show as "undef."
Bioinformatics:
- Flag to set personal e-mail
- Flag to delete prior sample analysis, to support instant reruns
- start and finish now use CG_ID per default. Paths can be manuall used via flag
- Slurm logs now automatically go to analysis folder with name "ID_slurm.log"