All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Fix the json.decoder.JSONDecodeError when running with
--amr
.
- The workflow now uses the
fastcat
read length and quality histograms instead of the per-read stats in the report process.
- Output IGV configuration file if the
keep_bam
option is enabled and a custom reference is provided (in minimap2 mode). - Output reduced reference file if the
keep_bam
option is enabled (in minimap2 mode). abundance_threshold
reduces the number of references to be displayed in IGV.
- Complete previous taxonomic rank when there are no parent nodes.
- Checking the correspondence between the reference and ref2taxid now also works with compressed references.
exclude-host
can input a file in the EPI2ME Desktop Application.
- Statistics derived from minimap2 alignment are now in the output if the
keep_bam
option is enabled. - Reads below percentages of identity (
min_percent_identity
) and the reference covered (min_ref_coverage
) are considered as unclassified in the minimap2 approach.
- Request less memory if
kraken2_memory_mapping
is used. - Show the percentage of each species when hovering over the taxonomy bar plot.
bam
folder within output has been renamed tobams
.
- Files that are empty following the fastcat filtering are discarded from downstream analyses.
- "Can only use .dt accessor with datetimelike values" error in makeReport
- "invalid literal for int() with base 10" error in makeReport
- Avoid argument list lengths that may be too long when using glob patterns.
- The Kraken2 pipeline sometimes reporting wrong numbers for unclassified counts.
- Minimum memory requirement for custom databases depends on the database size.
- Workflow now accepts BAM or FASTQ files as input (using the
--bam
or--fastq
parameters, respectively). - Run ID's now included in the output BAM files.
- Add nextflow tags to keep track of the samples along the different processes.
- Output BAM files without host reads.
- Default for
--n_taxa_barplot
increased from 8 to 9. - Replace the eCharts barplot with an equivalent Bokeh version.
--database_set
parameter is nowStandard-8
when the--classifier
parameter is set tokraken2
.
- Standard 8GB kraken2 database.
- Update docs.
- Heatmap generated when
--minimap2_by_reference
is enabled references with a mean coverage of less than 1% of the one with the largest value are omitted.
- Use store_dir without staging files from the web. Kraken2 can run offline if the databases have been previously stored.
- Fastcat plots showing the stats in the report before removing host sequences when
--exclude_host
in the minimap2 pipeline. - Real time kraken workflow hanging indefinitely when attempting to start kraken server with too many threads.
- Minimap pipeline is also able to use store_dir to store databases and run offline if the databases have been previously stored.
- Kraken2 pipeline accepts a sample sheet if the real time option is disabled.
- Only taxa present in the abundance table above the
--abundance_threshold
will appear in the alignment summary table (which is only generated when--minimap2_by_reference
is enabled).
--bracken_dist
: the bracken additional file for the database must be included in the database folder, as it is in the kraken2 indexes and when the database is generated.- Default local executor CPU and RAM limits
--watch_path
is now called--real_time
and enables the kraken2 pipeline to classify reads as they are written with watch_path.- The kraken2 workflow can now be used without
--real_time
, this will use the serverless kraken2 executable. - Barcode directories must now be named in the format
barcodeNN
, where NN is at least two digits (e.g.barcode01
). - Barcode directories must now have the same number of characters (e.g.
barcode01
cannot be provided withbarcode001
).
- Broken report when the dataframe is filtered using the
--abundance_threshold
. - Taxonomy abundances barplot was not showing more abundant species.
- Broken plots caused by single quotes in NCBI taxon names.
- Add the abundance_table_rank.tsv in the output for the last analysed rank.
- Optional
--minimap2_by_reference
parameter to output the sequencing depth and coverage of each matched reference in the database.
--kraken_confidence
to specify a threshold score.--exclude_host
: Optional parameter can accept a FASTA/MMI file with a host reference to be excluded from the analysis.--include_kraken2_assignments
: Output the classification of each read.
--abundance_threshold
: filter taxa based on their abundances.--n_taxa_barplot
: control the number of taxa displayed in the barplot.- Plot the taxa abundance distribution (e.g. Species abundance distribution plots).
- Remove abricate version if AMR does not run.
- Changelog format.
- Alpha diversity indices: Berger-Parker dominance index, Fisher’s alpha.
-Bumped minimum required Nextflow version to 23.04.2.
- Kraken2 pipeline: all the samples are shown in the report.
- Any sample aliases that contain spaces will be replaced with underscores.
- Antimicrobial resistance gene identification using Abricate.
- A new option
kraken2_memory_mapping
to avoid kraken2 loading the database into process-local RAM. --keep_bam
parameter to write BAM files into the output directory (minimap pipeline).- Lineages sunburst plot added to the report.
- SILVA.138 database available for both kraken2 and minimap2 pipelines.
bracken_level
parameter has been replaced bytaxonomic_rank
to choose the taxonomic rank at which to perform the analysis. It works in both pipelines.- Updated example command displayed when running
--help
. - Updated GitHub issue templates to force capture of more information.
- Bumped minimum required Nextflow version to 22.10.8.
- Enum choices are enumerated in the
--help
output. - Enum choices are enumerated as part of the error message when a user has selected an invalid choice.
- Replaced
--threads
option in fastqingress with hardcoded values to remove warning about undefinedparam.threads
. - Extract reads using
--minimap2filter
and--minimap2exclude
filters. The extracted reads are in the output/filtered folder.
- A new option
--min_read_qual
to filter by quality score. - Configuration for running demo data in AWS
- AWS configuration for external kraken2_server for demonstration at LC23
- Fix minimum and maximum length read filter.
- Default region and AWS CLI path for AWS batch profile
- Updated existing databases.
- Docker will use an ARM platform image on appropriate devices.
- A new PFP-8 database option.
- New test_data with Bacteria, Archaea and Fungi.
- Fix file names when exporting tables.
- Include 'kingdom' for Eukarya.
- Add ability to use an external kraken2 server.
- New fastqingress.
- New report with ezcharts.
- Stacked barplot for most abundant taxa.
- Show rank information in abundance tables.
- Export function from tables.
- Fix crash in the report with one sequence in the fastq.
- Use kraken2 with parallelization in single client.
- Remove symbolic links from store_dir.
- Update kraken databases to latest and ensure relevant taxdump is used.
- Plot species richness curves.
- Provide (original and rarefied, i.e. all the samples have the same number of reads) abundance tables listing taxa per sample for a given taxonomic rank.
- Add diversity indices.
- Memory requirement help text.
- Example_cmd in config.
- Minimap2 subworkflow fixed for when no alignments.
- Remove quality 10 parameter from Minimap2 subworkflow.
- Issue where processing more than ~28 input files lead to excessive memory use.
- Minor typos in docs.
- Updated description in manifest
- The version in the config manifest is up to date.
- Issue where discrepancies between taxonomy and databases led to error.
nextflow run epi2me-labs/wf-metagenomics --version
will now print the workflow version number and exit.
- Parameter name for selecting known database is now
--database_set
(was--source
). - Add classifier parameter and only allow running of minimap2 or kraken2 workflow.
- Workflow logic in kraken workflow has been reorganised for simpler parallelism.
-profile conda
is no longer supported, users should use-profile standard
(Docker) or-profile singularity
instead--run_indefinitely
parameter removed, instead implied when--read_limit
set to null.
- Add a test and fix for if all files in one directory are unclassified
- Check if fastq input exists
- Use store directory for database.
- Use per file kraken_report instead of cumulative.
- Kraken2-server v0.0.8.
- Add a run indefinitely parameter.
- Batch size breaking fastcat step.
- Consider white space in bracken report.
- Handling for unclassified with Bracken.
- Handling with kraken2 for single input file
- Removed sanitize option
- Output argument in Fastqingress homogenised.
- Bumped base container to v0.2.0
- Kraken workflow now in real time mode with watch_path
- Kraken and Minimap now in subworkflows
- Fastqingress metadata map
- Can only run Kraken or Minimap subworkflow not both
- Better help text on cli
- Fastq ingress and Args update
- Set out_dir option type to ensure output is written to correct directory on Windows
- pluspf8, ncbi_16s_18s_28s_ITS databases
- Add all sample tool combinations to report
- Enable kraken2 by default
- Clarify error messages
- Handle no assignments bracken error
- New docs format.
- Render bokeh.
- Update nextflow_schema.json
- Overriding taxonomy now works correctly
- Added missing threads param to kraken2
- Report now includes dynamic sankey visualisation and table
- Nextflow schema
- Updated to use new fastqingress module, permitting single .fastq input
- Rewired DAG to rely on sample id's rather than filenames
- Handle bracken failure when there are no classifications
- Handle cyclic dag issue when taxonomy has duplicate names
- First release.