- Nothing changed yet.
- Fix the demultiplexing of run with mixed barcode length across lanes
- Switch from using clarity REST API to using Reporting LIMS endpoints wherever possible
- Split demultiplexing per lane and allow for non pooling lane to be sequenced in pooling run.
- Fastq-Filterer applied to intermediate fastq files before alignment.
- Project process now use GATK4 for trio check
- Fix fastqc command to set the temp directory to lustre.
- New toolset for non human variant call with GATK3.8
- Support for Asana API changes
- Fixed IDT barcode support
- 'gender' API key renamed to 'sex'
- New pipelines :
- GATK4 based QC and variants call for human and non-human samples
- Dragen based variant call for human sample starting from the run processing
- New Features:
- file program_version.yaml only records version of tools used by the pipeline
- Location of genome on the filesystem is provided by the REST API
- Genome used during Sample processing is uploaded to the REST API
- Demultiplexing support IDT barcodes
- Sample Processing will start coverage from run elements is greater than required coverage
- analysis_driver.log only contains info level logs. New log containing debug level in analysis_driver-debug.log
- Bug fixes:
- Interop metrics parser doe not uploade NaN
- Check that all bcl files are present before starting run processing
- Java tools version can now be determined in cron
- Increase memory available for trio checking for projects with more than 25 samples
- Bugfix: ensure the bcls expected to exist exists before returning a completed cycle with interop
- Bugfix: make the samplesheet deterministic and only generate it when required
- Alignment during run processing start after the 50 cycles into read2 to speed up alignment metric generation
- Run picard GC bias detection tool anc calculate new metric to summarise each run element's GC-biais
- Update Picard to version 2.18.23
- Update bcl2fastq to version 2.20
- Bugfix: Initialise analysis_driver_procs at the start to avoid missing embedded entities
- Bugfix: Exceptions raised in Luigi are propagated to the main thread
- Updated EGCG-Core to 0.9.1
- Location-independent integration testing with EGCG-Core
- Updated EGCG-Core to v0.9
- Only running trio check on valid projects
- Fixed memory allocation for indel realignment and GenotypeGVCFs
- Eager-loading output config file
- Removed usages of old aggregation
- Overwriting existing fastqs in SampleDataOutput
- Minor fixes: rerunnable stages, pipeline start date, FR insert metrics, PhiX error handling
- Compress and index all variants files generated to avoid using GATK own index.
- Upload the source of the processing (run elements for sample, samples for project) to analysis_driver_procs
- Perform variant call for non human in order to get the variant based QC
- Remove Phix reads from data during run processing
- New script for removing Phix reads from already processed data
- Fixed
--resume
option - Using new Reporting-App aggregation in
ProjectDataset
- Fixed the toolset_type in
pipelines.demultiplexing
- New metrics parsed and uploaded:
- InterOp metrics
- non-facing read pairs
- Refactored report crawlers to take a single input dir
- Made
relatedness.GenotypeGVCFs
allocate memory dynamically
- Add default value for sample that do not have any data
- Fix bug where the sample data threshold was set to be required yield q30 instead of required yield
- Trigger automatic review after sample finishes processing
- Project process duplicated line bugfix
- Update requirements for sample to be ready for processing
- prevent different insert sizes from causing pipeline to crash
- Update the field name for required yield/yieldq30/coverage.
- Fix temporary directories used by picard.
- In demultiplexing pipeline: align all run elements to their respective default genome. Calculate duplicate rate and coverage
- Each sample processing is recorded as a step in the LIMS
- bug fix in run_qc
- buf fix in project process
- Fix bugs in project process
- Rearrange outfile format of GEL relatedness file
- Add Parser for Peddy and Relatedness
- Improvements to integration_test, including more flexible checking of outputs and and retention of data/logs
- Tools versioning: config can now list multiple tool path for multiple version. Toolset config file chose the appropriate version for specific pipeline
- Add md5 files for project process output file
- Original fastq files are now kept when fastq filtering is enabled
- Duplicated unmapped read generated in bcbio are removed
- Small fix to support user-prepared libraries
- Fixed fastq_filterer stats file bug and removed workaround from 0.15.1
- Bug fixes in pipeline stage reporting
- Samples to be passed through variant calling can now be marked
Variant Calling
orVariant Calling gatk
- Added genotype relatedness check with Peddy
- Change tools writing to /tmp in Samtools depth and Genotype Validation.
- Temporary fix to make sure the fastq_filterer stats file is present to be parsed by RunCrawler.
- All sample and project processes are now segmented using Luigi
- Allow filtering/trimming of bad quality runs in demultiplexing
- Fix analysis driver --stop and Error handling
- Removed need for SampleSheet to exist for a Run to be picked up
- Fix samplesheet generation when one sample is repeated over multiple lanes
- Continuously check failed bcl file to make sure they really are failed
- Fix BCL validation bug in previous version
- Add function to retrieve run metadata from the LIMS in RunDataset, generate the Samplesheet from it
- Demultiplexing pipeline is now segmented
- Run processing starts as the first files arrives from the sequencer
- Bcl validation runs throughout the sequencing
- Fix bug in species contamination
- Replace bamtools stats with samtools stats
- fix bcbio alternative genome version
- Support for multiple versions of genome per species (configuration file change required)
- Refactor SampleSheet and RunInfo
- Support for Seqlab2 sample sheets
- Archive the output at the end of the process
- Running pipelines through the Luigi task runner
- BCBio option for variant calling with Freebayes
- infrastructure for batch processing of entire projects
- use of new analysis_driver_stages endpoint in Reporting-App
- split driver into multiple files, contained in pipelines
- now using egcg_core for notifications
- updates to Readme.md
- refactoring throughout
- Add a try/except statement to catch taxa where taxids are unavailable from the E.T.E TAXDB
- Using our own fastq filterer after bcl2fastq instead of Sickle
- Added a genome size calculation
quiet=True
has been added to Rest API calls indataset
, so it should no longer spam the logs in debug mode- Basic Lims information is now being gathered and pushed to the reporting app early in the pipeline
- Updated EGCG-Core to v0.5.
- Various refactors in contamination_checks, dataset, dataset_scanner, report_crawlers
- Using our own fastq filterer after bcl2fastq instead of Sickle
- Added a genome size calculation
- quiet=True has been added to Rest API calls in dataset, so it should no longer spam the logs in debug mode
- Basic Lims information is now being gathered and pushed to the reporting app early in the pipeline
- Updated EGCG-Core to v0.5.
- Various refactors in contamination_checks, dataset, dataset_scanner, report_crawlers
- ti/tv and het/hom ration calculation
- bases coverage at 5, 15 and 30X + percentile