All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
-
Improved V&V interface
- Plugin support for protocols & checks
- Specification generation via
dpt validation spec
- Generalized validation runner via
dpt validation run
- Manual check interface via
dpt validation manual-checks
- CLI interface implemented using click
-
Started to use logging via loguru
- Microarray specific protocols/checks (now implemented as plugins outside dp_tools)
- Ability to inject columns during runsheet generation
- Microarray (Agilent 1 Channel) V&V protocol
- Pandera as dependency for better validation tooling
- BulkRNASeq runsheet validation enhanced
- Upgraded from Schema to Pandera
- Added checks for dataset metadata columns like 'paired_end'
- Added sanity check for 'read2_path' column optional nature
- Runsheet generation for methlySeq ISA archives
- GLDS API usage now considers the 'OSD' accession ID as the study ID instead of 'GLDS'. This is consistent with the recent release of the OSDR
- Fixes incorrect numeric inferrence for strings (commit: 3b0d953)[https://github.com/J-81/dp_tools/commit/3b0d9537de73363aaa78979b78b3a209c69ccd45]
- Fixes incorrect unit detection for runsheet generation #14
- Stdout logging for scripts, this better explains what is happening during the script
- Missing Microarray technology valid combination and handling of multiple valid combinations
- Staging runsheets failing to extract unit columns
- V&V crash related to factor columns being inferred as numeric. Now correctly inferring as string values.
- Integrity check for gzipped files to bulkRNASeq checks and protocol
- Pinned Pandas version to 1.4.4 (prior: no pin, most recent version installed)
- Version 1.5 causes changes to checksum for pandas objects and would require updating all tests that include a checksum (planned for future)
- Fixing false V&V halt flagging: Add in micro sign as whitelisted (better in sync with r make.names function)
- Expected location of SampleTable.csv and ERCC_SampleTable.csv in
- Fixing false V&V halt flagging: Add in greek characters as whitelisted (better in sync with r make.names function)
- Incorrect detection of has_ERCC from ISA Archives
- Example Impacted GLDS: 161,162,163,173
- Runsheet generation failing for different variations of raw reads data column names
- Example Impacted GLDS: 105,138
- Prior 1.0.0 tagged versions were actually develop style releases
- Moving ahead only production releases will have tags without 'rc' (release candidate) in the name
- Various flag messages improved
- Documentation updated
- Check related to multiQC samples inclusion
- Updated GeneLab filename to url mapping to utilize the GeneLab public API
- Addresses removal of prior-used deprecated endpoints
- Added samtools as needed for certain checks
- check_contrasts_table_rows: message no longer introduces extra newlines into log
- A validation protocol that runs on a BulkRNASeq dataset model
- Includes generation of report files
- A set of multi-stage loaders to create a data model
- Includes: validation system and multiQC powered data extraction
- Tilde characters are not converted to periods in contrasts: https://tower.nf/orgs/GL_Testing_Nextflow/workspaces/Nextflow_RCP_Testing/watch/1t8TfGbpDmCVNK
- this should emulate R make.names behaviour completely
- Data assets tagged with file categories for reporter file export including:
- md5sum table
- curation tables [GeneLab internal use]