forked from owenjm/damidseq_pipeline
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathchangelog
79 lines (63 loc) · 4.98 KB
/
changelog
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
[v1.4.6]
* New feature: paired-end reads are now handled natively at the fastq level. Use the --paired flag to enable, and your paired reads should be automagically detected and aligned
* Changed: BAM file format is now used natively without SAM intermediates at all levels (also fixes the recent samtools version handling bug)
* Changed: paired-end reads are selected based on the 0x2 bit on the SAM file format bitwise FLAG (reads aligned in proper pair)
* Fixed: paired-end read counts are enumerated correctly
* Removed: --keep_sam and --only_sam flags (as SAM format is no longer used)
* Added: --keep_original_bams flag to retain the non-extended reads BAM file when using single-end sequencing
* Changed: warning on missing chromosomal identities in GATC alignment made more friendly and less confusing
[v1.4.5]
* Added --ps_debug option to explicitly display pseudocounts calculation
[v1.4.4]
* Fixed issue with some externally-generated BAM files
[v1.4.3]
* Better handling of chromosome names
[v1.4.2]
* Small improvements to SAM file handling
[v1.4.1]
* Fixed issues with samtools v1.3 (this version of samtools introduced backwards incompatibilities when using the 'sort' function. damidseq_pipeline now checks for the version number and should support all versions of samtools.)
[v1.4]
* New read-extension method: by default, reads are now only extended as far as the next GATC fragment. Use --extension_method=full to disable this feature and extend every read by the value of --len.
* Output format is now bedgraph by default. Use --output_format=gff to restore the previous default. Changing the default to bedgraph allows users to create TDF tracks directly within the graphical IGV tools, making it easier for end users.
* Minor code cleanups
[v1.3]
* Major bugfix: reads from the minus strand were not being extended correctly during processing. The overall impact is minor (correlation between old and new read extension methods is >0.95) but this new method is technically more accurate.
* added --keep_sam (do not delete the temporary SAM file) option
* added --only_sam (do not generate BAM files) option (both options are intended for debug purposes only)
[v1.2.7]
* New opition --no_file_filters to prevent any filename trimming/filtering (by default input filenames are trimmed to the first underscore)
* Small filename issue fixes
[v1.2.6]
* Now uses File::Basename to handle filenames
* Fixed/cleaned up a number of rare problems with filename handling
[v1.2.5]
* Added --dam and --out_name options. Use these to set the Dam-only control sample and/or a custom output name
* Added more sanity checks
* Minor bugfixes
[v1.2.4]
* Added explicit checks for bowtie2 and samtools executables, and for bowtie2 output
[v1.2.3]
* Fixed a serious error in RPM normalisation calculations (values were inverted) -- please re-run on your samples if you have used this method on them
* Minor code cleanups
[v1.2.1]
* Added ability to process BAM files generated from paired-end sequencing
* Cleaner reporting of missing assembly fragments in GATC files
* Some small bugs fixed
* General code clean-ups
[v1.2]
* Completely re-written normalisation routine based on kernel density estimation
* Genomic coverage is now calculated internally rather than using bedtools (uses much less memory, is slightly faster, and drops the requirement for an external binned windows file)
* Binned window files are no longer required (bins are calculated automatically using the sequence information provided in the BAM headers, and the bin size specified by the --bins command-line option)
* Better handling of GATC fragment files (should prevent hangs/pauses when creating GATC fragment arrays)
* Memory optimisation for large files (greatly reduces usage for processing mouse/human data)
* Added ability to process BAM files directly
* Much better file-handling all round (now takes sample names directly from filenames by default; the option to use an index.txt file remains but is essentially deprecated)
* New option: --norm_method=[kde/rpm] "kde" is the default method using kernel density estimation; "rpm" normalises solely on readcounts/million reads only (the "rpm" method is not recommended except for the very rare cases in which a Dam-fusion protein fails to methylate accessible genomic regions, making kde normalisation is inappropriate)
* Re-written --help output rountines (better formatted and more informative)
* Ability to read gzipped GATC files
* Ability to save sets of defaults to enable quick switching between different genomes (use --save_defaults=[name]; use --load_defaults=[name] to load; use load_defaults=list to list current available options)
* New location for config files (in ~/.config/damid_pipeline/). Existing config file will be migrated automatically
* Various small bugfixes and code clean-ups
** NB: a number of default parameters have changed with this release. It is strongly advised to reset all parameters to the default value with --reset_defaults.
[v1.0]
* Initial release