forked from PASApipeline/PASApipeline
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Changelog.txt
228 lines (137 loc) · 10.5 KB
/
Changelog.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
#################
## PASA release v2.4.1, Nov. 9, 2019
-mysql schema updates for compatibility with latest mysql
-pasaweb updates to leverage user perl @INC environment - finds user installed modules.
-plugin compilation updates for macs
-docker adds pasa binary to /usr/local/bin
-datetime printed for each step of the pipeline execution
-transdecoder updated to current version
-db connection updates for mysql to catch 'lost server'
-including SAM_to_gtf.pl for potential minimap2 integration
#######################################
## PASA release v2.3.3, April 27, 2018
bugfix release. If using seqclean for transcript cleaning and for polyA-site identification, this patch release should work for you.
Also updated TransDecoder to latest release v5.2.0
Pipeliner updates to better follow operations
maybe a little more than a bugfix release, but intended to be mostly a bugfix release.
#############
## PASA Release v2.3.2 April 19, 2018
-removed cdbyank from most operations to reduce I/O bound issues w/ large chromosomes.
-updated multithreading around sqlite usage to avoid db locking issues.
########
## PASA Release v2.3.1 April 12, 2018
-bugfix: if users specified both gmap and blat alignments, only blat was being uploaded. This is unique to v2.3.0 and had to do w/ the new pipeliner and checkpointing system. This is now fixed, and both gmap and blat will be leveraged together as in earlier versions of PASA.
#####################################
## PASA Release v2.3.0 Mar 27, 2018
-Integration of SQLite (by Nathan Weeks)
-GTF and GFF3 annotation support.
-PasaWeb updates for using SQLite and lighttpd
-upgraded TransDecoder, now including as a submodule
-restructured build process, relocating cdbtools to pasa-plugins
-option --cufflinks_gtf changed to --trans_gtf to be more generic (ie. use stringtie or other)
-docker now supported for PASA
-pipeliner overhauled, now auto-resume integrated.
-added Docker support
####################################
## PASA Release v2.2.0 Oct 13, 2017
bugfix - split genes are now assigned unique gene identifiers.
The env var PASACONF can be used to point PASA towards a user-specific 'conf.txt' file for user-specific mysql login info.
#####################################
## PASA Release v2.1.0 Feb 28, 2017
Maintenance release
updates for improved compatibility with latest version of mysql and the gmap alignment software.
###################################
## PASA release v2.0.2 May 9, 2015
Minor bugfixes, and the sample data set now runs through properly if the sample mysql db doesn't already exist.
######################
## PASA v2.0.1 Release
compatibility with Trinity 2.0 accessions in comprehensive transcriptome build process (thanks to pull request from Evan Ernst)
######################################
## PASA v2.0.0 Release Feb 23, 2015
PASA is now moved here to github. Much has changed since the original PASA release back in 2003, and so we're going to start this release off with the 2.0.0 version and use semantic versioning from here out.
-GFF3_utils.pm: allow for multiple parent transcript features assigned to a given exon or CDS feature.
########################
Release r20140417
########################
Turned off extended donor consensus requirement for AT-AC introns.
Comprehensive transcriptome build compatible w/ latest Trinity (April 2014 release).
Minor tweaks to the web scripts
Works w/ Just Annotate My Genome (JAMg) // as per Alexie Papanicolaou
#######################
Release: r2013-09-07
#######################
Compatible with '|' in the genome accessions. ex. gi|xxxxx|yyyyyy
######################
Release: r2013-08-14
#####################
Improvements in memory usage during BLAT output processing, comprehensive transcriptome-generation, and assembly description writer.
Parallel cluster reassignment, and faster uploading to mysql.
##############################
# Release: PASA2-r20130605p1
#############################
alignment validation:
-alignment validation is now not case-sensitive (but introduced in the most recent release).
-includes a 'resume' mode in case the process breaks partially into this stage.
PASA_transcripts_and_assemblies_to_GFF3.dbi:
-include 'gene' clustering information in pasa assembly GTF output.
#################################
# Stable Release PASA2-r20130605
#################################
Web portal:
-all the CGI scripts in the PASA web portal are now compatible with PASA2
Alignment validation:
-uses memory-lean indexing of fasta files rather than storing all seqs in RAM. Additional improvements to multi-threading and mem-sharing.
Alignment assembly:
-individual alignment clusters capped at 5k alignments, sampled according to alignment score and alignment position.
BLAT alignments:
-use file-based sorting to reduce RAM requirements.
#############################################
## Beta release of PASA2 on April 25-2013:
#############################################
PASA2-related improvements:
1. both GMAP and BLAT can be run simultaneously, and multiple high quality validating alignments at different genome locations can be leveraged for defining gene structures via PASA assembly.
2. Lots of multithreading. Use the new --CPU parameter to take advantage of multiple threads. Overall, the system is much faster that earlier versions due to parallel processing. Also, database interaction has vastly sped up.
3. Transdecoder (use --TRANSDECODER) can be executed as part of running PASA. Transcripts that appear to be coding and full-length are flagged by PASA and treated accordingly.
4. A new process is available that combines Trinity de novo RNA-Seq assemblies along with genome-based transcript reconstructions (cufflinks and genome-guided Trinity) to generate a comprehensive transcriptome database. This facilitates the identification of 'missing' genes and captures those transcripts that may be insufficiently represented by a draft genome assembly. Instructions for this are available here:
http://pasa.sourceforge.net/#A_ComprehensiveTranscriptome
// Version: PASA2-v20130425beta
-schema restructured, allows for storing multiple mappings per transcript
-run blat and gmap simultaneously
-generic gff3 loader for alignments
-incorporation of Trinity full-denovo special status (--TDN parameter)
-include --CPU parameter and use parallel processing for assembly, validation, alignment
-much faster loading into mysql
-not including mysql user/pass info in pipeline commands, instead using pasa_conf/conf.txt settings internally.
-incorporates transdecoder to identify candidate full-length transcripts
-added process for building a comprehensive genome-based + de novo transcriptome assembly-based trancriptome database.
-including gtf along with bed and gff3 output formats.
// Version: 2012-06-25
-pass max intron length parameter to the alignment validation step.
-removed type=isam from the pasa database schema; caused problems during mysql db installation using the latest mysql version 5.6.
-validated_transcripts.gff3 file now contains the valid transcript alignments. It was inadvertently reporting the PASA assembly structures- only impacted previous release. Thanks to Shengqiang Shu @ JGI for pointing this out.
// Version: May-20-2011
-further refinement to pasa_asmbls_to_training_set.dbi: those ORFs with sum(log-likelihood) scores > 0 are examined to see if the coding score is maximal within the proposed frame of the ORF. If the score is better in any other reading frame, the ORF is excluded. This has been found to be very useful for de novo coding gene annotations based on RNA-Seq data.
-write gff3 and bed files for valid and failed transcript alignments (failing validation, that is) as separate files.
-reintroduced the sim4 chaser option based on user feedback
// Version: Jan-09-2011
-pasa_asmbls_to_training_set.dbi: now scores ORFs based on a Markov model and hexamer composition. These best ORFs (complete or partial) are extracted and reported separately.
-Launch_PASA_pipeline.pl:
-removed the blat/sim4 option and relying entirely on GMAP
-provides a simplified interface for both the alignment-assembly and for annotation comparison steps
-the --MAX_INTRON_LENGTH parameter is moved from the configuration file to a command-line parameter that is forwarded on to GMAP during the alignment stage and separately to the subsequent alignment validation stage.
-the analysis of alternative splicing now is included in the alignment-assembly stage. No reason to run it separately now.
-key output files generated by PASA are now separated from the voluminous other output files, which are now written to a separate log directory.
-BED format is now output along with GFF3 for many of the various alignment and gene structure output files. BED format has excellent support in browsers such as UCSC or IGV.
-annotation updates are improved:
-bugfix wrt reporting merged gene products in addition to the genes that were involved in the merge operation.
-improved logic in handling the merging of annotated genes based on transcript alignment evidence, requiring protein homology-tests to be passed for all input genes with the merged gene structure in addition to exon overlap.
-RNA-Seq: in filtering lowly expressed artifacts resulting from the Inchworm strand-specific assembly process, both valid and invalid best alignments are considered when pruning artifacts. This is important because an invalid alignment (non-consensus splice site or mapping to half the length) might be only partially invalid, but still properly represents the expression level at that locus.
-Sample data and pipeline:
-simplified pipeline execution illustrated in: sample_data/run_sample_pipeline.pl
-Web displays:
-the central status report now indicates the number of protein-coding sequences that change as a result of the annotation update (excluding UTR-only and alt-splice additions).
-since the alternative splicing is now mandatory as part of the alignment-assembly page, the corresponding data are immediately accessible from the alternative splicing report pages.
-in the alternative splicing pages, the transcript identifiers link to the full assembly report for that subcluster (gene).
-General:
-CdbTools will be treated as a third party tool and we'll require that users download and install this from the Dana Farber tools page, instead of including a version within the PASA distro.
-including this CHANGELOG file in the PASA distro to include a record of the updates to the software and related processes.