Skip to content

Releases: vastgroup/vast-tools

v2.1.2

07 Sep 08:17
Compare
Choose a tag to compare

NEW

  • A new variable (--use_all_excl_eej) in combine allows users to choose an alternative way of quantifying exclusion reads in the splice-site-based module. Together with --extra_eej, it may increase sensitivity, but also the number of false positives.

  • A new variable (--extra_eej) allows defining the number of additional further upstream (for the C1 exons) and further downstream (for the C2 exons) junctions that are considered to quantify exclusion in the annotation-based module as well as in the splice-site-based module if --use_all_excl_eej is active. Default is 5.

  • NOTE: running combine v2.1.2 with default options should provide identical results to v2.1.1.

Updates and fixes

  • Updates in verbose text messages.

v2.1.1

31 Aug 01:59
Compare
Choose a tag to compare

Updates and fixes

  • Further improvements in the the quantification of ANN events in combine aimed at reducing false positives in real RNA-seq samples.
    Respect to v2.0.2, using a human (-sp Hsa -a hg19) sample (SRR3102173), these changes significantly (|deltaPSI| > 5) affect 79/52621 (0.15%) ANN exons with sufficient read coverage, while 99.22% have a |deltaPSI| < 1. NOTE: The original impact summary in release v2.1.0 was incorrect. Please check updated notes for comparison.

v2.1.0

30 Aug 18:05
Compare
Choose a tag to compare

NEW

  • The ANNOT (annotated exons; ANN) module from combine uses a slightly different strategy to define complex skipping reads, which may result in different PSIs for some events.
    In a human (-sp Hsa -a hg19) sample (SRR3102173), it significantly (|deltaPSI| > 5) impacts 86 (0.17%) ANN exons, while 99.16% have a |deltaPSI| < 1 in v2.1.0 respect to the previous version. This change has been implemented since it performed better with reads simulating transcripts with random skipping of constitutive exons. Therefore, it may improve quantifications particularly for 'artificial' conditions such as KDs of RNA binding proteins. NOTE: while it has been shown to decrease the false negative rate, it might slightly increase the false positive rate in real biological samples.

  • A new module compare_expr has been included to identify differentially expressed genes based on fold changes of cRPKMs between samples. It uses a similar logic to the one used by compare to identify differentially alternatively spliced events.

  • compare can provide all events (--print_all_ev) and all AS events (10<PSI<90 in at least one compared sample; --print_AS_ev) that pass the coverage criteria used in a given analysis. It can also print different sets of events to facilitate their downstream comparison using Matt (http://matt.crg.eu/) (--print_sets):

    • CS: all events with coverage and constitutively spliced (PSI>95 for AltEx, PSI<5 for IR).
    • CR: all events with coverage and cryptically spliced (PSI<5 for AltEx, PSI>95 for IR).
    • AS_NC: all events with coverage, alternative (10 < av_PSI < 90 in a group) and that do not change between the two conditions (abs(dPSI)< max_dPSI).

Updates and fixes

  • trim5 option added on align to skip the first X nucleotides of the forward read. This is handy when there are ambiguous nucleotides that will result in no mapping in the strand determination step as well as in the gene expression quantification.

  • Minor corrections and bug fixes.

  • Updates in help messages and README.

v2.0.2

16 Mar 14:21
Compare
Choose a tag to compare

Minor bug fixes:

  • Bug fix on merge. When using --move_to_PARTS, merged info files were moved to PARTS/ whereas those of the subsamples were left in the to_combine/ folder. This will be interpreted by combine as if the merged file is not strand-specific. It only affected merges of strand-specific samples when using the --move_to_PARTS option. Any other conditions were handled fine.

  • Bug fix on align with new versions of perl: Experimental pop on scalar is now forbidden.

v2.0.1

07 Mar 11:42
Compare
Choose a tag to compare

This is an important bug fix for align from v2.0.0. Non strand-specific reads were often detected as strand-specific, usually resulting in the loss of mappability for half of the reads. It is strongly recommended that non strand-specific reads mapped using vast-tools v2.0.0 are remapped with v2.0.1. This does not affect prior versions of vast-tools (v1).

v2.0.0

16 Feb 16:02
85e434b
Compare
Choose a tag to compare

NEW

  • align becomes strand-aware. Before mapping, reads are automatically tested to infer whether they are strand-specific or not, and in which direction (FR or RF). Mapping is then performed according to this information. It is possible to run any fastq file in the non-strand-aware mode (--ns), which is equivalent to running align from v1.

  • combine includes a new module that generates PSIs for all annotated exons (provided they fulfill some mappability and read balance requirements; see README for more information). These means the final INCLUSION table now contains tens of thousands of new exons, often with PSI ~ 100 (i.e. constitutive exons). They can be distinguish by the first digit of the event ID (e.g. HsaEX6000001).

  • IMPORTANT NOTE I: these changes require new VASTDB files to be installed. In particular, strand-specific mapping requires different mappability files and the annotation module uses a new template. It is recommended that the entire libraries from version v1 are deleted, and the new libraries simply re-install them from scratch. You may download the new libraries for each available species here (you will only need to untar them afterwards and make sure they are inside VASTDB/):

  • IMPORTANT NOTE II: v2 and v1 align outputs are still relatively compatible for merge and combine. Intermediate outputs from align in v2 a include a *.info file, which contains information about strand awareness. When running merge or combine, vast-tools will first search for all *.info files. For samples with no info file, it will assume they have been mapped in a non-strand-aware manner (e.g. in v1). For combine, each sample is processed according to each sample information; therefore, a final INCLUSION table may include both strand-specific and non-strand-specific samples. For merge, if one sample of a group is non-strand-specific or mapped in the non-strand-aware mode (--ns or in v1), all samples from the group will be treated as non-strand-specific. Obviously, users should keep in mind that merging strand and non-strand-specific RNA-seq samples is risky.

Updates and fixes

  • Quantification of multi-microexon events has been modified so that only reads fully covering a microexon are used to support inclusion. For Spu and Dre, exon-microexon or microexon-exon junction were removed also for simple microexon events. [This update was done to avoid false positive microexon calls that overlap with longer exons]

  • It is possible to obtain the number of counts per exon-exon junction also for the microexon and transcript-based (exskX and MULTI3X) pipelines using the option -ec, --EEJ_counts in align.

  • The non-strand-specific mappability file for the MULTI pipeline in Mmu was updated.

v1.3.0

08 Nov 14:36
Compare
Choose a tag to compare

NEW

  • Two new species have been added: zebrafish, Danio rerio (assembly danRer10; species key: Dre), and sea urchin, Strongylocentrotus purpuratus (assembly Spur 3.1; species key: Spu). The associated VastDB libraries can be downloaded in http://vastdb.crg.eu/libs/vastdb.dre.10.03.17.tar.gz and http://vastdb.crg.eu/libs/vastdb.spu.10.03.17.tar.gz.

  • The coverage scoring for ALTA and ALTD events after combine was changed to better match those of EX and INT events:

    • VLOW: 15 <= X < 25 (previously 10 <= X < 20)
    • LOW: 25 <= X < 40 (previously 20 <= X < 40)
  • Resume option in align. If a run stops before all the steps are finalized properly, align can be run using the option --resume and it will identify the last step finished successfully and resume it from there.

  • New module tidy implemented to filter and clean INCLUSION tables from combine. The output of tidy is a simpler table only with PSIs (no quality scores) for each event that pass certain filters (including coverage in a minimum number of samples, minimum PSI variation across samples, etc). PSIs for samples that do not reach the minimum coverage threshold are converted into NA. The output of tidy is designed to be uploaded directly to R. Finally, summary statistics by sample are provided (`% of events without coverage, etc.).

  • plot can now make plots for cRPKM values for gene expression ( --expr=TRUE ).

Updates and fixes

  • Updates in align:

    • fastA files can be used for all steps.
    • to handle fastq soft links as inputs.
    • to map only to the intron retention libraries ( --onlyIR ).
  • Updates in combine:

    • "Last donor" is used for recursive exon-exon junction generation to improve complex PSI quantification (minor impact).
    • The minimum number of mappable positions per junction for an AltEx event to be valid in combi increases from 1 to 2 (minor impact).
    • Other minor improvements in combi sub-module quantification for AltEx events.
    • Actual number of reads shown after the @ in the quality score when the number of reads is 0, 1 or 2 (before, all round down to 0).
  • merge was updated to handle nested merges and to have an initial check for inconsistencies.

  • Problems with installation of VASTDB folder fixed.

  • Updates in README to incorporate links to VastDB web server and information.

Citation update:

  • Main vast-tools paper, including benchmarking:

Tapial, J., Ha, K.C.H., Sterne-Weiler, T., Gohr, A., Braunschweig, U., Hermoso-Pulido, A., Quesnel-Vallières, M., Permanyer, J., Sodaei, R., Marquez, Y., Cozzuto, L., Wang, X., Gómez-Velázquez, M., Rayón, M., Manzanares, M., Ponomarenko, J., Blencowe, B.J., Irimia, M. (2017). An Alternative Splicing Atlas Reveals New Regulatory Programs and Genes Simultaneously Expressing Multiple Major Isoforms in Vertebrates. Genome Res, 27(10):1759-1768

  • Zebrafish and sea urchin databases:

Burguera, D., Marquez, Y., Racioppi, C., Permanyer, J., Torres-Mendez, T., Esposito, R., Albuixech, B., Fanlo, L., D'Agostino, Y., Gohr, A., Navas-Perez, E., Riesgo, A., Cuomo, C., Benvenuto, G., Christiaen, L.A., Martí, E., D'Aniello, S., Spagnuolo, A., Ristoratore, F., Arnone, M.I., Garcia-Fernàndez, J., Irimia, M. (2017). Evolutionary recruitment of flexible Esrp-dependent splicing programs into diverse embryonic morphogenetic processes. Nat Commun, In press.

v1.2.0

24 Dec 20:43
Compare
Choose a tag to compare

New

  • A new species has been added: planarian, Schmidtea mediterranea (assembly v31). The associated VastDB library can be downloaded in http://vastdb.crg.eu/libs/vastdb.sme.13.11.15.tar.gz. The species key is Sme.
  • It is no longer necessary to provide the read length for quantifying gene expression. It is also not needed that all reads have the same length.

vast-tools v1.1.0

22 Jun 22:08
Compare
Choose a tag to compare

New

  • It is now possible to generate the INCLUSION_LEVELS_FULL table from combine for the newest mouse (mm10) and human (hg38) assemblies. For this, simply provide the assembly version using the -a [hg19|hg38|mm9|mm10] option. The default is still mm9/hg38.
    • Note: combine uses a conversion file for each species that is now included in a new VASTDB version (vastdb.hsa.22.06.16 and vastdb.mmu.22.06.16). If you already have VASTDB installed, the conversion files are also available for download in http://vastdb.crg.eu/libs/PATCH_mm10-hg38.tar.gz. They should be placed in the corresponding VASTDB/Sp/FILES/ folder.
    • Note: vast-tools still operates with mm9 and hg19 VASTDB versions. Only the coordinates are converted in the final output table.

Updates and fixes

  • Fix in merge: incorrect behavior to avoid overwriting merged files if already present.

vast-tools v1.0.0-beta.2

23 Nov 15:51
Compare
Choose a tag to compare

This update contains some bug fixes from v1.0.0-beta.1.

New

  • A new method to calculate intron retention is available (use --IR_version 2 in align and combine). It is a modification of the original one (as described in Braunschweig et al, 2014), but uses multi exon-exon junction read counts for skipping. It provides a more realistic estimate of the Percent Intron Retention (PIR) at the gene level. The original method can still be used with --IR_version 1.

    • New files are needed in the VASTDB/Species/FILES/ folder for the new IR option as well as to obtain gene IDs for GO analysis. Additional files for human, mouse and chicken can be downloaded here: http://vastdb.crg.eu/libs/PATCH_IRv2.tar.gz. To install the patch:
    tar -xzvf PATCH_IRv2.tar.gz
    rsync -av PATCH_IRv2/ /path/to/VASTDB/
    
  • merge: a new module to merge align outputs from multiple subsamples into new sample files.

  • compare: a new module to identify differentially spliced (DS) AS events based on average inclusion level differences. It also provides gene lists for GO analysis and directly plots DS AS events. This module is independent of diff and is not a replacement for it.

Updates and fixes

  • Fix in the calculation of gene expression (cRPKMs) to properly account for read length. A new option --readLen or a specific sample name format (Sample-readLen.fq.gz) is enforced in align.
  • Definition of species (--sp) is enforced in combine.
  • Fix in combine. Some Alt3 and Alt5 AS events were not being outputted if their coordinate matched that of a cassette exon.
  • Fix in ylim setting in plot.
  • Update of install.R for the new VASTDB libraries.
  • Update of documentation.
  • Misc. updates.