Skip to content

Releases: vastgroup/vast-tools

v2.5.1

26 Jun 14:02
Compare
Choose a tag to compare

New VASTDB library release to improve PSI quantifications.

NEW

  • A new release of VASTDB libraries for all species (vastdb.sp.23.06.20).

    • These new libraries only involve changes in two files (Sp_COMBI-M-50-gDNA.eff and Sp_COMBI-M-50-gDNA-SS.eff), which are used by combine to quantify PSIs for alternative exons through the "a posteriori" and "annotated" modules, as well as for ALTA and ALTD events.
    • In these new files, a subset of pre-built exon-exon junctions was identified as likely to result in spurious mappings and thus excluded from the PSI quantifications by setting their mappability to 0. Spurious exon-exon junctions were identified using large sets of RNA-seq data for each species based on their mapped read distribution along the junction (in such spurious junctions, mapped reads accumulate only in the first or last positions of the exon-exon junction).
    • These new libraries are thus expected to reduce "skipping noisy" in PSI estimates, particularly for Hs2 and Mm2 (with nearly no effect for Hsa and Mmu).
    • There is no need to rerun align to generate updated PSI estimates. Re-running only combine on an existing align output is enough.
    • Different vast-tools versions and VASTDB libraries are retro- and forward- compatible.
    • To report version information in your Methods section, simply state the vast-tools and VASTDB library version you have used for combine (e.g. "we have used vast-tools v2.5.1 with the VASTDB library vastdb.hs2.23.06.20"). This information is stored in the VTS_LOG_commands.txt file.
  • combine has a new option (--add_version) to add the vast-tools version to the output INCLUSION file name (e.g. INCLUSION_LEVELS_FULL-hg38-4-v251.tab). This option is off by default.

Updates and fixes

  • These new libraries also include updated exon VastIDs for Hs2 and Mm2 (~500 and ~300, respectively) to more accurately match the lifted ones from Hsa and Mmu.

  • Default value for --extra_eej option in combine has been set from 10 to 5.

v2.5.0

12 Jun 13:47
Compare
Choose a tag to compare

Correction for stack reads

NEW

  • combine now performs a new correction for reads that disproportionately map to individual positions of exon-exon junctions. This change may affect PSIs of exons quantified through the COMBI ("a posteriori") and ANN ("annotation") sub-modules, and to ALT3 and ALT5 events, and it is expected to reduce false positive calls.

v2.4.2

29 May 09:48
Compare
Choose a tag to compare

Improvements in speed and performance.

Improvements

  • A batch of improvements has been implemented in combine to increase computing speed. First, combine can take up to 6 cores to parallelize the generation of inclusion tables. Second, the quantification of pseudo-PSIs for ALTA and ALTD events (added in version v2.2.1, and used for the --min_ALT_use option) has been simplified to reduce computing time. It currently uses more local quantification, as the ANN module. PSIs of individual alternative splice sites do not change. Finally, a few complex events were removed.

  • align includes a sanity check when paired reads are provided and it will die if the number of R1 and R2 reads are different.

  • It is now possible to provide group names in compare (otherwise, they will still be automatically generated from the first replicate name).

Updates and fixes

  • Silent change in align to use a more efficient hash system to store counts by position. This has seemingly given issues to a user with hg38.

v2.4.1

16 Apr 13:37
Compare
Choose a tag to compare

VASTDB version control and minor updates

NEW

  • align and combine now print the VASTDB version that is used in the LOG file. The VASTDB file for each species now includes a file called VASTDB_VERSION which contains the version of the library (e.g. "vastdb.hs2.20.12.19"). If you had already downloaded version of VASTDB, you can simply add a file called VASTDB_VERSION into the VASTDB/Sp/ folder and add the version name on it.

  • The help messages of align and combine now print the available species for vast-tools in the local VASTDB installation.

Updates and fixes

  • For ALTA events, the alternative splice acceptors separated by "+" in the full coordinate (FullCO, 5th column in the INCLUSION table) are now sorted differently to make them consistent with the sorting of ALTD events. In both cases, the coordinates are now sorted from internal to external.

v2.4.0

10 Apr 10:16
26e83ec
Compare
Choose a tag to compare

Updates on species and assemblies.

NEW

  • All modules that require the species key to be provided (i.e. align, combine, merge with the --expr option, and compare with --GO option) now take the assembly (e.g. hg19, mm10, danRer10) as preferred input. For instance, instead of using -sp Hsa, it is possible and recommended to provide -sp hg19 instead. However, the 3-letter species key can still be provided as in previous versions. vast-tools will still use the 3-letter species key internally, but the output tables will show the assembly instead of the species key. For instance, an inclusion table named INCLUSION_LEVELS_FULL-Hsa5.tab will now be named INCLUSION_LEVELS_FULL-hg19-5.tab.

  • VASTDB libraries for new species and assemblies have been released (more information about them in README):

    • Homo sapiens (hg38, Hs2).
    • Mus musculus (mm10, Mm2).
    • Bos taurus (bosTau6, Bta).
    • Gallus gallus (galGal4, Gg4).
    • Xenopus tropicalis (xenTro3, Xt1).
    • Arabidopsis thaliana (araTha10, Ath).
  • In particular, newer versions for human and mouse are now available, with many more events. These have been built already in hg38 (internal species key "Hs2") and mm10 (internal species key "Mm2"), respectively. As with updates for other species, the EventIDs are lifted and maintained across versions.

  • The option -a in combine has been deprecated. This option lifted the hg19 and mm9 coordinates in the INCLUSION tables to hg38 and mm10. To avoid issues with the new assemblies, this option is now called -lift_coord and it applies only to hg19 and mm9 (Hsa and Mmu). If used, the name of the INCLUSION table will be, e.g. INCLUSION_LEVELS_FULL-hg19-3-lifted_hg38.tab.

  • The installer script install.R has been updated to better accommodate the larger number of available species and assemblies (currently 17).

  • PSI quantifications and the content of the files do NOT change in this version.

Updates and fixes

  • Correction in compare_expr of a bug due to which a few genes were missed in the GO backgrounds.

  • Change in tidy: when using groups, the --min_SD is calculated for the two groups together, not each group individually.

v2.3.0

22 Dec 12:18
7e74797
Compare
Choose a tag to compare

Changes in combine (related to ANN exons) and compare (related to ALTA and ALTD events). This release includes updated VASTDB files for all species.

NEW

  • compare implements a different logic to define differentially used alternative donors (ALTD) and acceptors (ALTA). In previous versions, to make it more comparable to AltEx and IR events, only the donor/acceptor sites that resulted in inclusion of alternative sequence (i.e. length >= 0 or excluding the most internal [1/X] sites) were considered, and they could go up or down in the comparison. While this is useful for events involving only two alternative sites, it is not optimal for ALTA/ALTD involving more sites. Therefore, compare now evaluates by default each alternative site independently and only reports those with increased usage. The previous mode can still be invoked by using the option --legacy_ALT .

Updates and fixes

  • combine includes some improvements and fixes with respect to ANN exons:
  1. A bug was corrected that reported incorrect C1 (reference upstream) and C2 (reference downstream) exon coordinates for ANN exons in the FullCo column, especially for genes in the positive strand. Given that vast-tools uses multiple neighboring exon-exon junctions for quantification of PSIs, this has little to no effect in the quantification of the vast majority of ANN exons (most of which have PSI ~ 100). Other events are not affected.
  2. The way to define the closest local upstream donor and downstream acceptor for PSI quantification is slightly modified to make exons with and without associated ALTA/ALTD events more comparable. In addition, the default number of additional donors/acceptors to consider for PSI quantification (--extra_eej ) was set to 10, instead of 5.
  3. Some exons were deprecated in the current VASTDB files due to various new filters applied (more strict coordinate overlaps, overlap with ALTA and ALTD events and alternative polyAdenylation/start sites).

COMPATIBILITY NOTES: While it is recommended that the new VASTDB libraries are installed (vastdb.*.20.12.19.tar.gz), the old libraries give nearly identical results when used with vast-tools v2.3.0. Similarly, old versions of combine also give very similar results if run with the new VASTDB libraries. That is: different versions of combine and VASTDB libraries are backward compatible. Older libraries are also accessible in the github webpage (README). Only the following files have changed for each species (Sp) VASTDB folder: Sp.Event-Gene.IDs.txt, New_ID-Sp.txt.gz, Sp.ANNOT.Template.txt and Sp.FULL.Template.txt.gz, and
lftOvr_dict_from_hg19_to_hg38.pdat (Hsa) and lftOvr_dict_from_mm9_to_mm10.pdat (Mmu).

  • The species key for chicken galGal3 in the new VASTDB library is Gg3 (formerly Gga).

v2.2.2

12 May 16:11
Compare
Choose a tag to compare

This is a minor update incorporating a new function in vast-tools secondary modules.

NEW

  • compare and tidy have a new option, --noB3, by which exons are filtered out if they contain 0 reads supporting the upstream or downstream inclusion and at least 15 supporting the other. These cases are usually due to alternative first or last exons that in some cases behave as true cassette exons.

  • To allow the --noB3 option, the fourth score for alternative exons (excluding those quantified by the microexon pipeline), quantifying the imbalance between inclusion read sets, now includes the B3 score (exons now considered B3 were previously considered B2). In addition, the third and fourth scores of other event types have been modified to provide more meaningful information (specifically, the number exon-exon junction reads). These format changes are silent for all modules, as this information is not used by any module. See README for further information about the scores.

Updates and fixes

  • IR files run with IR_version 1 with release v1 or previous were not compatible with the current combine module.

v2.2.1

05 May 22:26
Compare
Choose a tag to compare

NEW

  • The version of vast-tools used by each module is now printed when each process starts. In addition, a log file (VTS_LOG_commands.txt) is created in the output folder and registers version, date and main options used for each vast-tools run.

  • compare has a new option to remove Alt3 (ALTA) and Alt5 (ALTD) events with differential splice site usage if their overall impact in the transcript pool is predicted to be minor. Therefore, compare now requires that the alternative splice sites belong to an exon with a minimum inclusion level (~PSI) across ALL compared samples. This minimum PSI is set to 25 by default, and can be modified using the --min_ALT_use option. The equivalent value to previous versions is --min_ALT_use 0.

v2.2.0

12 Apr 11:24
ef9889f
Compare
Choose a tag to compare

NEW

Updates and fixes

  • The VASTDB libraries for zebrafish (Dre) and sea urchin (Spu) have been updated (http://vastdb.crg.eu/libs/ vastdb.dre.01.12.18.tar.gz and http://vastdb.crg.eu/libs/vastdb.spu.01.12.18.tar.gz). A few new microexons are now included in the MIC module.
  • Quality control to make sure the expression files have the same number of genes when combining.
  • Improvements in README and help messages.
  • Various other minor fixes and improvements.

v2.1.3

17 Oct 19:42
0e72448
Compare
Choose a tag to compare

NEW

  • compare has a new option, --use_int_reads, to increase the stringency when calling differentially regulated intron retention events. It requires that the average number of corrected intron body read counts of the group with the higher PIR is at least 0.4 times the average of the exon-intron junction reads (used to calculate PIR). This fraction can be modified using --fr_int_reads. Intron body reads come from mapping reads to 200bp in the middle of the intron, or the whole intron when shorter. (This mapping was already implemented from the first release of vast-tools, but only used when doing the binomial test for the balance score [5th score in IR]. Therefore, there is no need to re-run align to have access to this feature, only combine; see next).

  • combine has a few changes to include the corrected number of reads in the quality score for IR (3rd score). Also, the 4th score now has the corrected number of read for EI, IE and EE, not the raw counts.