Skip to content

Releases: iquasere/reCOGnizer

Increase maximum SMPs per database

08 Aug 19:43
Compare
Choose a tag to compare

Set option -max_smp_vol 1000000 for the makeprofiledb command.

Context: the blast package had an update, and the makeprofiledb tool now outputs a database for each 1000 HMM profiles by default.

Fix on COG2KO

07 Jul 14:43
Compare
Choose a tag to compare

Blocked it for now. So reCOGnizer finishes its workflow.

Major improvements on reporting results

22 Jun 09:55
Compare
Choose a tag to compare

Columns have been standardized to have the same names, regardless of database
For example, COG functional category and cog columns renamed to functional category and DB ID, respectively
Helps to provide a simpler report, with much less NA values

Databases now inputted as comma-separated values

No problem when using one or all default databases (without specifying values), but breaks backwards compatibility, and so version was upped to 1.8.

Also some miscellaneous fixes

Prohibited creating kronas when there is no annotation for the respective database (COG or KOG)
Removed Biopython as dependency

Intermediates now removed

23 May 13:57
93602d1
Compare
Choose a tag to compare

Files in the asn, blast, rpsbproc are again removed.

Fixes in versions

So reCOGnizer can be integrated easily with other tools, versions for krona and Biopython were relaxed.
Because of a previous bug in blast 2.11, version of blast was set to >=2.12.

BLAST version relaxed

20 May 17:12
Compare
Choose a tag to compare

Now can use any blast version, as new ones come fixed from the bug that prevented using newer versions in reCOGnizer

EC numbers obtained from CDD and Smart

10 Mar 16:16
Compare
Choose a tag to compare

EC numbers are now obtained from parsing database descriptions of CDD and Smart.

For Smart, all EC numbers are obained, as they are always respective of the domain described.

In the case of CDD, only EC numbers in the form "(EC:X.X.X.X)" are obtained, as many more EC numbers are reference in other formats that are respective of other proteins in the same domain family, but not respective to the domain in question.

A working Continuous Integration

02 Feb 10:51
Compare
Choose a tag to compare

Added mini cdd.tar.gz with only some HMMs for all databases

New parameter of reCOGnizer, --skip-downloaded, mainly for CI: if set, files already downloaded will be skiped, no longer asking for the files one at a time

Also simplified some intermediate tasks

  • "Organize COGs to each tax ID" is now limited to when taxonomy is relevant
  • cog2ko downloads are simplified: silenced with the -q parameter of wget

Removal of artifacts and bug fixes

18 Jan 15:20
Compare
Choose a tag to compare

Removal of artifacts

Now removes CDD tarball
Now removes all files helper directories: fasta, asn, blast, rpsbproc and tmp
Integrated cog2ec.py code

Bug fixes

Fix on pointing to directory where SMPs are now
Fix on only reporting time in hours, minutes and seconds: now also reports days
Removed redundant asking for resources download

Also changed default of --max-target-seqs from 1 to 20

Now downloads RPSBPROC files

Implemented COG taxonomic workflow

19 Nov 19:01
Compare
Choose a tag to compare

COG annotation can now follow an alternative workflow based on taxonomy.

  • if --tax-file is inputted and --species-taxids is set
  • --species-taxids new parameter, just for this
  • SMPs will each be its own database
  • Tax ID to list of COGs is estimated from NOG.members.tsv
  • If a Tax ID from tax file is present in tax ID to COG, those COGs will be used as reference database