Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ktUpdateTaxonomy.sh runs into infinite recursion and gets killed at memory limit #192

Open
phyden opened this issue Apr 12, 2024 · 0 comments

Comments

@phyden
Copy link

phyden commented Apr 12, 2024

Hi,

I tried to update taxonomy on our system today and I failed to do so. The command ktUpdateTaxonomy.sh produced following output:

(cge_tools) [seqsphere@wsps1152 cge_tools]$ ktUpdateTaxonomy.sh
Fetching taxdump.tar.gz...
   Fetching checksum...
   Checksum for taxdump.tar.gz matches server.
Extracting taxonomy...
make: *** [/proj/seqsphere/conda/mambaforge/envs/cge_tools/opt/krona/scripts/taxonomy.make:14: taxonomy.tab] Killed
make: *** Deleting file 'taxonomy.tab'

Update failed.
   Building taxonomy table failed (see errors above). Issues can be tracked and reported at https://github.com/marbl/Krona/issues.

Initially I ran the command on a machine with only 15G of RAM, but trying this at a larger compute-node of our system revealed, that even 500G RAM are not enough, as perl starts to "panic" at approx. 268G RAM usage:

panic: memory wrap at /proj/seqsphere/conda/mambaforge/envs/cge_tools/opt/krona/scripts/extractTaxonomy.pl line 87.
Command exited with non-zero status 25
        Command being timed: "/proj/seqsphere/conda/mambaforge/envs/cge_tools/opt/krona/scripts/extractTaxonomy.pl /proj/seqsphere/conda/mambaforge/envs/cge_tools/opt/krona/taxonomy/"
        User time (seconds): 325.75
        System time (seconds): 103.17
        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 7:11.34
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 280181080
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 45
        Minor (reclaiming a frame) page faults: 57501650
        Voluntary context switches: 175
        Involuntary context switches: 822
        Swaps: 0
        File system inputs: 865640
        File system outputs: 8
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 25

The error message points to line 87 of that script, which is part of a recursive function.
I tried to run the same command with the newly downloaded taxdump with my old docker-image, and get the same issue.
Comparing the (incomplete) taxonomy.tab file to my old image, the taxonomy-id which is the first one missing is

49      6       3031712 family  Polyangiaceae

Even though this might be an issue with the NCBI-Taxonomy file, I think that this error should be handled in the extractTaxonomy.pl script, as it looks like the getParent() function receives undef and again returns undef which leads to infinite recursion for this family.

I could not resolve entirely, what's wrong with this family, but I assume, that the issue lies in the rather weird hierarchy of this family, as 3031711 is missing a standard rank.
phylum:Bacteria(2)->phylum:Myxococcota(2818505)->**Polyangia(3031711)**->order:Polyangiales(3031712)->family:Polyangiaceae(49)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant