Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ntmax not obeyed #340

Open
digitalwright opened this issue Oct 30, 2024 · 7 comments
Open

ntmax not obeyed #340

digitalwright opened this issue Oct 30, 2024 · 7 comments

Comments

@digitalwright
Copy link

I am using IQ-TREE2 on a computer grid that monitors CPU usage. If a process exceeds the number of requested threads it will error, even if more than that number of threads are available. I am receiving errors for IQ-TREE2 (v2.3.6) exceeding the number of threads when using "-T AUTO -ntmax 1". I believe a step within IQ-TREE2 is not obeying ntmax. Looking at the output, it might be reading in the file:

Kernel: AVX+FMA - auto-detect threads (8 CPU cores detected)
Reading alignment file X.fas.gz ... Fasta format detected
Reading fasta file: done in 7.37892e-05 secs using 48.79% CPU

The output above suggests all 8 CPUs may have been used to read the file even though ntmax was 1. Thanks in advance for taking a look at this issue.

@roblanf
Copy link
Collaborator

roblanf commented Oct 31, 2024

@digitalwright I don't have the definitive answer here, but I have two suggestions. First, the core detection is automatic, and just asks a question of the machine about how many cores exist. So that doesn't in itself indicate CPU usage. Second, the reading of the alignment used ~50% of a CPU, also suggesting that IQ-TREE was obeying ntmax in this case.

One thing I would try (while waiting for @bqminh or others who actually know the answer to reply - it's a busy time here at the moment) is to remove -T AUTO. That serves no function when -ntmax is 1, and could be overriding it (though I'd be surprised if that were the case!)

Can you try that and report back? Also if you can show the full command line, input files, and output files, that will be helpful for diagnosing the issue.

@digitalwright
Copy link
Author

Thanks @roblanf

I will try to obtain a reproducible case and report back to you if I figure it out. It only happens with jobs running iqtree2. The grid manager seems to catch this issue stochastically, so I need to figure out exactly how they are flagging these instances. If you don't hear back from me soon, and don't think the issue stems from iqtree2, please feel free to close this issue.

@corneliusroemer
Copy link
Contributor

corneliusroemer commented Nov 13, 2024

I have a reproduction of what sounds like the same issue: #312 (comment)

I can definitely see something like 210% CPU in htop which should never happen if --max-threads was respected

@digitalwright how are you invoking iqtree? As part of augur tree? we set -T auto by default there but you can override it by passing -T 1 or whatever you want. This is how I worked around the issue.

@roblanf
Copy link
Collaborator

roblanf commented Nov 14, 2024

@corneliusroemer that sounds like a good option to try. If I get a chance I'll try something here as well - different combinations of -ntmax and -T to try and help narrow down the problem.

@corneliusroemer
Copy link
Contributor

Just to share what some initial investigation revealed: I could reproduce my reproduction on current master, on 2.3.6 and 2.3.5. So it doesn't seem to be a very recent change.

@digitalwright
Copy link
Author

digitalwright commented Nov 14, 2024

Thank you for looking into the issue, @roblanf and @corneliusroemer

I can confirm that exceeding 1 CPU is reproducible, although our computer grid only catches the issue a fraction of the time. I am guessing this is because 1 CPU is exceeded for a duration shorter than the CPU polling frequency.

I am invoking IQ-TREE2 on the command line with:

./iqtree2 -T AUTO -ntmax 1 --seqtype AA -m JTT+G4 --seed 123 -s path_to_file

The two attached files are examples of culprits.

Caryophyllales_at_cc5667-1.inclade1.ortho1.aln-cln.fas.gz
Hanseniaspora_at_ORTHOMCL623_aa_aln_trimmed.fas.gz

@thomaskf
Copy link
Collaborator

Thanks, @digitalwright, @corneliusroemer, and @roblanf, for looking into the issue and providing reproducible examples.

These examples and information will be very useful for fixing the issue.
I am not available in the coming two weeks, but I can look into the issue and fix it around the end of this month. Thanks again for all the information!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants