Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some level_1 output is not generated #9

Open
alexyfyf opened this issue Jun 6, 2023 · 2 comments
Open

some level_1 output is not generated #9

alexyfyf opened this issue Jun 6, 2023 · 2 comments

Comments

@alexyfyf
Copy link

alexyfyf commented Jun 6, 2023

Hi team,

I am using this as part of https://github.com/epi2me-labs/wf-transcriptomes/
I am able to run make batches, generating 0-48 batches, and the following clustering step failed.
The error message is slurmstepd: error: Detected 3526 oom_kill events in StepId=12206690.batch. Some of the step tasks have been OOM Killed.

But when I examine the log files, all job_level_0 output was generated, but most level_1 output not.
I tried to run the failed script from level_1.sh

isONclust2 cluster -x sahlin -v -Q -l clusters/isONcluster_0.cer -r clusters/isONcluster_1.cer -o clusters/isONcluster_49.cer ; sync

It showed segmentation fault (core dumped).

Loaded input batch from clusters/isONcluster_0.cer:
        Batch number: 0
        Batch range: [0,16883]
        Depth: 0
        Nr sequences: 16884
        Nr bases: 50287830
        Nr clusters: 1
        Nr nontrivial clusters: 1
        Minimizers in database: 22619
Loaded input batch from clusters/isONcluster_1.cer:
        Batch number: 1
        Batch range: [16884,39157]
        Depth: 0
        Nr sequences: 22274
        Nr bases: 50286904
        Nr clusters: 2
        Nr nontrivial clusters: 2
        Minimizers in database: 0
Generating consensus using spoa algorithm: semi-global
Clustering mode: sahlin
Segmentation fault (core dumped)

There are some were successfully run for level_1

isONclust2 cluster -x sahlin -v -Q -l clusters/isONcluster_8.cer -r clusters/isONcluster_9.cer -o clusters/isONcluster_53.cer ; sync
Loaded input batch from clusters/isONcluster_8.cer:
	Batch number: 8
	Batch range: [219434,255621]
	Depth: 0
	Nr sequences: 36188
	Nr bases: 50286190
	Nr clusters: 38
	Nr nontrivial clusters: 38
	Minimizers in database: 23082
Loaded input batch from clusters/isONcluster_9.cer:
	Batch number: 9
	Batch range: [255622,293538]
	Depth: 0
	Nr sequences: 37917
	Nr bases: 50287047
	Nr clusters: 33
	Nr nontrivial clusters: 32
	Minimizers in database: 0
Generating consensus using spoa algorithm: semi-global
Clustering mode: sahlin
Filtered out 0 input clusters smaller than 2.
Finished clustering!
Alignment invocation count: 0 (0%)
Consensus invocation count: 33 (100%)
Number of clusters larger than 1: 38
Output batch statistics:
	Batch number: 8
	Batch range: [219434,293538]
	Depth: 1
	Nr sequences: 74105
	Nr bases: 100573237
	Nr clusters: 38
	Nr nontrivial clusters: 38
	Minimizers in database: 24370
Output batch written to: clusters/isONcluster_53.cer

I noticed the minimizes is 0 for the right cluster, but not sure if this is related. This error caused then all subsequent issues. The file sizes seem small, and I have requested 16GB per core in a slurm management system.
I need some help to run this if you could kindly have a look at the issue.

Thanks a lot.

@alexyfyf alexyfyf changed the title minimizer counts is not read after self clustering some level_1 output is not generated Jun 6, 2023
@ksahlin
Copy link

ksahlin commented Jun 11, 2023

Hi @alexyfyf,

Looks like your job was killed because of Out Of Memory (OOM) reading from slurmstepd: error: Detected 3526 oom_kill events in StepId=12206690.batch. Some of the step tasks have been OOM Killed.

So giving more memory may help. I am not the developer of this tool so I canot give detailed advice or insights on its implementation, but I developed isONclust which should give identical results to isONclust2. So I can help you with isONclust if you also decide to try that tool.

Note though that isONclust2 was developed to improve mainly speed over isONclust, as original isONclust is implemented in Python. How many reads do you have?

@alexyfyf
Copy link
Author

Hi Kristoffer, thank you for your suggestion, I'll try isONclust and see if it runs.
The OOM issue looks weird, as these level 1 files are typically a few MB (definitely <100MB). Not sure why those specific ones failed.

Alex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants