-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash during alignment stage after 3.1 update #57
Comments
I tried to replicate the issue, and found that we don't have proteins for the taxonomy branch of your sequence. I see that you supplied the proteins yourself, but as far as I can tell it does not work as reliably as if you have the taxonomy branch covered by us. If you can provide me with a link to download proteins I can try to replicate it again. Thanks, Victor. |
Hi Victor, Thanks for getting back! Attached is the proteome I used. Originally I had a collection of UniProt formatted proteins for all Trypanosomatids (~500MB) but that proved to be too much, so I restricted it to a functional annotation I previously performed for this species. Side note, these organisms don't have introns, is there a way to account for this in the annotation process? |
Thanks, I will try it again with your protein data. About introns - I don't know, but I will ask my colleagues. It should have a way - we annotate many different kinds of organisms. I will ask around. |
protists including Trypanosomes are currently out-of-scope for EGAPx, as stated on the home page. It's not just a matter of the protein sets -- we need to do additional development to adequately support protists and fungi. It's on our roadmap, but it'll likely be a while before we are ready to support Trypanosomes. That doesn't explain the |
I did have a run with your proteins and NCBI's version of sequence and SRA reads. It ran through STAR successfully, and even memory requirements for it were not extreme. It failed later for me suggesting that the sequence have too many similarities to proks, so it is probably contaminated. But anyways as already mentioned, we don't support this taxonomy branch yet, so even if it runs successfully after using our another product, FCS (Foreign Contamination Screening) the results are not going to be valid. |
Yeah, I did see the lack of support Trypanos but wanted to see if I could sneak it through anyway. Perhaps this is why I was seeing a slightly different failure in v3.0 for gnomon training. I have a full annotation for this species already but having a not fun time getting it to table2asn standards. Mostly this was an easy (hopefully) test run before trying to push through several much larger genomes with our consortium project. We'll likely have to run most of those on the HPC, but it would be nice to be able to do some of the smaller ones locally. I can rerun the example files to see if the run_star failure persists for v3.1 and report back if you think that will be helpful. |
Thanks, Victor! Any thoughts as to why I'm having the run_star failure with 3.1 and did not have it with 3.0? |
It maybe an accidental fault in STAR, the error of this kind can happen if STAR failed and samtools can't read a full data chunk. On the other hand, it should be retried and if it is just a fluke it should complete. I don't see theese retries in your run.trace.txt file. Can you send me the config file you have for Singularity, please? And what are the parameters of machine you're running it on, CPUs, RAM? |
If I remember correctly, I tried to continue after the first time it failed, then it failed again, so I deleted everything in the working directory, deleted the project directory, and started everything fresh after a restart and general update check. I set the docker config to 31 CPUs and 120 GB RAM and then set a 20GB swap. When running on 3.0, there seemed to be no problems until the end after completing ~480 tasks. The only thing that might be strange is that I have Nextflow installed as a mamba environment stacked on the python environment for egapx, but it seemed to work fine until 3.1 was installed. The only other thing I found was my samtools version was slightly out of date, so I just updated. I didn't change anything in the Singularity config file so it just says: I did edit the docker config file to: |
Hello! I was running 3.0 almost successfully yesterday but it crashed during gnomon training so I updated to 3.1. Today it appears to not make it through the alignment stages with the following error:
ERROR ~ Error executing process > 'egapx:rnaseq_short_plane:star:run_star (4)'
Caused by:
Process
egapx:rnaseq_short_plane:star:run_star (4)
terminated with an error exit status (3)I'm attempting to annotate a small trypanosomid genome (~34MB) with a proteome and ample RNAseq data. I was able to execute the example files with no problems for 3.0, though I haven't checked for 3.1. Attached are the various log files.
Thank you!
issue.zip
The text was updated successfully, but these errors were encountered: