-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems with empty annotation intersection #150
Comments
Hey, |
Hey, I used the default GTF file provided by eGenomes, which I believe should have the correct format. Regarding the pipeline, I did update it using nextflow pull nf-core/circrna, but it's possible that the update didn't complete properly due to issues with the HPC environment. I'll look into it to ensure the pipeline is fully updated. Thanks, |
I am sure the GTF will have the correct format; otherwise, errors will look different. The problem occurs because the GTF contains regions on sequences not present in the FASTA file. This problem will also occur on the latest pipeline version, as I have not yet had time to fix it - this was just a side note. EDIT: This message was a mixup - forget about it |
The FASTA file is also provided by eGenomes... |
Oh I'm sorry, I got mixed up between two issues. This issue does not have anything to do with the FASTA file. The one with the FASTA file compatibility problems is #151. Still, the error you encounter is due to missing |
I tried using another GTF file and encountered an error while running CIRIquant because it is unable to find the GTF file, whereas other tools, such as circRNA_finder, are able to do. I have written about the issue in #155 . Please feel free to delete or close that entry if you prefer to resolve the issue here. Thank you very much for your time and assistance. |
This error persists despite using different GTF files. Could it be because there are no circRNAs in those samples? |
You are absolutely right, this can also occur if no circRNAs are found. I should have thought about this earlier. You can confirm this is the case by switching to If it is really the case, I will implement a clear error message pointing this out for future users. |
I cannot find the GTF file in that directory, but the intersect.bed file is empty. |
Yes okay, this is the reason then. Is the data you used confidential? Otherwise I would like to use it as test data for coming up with a clean solution |
Hey @ZabalaAitor, please re-execute the pipeline with the branch connected to the PR I just opened (#159) and provide me with the updated error message |
An error occurred due to the absence of transcript_id in the rows where the flag equals gene in the GTF (Gene Transfer Format) file. Furthermore, you are inquiring about which branch, between dev and 150-problems-with-empty-annotation-intersection, should be regarded as the most updated one. df_incomplete = df_incomplete[df_incomplete != ""] 1 1223243 1223968 1:1223243-1223968:- 11.0 - 1 ensembl_havana gene 1216908 1232067 . -. gene_id "ENSG00000078808"; gene_version "18"; gene_name "SDF4"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; |
Description of the bug
Hello,
I am trying to run nf-core/circRNA on sncRNA samples, and I encountered an error during the annotation part for some of the samples. I noticed that the samples with errors have an empty intersect.bed file.
I am wondering what information is supposed to be in the intersect.bed file and what biological reasons could cause it to be empty.
Thank you very much,
Aitor Zabala
Command used and terminal output
Relevant files
No response
System information
Nextflow: 23.04.2
Hardware: HPC
Executor: slurm
Conatiner: Apptainer
OS: Linux
nf-core/circrna: dev
The text was updated successfully, but these errors were encountered: