Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fatal INPUT FILE error, no valid exon lines in the GTF file: genes.gtf when using --genome GRCh38 #129

Closed
ianyfchang opened this issue Jun 8, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@ianyfchang
Copy link

Description of the bug

When running nf-core/circrna with --genome GRCh38, the error happended.
I checked the fasta.fa downloaded from AWS, the chromosome name is not what I expected. The top 10 chromosome names:

$ ~/circrna_test/work/b0/e5217e456114a3a1f67f633ddf2687$ grep ">" fasta.fa

chr1gi:568336023LN:248956422rl:ChromosomeM5:6aef897c3d6ff0c78aff06ac189178ddAS:GRCh38
chr2gi:568336022LN:242193529rl:ChromosomeM5:f98db672eb0993dcfdabafe2a882905cAS:GRCh38
chr3gi:568336021LN:198295559rl:ChromosomeM5:76635a41ea913a405ded820447d067b0AS:GRCh38
chr4gi:568336020LN:190214555rl:ChromosomeM5:3210fecf1eb92d5489da4346b3fddc6eAS:GRCh38
chr5gi:568336019LN:181538259rl:ChromosomeM5:a811b3dc9fe66af729dc0dddf7fa4f13AS:GRCh38hm:47309185-49591369
chr6gi:568336018LN:170805979rl:ChromosomeM5:5691468a67c7e7a7b5f2a3a683792c29AS:GRCh38
chr7gi:568336017LN:159345973rl:ChromosomeM5:cc044cc2256a1141212660fb07b6171eAS:GRCh38
chr8gi:568336016LN:145138636rl:ChromosomeM5:c67955b5f7815a9a1edfaa15893d3616AS:GRCh38
chr9gi:568336015LN:138394717rl:ChromosomeM5:6c198acf68b5af7b9d676dfdd531b5deAS:GRCh38
chr10gi:568336014LN:133797422rl:ChromosomeM5:c0eeee7acfdaf31b770a509bdaa6e51aAS:GRCh38

Command used and terminal output

Command used: nextflow run nf-core/circrna -r dev --input sample_sheet_for_circrna.csv --outdir circrna_output  -profile docker  --phenotype phenotype_for_circrna_diff_expr.csv --module 'circrna_discovery,mirna_prediction,differential_expression' --tool_filter 2 --tool 'ciriquant,circexplorer2,find_circ,circrna_finder,mapsplice,dcc,segemehl' --genome GRCh38

Terminal output:
ERROR ~ Error executing process > 'NFCORE_CIRCRNA:CIRCRNA:PREPARE_GENOME:STAR_GENOMEGENERATE (fasta.fa)'

Caused by:
  Process `NFCORE_CIRCRNA:CIRCRNA:PREPARE_GENOME:STAR_GENOMEGENERATE (fasta.fa)` terminated with an error exit status (104)


Command executed:

  samtools faidx fasta.fa
  NUM_BASES=`gawk '{sum = sum + $2}END{if ((log(sum)/log(2))/2 - 1 > 14) {printf "%.0f", 14} else {printf "%.0f", (log(sum)/log(2))/2 - 1}}' fasta.fa.fai`
  
  mkdir star
  STAR \
      --runMode genomeGenerate \
      --genomeDir star/ \
      --genomeFastaFiles fasta.fa \
      --sjdbGTFfile genes.gtf \
      --runThreadN 24 \
      --genomeSAindexNbases $NUM_BASES \
      --limitGenomeGenerateRAM 154518822656 \
      --sjdbOverhang 100
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_CIRCRNA:CIRCRNA:PREPARE_GENOME:STAR_GENOMEGENERATE":
      star: $(STAR --version | sed -e "s/STAR_//g")
      samtools: $(echo $(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*$//')
      gawk: $(echo $(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*$//')
  END_VERSIONS

Command exit status:
  104

Command output:
        STAR --runMode genomeGenerate --genomeDir star/ --genomeFastaFiles fasta.fa --sjdbGTFfile genes.gtf --runThreadN 24 --genomeSAindexNbases 14 --limitGenomeGenerateRAM 154518822656 --sjdbOverhang 100
        STAR version: 2.7.10a   compiled: 2022-01-14T18:50:00-05:00 :/home/dobin/data/STAR/STARcode/STAR.master/source
  Jun 08 01:49:15 ..... started STAR run
  Jun 08 01:49:15 ... starting to generate Genome files
  Jun 08 01:49:53 ..... processing annotations GTF

Command error:
        STAR --runMode genomeGenerate --genomeDir star/ --genomeFastaFiles fasta.fa --sjdbGTFfile genes.gtf --runThreadN 24 --genomeSAindexNbases 14 --limitGenomeGenerateRAM 154518822656 --sjdbOverhang 100
        STAR version: 2.7.10a   compiled: 2022-01-14T18:50:00-05:00 :/home/dobin/data/STAR/STARcode/STAR.master/source
  Jun 08 01:49:15 ..... started STAR run
  Jun 08 01:49:15 ... starting to generate Genome files
  Jun 08 01:49:53 ..... processing annotations GTF
  
  Fatal INPUT FILE error, no valid exon lines in the GTF file: genes.gtf
  Solution: check the formatting of the GTF file. One likely cause is the difference in chromosome naming between GTF and FASTA file.
  
  Jun 08 01:49:57 ...... FATAL ERROR, exiting

Work dir:
  /home/ian/NGSDATA_FOR_NEXTFLOW/Dr.LaiCH_RNASeq/work/b0/e5217e456114a3a1f67f633ddf2687

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`


 -- Check '.nextflow.log' file for details
ERROR ~ Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting

 -- Check '.nextflow.log' file for details

Relevant files

No response

System information

Nextflow version : 24.04.2
Hardware: Linux desktop
Executor: local
Container engine: Docker
OS: centos 7
Version of nf-core/circrna: dev

@ianyfchang ianyfchang added the bug Something isn't working label Jun 8, 2024
@nictru
Copy link
Contributor

nictru commented Jun 8, 2024

Please consider downloading your own reference genome data as described here

@nictru nictru closed this as not planned Won't fix, can't repro, duplicate, stale Jun 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants