Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limitted input length. #86

Closed
Fabian-Boehm opened this issue Nov 23, 2023 · 2 comments
Closed

Limitted input length. #86

Fabian-Boehm opened this issue Nov 23, 2023 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@Fabian-Boehm
Copy link

Fabian-Boehm commented Nov 23, 2023

###Description of the bug

From what I understood from the error messages the maximum input length of this pipeline is at 650 bases per read.

For reads longer 650 the RNA sequence seems to be cropped to a length of 650, while the quality string doesn't, leading to inequal quality string and sequence length.

Command used and terminal output

nextflow run /nfs/data3/CIRCEST/pipeline -profile apptainer,cluster -params-file ./configs/params.yaml

PARAMS:
input: './samplesheet.csv'
outdir: './results/'
save_reference: true
save_intermediates:  true
hisat2_build_memory:  '200.GB'
genome:  'WBcel235'
star: null
module: 'circrna_discovery'


ERROR ~ Error executing process > 'NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:STAR_1ST_PASS (elegans_unselected_1)'

Caused by:
  Process `NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:STAR_1ST_PASS (elegans_unselected_1)` terminated with an error exit status (104)

Command executed:

  STAR \
      --genomeDir star \
      --readFilesIn input1/elegans_unselected_1_trimmed.fq.gz  \
      --runThreadN 24 \
      --outFileNamePrefix elegans_unselected_1. \
      --outSAMtype BAM Unsorted \
       \
      --outSAMattrRGline 'ID:elegans_unselected_1'  'SM:elegans_unselected_1'  \
      --chimOutType Junctions WithinBAM --outSAMunmapped Within --outFilterType BySJout --outReadsUnmapped None --readFilesCommand zcat --alignSJDBoverhangMin 10 --chimJunctionOverhangMin 10 --chimSegmentMin 10



  if [ -f elegans_unselected_1.Unmapped.out.mate1 ]; then
      mv elegans_unselected_1.Unmapped.out.mate1 elegans_unselected_1.unmapped_1.fastq
      gzip elegans_unselected_1.unmapped_1.fastq
  fi
  if [ -f elegans_unselected_1.Unmapped.out.mate2 ]; then
      mv elegans_unselected_1.Unmapped.out.mate2 elegans_unselected_1.unmapped_2.fastq
      gzip elegans_unselected_1.unmapped_2.fastq
  fi

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:STAR_1ST_PASS":
executor >  slurm (7)
[54/b6824f] process > NFCORE_CIRCRNA:CIRCRNA:INPUT_CHECK:SAMPLESHEET_CHECK (samplesheet.csv)                                     [100%] 1 of 1 ✔
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:CAT_FASTQ                                                                           -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:PREPARE_GENOME:BOWTIE_BUILD                                                         -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:PREPARE_GENOME:BOWTIE2_BUILD                                                        -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:PREPARE_GENOME:BWA_INDEX                                                            -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:PREPARE_GENOME:HISAT2_EXTRACTSPLICESITES                                            -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:PREPARE_GENOME:HISAT2_BUILD                                                         -
[5b/f8b873] process > NFCORE_CIRCRNA:CIRCRNA:PREPARE_GENOME:STAR_GENOMEGENERATE (genome.fa)                                      [100%] 1 of 1 ✔
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:PREPARE_GENOME:SEGEMEHL_INDEX                                                       -
[00/9b89ee] process > NFCORE_CIRCRNA:CIRCRNA:FASTQC_TRIMGALORE:FASTQC (elegans_unselected_1)                                     [100%] 1 of 1 ✔
[95/3c54c8] process > NFCORE_CIRCRNA:CIRCRNA:FASTQC_TRIMGALORE:TRIMGALORE (elegans_unselected_1)                                 [100%] 1 of 1 ✔
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:SEGEMEHL_ALIGN                                                    -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:SEGEMEHL_FILTER                                                   -
[17/6ccdf3] process > NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:STAR_1ST_PASS (elegans_unselected_1)                              [100%] 2 of 2, failed: 2..
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:STAR_SJDB                                                         -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:STAR_2ND_PASS                                                     -
[39/0919bd] process > NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:CIRCEXPLORER2_REF (genes.gtf)                                     [100%] 1 of 1 ✔
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:CIRCEXPLORER2_PAR                                                 -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:CIRCEXPLORER2_ANN                                                 -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:CIRCEXPLORER2_FLT                                                 -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:CIRCRNA_FINDER_FILTER                                             -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:FIND_CIRC_ALIGN                                                   -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:SAMTOOLS_INDEX                                                    -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:SAMTOOLS_VIEW                                                     -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:FIND_CIRC_ANCHORS                                                 -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:FIND_CIRC                                                         -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:FIND_CIRC_FILTER                                                  -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:CIRIQUANT_YML                                                     -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:CIRIQUANT                                                         -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:CIRIQUANT_FILTER                                                  -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:DCC_MATE1_1ST_PASS                                                -                          [-        ] process > NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:DCC_MATE1_SJDB                                                    -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:DCC_MATE1_2ND_PASS                                                -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:DCC_MATE2_1ST_PASS                                                -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:DCC_MATE2_SJDB                                                    -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:DCC_MATE2_2ND_PASS                                                -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:DCC                                                               -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:DCC_FILTER                                                        -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:MAPSPLICE_REFERENCE                                               -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:MAPSPLICE_ALIGN                                                   -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:MAPSPLICE_PARSE                                                   -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:MAPSPLICE_ANNOTATE                                                -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:MAPSPLICE_FILTER                                                  -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:COUNTS_SINGLE                                                     -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:REMOVE_HEADER                                                     -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:SPLIT_ANNOTATION                                                  -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:ANNOTATION                                                        -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:CAT_ANNOTATION                                                    -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:SORT_ANNOTATION                                                   -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:FASTA                                                             -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:PSIRC_INDEX                                                       -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:PSIRC_QUANT                                                       -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:PSIRC_COMBINE                                                     -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:MIRNA_PREDICTION:TARGETSCAN_DATABASE                                                -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:MIRNA_PREDICTION:TARGETSCAN                                                         -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:MIRNA_PREDICTION:MIRANDA                                                            -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:MIRNA_PREDICTION:MIRNA_TARGETS                                                      -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:DIFFERENTIAL_EXPRESSION:HISAT2_ALIGN                                                -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:DIFFERENTIAL_EXPRESSION:BAM_SORT_STATS_SAMTOOLS:SAMTOOLS_SORT                       -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:DIFFERENTIAL_EXPRESSION:BAM_SORT_STATS_SAMTOOLS:SAMTOOLS_INDEX                      -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:DIFFERENTIAL_EXPRESSION:BAM_SORT_STATS_SAMTOOLS:BAM_STATS_SAMTOOLS:SAMTOOLS_STATS   -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:DIFFERENTIAL_EXPRESSION:BAM_SORT_STATS_SAMTOOLS:BAM_STATS_SAMTOOLS:SAMTOOLS_FLAG... -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:DIFFERENTIAL_EXPRESSION:BAM_SORT_STATS_SAMTOOLS:BAM_STATS_SAMTOOLS:SAMTOOLS_IDXS... -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:DIFFERENTIAL_EXPRESSION:STRINGTIE_STRINGTIE                                         -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:DIFFERENTIAL_EXPRESSION:STRINGTIE_PREPDE                                            -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:DIFFERENTIAL_EXPRESSION:DESEQ2_DIFFERENTIAL_EXPRESSION                              -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:DIFFERENTIAL_EXPRESSION:PARENT_GENE                                                 -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:DIFFERENTIAL_EXPRESSION:PREPARE_CLR_TEST                                            -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:DIFFERENTIAL_EXPRESSION:CIRCTEST                                                    -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:CUSTOM_DUMPSOFTWAREVERSIONS                                                         -
[-        ] process > NFCORE_CIRCRNA:CIRCRNA:MULTIQC                                                                             -
ERROR ~ Error executing process > 'NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:STAR_1ST_PASS (elegans_unselected_1)'

Caused by:
  Process `NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:STAR_1ST_PASS (elegans_unselected_1)` terminated with an error exit status (104)

Command executed:

  STAR \
      --genomeDir star \
      --readFilesIn input1/elegans_unselected_1_trimmed.fq.gz  \
      --runThreadN 24 \
      --outFileNamePrefix elegans_unselected_1. \
      --outSAMtype BAM Unsorted \
       \
      --outSAMattrRGline 'ID:elegans_unselected_1'  'SM:elegans_unselected_1'  \
      --chimOutType Junctions WithinBAM --outSAMunmapped Within --outFilterType BySJout --outReadsUnmapped None --readFilesCommand zcat --alignSJDBoverhangMin 10 --chimJunctionOverhangMin 10 --chimSegmentMin 10



  if [ -f elegans_unselected_1.Unmapped.out.mate1 ]; then
      mv elegans_unselected_1.Unmapped.out.mate1 elegans_unselected_1.unmapped_1.fastq
      gzip elegans_unselected_1.unmapped_1.fastq
  fi
  if [ -f elegans_unselected_1.Unmapped.out.mate2 ]; then
      mv elegans_unselected_1.Unmapped.out.mate2 elegans_unselected_1.unmapped_2.fastq
      gzip elegans_unselected_1.unmapped_2.fastq
  fi

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_CIRCRNA:CIRCRNA:CIRCRNA_DISCOVERY:STAR_1ST_PASS":
      star: $(STAR --version | sed -e "s/STAR_//g")
      samtools: $(echo $(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*$//')
      gawk: $(echo $(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*$//')
  END_VERSIONS

Command exit status:
  104

Command output:
        STAR --genomeDir star --readFilesIn input1/elegans_unselected_1_trimmed.fq.gz --runThreadN 24 --outFileNamePrefix elegans_unselected_1. --outSAMtype BAM Unsorted --outSAMattrRGline ID:elegans_unselected_1 SM:elegans_unselected_1 --chimOutType Junctions WithinBAM --outSAMunmapped Within --outFilterType BySJout --outReadsUnmapped None --readFilesCommand zcat --alignSJDBoverhangMin 10 --chimJunctionOverhangMin 10 --chimSegmentMin 10
        STAR version: 2.7.9a   compiled: 2021-05-04T09:43:56-0400 vega:/home/dobin/data/STAR/STARcode/STAR.master/source
  Nov 23 14:00:32 ..... started STAR run
  Nov 23 14:00:32 ..... loading genome
  Nov 23 14:00:34 ..... started mapping

Command error:

  EXITING because of FATAL ERROR in reads input: quality string length is not equal to sequence length
  @SRR19055922.2.1
  AGAATTGGCTCTAGAGAATGCAGATATCATTGAGGTCGAGACCAAAAAGCCTTACAAGACTAAAGAATAAGAAAAACTGTTTTCACAGCAATAACAGAATTGAAAAGATCCATGATTACGCACCTCACTGGTCTCGAGAATGTCATGCAAGAGCTTTCTCTGTCAAGATAACTTGAAAGAGGTTCCATTCCTCAGCTTTCGCTGGACTCAAGTGCTCAACATTCCAGCCAAATGCAACAAAATCGAAAACATCCACCAAGGCTTTGTCAACATGACTTCCCTCATCGATGTCAACTTGGGATGCAATCAAATCAGCATGGCAGCTGATACTTTCGCCAACGTTCAAGATGTCTCCAGAACTTGATTCTTGATAATAACTGCATGACTGAATTCCCAAGCAAAGCTGTGAGAAACATGAACAACTTGATTGCTCTCAAATATAACAAGATCAACGCCATTAGACAAACGACTTTGTTAACCTCTCCTCCCTCTCCATGCTCTTAATGGAAACATTTTCTTGGCTTTAAAGGAGGAGCCCTCCAGAACCATCCAAATCTTCATATCTGTATTTGAATCAGGAACAATCTGCAAACTCGACAACGGAGTCTTGGAGCAATCAAGCAACTCCTGAGGTTTCGATCTATCTTCAA

  SOLUTION: fix your fastq file

  Nov 23 14:00:36 ...... FATAL ERROR, exiting

Work dir:
  /nfs/data3/CIRCEST/runs/benchmarking/work/17/6ccdf33d1780d19e7b6bbe0973cdf6

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 -- Check '.nextflow.log' file for details

-[nf-core/circrna] Pipeline completed with errors-


### Relevant files


### System information

_No response_
@Fabian-Boehm Fabian-Boehm added the bug Something isn't working label Nov 23, 2023
@nictru nictru self-assigned this Nov 23, 2023
@Fabian-Boehm Fabian-Boehm changed the title Incorrect fastq reading leads to different sequence and quality string lengths. Limitted input length and inequal quality string lenths. Nov 24, 2023
@Fabian-Boehm Fabian-Boehm changed the title Limitted input length and inequal quality string lenths. Limitted input length. Nov 24, 2023
@nictru
Copy link
Contributor

nictru commented Nov 24, 2023

Hey,
the cropping is probably only done in the error message, I can hardly imagine STAR doing something as substantial as this without giving a warning at least.

For the length inequality problem there are some related issues - maybe one of them applies to your data:

However, this problem is most likely not related to our pipeline and should thus be checked by running STAR on your data without the pipeline

@nictru
Copy link
Contributor

nictru commented Jan 13, 2024

Closing this for now, feel free to re-open if the problems persist

@nictru nictru closed this as completed Jan 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants