Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GSEA_GSEA error:: Missing output file(s) *butterfly_plot.png? #143

Closed
paolo-kunderfranco opened this issue Jun 28, 2023 · 6 comments
Closed
Labels
bug Something isn't working

Comments

@paolo-kunderfranco
Copy link

paolo-kunderfranco commented Jun 28, 2023

Description of the bug

Dear All,
When I add gsea option true to run the analysis, I encounter the following problem.

The provided gmt file has gene_name, whereas my input matrix has ensembl gene_id.
Or maybe a issue related to Java?

Any suggestion? thanks


Error executing process > NFCORE_DIFFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:GSEA_GSEA (Cortex_Striatum)

Caused by:
  Missing output file(s) `*butterfly_plot.png` expected by process NFCORE_DIFFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:GSEA_GSEA (Cortex_Striatum)

Command executed:

  # Run GSEA
  
  gsea-cli GSEA \
      -res Brain_CP.gct \
      -cls Cortex_Striatum.cls#Striatum_versus_Cortex \
      -gmx Mouse_Human_Reactome_June_03_2023_symbol.gmt \
      -chip genes.anno.feature_metadata.chip -collapse true \
      -out . \
      --rpt_label Cortex_Striatum.Mouse_Human_Reactome_June_03_2023_symbol \
      -nperm 1000 -permute gene_set -scoring_scheme weighted -metric Signal2Noise -sort real -order descending -set_max 500 -set_min 15 -norm meandiv -rnd_type no_balance -make_sets true -median false -num 100 -plot_top_x 20 -rnd_seed timestamp -save_rnd_lists false -zip_report true
  
  # Un-timestamp the outputs for path consistency
  mv Cortex_Striatum.Mouse_Human_Reactome_June_03_2023_symbol.Gsea.*/* .
  timestamp=$(cat *.rpt | grep producer_timestamp | awk '{print $2}')
  
  for pattern in _${timestamp} .${timestamp}; do
      find . -name "*${pattern}*" | sed "s|^\./||" | while read -r f; do
          mv $f ${f//$pattern/}
      done
  done
  sed -i.bak "s/[_\.]$timestamp//g" *.rpt *.html && rm *.bak
  
  # Prefix files that currently lack it
  ls -p | grep -v / | grep -v "Cortex_Striatum.Mouse_Human_Reactome_June_03_2023_symbol." | while read -r f; do
      mv $f Cortex_Striatum.Mouse_Human_Reactome_June_03_2023_symbol.${f}
      sed -i.bak "s/$f/Cortex_Striatum.Mouse_Human_Reactome_June_03_2023_symbol.${f}/g" *.rpt *.html && rm *.bak
  done
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_DIFFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:GSEA_GSEA":
      gsea: 4.3.2
  END_VERSIONS

Command exit status:
  0

Command output:
  GeneSetCohorted_scored: 501 / 1000
  GeneSetCohorted: 501 / 1000
  GeneSetCohorted_scored: 501 / 1000
  GeneSetCohorted: 501 / 1000
  GeneSetCohorted_scored: 501 / 1000
  shuffleGeneSet for GeneSet 1126/1140 nperm: 1000
  GeneSetCohorted: 501 / 1000
  GeneSetCohorted_scored: 501 / 1000
  GeneSetCohorted: 501 / 1000
  GeneSetCohorted_scored: 501 / 1000
  GeneSetCohorted: 501 / 1000
  GeneSetCohorted_scored: 501 / 1000
  GeneSetCohorted: 501 / 1000
  GeneSetCohorted_scored: 501 / 1000
  GeneSetCohorted: 501 / 1000
  GeneSetCohorted_scored: 501 / 1000
  shuffleGeneSet for GeneSet 1131/1140 nperm: 1000
  GeneSetCohorted: 501 / 1000
  GeneSetCohorted_scored: 501 / 1000
  GeneSetCohorted: 501 / 1000
  GeneSetCohorted_scored: 501 / 1000
  GeneSetCohorted: 501 / 1000
  GeneSetCohorted_scored: 501 / 1000
  GeneSetCohorted: 501 / 1000
  GeneSetCohorted_scored: 501 / 1000
  GeneSetCohorted: 501 / 1000
  GeneSetCohorted_scored: 501 / 1000
  shuffleGeneSet for GeneSet 1136/1140 nperm: 1000
  GeneSetCohorted: 501 / 1000
  GeneSetCohorted_scored: 501 / 1000
  GeneSetCohorted: 501 / 1000
  GeneSetCohorted_scored: 501 / 1000
  GeneSetCohorted: 501 / 1000
  GeneSetCohorted_scored: 501 / 1000
  GeneSetCohorted: 501 / 1000
  GeneSetCohorted_scored: 501 / 1000
  GeneSetCohorted: 501 / 1000
  GeneSetCohorted_scored: 501 / 1000
  [1687940710522] [INFO] Finished permutations ... creating reports
  [1687940710523] [INFO] Extracting ds: Brain_CP_collapsed_to_symbols by template: Cortex_Striatum.cls#Striatum_versus_Cortex
  [1687940712436] [INFO] Creating marker selection reports ...
  [1687940714256] [SEVERE] No Probe called: Gpr101 on this chip (chip name is >genes.anno.feature_metadata.chip<)
  [1687940714258] [SEVERE] Turning off subsequent error notifications
  [1687940714871] [INFO] Creating FDR reports ...
  [1687940720004] [INFO] Done FDR reports for positive phenotype
  [1687940722399] [INFO] Done FDR reports for negative phenotype
  [1687940722768] [INFO] Creating global reports ...
  [1687940722773] [INFO] Done all reports!!
  [1687940724753] [INFO] Zipping: Cortex_Striatum.Mouse_Human_Reactome_June_03_2023_symbol.Gsea.1687940616338 to /tmp/Cortex_Striatum.Mouse_Human_Reactome_June_03_2023_symbol.Gsea.1687940616338.rpt12676184735624301463.zip
  Time taken: 109 secs

Command error:
  openjdk version "17.0.3-internal" 2022-04-19
  OpenJDK Runtime Environment (build 17.0.3-internal+0-adhoc..src)
  OpenJDK 64-Bit Server VM (build 17.0.3-internal+0-adhoc..src, mixed mode, sharing)
  WARNING: package com.apple.laf not in java.desktop
  WARNING: package com.sun.java.swing.plaf.windows not in java.desktop
  WARNING: package sun.awt.windows not in java.desktop


@paolo-kunderfranco paolo-kunderfranco added the bug Something isn't working label Jun 28, 2023
@paolo-kunderfranco paolo-kunderfranco changed the title GSEA_GSEA error GSEA_GSEA error:: Missing output file(s) *butterfly_plot.png? Jun 28, 2023
@paolo-kunderfranco
Copy link
Author

any suggestion please?

@WackerO
Copy link
Collaborator

WackerO commented Oct 5, 2023

@paolo-kunderfranco Sorry for the suuuper late response!

Do you by any chance have a small dataset that I could use to reproduce the error? Or at least the command you used to run the pipeline? From the output you posted I can't really tell what the problem is; I think the java stuff is just a couple of info messages and warns...

@rekren
Copy link

rekren commented Jan 26, 2024

I have also faced with same issue, and I think somehow it is related to https://nf-co.re/differentialabundance/1.4.0/parameters#gsea_permute parameter being set to "gene_set".

First I run the pipeline with the selection --gsea_permute phenotype (default) and it is completed succesfully.
When I changed the parameter to --gsea_permute gene_set, I have received the error message of "Missing output file(s) *butterfly_plot.png expected by process"

Jan-26 14:07:08.956 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[jobId: 3741175; id: 12; name: NFCORE_DIFFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:GSEA_GSEA (ADvsControl); status: COMPLETED; exit: 0; error: -; workDir: /work/user/rekren/06_collabAllergyVacLR/work/ef/1cc4458e72a0b0f434541d85fee9bc started: 1706274313963; exited: 2024-01-26T13:07:05.677783238Z; ]
Jan-26 14:07:09.014 [Task monitor] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
  task: name=NFCORE_DIFFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:GSEA_GSEA (ADvsControl); work-dir=/work/user/rekren/06_collabAllergyVacLR/work/ef/1cc4458e72a0b0f434541d85fee9bc
  error [nextflow.exception.MissingFileException]: Missing output file(s) `*butterfly_plot.png` expected by process `NFCORE_DIFFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:GSEA_GSEA (ADvsControl)`
Jan-26 14:07:09.035 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'NFCORE_DIFFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:GSEA_GSEA (ADvsControl)'

Caused by:
  Missing output file(s) `*butterfly_plot.png` expected by process `NFCORE_DIFFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:GSEA_GSEA (ADvsControl)`

Command executed:

  # Run GSEA
  
  gsea-cli GSEA \
      -res ADvsControl.gct \
      -cls ADvsControl.cls#AD_versus_Vehicle \
      -gmx m2.cp.v2023.2.Mm.symbols.gmt \
      -chip hsILsinsertedmm10.anno.feature_metadata.chip -collapse true \
      -out . \
      --rpt_label ADvsControl.m2.cp.v2023.2.Mm.symbols \
      -nperm 1000 -permute gene_set -scoring_scheme weighted -metric Signal2Noise -sort real -order descending -set_max 500 -set_min 15 -norm meandiv -rnd_type no_balance -make_sets true -median false -num 100 -plot_top_x 20 -rnd_seed timestamp -save_rnd_lists false -zip_report false
  
  # Un-timestamp the outputs for path consistency
  mv ADvsControl.m2.cp.v2023.2.Mm.symbols.Gsea.*/* .
  timestamp=$(cat *.rpt | grep producer_timestamp | awk '{print $2}')
  
  for pattern in _${timestamp} .${timestamp}; do
      find . -name "*${pattern}*" | sed "s|^\./||" | while read -r f; do
          mv $f ${f//$pattern/}
      done
  done
  sed -i.bak "s/[_\.]$timestamp//g" *.rpt *.html && rm *.bak
  
  # Prefix files that currently lack it
  ls -p | grep -v / | grep -v "ADvsControl.m2.cp.v2023.2.Mm.symbols." | while read -r f; do
      mv $f ADvsControl.m2.cp.v2023.2.Mm.symbols.${f}
      sed -i.bak "s/$f/ADvsControl.m2.cp.v2023.2.Mm.symbols.${f}/g" *.rpt *.html && rm *.bak
  done
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_DIFFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:GSEA_GSEA":
      gsea: 4.3.2
  END_VERSIONS

Command exit status:
  0

Command output:
  GeneSetCohorted: 501 / 1000
  GeneSetCohorted_scored: 501 / 1000
  GeneSetCohorted: 501 / 1000
  GeneSetCohorted_scored: 501 / 1000
  GeneSetCohorted: 501 / 1000
  GeneSetCohorted_scored: 501 / 1000
  GeneSetCohorted: 501 / 1000
  GeneSetCohorted_scored: 501 / 1000
  shuffleGeneSet for GeneSet 991/1005 nperm: 1000
  GeneSetCohorted: 501 / 1000
  GeneSetCohorted_scored: 501 / 1000
  GeneSetCohorted: 501 / 1000
  GeneSetCohorted_scored: 501 / 1000
  GeneSetCohorted: 501 / 1000
  GeneSetCohorted_scored: 501 / 1000
  GeneSetCohorted: 501 / 1000
  GeneSetCohorted_scored: 501 / 1000
  GeneSetCohorted: 501 / 1000
  GeneSetCohorted_scored: 501 / 1000
  shuffleGeneSet for GeneSet 996/1005 nperm: 1000
  GeneSetCohorted: 501 / 1000
  GeneSetCohorted_scored: 501 / 1000
  GeneSetCohorted: 501 / 1000
  GeneSetCohorted_scored: 501 / 1000
  GeneSetCohorted: 501 / 1000
  GeneSetCohorted_scored: 501 / 1000
  GeneSetCohorted: 501 / 1000
  GeneSetCohorted_scored: 501 / 1000
  GeneSetCohorted: 501 / 1000
  GeneSetCohorted_scored: 501 / 1000
  shuffleGeneSet for GeneSet 1001/1005 nperm: 1000
  GeneSetCohorted: 501 / 1000
  GeneSetCohorted_scored: 501 / 1000
  GeneSetCohorted: 501 / 1000
  GeneSetCohorted_scored: 501 / 1000
  GeneSetCohorted: 501 / 1000
  GeneSetCohorted_scored: 501 / 1000
  GeneSetCohorted: 501 / 1000
  GeneSetCohorted_scored: 501 / 1000
  GeneSetCohorted: 501 / 1000
  GeneSetCohorted_scored: 501 / 1000
  [1706274403597] [INFO] Finished permutations ... creating reports
  [1706274403598] [INFO] Extracting ds: ADvsControl_collapsed_to_symbols by template: ADvsControl.cls#AD_versus_Vehicle
  [1706274404543] [INFO] Creating marker selection reports ...
  [1706274407265] [INFO] Creating FDR reports ...
  [1706274411987] [INFO] Done FDR reports for positive phenotype
  [1706274414351] [INFO] Done FDR reports for negative phenotype
  [1706274414728] [INFO] Creating global reports ...
  [1706274414729] [INFO] Done all reports!!
  Time taken: 101 secs

Command error:
  openjdk version "17.0.3-internal" 2022-04-19
  OpenJDK Runtime Environment (build 17.0.3-internal+0-adhoc..src)
  OpenJDK 64-Bit Server VM (build 17.0.3-internal+0-adhoc..src, mixed mode, sharing)
  WARNING: package com.apple.laf not in java.desktop
  WARNING: package com.sun.java.swing.plaf.windows not in java.desktop
  WARNING: package sun.awt.windows not in java.desktop

Work dir:
  /work/user/rekren/06_collabAllergyVacLR/work/ef/1cc4458e72a0b0f434541d85fee9bc

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

@WackerO
Copy link
Collaborator

WackerO commented Feb 5, 2024

Many thanks @rekren! I could also reproduce this error now.
@pinin4fjords I think the butterfly output channel might need to be made optional; when I did this locally, the module executed perfectly fine, all other necessary output is created. I can open a PR about that. Do you by any chance know how this output file works? I was unable to find any GSEA documentation about when the plot is generated and when it isn't...

@pinin4fjords
Copy link
Member

Thanks for reproducing @WackerO . If it's just not produced with specific parameter combinations then let's just make the output optional- either of you feel free to PR the module in nf-core/modules and ping me :-)

@WackerO
Copy link
Collaborator

WackerO commented Feb 27, 2024

This should be solved with the GSEA update

@WackerO WackerO closed this as completed Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants