feat: Adding somatic neoepitope preparation workflow #495

giacuong171 · 2024-04-04T09:12:32Z

Adding a draft workflow for somatic neoepitope preparation for using pvactool later

github-actions · 2024-04-04T09:14:07Z

Please format your Python code with ruff: make fmt
Please check your Python code with ruff: make check
Please format your Snakemake code with snakefmt: make snakefmt

You can trigger all lints locally by running make lint

coveralls · 2024-04-04T22:31:54Z

coverage: 85.894% (+0.1%) from 85.778%
when pulling 03c9714 on 472-adding-neoepitope-prediction-pipeline
into fbb3de1 on main.

ericblanc20

This is very good, thanks a lot for your efforts. I think that there are still a few points to be fixed, but otherwise it is great.

snappy_pipeline/workflows/somatic_neoepitope_prediction/__init__.py

snappy_wrappers/wrappers/pvactools/combining/comb_rna.py

snappy_wrappers/wrappers/pvactools/combining/wrapper.py

tests/snappy_pipeline/workflows/test_workflows_somatic_neoepitope_prediction.py

ericblanc20

One last bit: ensure that the app can deal with columns selected by number rather than by name.
Perhaps it would be good to have the choice of 4 columns

The gene or transcript id (with or without version)
The TPMs
If there is no value for the TPM column, then the counts and
If there is no value for the TPM column, then the feature lengths.

If there if no TPM column and no feature length column, then the gtf is used to get the length of transcripts, and an approximate value for the gene lengths.

snappy_pipeline/workflows/somatic_neoepitope_prediction/__init__.py

ericblanc20 · 2024-05-31T16:09:55Z

snappy_pipeline/workflows/somatic_neoepitope_prediction/__init__.py

+                                        mapper="star",
+                                        library_name=rna_library,
+                                    )
+                                    ext = {"expression", "bam", "bai"}


What is the purpose of this statement?

The nesting is also pretty deep, perhaps it can be restructured (by breaking it into smaller chunks / moving common stuff into dedicated functions) a bit to make it a bit more palatable

ericblanc20 · 2024-05-31T16:13:47Z

snappy_pipeline/workflows/somatic_neoepitope_prediction/__init__.py

+            "{mapper}.{var_caller}.{anno_caller}.{tumor_library}"
+        )
+        # Need to change for work on many different tools
+        key_ext = {"vcf": ".full.vcf.gz", "vcf_tbi": ".full.vcf.gz.tbi"}


The choice between the full annotation .full.vcf.gz and the selected transcript only .vcf.gz may be left to the user.

ericblanc20 · 2024-05-31T16:16:27Z

snappy_pipeline/workflows/somatic_neoepitope_prediction/__init__.py

+    tools_somatic_variant_calling: [] # deafult to those configured for somatic_variant_calling
+    max_depth: "4000"
+    preparation:
+      format: 'star' # REQUIRED - The file format of the expression file to process. (stringtie,kallisto,cufflinks,custom)


As discussed before, snappy now outputs a 2-columns GeneCounts.tab file: the gene id in the first, and the counts in the second. Therefore, I suggest that the defaults should be:

format: "custom" id-column: 1 expression-column: 2

snappy_wrappers/wrappers/pvactools/combining/comb_rna.py

snappy_wrappers/wrappers/pvactools/combining/wrapper.py

tedil

Some change suggestions from a different perspective than Eric's ;)

snappy_pipeline/workflows/somatic_neoepitope_prediction/Snakefile

tedil · 2024-06-03T14:37:26Z

snappy_pipeline/workflows/somatic_neoepitope_prediction/__init__.py

+                                        mapper="star",
+                                        library_name=rna_library,
+                                    )
+                                    ext = {"expression", "bam", "bai"}


The nesting is also pretty deep, perhaps it can be restructured (by breaking it into smaller chunks / moving common stuff into dedicated functions) a bit to make it a bit more palatable

snappy_pipeline/workflows/somatic_neoepitope_prediction/__init__.py

snappy_wrappers/wrappers/pvactools/combining/comb_rna.py

snappy_wrappers/wrappers/pvactools/combining/environment.yaml

snappy_wrappers/wrappers/pvactools/combining/wrapper.py

Co-authored-by: Till Hartmann <[email protected]>

…__.py Co-authored-by: Till Hartmann <[email protected]>

…m:bihealth/snappy-pipeline into 472-adding-neoepitope-prediction-pipeline

ericblanc20

Many comments here are general musings about how to improve snappy in general. We should discuss them at some point, and involve Till as much as possible.

ericblanc20 · 2024-07-22T10:40:51Z

snappy_pipeline/workflows/somatic_neoepitope_prediction/__init__.py

+        )
+        yield "combine_vcf", prepare_tpl
+        hla_typing = self.parent.sub_workflows["hla_typing"]
+        hla_tpl = "output/optitype.{ngs_library}/out/optitype.{ngs_library}.txt"


Ideally, the HLA caller tools shouldn't be hard-coded.
However, to keep it general, we should consider the possibility that some HLA caller tools require mapped data.
So there is a possible file naming issue there, and the code needs to do something like the pseudo-code below:

hla_caller_prefix = "" # Test if the selected HLA caller gets its input from the ngs_mapping step if "path_ngs_mapping" in self.w_config.step_config.hla_typing[self.config.hla_caller].keys(): hla_caller_prefix += f"{self.config.mapper}." hla_caller_prefix += f"{self.config.hla_caller}"

This is ugly, likely to fail (I believe that path_ngs_mapping is defined regardless of its need in the hla_typing step model) and I don't like it.
Ideally, I think that each step should provide a facility to get the naming prefix, or something like that.

But perhaps there are better ideas, or perhaps we have to live with ugly solutions.
Ideas, comments?

ericblanc20 · 2024-07-22T11:01:07Z

snappy_pipeline/workflows/hla_typing/Snakefile

I don't understand why the rules for tools must be put under an if statement.

I have discussed with Till about this bug. The rule hla_typing_arcashla_run gets evaluated, even that arcashla is not defined. So this is a quick fix for this bug.

ericblanc20 · 2024-07-22T12:43:17Z

snappy_pipeline/workflows/somatic_neoepitope_prediction/__init__.py

+                            key,
+                            ngs_mapping(
+                                (rna_tpl + ext).format(
+                                    mapper=self.w_config.step_config["ngs_mapping"].tools.rna[0],


In case there are many tools to quantify expression, I think it's better to give the choice of mapper to the user, rather than take the first one.
Ideally, I would have a rna_mapper entry in the configuration, so the user can choose the one she likes. If it's empty, then snappy takes the first rna mapper defined the the ngs_mapping configuration step.

ericblanc20 · 2024-07-22T12:55:20Z

snappy_pipeline/workflows/somatic_neoepitope_prediction/__init__.py

+    def _get_output_files_prepare(self):
+        if self.config["preparation"]["mode"] == "gene":
+            prefix = (
+                "work/prepare/{mapper}.{var_caller}.{anno_caller}.GX.{tumor_library}/out/"


I think it would be better to store the prepared vcf in work/{mapper}.{var_caller}.{anno_caller}.(GX|TX).{tumor_library}/prepare.
It is more like the rest of the pipeline, where the rule is on subdirectory per sample in work.
The only exceptions are input_links and docker containers, R packages, & any files used during the creation of every output.
(Actually, when I think of it, there should not be a input_links directory, it would be better to have work/xxx.{library}/input_links, what do you think)

ericblanc20 · 2024-07-22T13:04:00Z

snappy_pipeline/workflows/somatic_neoepitope_prediction/__init__.py

+    @dictify
+    def _get_output_files_predict(self):
+        key_ext = {"all_epitopes": ".all_epitopes.tsv", "filtered_epitopes": ".filtered.tsv"}
+        prefix = "work/predict/{mapper}.{var_caller}.{anno_caller}.{mode}.{tumor_library}.epitopes/out/MHC_Class_I/{tumor_library}"


Same here, work/{mapper}.{caller}.{anno_caller}.pvacseq.{tumor_library}/out/{mapper}.{caller}.{anno_caller}.pvacseq.{tumor_library}.{mode}.MHC_Class_I might be a better prefix.
Another comment: I am not too keen on including all tools in directories & filenames. If we want to follow the rule strictly, we should add the HLA typing tool, and the RNA mapper. We might also add adapter trimming, for example.
I think it is unmanageable to add all the tools, and we might need to think about alternative naming schemes.
Ideas, comments?

ericblanc20 · 2024-07-22T16:13:12Z

snappy_wrappers/wrappers/pvactools/pvacseq/wrapper.py

+    else ""
+)
+
+op_dir = "/".join(snakemake.output.all_epitopes.split("/")[:-2])


Please replace "/".join with os.path.join

ericblanc20 · 2024-07-22T16:16:04Z

snappy_wrappers/wrappers/pvactools/pvacseq/wrapper.py

+
+op_dir = "/".join(snakemake.output.all_epitopes.split("/")[:-2])
+
+files_to_bind = {


I am not sure that currently you need to bind the any directory. My understanding is that the pvacseq command requires only files within the step directory, both for input and output.

ericblanc20 · 2024-07-22T16:21:55Z

snappy_wrappers/wrappers/pvactools/pvacseq/wrapper.py

+__author__ = "Pham Gia Cuong"
+__email__ = "[email protected]"
+
+step = snakemake.config["pipeline_step"]["name"]


step & config should not be needed anymore, is that correct?

ericblanc20 · 2024-07-22T16:23:43Z

snappy_wrappers/wrappers/pvactools/pvacseq/wrapper.py

+    {maximum_transcript_support_level}"
+echo 'TMPDIR=/bindings/d2' > $TMPDIR/{snakemake.wildcards.tumor_library}.sh
+echo $cmd >> $TMPDIR/{snakemake.wildcards.tumor_library}.sh
+apptainer exec --home $PWD -B $TMPDIR:/bindings/d2 {bindings} {config[path_container]} bash /bindings/d2/{snakemake.wildcards.tumor_library}.sh


The path to the container should be passed to the wrapper as a parameter.

ericblanc20 · 2024-07-22T16:24:55Z

snappy_wrappers/wrappers/pvactools/pvacseq/wrapper.py

+    {NORMAL_VAF}\
+    {exclude_NAs}\
+    {maximum_transcript_support_level}"
+echo 'TMPDIR=/bindings/d2' > $TMPDIR/{snakemake.wildcards.tumor_library}.sh


Why the command is written to a temporary script? Isn't it possible to pass it as a string?

Adding somatic neoepitope preparation workflow

72717b8

giacuong171 linked an issue Apr 4, 2024 that may be closed by this pull request

Adding neoepitope prediction pipeline #472

Open

giacuong171 changed the title ~~Adding somatic neoepitope preparation workflow~~ feat: Adding somatic neoepitope preparation workflow Apr 4, 2024

giacuong171 added 4 commits April 4, 2024 13:20

Satisfying code format

4eab17a

Satisfying black

4e08a50

Satisfying snakefmt

0c35c66

make isort comfort

b4b36ba

giacuong171 self-assigned this Apr 10, 2024

giacuong171 requested a review from ericblanc20 April 10, 2024 16:38

ericblanc20 requested changes Apr 16, 2024

View reviewed changes

giacuong171 added 2 commits May 22, 2024 13:08

update neoepitope prediction

f1af42c

Make black satisfy

2fcf1bf

giacuong171 requested a review from ericblanc20 May 22, 2024 11:33

ericblanc20 requested changes May 31, 2024

View reviewed changes

tedil requested changes Jun 3, 2024

View reviewed changes

giacuong171 and others added 6 commits June 4, 2024 15:18

Update snappy_pipeline/workflows/somatic_neoepitope_prediction/Snakefile

cfa36ac

Co-authored-by: Till Hartmann <[email protected]>

Update snappy_pipeline/workflows/somatic_neoepitope_prediction/__init…

8d9155d

…__.py Co-authored-by: Till Hartmann <[email protected]>

Merge branch 'main' into 472-adding-neoepitope-prediction-pipeline

9776587

Update preparation for somatic neoepitope prediction

40cb936

Merge branch '472-adding-neoepitope-prediction-pipeline' of github.co…

e645608

…m:bihealth/snappy-pipeline into 472-adding-neoepitope-prediction-pipeline

Update test

23101e8

giacuong171 requested review from ericblanc20 and tedil June 18, 2024 20:23

sellth force-pushed the main branch 3 times, most recently from 9664352 to bf39678 Compare June 28, 2024 16:18

giacuong171 added 2 commits July 1, 2024 14:39

Adding plugins option to VEP

0920803

Merge branch 'main' into 472-adding-neoepitope-prediction-pipeline

9d8a660

giacuong171 added 12 commits July 3, 2024 16:02

Adding plugin options for vep

0d95ea3

Reformat neoepitope prediction for new snappy version

1d72562

Satisfies lint

6d9257c

Adding test for somatic neoepitope prediction preparation substep

999c6ea

Preparation for pvactool

a140e28

Fix HLA_typing pipeline

03bd9e4

Adding test for neoepitope prediction

d43a11b

Adding pvacseq pipeline

73e08c6

Merge branch 'main' into 472-adding-neoepitope-prediction-pipeline

ceb3ae6

make lint happy

9670292

Make linting satisfying

76e0c97

Reformat hla_typing snakefile

03c9714

ericblanc20 reviewed Jul 22, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Adding somatic neoepitope preparation workflow #495

feat: Adding somatic neoepitope preparation workflow #495

giacuong171 commented Apr 4, 2024

github-actions bot commented Apr 4, 2024 •

edited

Loading

coveralls commented Apr 4, 2024 •

edited

Loading

ericblanc20 left a comment

ericblanc20 left a comment

ericblanc20 May 31, 2024

tedil Jun 3, 2024

ericblanc20 May 31, 2024

ericblanc20 May 31, 2024

tedil left a comment

tedil Jun 3, 2024

ericblanc20 left a comment

ericblanc20 Jul 22, 2024

ericblanc20 Jul 22, 2024

giacuong171 Jul 23, 2024

ericblanc20 Jul 22, 2024

ericblanc20 Jul 22, 2024

ericblanc20 Jul 22, 2024

ericblanc20 Jul 22, 2024

ericblanc20 Jul 22, 2024

ericblanc20 Jul 22, 2024

ericblanc20 Jul 22, 2024

ericblanc20 Jul 22, 2024


		op_dir = "/".join(snakemake.output.all_epitopes.split("/")[:-2])

		files_to_bind = {

feat: Adding somatic neoepitope preparation workflow #495

Are you sure you want to change the base?

feat: Adding somatic neoepitope preparation workflow #495

Conversation

giacuong171 commented Apr 4, 2024

github-actions bot commented Apr 4, 2024 • edited Loading

coveralls commented Apr 4, 2024 • edited Loading

ericblanc20 left a comment

Choose a reason for hiding this comment

ericblanc20 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tedil left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ericblanc20 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Apr 4, 2024 •

edited

Loading

coveralls commented Apr 4, 2024 •

edited

Loading