Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in rule aggregate_absplice_scores #128

Open
Antonio-Nappi opened this issue Aug 23, 2024 · 4 comments
Open

Error in rule aggregate_absplice_scores #128

Antonio-Nappi opened this issue Aug 23, 2024 · 4 comments

Comments

@Antonio-Nappi
Copy link

Hi all, thanks for the the amazing pipeline to annotate the variants. However I am having some difficulties in run it, although I am testing it with the data you provide. After some manual bug-fixing (I can post here the errors and how I fixed them) I have an error in the following rule:
aggregate_absplice_scores
and here there is the traceback of the error
Traceback (most recent call last): File "/home/aih/antonio.nappi/miniconda3/envs/mymamba/envs/deeprvat_annotations/bin/deeprvat_annotations", line 33, in <module> sys.exit(load_entry_point('deeprvat', 'console_scripts', 'deeprvat_annotations')()) File "/home/aih/antonio.nappi/miniconda3/envs/mymamba/envs/deeprvat_annotations/lib/python3.9/site-packages/click/core.py", line 1128, in __call__ return self.main(*args, **kwargs) File "/home/aih/antonio.nappi/miniconda3/envs/mymamba/envs/deeprvat_annotations/lib/python3.9/site-packages/click/core.py", line 1053, in main rv = self.invoke(ctx) File "/home/aih/antonio.nappi/miniconda3/envs/mymamba/envs/deeprvat_annotations/lib/python3.9/site-packages/click/core.py", line 1659, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/aih/antonio.nappi/miniconda3/envs/mymamba/envs/deeprvat_annotations/lib/python3.9/site-packages/click/core.py", line 1395, in invoke return ctx.invoke(self.callback, **ctx.params) File "/home/aih/antonio.nappi/miniconda3/envs/mymamba/envs/deeprvat_annotations/lib/python3.9/site-packages/click/core.py", line 754, in invoke return __callback(*args, **kwargs) File "/ictstr01/groups/casale/code/users/antonio.nappi/deeprvat_main/deeprvat/annotations/annotations.py", line 1077, in aggregate_abscores ca_shortened = current_annotations[["id", "Gene", "chrom", "pos", "ref", "alt"]] File "/home/aih/antonio.nappi/miniconda3/envs/mymamba/envs/deeprvat_annotations/lib/python3.9/site-packages/pandas/core/frame.py", line 3813, in __getitem__ indexer = self.columns._get_indexer_strict(key, "columns")[1] File "/home/aih/antonio.nappi/miniconda3/envs/mymamba/envs/deeprvat_annotations/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 6070, in _get_indexer_strict self._raise_if_missing(keyarr, indexer, axis_name) File "/home/aih/antonio.nappi/miniconda3/envs/mymamba/envs/deeprvat_annotations/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 6133, in _raise_if_missing raise KeyError(f"{not_found} not in index") KeyError: "['Gene'] not in index" Any idea on how to fix it?

@Marcel-Mueck
Copy link
Collaborator

Hello Antonio,
Sorry for the late answer.
Thank you for the input, if you ran into issues you could fix yourself it would still be interesting to know, so that we can increase the usability of the pipeline.
About the error you got in the aggregate_absplice_scores:
The 'Gene' column is created from VEP, and renamed only in rule add_gene_ids.
So it is odd that it cannot find the column name in rule aggregate_absplice_scores, a rule that is called before add_gene_ids.
Especially since the rule called before aggregate_absplice_scores, i.e. merge_deepsea_pcas is using this column (but not renaming it). Can you check which columns are in the annotations.parquet dataframe? (you can do this for example with parquet-tools, parquet-tools inspect annotations.parquet?
That would help narrow things down.

Thank you and regards,

Marcel Mück

@Antonio-Nappi
Copy link
Author

Hi Marcel,
so far in the pipeline I found this "bugs" (after the arrow how I fixed it)

  1. tensorflow.python.framework.errors_impl.NotFoundError: /usr/local/lib/python3.6/dist-packages/tensorflow/core/kernels/libtfkernel_sobol_op.so: undefined symbol: XXXXXXXXXX -> remove the libtfkernel_sobol_op.so file (this is not a deeprvat error by itself rather than config of cluster/old versions of tensorflow but I reported, maybe could be interesting)

  2. snakemake is not loaded as a dic in deeprvat_main/pipelines/resources/mmsplice_splicemap.py,
    deeprvat_main/pipelines/resources/spliceai.py and deeprvat_main/pipelines/resources/absplice_dna.py -> I commented out the import and it worked

  3. The default number of pca is too big (if I want to run the pipeline on the example data)

  4. Somehow in the calculate_scores_max(scores) the scores with the provided data are None so I added the following if statement (before the else)
    if np.isnan(scores): return np.NaN

  5. sometimes there is a timeout error in downloading files that brings to a gzip.BadGzipFile: Not a gzipped file (b'<h') -> I remove the wrong file and it's fine

@Antonio-Nappi
Copy link
Author

regarding my specific problem I will post some other updates later

@Marcel-Mueck
Copy link
Collaborator

Alright, thank you for the updates already

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants