Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AmpliconArchitect should use AmpliconSuite-pipeline as mode of entry #52

Closed
jluebeck opened this issue Apr 8, 2023 · 8 comments
Closed
Labels
enhancement New feature or request

Comments

@jluebeck
Copy link

jluebeck commented Apr 8, 2023

Description of feature

Thank you Daniel and the rest of the nf-circdna team for putting together this nextflow pipeline!

From my understanding of the nf-circdna workflow file, https://github.com/nf-core/circdna/blob/master/workflows/circdna.nf#L395, the following steps are taken

  1. Run CNVKit (mode with matched normal not currently supported?)
  2. Feed .cns file into collect_seeds.py, which is ported from the PrepareAA.py script.
  3. Run amplified_intervals.py on the output bed from step 2.
  4. Run AmpliconArchitect.py on the output bed from step 3.

However, one important function from PrepareAA is missing in this workflow. PrepareAA.py calls a function between steps 2 and 3 from above called cnv_prefilter (line 828 of PrepareAA). This applies an additional set of filters needed before using amplified_intervals.py. This isn't reflected in the current workflow and for best reproducibility of AmpliconSuite-pipeline on nextflow, it should be included either by adding it in a custom way, as other parts of PrepareAA.py have been added, or by just using the PrepareAA.py wrapper script to take the .cns file, convert and filter it (prefilter + amplified_intervals.py).

It may be simplest to instead of breaking up the PrepareAA.py code into pieces, place AmpliconSuite-pipeline (PrepareAA) in the workflow, give the CNVKit .cns file to PrepareAA.py - let it make the seeds and then you can give the AA_CNV_SEEDS.bed file to AmpliconArchitect as is currently done.

I am happy to provide additional guidance on this if needed.

Thanks again,
Jens

@jluebeck jluebeck added the enhancement New feature or request label Apr 8, 2023
@DSchreyer
Copy link
Collaborator

HI Jens,
yes I agree this should be added.
By including PrepareAA, also future updates can be easily incorporated. This should take much time,

You are correct, cnvkit currently only runs in tumour-only mode, but I will try to include normals as well in the next update. I am not sure how much work this will be, but I will keep this issue open until it is fixed.

Cheers,
Daniel

@jluebeck
Copy link
Author

Fantastic, that sounds great. Thank you Daniel! Please let me know if I can assist in any way.

Cheers,
Jens

@DSchreyer
Copy link
Collaborator

Hi Jens,
yes I might need assistance because I have trouble getting PrepareAA into the pipeline due to mosek, again. I believe using mosek9 is still not an option for either PrepareAA or AmpliconArchitect.

To run the pipeline with singularity or docker, a multi-package-container needs to be created containing all conda packages necessary to run PrepareAA. However, the only combination of package versions i was able to find which can use Mosek8 is based on python2. Python2 is now not supported anymore and the maintainers of the multi-package-container repo is concerned due to security issues coming from mosek and they are, so far, not willing to create the container.

For the previous versions a container exists, but the newer version requires intervaltree, so a new container is needed.

Questions:

  • When is mosek9 required? Only for AmpliconArchitect or also for PrepareAA. I am currently testing PrepareAA with mosek9 and it seems to work ?
  • Do you plan to implement mosek9, soon ?
  • Is it possible to split PrepareAA into multiple parts. One requiring mosek and the other requiring intervaltree ?

Let me know if you have answers for these. If mosek9 is fine to use, I am happy to update PrepareAA, but leave AmpliconArchitect with mosek8. At least until new modules will be added and a new container is needed O.O

Cheers,
Daniel

@DSchreyer
Copy link
Collaborator

To note: This is the current pull request containing the conversation with the multi-package-container maintainer:
BioContainers/multi-package-containers#2588

Many tries failed using python3 and mosek8. Python3 and mosek9 seems to work and it also works for PrepareAA. Is it possible to substitute mosek9 with mosek8 in PrepareAA? This might circumvent the security issue.

@jluebeck
Copy link
Author

Hi Daniel,

Thanks for working on this update. Yes, let us definitely resolve this.

  • Mosek9 is only needed inside AA.
  • We already added Mosek 9 + 10 support in 1.3.r3 (this is available on the jluebeck fork, but the PR is still pending on the virajbdesphande github). AA 1.3.r1 adds support for both python2 and python3. Which version of AA is currently used by nf-core?
  • For simplicity, I am hoping that we will not need to split it into multiple parts, but it is indeed possible to run PrepareAA.py without setting --run_AA, and it will stop at the point of generating the AA_CNV_SEEDS.bed input which is a required input file for AA. AA can then be called in a separate module. The important point I think is that all the data preparation steps be done by PrepareAA.py before running AA. Intervaltree is used in PrepareAA.py and AmpliconClassifier, but not AA.

We now have a singularity container available here for AmpliconSuite-pipeline (PrepareAA). https://github.com/jluebeck/AmpliconSuite-pipeline#option-b-singularity--docker-images. This singularity container uses python3 for everything.

You can do everything in the pipeline with python3, so it is certainly possible (and probably recommended) to drop support for python2 and mosek8.

Happy to take a look at any error logs you are receiving to troubleshoot this update.

Thanks and please let me know what other information I can provide.
Jens

@jluebeck
Copy link
Author

I think I didn't understand the BioContainers dev's question about the source of the Mosek package being a security issue. Here is the Conda page for the Mosek package: https://anaconda.org/MOSEK/mosek

@DSchreyer
Copy link
Collaborator

Hi Jens,
thanks for the quick reply! Currently the AA version is "1.3_r1" which I believe is the most updated on the deshpande github.

That's good news that mosek9 is already implemented in PrepareAA and will be implemented in AA soon. This might fix some issues, because it is currently on its own conda channel "mosek" instead of conda-forge or bioconda, which is usually used in the multi-package containers.

We will see if we can get the multi-package container now through with mosek9 and python3.

I keep you updated on this in this issue!

Best,
Daniel

@DSchreyer
Copy link
Collaborator

I will close this issue here as it is in progress and will be included in v.1.1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants