Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Referencing python package files as paths in PEP #350

Open
afrendeiro opened this issue May 27, 2020 · 5 comments
Open

Referencing python package files as paths in PEP #350

afrendeiro opened this issue May 27, 2020 · 5 comments

Comments

@afrendeiro
Copy link
Contributor

I propose a new type of string substitution in a PEP that is capable of referencing files existing inside a Python package.

Take for example a PEP produced with geofetch. This is a package that has both a pipeline and interfaces for it. But (correct me if I'm wrong) in order for looper to use the amendment adding the interface, the user would need to additionally, clone the repository and either change the PEP manually to point to their newly cloned interface file, or set up the CODE environment variable:

...
project_modifiers:
  amend:
    sra_convert:
      looper:
        results_subdir: sra_convert_results
      sample_modifiers:
        append:
          SRR_files: SRA
          pipeline_interfaces: ${CODE}/geofetch/pipeline_interface_convert.yaml
...

The thing is, if the interface file is distributed with the pipeline, it would be great to do something like this to reference it (nomenclature/syntax to be discussed):

...
          pipeline_interfaces: {pypi:geofetch/pipeline_interface_convert.yaml}
...

The only thing the geofetch package has to change is add pipeline_interfec_convert.yaml to the Manifest.in.
Now pipeline packages are pip installable and the user doesn't have to either change the PEP or point

Internally, this seems straightforward to implement too:

import pkg_resources
pkg_resources.resource_filename("geofetch", 'pipeline_interface_convert.yaml')
@stolarczyk
Copy link
Member

I've experienced this inconvenience as well.. That's an interesting concept, but I have 2 questions:

  1. It seems like geofetch is one of a kind as pipelines go, and we do not distribute pipelines as software packages. These are kind of ever changing codebases. Are there any other scenarios that would benefit from this feature?

  2. PEP is a generic concept, currently implemented in Python and R, but possibly in more languages. It would not be trivial to implement a PyPi interface in every PEP implementation. Likewise, peppy would need to implement other ones too, e.g. to R package data? So we could not introduce this as a generic PEP feature, but rather an implementation-specific one? Is that desirable? Maybe looper would be a better place to implement a feature that would solve this kind of problems?

@nsheff
Copy link
Contributor

nsheff commented May 27, 2020

You are right about geofetch and this is a good idea. I have also been thinking about something similar. I would like to make it so you could install geofetch, and you wouldn't also have to clone it to get the interface, as you point out.

As a side point -- pipeline interfaces is a looper key, so I think this would be handled by looper, not peppy. But anyway, the point still stands.

I think the python-focused seems reasonable, but we can do better and make it more generic, such that the PEP is not tightly coupled to python. For looper sections it's less relevant since looper is in python, but for generic PEPs there are other implementations.

Anyway, the alternative idea I had was that the pipeline_interfaces key could be a command template. Maybe it would be a new key, like pipeline_interfaces_command, which would be a shell command that returns a pipeline interface. Then, I'd write a new geofetch command, something like geofetch iface, which would print the interface to stdout. Now, I'd put in my PEP:

sample_modifiers:
  append:
    pipeline_interface_command: geofetch iface

This is a more universal solution because I can use any command -- so it doesn't restrict us to python.

Edit: Just saw @stolarczyk's comment. We are on the same lines. You are exactly right -- this would be a looper thing (at least as suggested). The generic solution I propose, though, is universal enough to be integrated into peppy. Still, I'd probably put it in looper.

@nsheff
Copy link
Contributor

nsheff commented May 27, 2020

This would actually use almost the same functionality we already wrote for dynamic_variables_command_template

@afrendeiro
Copy link
Contributor Author

It seems like geofetch is one of a kind as pipelines go, and we do not distribute pipelines as software packages. These are kind of ever changing codebases. Are there any other scenarios that would benefit from this feature?

@stolarczyk Maybe not currently, but initially the epigen/open_pipelines repo was pip installable, only later we switched in order for them to be cloned by the user.

Referencing language-specific packages and in the language-agnostic PEP is indeed not ideal,.

One option would be to let the PEP parser tool handle that.
E.g. if looper finds a reference to an R package pipeline_interfaces: {bioc:XYZ/pipeline_interface.yaml} it warns looper cannot use bioc:XYZ interface because it's an R package. Hmm.. probably still not great.

A pipeline shell command that returns an interface sounds interesting though.

@nsheff
Copy link
Contributor

nsheff commented May 27, 2020

A pipeline shell command that returns an interface sounds interesting though.

I don't see any disadvantages. it's exactly what you're suggesting, but more universal -- now it's up to the developer to produce a command that can return what you want -- but anything can do this. so it's much easier to code on the looper side, and much more flexible, as it can accommodate anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants