-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define a new format for the biobox.yaml input data #207
Comments
I have been thinking about our discussion on friday. I agree that it makes sense to have the arguments in a specific order. It makes sense when you have to use for example multiple fastas that are used by a tool in two different contexts. Regarding this I will update the profiling format to: version: 1.0.0
arguments:
- reads:
- type: fastq
value: /path/to/fastq
- databases:
- value: /path/to/ncbi_dump
type: bioboxes.org:/taxonomy_ncbi_dumps There is still one thing that I think is quite important: We could extend the bioboxes-py library/ interface like this: version: 1.0.0
arguments:
[...]
- cache:
- value /path/to/cache
type: cache or this: version: 1.0.0
arguments:
[...]
cache: /path/to/cache I prefer the second, since there will be just one cache directory anyway. |
I'll make the following three observations, which I see as problems for the
I raise this because I think the format of the |
Yes, I also think it is difficult to read, but since we can use the command line tool, I don't think it is a problem anymore.
Yes it is difficult for someone who is not familiar with creating bioboxes.
The cache keyword is in my opinion the only solution to tools that use their own custom database.
Everytime I ask a developer to build a biobox, I always provide an example first. |
Thanks for your feedback Peter. I suggest the following solutions to these Create a tool that takes a biobox signature string and generates the validation
I agree. When I referred to this as a symptom I was speaking to the point of
Where the
Or for lists:
So for example with
Or for lists such as
In the short-term, a tool to do this doesn't exist in the format we would
I agree, a tool to make extracting the inputs from the biobox yaml would I |
@michaelbarton I updated the profiling interface in PR #210 according to our discussion. Please merge if you agree. |
Thanks Peter. I've merged this. It might be worth discussing what we should do with the ID field and how useful this still is, I don't see any tools currently using it so far? |
I agree. I think we introduced the id field for the fragment size parameters. |
The existing version has shortcomings for more complex bioinformatics tasks
such as read profiling. I've created this issue to start the discussion for a
new version of the biobox.yaml format.
@pbelmann, before we discuss what the implementation of the new format could
look like, I think it would be good to first describe the current problems we
are having with the existing one. This will help us determine what problems the
new format should solve. Could you describe the following as a starting point?
What are the problems the existing format creates?
What does the existing format prevent the bioboxes/users from doing?
The text was updated successfully, but these errors were encountered: