Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include docstrings, maybe a schema? #10

Open
tlambert03 opened this issue Jan 12, 2021 · 10 comments
Open

Include docstrings, maybe a schema? #10

tlambert03 opened this issue Jan 12, 2021 · 10 comments

Comments

@tlambert03
Copy link

tlambert03 commented Jan 12, 2021

Love that there is now a central kernels repo. I think it would be great to put as much of the universal information here as possible.

I know it's a lot of work... but it would be great if the kernels here could be documented with a good (parseable) format ... I guess doxygen is the defacto C standard? (not sure there). And then that info wouldn't need to be duplicated (for instance, pyclesperanto.absolute and clij2 Absolute). Adding docstrings here could be a first step.

a related idea (maybe an alternative?) would be to have a very easily parseable schema that that does not require doxygen to parse. The downside there is that we'd need to make sure the kernels were in sync with the schema (we could write tests for that ... but it's possible that the logic for those tests would be as complicated as simply using doxygen to parse in the first place). An example schema:

version: 2.0.0.10
operations:
  - name: absolute
    description: Computes the absolute value of every individual pixel x in a given image.
    details: <pre>f(x) = |x| </pre>
    categories: [filter, in_assistant]
    dimension_support: [2, 3]
    tier: 1
    parameters:
      - name: source
        type: IMAGE_src_TYPE
        description: The input image to be processed.
      - name: destination
        type: IMAGE_dst_TYPE
        description: The output image where results are written into.
    references: ...
  - name: add_image_and_scalar
    ...

(this sort of thing would also make it much easier to create templates that let you convert scripts between languages very easily... as well as facilitate autogeneration of most of the .py files in pyclesperanto)

@tlambert03
Copy link
Author

@haesleinhuepf ... curious how these thoughts strike you? Playing around with plugin autogeneration and general code-deduplication in pyclesperanto is definitely something I'd like to work on, but the solutions there will depend very much on the structure in this submodule.

@StRigaud
Copy link
Member

I guess doxygen is the defacto C standard?

Indeed, that would be the first choice on my side.

@haesleinhuepf
Copy link
Member

Hey @tlambert03 , your suggestion is amazing! We could copy-paste the information from the current pyclesperanto docstrings, curate it here and in the future, we auto-generate the doc-strings (in Python, Java, C++, HTML/website) from this repository.

Side-topic: How about the "kernels" which have no opencl-code associated directly. Do you think we could maintain their documentation here as well or should their documentation live somewhere else? I'm talking about filters like top-hat, for example.

@tlambert03
Copy link
Author

tlambert03 commented Jan 28, 2021

Hey @tlambert03 , your suggestion is amazing! We could copy-paste the information from the current pyclesperanto docstrings, curate it here and in the future, we auto-generate the doc-strings (in Python, Java, C++, HTML/website) from this repository.

that's the dream! You can tell me how easy/hard that will be for Java or C++ ... but for python and HTML, it should be no sweat. And then we all know we're in sync :)

Side-topic: How about the "kernels" which have no opencl-code associated directly. Do you think we could maintain their documentation here as well or should their documentation live somewhere else? I'm talking about filters like top-hat, for example.

good question. I'd hope we can find a place for them here too. It'd be pretty trivial to do that with the schema idea:

version: 2.0.0.10
operations:
  - name: top-hat
    description: Applies a top-hat filter for background subtraction to the input image.
    tier: 2
    parameters:
      - name: input
        ...
    comprises:
       - minimum_box
       - maximum_box
       - add_images_weighted

and with doxygen, there's probably some sort of stub concept we can use to the same effect. I'm not that familiar with doxygen though. @StRigaud do you have thoughts here? Does anyone have strong feelings for or against an independent schema vs strictly putting it in the cl code itself?

@haesleinhuepf
Copy link
Member

but for python and HTML, it should be no sweat.

I think it's also pretty straightforward in Java. I was thinking of translating the current code-generation machinery from Java to Python anyway. Just another code translation project ... this clEsperanto project turns out to be translating a lot of code. I hope it will spare at least that amount of code translation afterwards.🤣

Joking aside: This massive refactoring gives me the chance to clean up some messy parts. Thanks for being with me. And thanks for your patience.

Minor comment: Can we leave out the version from the scheme you suggested? I think git tags / versions can do the job better.

@tlambert03
Copy link
Author

tlambert03 commented Jan 29, 2021

Can we leave out the version from the scheme you suggested? I think git tags / versions can do the job better.

In the sense that we don't need to hard-code it sure. We can pull the actual value from version control, but I do think the actual "schema instance" (i.e. the yml or json file "product" that other repos/users use) should have a version string in it.

If this concept of a schema outside of the CL code itself (but in this repo) is at least moderately attractive, I can start putting together a little more concrete example... which we can then populate more fully once we like the pattern. Sound good? We basically need two files: one to define the schema itself (i would probably use JSON Schema unless you have any objections... it has good support and validators for all of the languages at play here), and then we need the actual schema "instance" like the toy examples I put above. That too could be in JSON, but yaml obviously has very attractive human-readability.

Ultimately, if we regret putting this info outside of the CL code itself, I don't think it will be that hard to covert (and at that point all of the info will be nicely in one place anyway).

I was thinking of translating the current code-generation machinery from Java to Python anyway

I had some thoughts on that too, if you have time, I'd be curious to hear what your strategy is before you write it all out?

@StRigaud
Copy link
Member

It looks good for me. 👍

This will also help us define the filter name and parameters across all the language (Python, C++, Fiji, Java...).

@haesleinhuepf
Copy link
Member

I'm just adding an idea here: How about aiming compatibility with the common workflow language?
https://www.commonwl.org/

@StRigaud
Copy link
Member

StRigaud commented Feb 4, 2021

Got two questions from #12 :

  • What's the output: true at the end of your example in the PR?
  • Is this format allows newlines in the same field?

For the last, I assume yes but prefer to ask. Mainly for the description field.

@tlambert03
Copy link
Author

What's the output: true at the end of your example in the PR?

that was a super-preliminary way to indicate that this "input" parameter also serves as a kernel output. was going to raise that as a thing to discuss... Happy to represent that concept differently!

Is this format allows newlines in the same field?

yep, the easiest way to indicate that in YAML is a pipe character:

operation:
  description: |
    any long string
    with line breaks, etc...

Though, there are alternate syntaxes that can also be used. see: https://yaml-multiline.info/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants