Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add metadata field for custom tests #194

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

sstrong99
Copy link

Checklist

  • Pull request details were added to HISTORY.rst

@sstrong99
Copy link
Author

@rhpvorderman see the updated documentation i added for the justification of this feature. let me know if you think it makes sense to include

@sstrong99
Copy link
Author

also, depending on what you think about this, i'm also curious what you think about adding a similar "metadata" field to each item that generates a test, i.e. files, stdout, stderr

@rhpvorderman
Copy link
Member

What is the use case here? I see it declared that "it can be used in custom tests" yet I see no examples, nor do I see test cases. Without a proper use case I cannot make a proper evaluation.

@sstrong99
Copy link
Author

Most generally it is kind of a compromise solution to this issue: #184

That is, it allows setting the parameters for custom tests (e.g. the location of a file to check against) in the test yaml, so that the test is more fully specified in a single place. This makes the tests easier to read and maintain.

Another use case that is relevant for us is the ability to automatically link test runs to an external requirements software. If we can encode the information about which test links to which requirement in the test.yml itself, it is easier to read and maintain, as opposed to having that information stored somewhere separate from where the test is specified.

It is also helps with use cases like this: #182. In which you could pass in a piece of metadata that indicates that no extra files should be found in a certain directory.

In my opinion it makes the custom test interface significantly cleaner, while adding very little if any risk to the core functionality

@rhpvorderman
Copy link
Member

On further thought I do like the hook proposal better. The custom interface is clunky, and adding metadata to the YAML is not going to fix that.

@sstrong99
Copy link
Author

I agree with that. But your proposal in #184 doesn't solve the use case i list regarding linking tests to requirements. I think that the proposal here to slightly open up the schema allows for a lot of flexibility with pretty much no risk.

@rhpvorderman
Copy link
Member

I think that the proposal here to slightly open up the schema allows for a lot of flexibility with pretty much no risk.

"Pretty much no risk" is never true for a test framework. How many errors do you allow in your test framework? Say you are running tests, they fail. It turns out it is not your tests, it is a bug in the framework. Do you ditch the framework? It happens a second time, do you ditch the framework?

Adding features in a test framework is therefore always a risk. Every code change should have tests backing it up. Every feature should have all its use cases tested. When bugs occur, regression tests should be added immediately. New features should be considered very carefully.

This is not to shoot down your feature, I merely add this so you know the angle from which I look at new functionality. There is no "procedure" for adding features to pytest-workflow but there are things to look at.

  1. Is this feature a quality of life improvement for many users?
  2. Can this feature by easily attained by using functionality that is not in pytest-workflow's scope? (Environment variables, execution environment etc.)
  3. Can this feature already by attained by leveraging pytest-workflow's existing features?
  4. Is the feature's technical cost worth the benefits.

I think 4 in this case the technical cost is very low so in that respect I agree with you.

That is, it allows setting the parameters for custom tests (e.g. the location of a file to check against) in the test yaml, so that the test is more fully specified in a single place. This makes the tests easier to read and maintain.

The example you give here is already covered by the path keyword as that already states the location of a file.

Another use case that is relevant for us is the ability to automatically link test runs to an external requirements software. If we can encode the information about which test links to which requirement in the test.yml itself, it is easier to read and maintain, as opposed to having that information stored somewhere separate from where the test is specified.

Have you tried using the tags keyword for this? It seems appropriate.

It is also helps with use cases like this: #182. In which you could pass in a piece of metadata that indicates that no extra files should be found in a certain directory.

Again adding a "no_extra_files" tag to the workflow should achieve the same result. The "free-form" part of the meta tag is not fully motivated, and if it is not free-form, tags is already present. Heck, you could even do raw json strings in the tags field.

In my opinion it makes the custom test interface significantly cleaner, ...

Allowing a free-form field would make the YAMLs significantly less cleaner. Which is a disadvantage to weigh here too. Also I do not see how it would make the custom test interface cleaner as there are no examples provided.

    metadata:                          # A list of metadata to associate with the test and can be used to, for example,
      - "some metadata"                # parameterize custom workflows. Each metadata entry can be a simple string
      - structured_metadata:           # or a complex structured object
          key: value

It is unclear to me what is meant by "parametrizing custom workflows" here . Pytest-workflow does not allow workflow parametrization by design. All test cases should be explicitly written out. This is a little tedious, I agree, but pytest is a difficult beast to tame and the cost of building in parametrization is not worth the benefit (ctrl-c and ctrl-v make duplicating and modifying tests a single-digit-minute job).

@sstrong99
Copy link
Author

sstrong99 commented Apr 15, 2024

Thanks for taking the time to explain your perspective. Here are my responses to your questions:

That is, it allows setting the parameters for custom tests (e.g. the location of a file to check against) in the test yaml, so that the test is more fully specified in a single place. This makes the tests easier to read and maintain.

The example you give here is already covered by the path keyword as that already states the location of a file.

Sorry for being unclear. I meant the location of the "expected" file in the case where you want to do some custom comparison that is not already supported by pytest-workflow (md5sum, contains, must_not_contain). For example if you only want to compare a certain part of a file, or if you want to give more verbose output when a mismatch is found.

Another use case that is relevant for us is the ability to automatically link test runs to an external requirements software. If we can encode the information about which test links to which requirement in the test.yml itself, it is easier to read and maintain, as opposed to having that information stored somewhere separate from where the test is specified.

Have you tried using the tags keyword for this? It seems appropriate.

In this case it is common to need more structure than a simple string tag to convey the information. For example you might need to convey the test name, a list of requirements satisfied, a longform test description, a test group that the test is a part of, etc. It would be possible to encode this all in a single string but that would not be very readable.

It is also helps with use cases like this: #182. In which you could pass in a piece of metadata that indicates that no extra files should be found in a certain directory.

Again adding a "no_extra_files" tag to the workflow should achieve the same result. The "free-form" part of the meta tag is not fully motivated, and if it is not free-form, tags is already present. Heck, you could even do raw json strings in the tags field.

In the no_extra_tags case, you also need to specify a directory, so that's a very simple example requiring some structure. In that simple case i agree it could easily be parsed out of a string tag, but that quickly becomes hard to read (e.g. in the example above).

In my opinion it makes the custom test interface significantly cleaner, ...

Allowing a free-form field would make the YAMLs significantly less cleaner. Which is a disadvantage to weigh here too.

It only makes the YAML less clean to the extent the user wants. I would imagine that many use cases would not make use of this and therefore not make the YAML less clean. And I could imagine that in some use cases, users would consider the tradeoff and decide to go with another solution, because a solution using the metadata field would make the YAML too hard to read. This feature simply gives the user the choice.

Also I do not see how it would make the custom test interface cleaner as there are no examples provided.

Here is a concrete example. Note that this involves the version of this feature where you can add the metadata field to "path" objects as well as the overall test objects per my previous comment:

- name: some_test
  command: ...
  files:
    - path: "file1.txt"
      metadata: 
        file_to_compare: "expected_file1.txt"  # see the discussion above about expected files
        expected_lines: 5

# also thought i'd thrown in an example regarding the linking to requirements per the other discussion
- name: some_test
  command: ...
  metadata: 
    test_group: test_group_1
    satisfies:
      - req1
      - req2
    description: description of test    
  files:
    - path: "file1.txt"

Then there would be two custom tests (I can provide a concrete example if necessary but i think you can imagine what i mean), which parse the yml files, look for the expected_lines or file_to_compare metadata, and if they are present, do the relevant comparison.

It is unclear to me what is meant by "parametrizing custom workflows" here . Pytest-workflow does not allow workflow parametrization by design. All test cases should be explicitly written out. This is a little tedious, I agree, but pytest is a difficult beast to tame and the cost of building in parametrization is not worth the benefit (ctrl-c and ctrl-v make duplicating and modifying tests a single-digit-minute job).

Oh, sorry i meant to say parametrizing custom tests, not workflows.

@rhpvorderman
Copy link
Member

In this case it is common to need more structure than a simple string tag to convey the information. For example you might need to convey the test name, a list of requirements satisfied, a longform test description, a test group that the test is a part of, etc. It would be possible to encode this all in a single string but that would not be very readable.

Behold the power of YAML:

tags:
  - >-
    {
      "What is this?": 
        [
          "OMG",
          "This is readable JSON!",
          "What dark magic was used to conjure this up?"
        ]
      }

This will be collapsed in a single string by the YAML parser. (YAML is quite a complex file format that allows all sorts of nice things.)

It only makes the YAML less clean to the extent the user wants. I would imagine that many use cases would not make use of this and therefore not make the YAML less clean. And I could imagine that in some use cases, users would consider the tradeoff and decide to go with another solution, because a solution using the metadata field would make the YAML too hard to read. This feature simply gives the user the choice.

I don't want to give the user the choice to make the YAML less clean. Code is read more than it is written. Pytest-workflow is meant to cover 90% of your test cases with a simple easy to write YAML. It also means that any user can come in, read your YAML and completely comprehend 90% of your tests because the keywords are limited, not free-form and self explanatory.

For the other 10%, custom test hooks are provided. Making the 90% use case less clean by user choice in order to enable a slighlty cleaner 10% is not a good tradeoff in my opinion.

I suggest you work with structured tags as provided above first, and see how that works out in practice. If other users follow your example and see it as a good way to structure testing, we can discuss how (and if) to make a more integral test feature. For now I only see one user who wants this. Free-form fields will divert too much from the "easy and clear to get to 90%" philosophy in my opinion.

@sstrong99
Copy link
Author

Very cool! i didn't know you could do that with yaml. Unfortunately, this doesn't meet the need described by the examples i provided in my previous reply, since those rely on a metadata field within the "files" objects.

For the other 10%, custom test hooks are provided. Making the 90% use case less clean by user choice in order to enable a slighlty cleaner 10% is not a good tradeoff in my opinion.

Respectfully, I think this is incorrect. It would have no effect on the 90% use case and make the 10% use case cleaner. Maybe there has been a miscommunication about the feature i'm suggesting? The new freeform metadata field is entirely optional, and i imagine will be left out of the majority tests. In the examples i give above it will be used in 1) custom tests (the 10%) and 2) cases where users are integrating with an entirely separate system for requirement tracking, and therefore are willing to support some extra complexity in order to manage that interface.

@rhpvorderman
Copy link
Member

Adding optional free-form fields make the tests less clean, regardless if they are used or not. Seven years of working in a group has taught me that having tight constraints is a good thing. Bells and whistles create cognitive overhead. If a not previously used keyword shows up in a test, this means the person reading your test will have to look it up. By adding a free-form field, you are basically introducing a limitless amount of new keywords.

By not allowing this I can come into any project that uses pytest-workflow and have the guarantee that I immediately understand its tests. This is vital for workflows that are cooperated on by more than one person.

The custom test interface is merely a recognition that such a simple interface cannot cover all use cases. It is an escape gap. It is by no means the primary focus of this project. In our group we try to use the least amount of custom tests possible.

This is because there are already a myriad of other test frameworks available that are attuned to specific workflow systems and do allow you to write advanced parametrized tests etc. There is no reason for pytest-workflow to become such a test system, as that would remove its defining feature: the simplicity.

Pytest-workflow is just one tool. If you have very extensive test requirements that require more hands-on programming to make them work, maybe pytest-workflow is not a good fit for your use case. Pytest-workflow can be used in conjuction with other test frameworks if its features need to be complemented.

@sstrong99
Copy link
Author

If a not previously used keyword shows up in a test, this means the person reading your test will have to look it up.

By not allowing this I can come into any project that uses pytest-workflow and have the guarantee that I immediately understand its tests.

You can immediately understand any of the non-custom tests, and this addition would not change that. You cannot (i assume) immediately understand any custom tests. You already have to go to them and look them up, so this addition doesn't change that either. In fact, this addition makes it easier to go look those tests up because it's easier to find all the relevant information.

The custom test interface is merely a recognition that such a simple interface cannot cover all use cases. It is an escape gap. It is by no means the primary focus of this project. In our group we try to use the least amount of custom tests possible.

I totally agree with that philosophy. But I don't think that's a reason to avoid making the custom tests work more smoothly. I agree that any modification to the custom test framework should be weighed carefully against any detriment to the core test framework. And in my opinion (see my above point) this PR does not have any affect on the core test framework.

Are you worried that users will abuse this new field and use it for things that would be better suited in tags (e.g. comments about the test)? If that is your concern, then I agree that you might end up in a situation where a user has created a very hard to read test for no good reason. I can see several solutions to that problem. Let me know what you think:

  1. remove the metadata field from the outer most test object, and only include it in the individual test specs (i.e. the file objects, stdout, stderr, and exit_code. This way users are forced to use the tags construct when possible.
  2. rename the metadata field to something more descriptive to make users less likely to abuse it, like custom_test_metadata or similar
  3. limit the metadata field to only a list of strings. This would make it identical to the tags construct, but allow it to be included within every test spec (files, stdout, stderr, and exit_code). This alleviates your concern about the limitless amount of new keywords. And then any need for further structure could be handled using the yaml syntax you described earlier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants