Implement bad inputs for input/output format validators #86

Alicipy · 2018-07-26T13:30:23Z

Now it's possible to specify invalid inputs and outputs which should be declined by input/output format validators. See #51 for details.

Usage:

Input Format Validator: place .in files into a folder input_format_validators/bad_inputs, this input files should be declined by input validators
Output Format Validator: place {testcasename}.ans.wrong-{something} into data/{category}/ for wrong answers to inputfile {testcasename}.in. This wrong answers should be declined by the output validators. Interactive problems are currently not supported. A warning will occur if the inputfile doesn't exist (in case of misspelling)

…nput validator Place them into a folder input_format_validators/bad_inputs, the input format validators should decline them. Signed-off-by: Stefan Kraus <[email protected]>

Files placed at data/{sample,secret,whatever}/{testcasename}.ans.wrong-{something} are files which should be declined by the output checker for inputfile {testcasename}.in and result in a wrong answer judgement. Signed-off-by: Stefan Kraus <[email protected]>

simonlindholm

Nice PR! This seems like a neat feature to have. @austrin maintains this repo, and is often quite late in responding, but in the meantime I made some comments in the hope that they're useful.

simonlindholm · 2018-07-26T19:32:49Z

problemtools/verifyproblem.py

        for f in ansfiles:
            if not f[:-4] + '.in' in infiles:
                self.error("No matching input file for answer '%s'" % f)

+        for f in wafiles:
+            if f not in wafiles_with_input:


I think if f.split('.ans.wrong')[0] + '.in' not in infiles would be better, to avoid repeated globbing

You are right, this is also much easier. I will change this (soon)

simonlindholm · 2018-07-26T19:53:27Z

problemtools/verifyproblem.py

+                continue
+
+            for val in self._validators:
+                status, _ = val.run(invalid_infile, args=None)


Hm, it's a bit sad not to be able to pass any flags here... It means we can't use this feature with IOI-style problems, where testgroup-specific limits are checked though flags. (Not that this is a feature I suspect we'll use for at least the Swedish IOI qualifiers, but still...)

I guess that's an argument for using *.in.wrong-* files in data/ instead, even if I agree that they don't completely belong there.

I noticed that at implementing, I didn't use this feature myself, but I didn't know that this is important for IOI. Of course, implementing it at wrong input files like the output validator is possible. Like you said, it doesn't completely belong there. Another way is to have the same directory-structure like in /data, but I don't know if this is worthwhile.

I think a third opinion here would be nice.

I don't know what the best choice is here either. Agreed that a third opinion would be nice.

Simon, what flags would you pass here?

IMHO, the data directory should be cluttered as less as possible. The bad inputs don't have any relationship to the other test cases so they don't belong in here.
The bad outputs are related to a single test case so it makes more sense to store them next to the real thing although storing them in the output_validator section with the same data/{sample,secret,whatever} structure may also be an option to clearly separate between test data relevant for the contest(ants) and only for testing the validators.

Flags either has to be None or taken from a .yaml file. I don't know the best option either.

I think the solution I'd lean towards is to put everything in data/. Say we add a directory to data/validator-tests with the same structure as secret and sample, and additionally files *.{in,ans}.{invalid,valid} (or -wrong or -wrong-* or whatever). This gives us input- and output validator flags, makes the structure uniform for both sorts of tests, is somewhat self-explanatory when you see it for the first time, and also makes it possible to add output validator tests without adding them as actual test cases. (For the Swedish IOI qualifiers for instance, we keep the secret/ directory entirely auto-generated, while a validator-tests directory would be under version control.) We might want to support .ans.invalid in the actual data directories as well, of course.

FWIW, I recently tried out Polygon and it has the following UI for validator tests (same for input and output):

After some testing I think that putting everything in data with the same structure as the other testcases is kind of hard. We would have to exclude this directory from the other testcases, implement that data/testcase-validator/sample should use the same flags as sample, think if wrong-answer-files should be able to refer to input-files from the testfiles and more. That is all possible, but require some changes on the current testcase class (like expand it with wrong answers, but that couldn't model invalid inputs) or other major restructures.

Should we check if the current state is good enough for a merge, or check some time more if we can fulfill all requested features here?

simonlindholm · 2018-07-26T19:57:29Z

problemtools/verifyproblem.py

+
+                # If an input validator declines the input file, everything is fine and we break.
+                if os.WEXITSTATUS(status) == 42:
+                    break


This logic is repeated 4 times now, it really should be a function (taking file, flags, and probably some bool about whether to error on failure)

(fwiw, github hid this as "outdated" but it's still relevant)

simonlindholm · 2018-07-26T20:04:12Z

problemtools/verifyproblem.py

+                    return
+
+                if self._problem.config.get('validation') == 'default':
+                    self.warning('Output validator check with wrong answers and default validator is unnecessary (skipping)')


FWIW, I could see use cases for this, like declaratively saying that a trivial answer isn't the answer to any test case without having to manually check all the .ans files (which might even be hard to do with diff with floats involved). So if it's easy to remove this check it would be nice to do so.

The problem here is that the default validator is not at self._validators (At least not in my tests), so every wrong output file is accepted.

Do you think that it's worthwhile to think about a solution here? With default validating, the trivial solution shouldn't be accepted because there is only one single solution normally.

It's not super important, but if you can get it to work at the same time as factoring output validation out into a single function it might be worth doing. Otherwise skip it.

(The trivial solution might be accepted if it turns out that it is in fact correct, because test data was generated in a broken way, which IME happens surprisingly often. Of course, this is why one writes greedy/stupid solutions and checks that they get WA, but it might be theoretically useful to declare that on a per-test case level. So yeah, not a super strong use-case, but not completely pointless either.)

Hm, _actual_validators?

How could i miss that? Yes, that solves my problem. Works for default validation now.

Thank you very much!

simonlindholm · 2018-07-26T20:07:07Z

problemtools/verifyproblem.py

+                # .in-file doesn't exist, already reported by testcase-check, so 'ignore' here
+                continue
+
+            wrong_answers = glob.glob(OutputValidators.WA_GLOB % testcase.ansfile)


(I know this isn't in the hot path, but this linear-time, syscall-based, noop-in-99%-of-cases globbing still tears at me. Maybe we could loop over all wrong answers and pick out their associated testcases, or store wrong answer lists as part of testcase objects.)

Like in the check if there is a corresponding input file? Yeah, nice idea. I thought this works fine and with 100 testcases at most, this won't be a big problem. Of course, your solution is better here.

simonlindholm · 2018-07-26T20:12:47Z

problemtools/verifyproblem.py

+                else:
+                    # Will only be executed if loop wasn't ended by break
+                    # No output validator declined wrong answer, this is an error.
+                    self.error("Wrong answer file %s was not declined by output validators" % wafile)


Like with the input validator, I don't like this duplication much (pretty complex code, too).

I think a function here would be nice. Maybe an abstract base class 'Validators' would help here, this would also make the check on noncompiling validators easier.

To abstract both input and output validators? I'm not sure, they don't share that much logic, and inheritance tends to be confusing. Maybe a helper function could be better (if anything; some small amount of duplication is fine IMO).

This way, no globbing is needed, it's shorter and easier to understand. Signed-off-by: Stefan Kraus <[email protected]>

The result of _actual_validators() also contains the default validator which declines wrong answers in validation mode "default" Signed-off-by: Stefan Kraus <[email protected]>

austrin · 2018-09-19T09:15:43Z

I've been thinking a bit about how to incorporate this. I agree with @meisterT about not cluttering the data directory too much. I also find the input_format_validators/bad_inputs a bit problematic, because subdirectories in input_format_validators typically means a program.

My suggestion would be to have a new top level directory called something like validator_tests. There, one can place a whole test case group hierarchy similar to the one under data/, and in particular have:

testdata.yaml files controlling the input/output validator flags used for each test case group
X.in, X.ans, X.out.wrong triples of files to test the output validator.
X.in.wrong files to test the input validator.

Some downsides of this would be:

It doesn't allow for adding output validator tests to the actual test cases (without making copies of them). I personally think this is fine.
Even if one can test the validators with different flags, if one wants to test them with the same groups as the actual test case groups of the problem, one would have to make sure that this parallel test case group tree is identical to the real one, which sounds annoying. On the other hand I think problems with complex test case group structures often has the data tree generated, and then that generator can equally well generate an identical tree in validator_tests/. @simonlindholm is this hypothesis correct?

simonlindholm · 2018-09-19T09:27:15Z

That sounds like a good suggestion to me.

@simonlindholm is this hypothesis correct?

It's true for testdata_tools, yes. I don't know if there is anyone else out there doing test groups with the problem format. One could also symlink testdata.yaml files if doing things manually. Or .in files.

godmar · 2018-11-19T17:21:51Z

I've been looking for this for a long time, particularly for input validators. Personally, I would just place them under data/badinput/*.in

I wasn't aware that there are input validator flags. Are they documented somewhere?

simonlindholm · 2018-11-19T17:27:29Z

http://www.problemarchive.org/wiki/index.php/Problem_Format, grep for "input_validator_flags". There are no default ones, it's purely something you can use if you implement your own (non-ctd) validators. They are declared in testdata.yaml and passed as command line arguments to the validator.

godmar · 2018-11-19T17:41:18Z

Ok. FWIW, does Per @austrin suggest to name the directory data/validator_tests ? If so, a risk is that it's not clear that all of these must be rejected. I'd rather have something explicit like 'badinput' or even 'must_reject' - similar to how 'submissions/{accepted|wrong_answer|...}' makes it clear what goes there.

austrin · 2018-11-19T19:00:45Z

@godmar the (current, still not completely finalized) plan is a directory /validator_tests/ under the root of the problem package, with file names clearly indicating whether the expected result is that the input/output validator accepts/rejects, along the lines of what is described in #86 (comment)

meisterT · 2018-11-19T19:29:20Z

What's necessary to move the finalize the plan? I think your proposal is good and the downsides are not too bad.

Alicipy · 2018-11-19T19:36:42Z

The described format of @austrin sound pretty good. I have lost sight of this feature, I hope I have some time after NWERC and can implement this. So, technically, the finalization of this plan is just time :D

godmar · 2018-11-19T19:48:39Z

I think it's good enough and it's definitely better than the status quo, although I aesthetically dislike

double suffixes like .in.wrong
mixing input and output validators in the same directory because they have, IMO, little/nothing in common
not staying under /data since we're talking about input data to the problem, and a separate root directory is to easy to overlook especially for new problem setters copying from a skeleton.

niemela · 2018-11-19T20:23:40Z

if one wants to test them with the same groups as the actual test case groups of the problem, one would have to make sure that this parallel test case group tree is identical to the real one, which sounds annoying.

Wouldn't symlinks work in that case?

meisterT · 2018-11-26T08:32:29Z

Did you talk about that this weekend in person? I totally forgot to bring it up but would love to use this feature.

austrin · 2018-11-26T16:19:07Z

Alas no, I also forgot. @AliceMurray if you feel like working on this in the near future let me know, otherwise I'm probably going to go ahead and do this now that there is a reasonable concrete plan.

Alicipy · 2018-11-26T16:23:48Z

Well, I also forgot discussing this in person this weekend.

If you want to, you can implement this, I won't have time in the near future. If you need someone to test it, let me know.

Alicipy · 2018-12-27T16:46:25Z

@austrin do you have already implemented something or need some help? In about a month, we have a small onside contest here and could test this feature on our problemset for you (under real conditions)

austrin · 2019-01-05T15:38:30Z

@AliceMurray did not get around to it. Would you have time to also implement it, or only to test it? If the latter I can make an effort to get it implemented.

Alicipy · 2019-01-05T16:03:28Z

@austrin Currently, preparing the contest, studying and work is taking a lot of time. I think testing will be done by using it in our contest preperation, At the moment, I won't find time to implement it myself. I think it would be better if you know someone or can make an effort to implement it.

pehrsoderman · 2021-04-26T12:46:27Z

@AliceMurray @austrin What is the status here? What is needed to get this PR moving forward?

RagnarGrootKoerkamp · 2021-04-26T13:03:59Z

I have a version of this working in BAPCtools:

All bad IO is in data/bad/.
Bad inputs go in data/bad/*in, and do not have a corresponding .ans.
Pairs of data/bad/X.{in,ans} indicate that X.ans is a bad output for X.in.

I don't really like the not-so-obvious mixing of bad inputs and outputs here though, so separating into something like data/bad_input and data/bad_output is probably nicer.

One benefit of keeping things in the data/ directory is that it will work better in combination with generators/generators.yaml. In particular, I now have support for writing the contents of (per testgroup) testdata.yml in the generators.yaml. This means that it's possible to replicate a directory structure without doing too much duplication of code/config, since yaml objects can be reused.

I don't see too much value in allowing e.g. .wrong.ans right next to a .in in the data/{sample,secret}/ tree. Usually I only add a few bad outputs for one/some small minimal testcases, where creating a new .in or possibly copying one to data/bad is fine.

Alicipy added 2 commits July 26, 2018 15:15

Handle invalid input files which should be declined by at least one i…

d81ac40

…nput validator Place them into a folder input_format_validators/bad_inputs, the input format validators should decline them. Signed-off-by: Stefan Kraus <[email protected]>

Alicipy mentioned this pull request Jul 26, 2018

Support 'wrong' as test data category #51

Open

Decline output at 43, not 42

c951e76

simonlindholm reviewed Jul 26, 2018

View reviewed changes

Alicipy added 2 commits July 27, 2018 10:57

Make check for existence of inputfile clearer

0fa96f7

This way, no globbing is needed, it's shorter and easier to understand. Signed-off-by: Stefan Kraus <[email protected]>

Support wrong answer files for default validation.

c30fd01

The result of _actual_validators() also contains the default validator which declines wrong answers in validation mode "default" Signed-off-by: Stefan Kraus <[email protected]>

RagnarGrootKoerkamp mentioned this pull request Jan 22, 2020

data/bad Validators tests RagnarGrootKoerkamp/BAPCtools#35

Closed

Implement bad inputs for input/output format validators #86

Are you sure you want to change the base?

Implement bad inputs for input/output format validators #86

Conversation

Alicipy commented Jul 26, 2018

simonlindholm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

austrin commented Sep 19, 2018

simonlindholm commented Sep 19, 2018

godmar commented Nov 19, 2018

simonlindholm commented Nov 19, 2018

godmar commented Nov 19, 2018

austrin commented Nov 19, 2018

meisterT commented Nov 19, 2018

Alicipy commented Nov 19, 2018

godmar commented Nov 19, 2018

niemela commented Nov 19, 2018

meisterT commented Nov 26, 2018

austrin commented Nov 26, 2018

Alicipy commented Nov 26, 2018

Alicipy commented Dec 27, 2018

austrin commented Jan 5, 2019

Alicipy commented Jan 5, 2019

pehrsoderman commented Apr 26, 2021

RagnarGrootKoerkamp commented Apr 26, 2021