askmarilyn: Can she (as the Judge) tell me how much I drank? #15

thorehusfeldt · 2019-10-19T08:45:57Z

The infrastructure with interactive problems is still a bit of a mystery to me. (It’s hard to simulate the user experience from the command line.)

I would love Marilyn to end the interaction with “You ended up with 659 beers. Well done.” (for AC) or “You ended up with only 502 beers. Too bad.” (for WA). This would be particularly useful for solvers who implement the wrong strategy. I know there is some way for Kattis to show hints, but I’m not sure how to write judge messages that end up in the right place.

simonlindholm · 2019-10-19T17:35:05Z

The way to do it is to read a directory name from argv[3], then create $dir/teammessage.txt with the output message (this is vaguely documented at https://www.problemarchive.org/wiki/index.php/Output_validator ). I think Kattis then shows you the message for the testcase you fail on, if you do, but not for earlier accepted testcases.

I agree that it's nice to tell the user whether their strategy is bad of if they are just outputting things in the wrong format.

thorehusfeldt · 2019-10-20T14:46:21Z

Turns out there is an author_feedback function in validate.h that talks to that directory for me. I’ve tried to cook something up in feat/marilynfeedback but it feels extremely fragile. I have no idea of telling is this actually works.

Well… I can of course run one-round interactions by hand (and that works and puts teammessage.txt in the right directory), but what is needed is to run all the input files against all the submissions and look at their feedback. Do I have to write that script myself? (I can, and would consider it a good exercise. But I’d prefer to learn problemtools.)

simonlindholm · 2019-10-20T14:57:14Z

I don't know a good way to test it; I would be inclined to take your commit as is and then test it on Kattis when the problem has been installed (which could happen far in advance of the contest if you email Kattis people and tell them that this requires early testing). If you feel the need for manual testing, then yeah, you probably do need to write that script yourself, or hack problemtools to leave temporary files around -- there's no built-in functionality for it AFAIK.

thorehusfeldt · 2019-10-20T15:16:31Z

I’ll write my own script. ’twill be good fun. Where do I put it without confusing problemtools?

simonlindholm · 2019-10-20T15:27:13Z

Basically anywhere that isn't input_format_validators/, output_validators/, graders/, submissions/<subdir>/ or data/<subdir>/. In particular it's fine just to put it at toplevel.

thorehusfeldt · 2019-10-20T16:10:04Z

OK, I’m going this route:

# verifyproblem askmarilyn -l info
Loading problem askmarilyn
[...]
Checking submissions
INFO : Check AC submission sl.cpp (C++)
[...]
INFO : Running on test case group data/secret
INFO : Test file result: AC [message: Congratulations! You got 653 drinks., CPU: 0.01s @ test case secret/1-0]
[...]

I find this extremely useful for development. Does it make sense to polish it and make it a pull request or am I wasting everybody’s time with something like that?

thorehusfeldt · 2019-10-20T17:00:05Z

It now summarises over test data and reports the last message for each submission, just like the rest of the verdicts. Seems to work great and makes the problem so much more accessible.

  AC submission sl.cpp (C++) OK: AC [message: Congratulations! You got 650 drinks., CPU: 0.10s @ test case secret/2-2]
   AC submission thore.py (Python 3) OK: AC [message: Congratulations! You got 649 drinks., CPU: 0.16s @ test case secret/3-1]
   Slowest AC runtime: 0.160, setting timelim to 1 secs, safety margin to 2 secs
   WA submission always_door_A.py (Python 3) OK: WA [message: 330 drinks in 1000 rounds. Too bad., test case: test case secret/1-0, CPU: 0.04s @ test case secret/1-0]
   WA submission break_protocol.py (Python 2 w/PyPy) OK: WA [message: Your guess must be a valid door name, such as A., test case: test case secret/1-0, CPU: 0.02s @ test case secret/1-0]
   WA submission first_a.py (Python 3) OK: WA [message: 502 drinks in 1000 rounds. Too bad., test case: test case secret/1-1, CPU: 0.14s @ test case secret/1-0]
   WA submission first_b.py (Python 3) OK: WA [message: 532 drinks in 1000 rounds. Too bad., test case: test case secret/1-2, CPU: 0.13s @ test case secret/1-2]
   WA submission first_c.py (Python 3) OK: WA [message: 506 drinks in 1000 rounds. Too bad., test case: test case secret/1-3, CPU: 0.14s @ test case secret/1-3]
   WA submission ignore_positive_hint.py (Python 3) OK: WA [message: 0 drinks in 1000 rounds. Too bad., test case: test case secret/1-0, CPU: 0.14s @ test case secret/1-0]
   WA submission plays-1001-rounds.py (Python 3) OK: WA [message: You won't stop talking!, test case: test case secret/1-0, CPU: 0.05s @ test case secret/1-0]
   WA submission plays-999-rounds.py (Python 3) OK: WA [message: You must begin round 1000 by guessing a door., test case: test case secret/1-0, CPU: 0.13s @ test case secret/1-0]
   WA submission plays-forever.py (Python 3) OK: WA [message: You won't stop talking!, test case: test case secret/1-0, CPU: 0.05s @ test case secret/1-0]
   WA submission random_door.py (Python 3) OK: WA [message: 342 drinks in 1000 rounds. Too bad., test case: test case secret/1-0, CPU: 0.12s @ test case secret/1-0]
   WA submission silent-death.py (Python 2 w/PyPy) OK: WA [message: You must begin round 1 by guessing a door., test case: test case secret/1-0, CPU: 0.02s @ test case secret/1-0]
   WA submission spam.py (Python 3) OK: WA [message: Your guess must be a valid door name, such as A., test case: test case secret/1-0, CPU: 0.01s @ test case secret/1-0]
   TLE submission forever-silent.py (Python 2 w/PyPy) OK: TLE [test case: test case secret/1-0, CPU: 4.00s @ test case secret/1-0]
askmarilyn tested: 0 errors, 0 warnings

thorehusfeldt · 2019-10-20T17:04:55Z

Hehe. The stubborn WA player ignore_positive_hint who sternly refuses to take positive hint from Marilyn does maximally bad against the super-friendly Marilyn who always shows him where the beer is. “No beers for you.”

simonlindholm · 2019-10-20T18:26:42Z

Seems reasonable to submit a problemtools PR for! Have it read judgemessage.txt as well though and prefer that if it exists; teammessage.txt is rarer and generally gives more vague errors.

thorehusfeldt · 2019-10-21T11:24:03Z

Solved by patching verifyproblem and updated feedback accordingly in 526f6ca . Patch submitted as PR at Kattis/problemtools#140

thorehusfeldt added the question Further information is requested label Oct 19, 2019

thorehusfeldt assigned simonlindholm Oct 19, 2019

thorehusfeldt assigned thorehusfeldt and unassigned simonlindholm Oct 20, 2019

thorehusfeldt closed this as completed Oct 21, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

askmarilyn: Can she (as the Judge) tell me how much I drank? #15

askmarilyn: Can she (as the Judge) tell me how much I drank? #15

thorehusfeldt commented Oct 19, 2019

simonlindholm commented Oct 19, 2019

thorehusfeldt commented Oct 20, 2019 •

edited

Loading

simonlindholm commented Oct 20, 2019

thorehusfeldt commented Oct 20, 2019

simonlindholm commented Oct 20, 2019

thorehusfeldt commented Oct 20, 2019 •

edited

Loading

thorehusfeldt commented Oct 20, 2019

thorehusfeldt commented Oct 20, 2019

simonlindholm commented Oct 20, 2019

thorehusfeldt commented Oct 21, 2019

askmarilyn: Can she (as the Judge) tell me how much I drank? #15

askmarilyn: Can she (as the Judge) tell me how much I drank? #15

Comments

thorehusfeldt commented Oct 19, 2019

simonlindholm commented Oct 19, 2019

thorehusfeldt commented Oct 20, 2019 • edited Loading

simonlindholm commented Oct 20, 2019

thorehusfeldt commented Oct 20, 2019

simonlindholm commented Oct 20, 2019

thorehusfeldt commented Oct 20, 2019 • edited Loading

thorehusfeldt commented Oct 20, 2019

thorehusfeldt commented Oct 20, 2019

simonlindholm commented Oct 20, 2019

thorehusfeldt commented Oct 21, 2019

thorehusfeldt commented Oct 20, 2019 •

edited

Loading

thorehusfeldt commented Oct 20, 2019 •

edited

Loading