Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

askmarilyn: Can she (as the Judge) tell me how much I drank? #15

Closed
thorehusfeldt opened this issue Oct 19, 2019 · 10 comments
Closed

askmarilyn: Can she (as the Judge) tell me how much I drank? #15

thorehusfeldt opened this issue Oct 19, 2019 · 10 comments
Assignees
Labels
question Further information is requested

Comments

@thorehusfeldt
Copy link
Owner

The infrastructure with interactive problems is still a bit of a mystery to me. (It’s hard to simulate the user experience from the command line.)

I would love Marilyn to end the interaction with “You ended up with 659 beers. Well done.” (for AC) or “You ended up with only 502 beers. Too bad.” (for WA). This would be particularly useful for solvers who implement the wrong strategy. I know there is some way for Kattis to show hints, but I’m not sure how to write judge messages that end up in the right place.

@thorehusfeldt thorehusfeldt added the question Further information is requested label Oct 19, 2019
@simonlindholm
Copy link
Collaborator

The way to do it is to read a directory name from argv[3], then create $dir/teammessage.txt with the output message (this is vaguely documented at https://www.problemarchive.org/wiki/index.php/Output_validator ). I think Kattis then shows you the message for the testcase you fail on, if you do, but not for earlier accepted testcases.

I agree that it's nice to tell the user whether their strategy is bad of if they are just outputting things in the wrong format.

@thorehusfeldt
Copy link
Owner Author

thorehusfeldt commented Oct 20, 2019

Turns out there is an author_feedback function in validate.h that talks to that directory for me. I’ve tried to cook something up in feat/marilynfeedback but it feels extremely fragile. I have no idea of telling is this actually works.

Well… I can of course run one-round interactions by hand (and that works and puts teammessage.txt in the right directory), but what is needed is to run all the input files against all the submissions and look at their feedback. Do I have to write that script myself? (I can, and would consider it a good exercise. But I’d prefer to learn problemtools.)

@simonlindholm
Copy link
Collaborator

I don't know a good way to test it; I would be inclined to take your commit as is and then test it on Kattis when the problem has been installed (which could happen far in advance of the contest if you email Kattis people and tell them that this requires early testing). If you feel the need for manual testing, then yeah, you probably do need to write that script yourself, or hack problemtools to leave temporary files around -- there's no built-in functionality for it AFAIK.

@thorehusfeldt
Copy link
Owner Author

I’ll write my own script. ’twill be good fun. Where do I put it without confusing problemtools?

@simonlindholm
Copy link
Collaborator

Basically anywhere that isn't input_format_validators/, output_validators/, graders/, submissions/<subdir>/ or data/<subdir>/. In particular it's fine just to put it at toplevel.

@thorehusfeldt
Copy link
Owner Author

thorehusfeldt commented Oct 20, 2019

OK, I’m going this route:

# verifyproblem askmarilyn -l info
Loading problem askmarilyn
[...]
Checking submissions
INFO : Check AC submission sl.cpp (C++)
[...]
INFO : Running on test case group data/secret
INFO : Test file result: AC [message: Congratulations! You got 653 drinks., CPU: 0.01s @ test case secret/1-0]
[...]

I find this extremely useful for development. Does it make sense to polish it and make it a pull request or am I wasting everybody’s time with something like that?

@thorehusfeldt
Copy link
Owner Author

It now summarises over test data and reports the last message for each submission, just like the rest of the verdicts. Seems to work great and makes the problem so much more accessible.

  AC submission sl.cpp (C++) OK: AC [message: Congratulations! You got 650 drinks., CPU: 0.10s @ test case secret/2-2]
   AC submission thore.py (Python 3) OK: AC [message: Congratulations! You got 649 drinks., CPU: 0.16s @ test case secret/3-1]
   Slowest AC runtime: 0.160, setting timelim to 1 secs, safety margin to 2 secs
   WA submission always_door_A.py (Python 3) OK: WA [message: 330 drinks in 1000 rounds. Too bad., test case: test case secret/1-0, CPU: 0.04s @ test case secret/1-0]
   WA submission break_protocol.py (Python 2 w/PyPy) OK: WA [message: Your guess must be a valid door name, such as A., test case: test case secret/1-0, CPU: 0.02s @ test case secret/1-0]
   WA submission first_a.py (Python 3) OK: WA [message: 502 drinks in 1000 rounds. Too bad., test case: test case secret/1-1, CPU: 0.14s @ test case secret/1-0]
   WA submission first_b.py (Python 3) OK: WA [message: 532 drinks in 1000 rounds. Too bad., test case: test case secret/1-2, CPU: 0.13s @ test case secret/1-2]
   WA submission first_c.py (Python 3) OK: WA [message: 506 drinks in 1000 rounds. Too bad., test case: test case secret/1-3, CPU: 0.14s @ test case secret/1-3]
   WA submission ignore_positive_hint.py (Python 3) OK: WA [message: 0 drinks in 1000 rounds. Too bad., test case: test case secret/1-0, CPU: 0.14s @ test case secret/1-0]
   WA submission plays-1001-rounds.py (Python 3) OK: WA [message: You won't stop talking!, test case: test case secret/1-0, CPU: 0.05s @ test case secret/1-0]
   WA submission plays-999-rounds.py (Python 3) OK: WA [message: You must begin round 1000 by guessing a door., test case: test case secret/1-0, CPU: 0.13s @ test case secret/1-0]
   WA submission plays-forever.py (Python 3) OK: WA [message: You won't stop talking!, test case: test case secret/1-0, CPU: 0.05s @ test case secret/1-0]
   WA submission random_door.py (Python 3) OK: WA [message: 342 drinks in 1000 rounds. Too bad., test case: test case secret/1-0, CPU: 0.12s @ test case secret/1-0]
   WA submission silent-death.py (Python 2 w/PyPy) OK: WA [message: You must begin round 1 by guessing a door., test case: test case secret/1-0, CPU: 0.02s @ test case secret/1-0]
   WA submission spam.py (Python 3) OK: WA [message: Your guess must be a valid door name, such as A., test case: test case secret/1-0, CPU: 0.01s @ test case secret/1-0]
   TLE submission forever-silent.py (Python 2 w/PyPy) OK: TLE [test case: test case secret/1-0, CPU: 4.00s @ test case secret/1-0]
askmarilyn tested: 0 errors, 0 warnings

@thorehusfeldt
Copy link
Owner Author

Hehe. The stubborn WA player ignore_positive_hint who sternly refuses to take positive hint from Marilyn does maximally bad against the super-friendly Marilyn who always shows him where the beer is. “No beers for you.”

@simonlindholm
Copy link
Collaborator

Seems reasonable to submit a problemtools PR for! Have it read judgemessage.txt as well though and prefer that if it exists; teammessage.txt is rarer and generally gives more vague errors.

@thorehusfeldt
Copy link
Owner Author

Solved by patching verifyproblem and updated feedback accordingly in 526f6ca . Patch submitted as PR at Kattis/problemtools#140

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants