Skip to content

Commit

Permalink
problem_format: add C++ templates
Browse files Browse the repository at this point in the history
Many templates have been floating around in the DMOJ community for
validation and input handling in checkers. This commit aims to
consolidate them. It has two main goals:

- Correct. Duh.
- Simple. Other templates that circulate, including the ones I have
  published, are too complex. People naively try and write their own. I
  am sick and tired of reading over incorrect validators.

  These templates forgo some principles of good design (such as
  object-oriented programming) in favour of pure simplicity. They should
  be simple enough that they are understandable by the broader
  community, and are not a black box. Hopefully this also dissuades
  re-writing.
  • Loading branch information
Riolku committed Sep 9, 2024
1 parent 430149b commit 8fcb74a
Show file tree
Hide file tree
Showing 68 changed files with 1,044 additions and 0 deletions.
21 changes: 21 additions & 0 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
name: build
on: [push, pull_request]
jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install clang-format 12
run: |
wget -O clang-format https://github.com/DMOJ/clang-tools-static-binaries/releases/download/master-5ea3d18c/clang-format-12_linux-amd64
chmod a+x ./clang-format
- name: Run clang-format
run: find sample_files/problem_setting \( -name '*.h' -or -name '*.cpp' -or -name '*.c' \) -print0 | xargs -0 ./clang-format --dry-run -Werror --color
cpp_template_tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Run C++ template tests
run: |
cd sample_files/problem_setting/test
./run_test.sh
1 change: 1 addition & 0 deletions docs/_sidebar.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
- [Custom graders](problem_format/custom_graders.md)
- [Generators](problem_format/generator.md)
- [Problem examples](problem_format/problem_examples.md)
- [C++ Problem Setting Templates](problem_format/cpp_psetting_templates.md)

- About
- [License](about/LICENSE.md)
90 changes: 90 additions & 0 deletions docs/problem_format/cpp_psetting_templates.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# C++ Problem Setting Templates - `cpp_psetting_templates`

There are three C++ input-handling templates provided for aiding problem setters. They are as follows:

- [Validator Template](https://github.com/DMOJ/docs/blob/master/sample_files/problem_setting/validator.cpp)
- [Identical Checker/Interactor Template](https://github.com/DMOJ/docs/blob/master/sample_files/problem_setting/identical_checker_interactor.cpp)
- [Standard Checker/Interactor Template](https://github.com/DMOJ/docs/blob/master/sample_files/problem_setting/standard_checker_interactor.cpp)

Examples of their use are as follows:

- A validator for <https://dmoj.ca/problem/aplusb> is [here](https://github.com/DMOJ/docs/blob/master/sample_files/problem_setting/examples/validator.cpp).
- An identical-style checker for <https://dmoj.ca/problem/seq3> is [here](https://github.com/DMOJ/docs/blob/master/sample_files/problem_setting/examples/identical_checker.cpp).
- A standard-style interactor for <https://dmoj.ca/problem/seed2> is [here](https://github.com/DMOJ/docs/blob/master/sample_files/problem_setting/examples/standard_interactor.cpp).

## Validator

This is a template for validating the input data of problems. It aims to be simple and of course, correct. It contains seven functions. The first three are whitespace functions:

- `void readSpace()` expects a space at the current position in the input, and aborts the program if there is not a space.
- `void readNewLine()` expects a newline at the current position in the input.
- `void readEOF()` expects the input file to end immediately at the current position.

The remaining four are for actual content:

- `std::string readToken(char min_char = 0, char max_char = 127)` returns the next token in the input stream. A token is defined as a whitespace-separated string. If the next character in the input is a whitespace character, this function aborts the program. The optional arguments `min_char` and `max_char` can be used to enforce a range on the characters in the token. For instance, `readToken('a', 'z')` reads a lowercase string of english letters.
- `std::string readLine(char min_char = 0, char max_char = 127)` returns the next line in the input stream. Specifically, it reads until it encounters a `\n`, and discards it (the newline is not part of the returned string). `min_char` and `max_char` are the same as for `readToken`. If `readLine` encounters an EOF, it fails.
- `long long readInt(long long lo, long long hi)` calls `readToken()` and parses the token as an integer. It aborts on overflow, malformed integers, and if the resultant integer is not in the range [lo, hi], inclusive. Leading zeroes and `-0` are not accepted.
- `long double readFloat(long double lo, long double hi, long double eps = 1e-9)` calls `readToken()` and parses the token as a float. It aborts on overflow, malformed floats, and if the resultant float is not in the range [lo, hi], inclusive, using the provided epsilon to perform the comparison. Scientific notation and NaNs are not accepted, nor are leading zeroes. `-0` is allowed. Trailing zeroes in the decimal portion are also permitted.
- `std::vector<T> readIntArray(size_t N, long long lo, long long hi)` parses the next space-separated N integers into an array, and then reads a final newline. It must be given a template argument, which is the type of the array elements. For example, `readIntArray<int>(5, 1, 10)` reads five space-separated integers into a `std::vector<int>`, where each integer is in the range [1, 10], inclusive.

A small caveat: `readToken` and `readLine` will throw if the string exceeds 10 million characters.

`readFloat()` and `readLine()` will likely be of no use for many validators, and can be safely deleted. Similarly, `readIntArray` can be deleted if unneeded.

## Checkers/Interactors

The next pair of templates are for checkers/interactors. The difference is the type of whitespace handling: the identical checker/interactor expects whitespace to match exactly. The standard checker/interactor handles whitespace like the `standard` checker.

The checkers and interactors are designed for the `coci` bridged checker/interactor type. However, updating the codes used and the order of command line parameters to work with other types should not be challenging.

Both files can be used for either checkers/interactors, with the following caveat: interactors MUST close `stdout` BEFORE calling `readEOF()`, so that the user process can terminate in case it _also_ expects an EOF. Checker stdout is used for feedback displayed to the user, and as such `stdout` should not be closed in this case. Validators also do not need to worry about this - only interactors do, and they should only call `readEOF()` once they have finished communicating with the user, to clean up and assert that the user didn't send any trailing data.

### Identical Checker/Interactor

This template expects whitespace to match exactly, just as in the validator. The template is simpler, but it is less forgiving to contestants.

The same functions are provided, but have slightly different behaviour:

- `readSpace(), readNewLine(), readEOF()`: These return Presentation Error if the check fails.
- `readToken()`: This exits with a Presentation Error if the token is empty, and WA if any character is not in range.
- `readLine()`: This exits with a Presentation Error if an EOF is encountered, and WA if any character is not in range.
- `readInt(), readIntArray(), readFloat()`: These exit with WA on failure.

Four new functions are provided:

- `exitWA(), exitPE()`: These functions exit immediately with the specified code.
- `assertWA(bool), assertPE(bool)`: These functions exit if the provided condition is false. Useful as a replacement for `assert()`.

Finally, there is an empty function `errorHook()`, which is called whenever any of the functions in the API would exit with an error. This can be used to implement functionality such as partial points, or outputting `-1` to signal errors in interactors.

### Standard Checker/Interactor

This template is much more complex, but is more lenient for submissions. It matches the whitespace of the `standard` builtin checker.

The behaviour of the non-whitespace functions are the same as for the identical checker template, with the following caveats:

- `exitPE()` and `assertPE()` don't exist, since the builtin `standard` checker never uses the Presentation Error code. Checker writers are discouraged from using it.
- `readToken()` always exits with WA on failure.
- `readLine()` doesn't exist, since the way it should process whitespace is not clear for the standard checker. Checker writers reaching for this method should consider the identical checker template instead, or rethink their output format entirely.

The remaining functions have the same behaviour as with the identical checker.

#### Whitespace functions

These functions all exit with `WA` on failure instead of `PE`, for the reasons described above.

The code maintains a flag of the type of whitespace it expects, one of `NONE, SPACE, NEWLINE, ALL`, initially ALL. `readSpace()` sets the flag to `SPACE` and `readNewLine()` sets it to `NEWLINE`.

`readToken()` sets the flag to `NONE` and consumes all the whitespace, and exits with WA if either:

- The current flag is `SPACE` and a newline was found.
- The current flag is `NEWLINE` and no newline was found.

It never exits with WA if the flag is `ALL`. It causes an IE if the flag is `NONE`, which only happens when `readToken()` is called twice in a row without an intervening whitespace function. Note that `readInt()` and `readFloat()` call `readToken()` internally.

`readEOF()` sets the flag to `ALL`, consumes all whitespace, and then exits with WA if any character remains in the stream.

No two whitespace functions can be called back to back, except for `readNewLine()` followed by `readEOF()`. The reason for the exception is that the canonical form for output should have a trailing newline, and so this exception allows checkers writers to think in terms of the canonical form of the output. Also, it allows calling `readIntArray()` right before `readEOF()`, since `readIntArray()` internally calls `readNewLine()`.

Note that this scheme is lazy. This is intentional; it allows the same code to be used by interactors without difficulty.
124 changes: 124 additions & 0 deletions sample_files/problem_setting/examples/identical_checker.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
#include <algorithm>
#include <cstdio>
#include <cstdlib>
#include <fstream>
#include <numeric>
#include <regex.h>
#include <stdexcept>
#include <string>
#include <vector>

namespace regex_helpers {
regex_t compile(const char *pattern) {
regex_t re;
if (regcomp(&re, pattern, REG_EXTENDED | REG_NOSUB) != 0) {
throw std::runtime_error("Pattern failed to compile.");
}
return re;
}
bool match(regex_t re, const std::string &text) {
return regexec(&re, text.c_str(), 0, NULL, 0) == 0;
}
} // namespace regex_helpers

void errorHook();
void exitWA() {
errorHook();
std::exit(1);
}
void exitPE() {
errorHook();
std::exit(2);
}
void assertWA(bool condition) {
if (!condition) {
exitWA();
}
}
void assertPE(bool condition) {
if (!condition) {
exitPE();
}
}
void readSpace() { assertPE(getchar() == ' '); }
void readNewLine() { assertPE(getchar() == '\n'); }
void readEOF() { assertPE(getchar() == EOF); }
std::string readToken(char min_char = 0, char max_char = 127) {
static constexpr size_t MAX_TOKEN_SIZE = 1e7;
std::string token;
int c = getchar();
assertPE(!isspace(c));
while (!isspace(c) && c != EOF) {
assertWA(token.size() < MAX_TOKEN_SIZE);
assertWA(min_char <= c && c <= max_char);
token.push_back(char(c));
c = getchar();
}
ungetc(c, stdin);
return token;
}
long long readInt(long long lo, long long hi) {
static regex_t re = regex_helpers::compile("^(0|-?[1-9][0-9]*)$");
std::string token = readToken();
assertWA(regex_helpers::match(re, token));

long long parsedInt;
try {
parsedInt = stoll(token);
} catch (const std::invalid_argument &) {
exitWA();
} catch (const std::out_of_range &) {
exitWA();
}
assertWA(lo <= parsedInt && parsedInt <= hi);
return parsedInt;
}
template <typename T>
std::vector<T> readIntArray(size_t N, long long lo, long long hi) {
std::vector<T> arr;
arr.reserve(N);
for (size_t i = 0; i < N; i++) {
arr.push_back(readInt(lo, hi));
if (i != N - 1) {
readSpace();
}
}
readNewLine();
return arr;
}
void errorHook() {}

// readFloat() and readLine() removed for brevity.

int main(int argc, char **argv) {
std::ifstream judge_input(argv[1]);
freopen(argv[2], "r", stdin);
std::ifstream judge_answer(argv[3]);

int N, K;
judge_input >> N >> K;

// If any integer is greater than K, we give an immediate WA.
std::vector<int> arr = readIntArray<int>(N, 0, K);
// We can read EOF now, since we are done with the input. This makes it easier
// to remember.
readEOF();

// Note that we must store the sum in a long long, since it may overflow a
// 32-bit integer.
long long sum = std::accumulate(arr.begin(), arr.end(), 0LL);

assertWA(sum == K);

// It turns out that the minimum product is always zero, since [0, 0, ..., K]
// is always valid.
// Thus, it suffices to check for a zero in the array.
if (std::find(arr.begin(), arr.end(), 0) == arr.end()) {
// No zero found. Give partial points:
// Output to stderr for the coci contrib module to grant partial AC.
fprintf(stderr, "partial 50/100\n");
// Output to stdout to give user feedback.
printf("50/100 points");
return 7; // 7 is the code for partial AC under the coci checker type.
}
}
Loading

0 comments on commit 8fcb74a

Please sign in to comment.