The aim is to provide substrings of the requested document representing clauses analogous (semantically and formally equivalent) to provided examples from other documents.
Clauses can consist of a single sentence, multiple sentences, or sentence parts. The exact kind of clause is not important during the evaluation since no full-featured training is allowed, and one has to use a set of few sample clauses during the execution.
The input file consists of up to 6 tab-separated fields, eg.:
ID of the document to search in | Entity considered | Example #1 | ... | Example #N |
---|---|---|---|---|
NDA_057 | governing-law | NDA_059 15215-15453 | NDA_033 7890-8032 | NDA_009 12797-13364 |
Each example consists of document ID (NDA_059, NDA_033, NDA_009) and characters range (15215-15453 and so on). Ranges can be discontinuous. In such a case, their parts are distinguished with a colon, e.g., 4103-4882,12127-12971.
The same annotation may occur in multiple lines because evaluation is to be performed using a repeated random sub-sampling validation procedure. Sub-samples drawn from a particular set of annotations were split into k-1 seed documents and one target document. The selected k interval results in 1-shot to 5-shot learning. Note that the 1–5 range denotes the number of annotated documents available. It is possible that the same clause type appeared twice in one document, resulting in a higher number of clause instances.
The expected file contains one answer per line, consisting of entity name (to be copied from input) and characters range in the same format as described above. The reference file contains two tab-separated fields: document id and its content.
README.md
— this fileconfig.txt
— configuration file (compatible with GEval commandline tool)dev-0/
— directory with dev datadev-0/in.tsv
— input data for the dev setdev-0/expected.tsv
— expected (reference) data for the dev setdev-0/reference.tsv.xz
— file with documents considered in dev settest-A
— directory with test datatest-A/in.tsv
— input data for the test settest-A/expected.tsv
— expected (reference) data for the test settest-A/reference.tsv.xz
— file with documents considered in test set
Please refer to the paper for details regarding the annotation process and evaluation procedure.