Name		Name	Last commit message	Last commit date
parent directory ..
dev-0		dev-0
test-A		test-A
README.md		README.md
config.txt		config.txt

README.md

Contract Discovery: Few-Shot Semantic Retrieval Challenge

The aim is to provide substrings of the requested document representing clauses analogous (semantically and formally equivalent) to provided examples from other documents.

Clauses can consist of a single sentence, multiple sentences, or sentence parts. The exact kind of clause is not important during the evaluation since no full-featured training is allowed, and one has to use a set of few sample clauses during the execution.

The input file consists of up to 6 tab-separated fields, eg.:

ID of the document to search in	Entity considered	Example #1	...	Example #N
NDA_057	governing-law	NDA_059 15215-15453	NDA_033 7890-8032	NDA_009 12797-13364

Each example consists of document ID (NDA_059, NDA_033, NDA_009) and characters range (15215-15453 and so on). Ranges can be discontinuous. In such a case, their parts are distinguished with a colon, e.g., 4103-4882,12127-12971.

The same annotation may occur in multiple lines because evaluation is to be performed using a repeated random sub-sampling validation procedure. Sub-samples drawn from a particular set of annotations were split into k-1 seed documents and one target document. The selected k interval results in 1-shot to 5-shot learning. Note that the 1–5 range denotes the number of annotated documents available. It is possible that the same clause type appeared twice in one document, resulting in a higher number of clause instances.

The expected file contains one answer per line, consisting of entity name (to be copied from input) and characters range in the same format as described above. The reference file contains two tab-separated fields: document id and its content.

Directory structure

README.md — this file
config.txt — configuration file (compatible with GEval commandline tool)
dev-0/ — directory with dev data
dev-0/in.tsv — input data for the dev set
dev-0/expected.tsv — expected (reference) data for the dev set
dev-0/reference.tsv.xz — file with documents considered in dev set
test-A — directory with test data
test-A/in.tsv — input data for the test set
test-A/expected.tsv — expected (reference) data for the test set
test-A/reference.tsv.xz — file with documents considered in test set

Please refer to the paper for details regarding the annotation process and evaluation procedure.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

semantic-retrieval-challenge

semantic-retrieval-challenge

README.md

Contract Discovery: Few-Shot Semantic Retrieval Challenge

Directory structure

Files

semantic-retrieval-challenge

Directory actions

More options

Directory actions

More options

Latest commit

History

semantic-retrieval-challenge

Folders and files

parent directory

README.md

Contract Discovery: Few-Shot Semantic Retrieval Challenge

Directory structure