This project shows how to use Prodigy to annotate data for the spancat component
The project.yml
defines the data assets required by the
project, as well as the available commands and workflows. For details, see the
Weasel documentation.
The following commands are defined by the project. They
can be executed using weasel run [name]
.
Commands are only re-run if their inputs have changed.
Command | Description |
---|---|
download |
Download the required spaCy model. |
span_manual |
Mark entity spans in a text by highlighting them and selecting the respective labels. |
span_manual_pattern |
Mark entity spans in a text with patterns. |
train_spancat |
Train a spancat model. |
span_correct |
Correct entity spans predicted by the trained spancat model. |
db_drop |
Drop the prodigy database defined in the project.yml |
db_export |
Export the database defined in the project.yml to .spacy files |
The following assets are defined by the project. They can
be fetched by running weasel assets
in the project directory.
File | Source | Description |
---|---|---|
assets/food_recipes.jsonl |
Local | Extract of the Food.com Recipe & Review dataset with 25.000 entries. |
assets/instructions.html |
Local | Example .HTML file for annotation instructions. |
assets/patterns.jsonl |
Local | Example patterns for pre-selecting spans in text. |
prodigy.json |
Local | Example prodigy.json file for using instruction files. |