Every transformation should have keywords that identify the type of transformation that is being written. The following list can be used as reference.
- the type of linguistic change that the transformation is attempting eg. morphological, lexical, syntactic, word-order, discourse, noise etc.
- the type of algorithm that is written eg. rule-based, model-based, api-based, external-knowledge-based, transformer-based, parser-based, etc.
- the naturalness of the generation eg. natural-sounding, natural-looking, unnatural, etc.
- the potential accuracy & precision of the generation eg. high-precision, high-coverage, etc.
- the text-specific modality being covered (if applicable) eg. visual, aural, written, etc.
- the skills required to differentiate between the input and the output of the transformation (if applicable) eg. causal-reasoning, visual-reasoning, social-reasoning, humor, figurative-language, grammar, world-knowledge, etc.
- other transformation specific keywords
Some of the categories can be subjective and they are meant to be used as best estimated. It is okay to include multiple of each type so feel free to pick all that applies. Example, many transformer based generators might fit into most of the linguistic categories. If your transformation is best described by keywords not yet included in the list below, please add the additional keywords to this file as part of your pull request.
Keyword | Description |
---|---|
Type of Linguistic Change | |
ββmorphological |
character level or inflectional/morphological changes |
ββlexical |
the transformation substitutes different words or phrases |
ββsyntactic |
the transformation changes the syntax of the input (eg. active to passive) |
ββword-order |
if the order of the words or phrases is changed (eg. topicalisation, changing the order of semantic roles) |
ββdiscourse |
operating on or generating units of language longer than a sentence (eg. paragraph, multiple utterances in a dialog setting, etc) |
ββnoise |
The generation adds random noise which would perturb the examples (eg. removing punctuations, changing case, repeating words/characters) |
Type of algorithm | |
ββrule-based |
if the implementation uses a heuristic or rules based approach |
ββmodel-based |
if the implementation uses a machine learning model |
ββapi-based |
if the implementation uses an external api or tool |
ββexternal-knowledge-based |
if the implementation uses a corpora, a separate knowledge base, etc. |
ββtransformer-based |
if the implementation uses a transformer based model (eg. BERT, T5) |
ββparser-based |
if the implementation uses any syntactic (dependency, constituency, etc) or shallow semantic parser or other semantic parsers (eg. semantic role labelling) |
ββtokenizer-required |
if the implementation uses a tokenizer |
ββchunker-required |
if the implementation uses a chunker |
ββdiscourse-segmenter-required |
if the implementation uses a discourse segmenter |
The naturalness of the generation | |
ββunnatural-sounding |
if the generations might not sound natural eg. if synthesized to speech, would they not sound natural as their source sentences. Eg, adding noun definitions might make the sentence factually correct but people might not use the same language while speaking. |
ββunnaturally-written |
generations in written format which might not look natural to a reader. homophonic translations, filler words |
ββvisual |
The outputs are visually similar to the inputs (eg. Leet letters) |
ββaural |
The outputs sound similar to the inputs (eg. homophones) |
ββwritten |
The transformation makes font changes, or changes which would look different in different written formats |
The potential accuracy & precision of the generation | |
ββhighly-meaning-preserving |
The output preserves the meaning of the source text |
ββpossible-meaning-alteration |
The output might change the meaning of the source text |
ββmeaning-alteration |
When the meaning is completely altered (eg supervised tasks other than sentence operation which change the label) |
ββhigh-precision |
The transformation generates what it intends to generate |
ββlow-precision |
The transformation sometimes does not generate what it intends to generate (eg. often changes meaning) |
ββhigh-coverage |
The transformation can return an output for almost all types of inputs |
ββlow-coverage |
The transformation might not return an output for all types of inputs but only a handful of inputs |
ββhigh-generations |
The transformation generates a lot of (possibly diverse) outputs for a single input (eg beam search or substitutions with large number of alternate choices) |
ββlow-generations |
The transformation generates few outputs for a single input (substitution with lesser number of alternate choices) |
The skills required to differentiate between the input and the output of the transformation (if applicable) | |
ββcausal-reasoning |
ability to reason about cause and effect |
ββsocial-reasoning |
ability to understand and reason about human social interactions |
ββfigurative-language |
ability to understand figures of speech |
ββdomain-knowledge |
ability to understand domain knowledge (e.g.: financial domain ) |
ββworld-knowledge |
ability to understand world knowledge |