Skip to content

FORMAS/TEFE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TEFE

TEFE - TimeBankPT Event Frame Extraction Open In Colab

Docker

DESCRIPTION

TEFE is a closed domain event extractor system for sentences in the Portuguese language. It extracts events from sentences, which means that it does event detection (i.e., event trigger identification and classification), and argument role prediction (i.e., argument identification and role classification). The event types are based on the typology of the FrameNet project (BAKER; FILLMORE; LOWE, 1998). The models were trained on an enriched TimeBankPT (COSTA; BRANCO,2012) corpus.

The system outputs the event extractions in the following Json format:

[
 {
   "trigger":   { 
       "text":   "disse",
       "start":  58,
       "end":    63,
   },  
   "arguments":  [
       {
           "role":  "Statement#Speaker",
           "text":  "presidente",
           "start": 66,
           "end":   76
        },
        ...
   ],
   "event_type": "Statement"		
 },
 ...
]
  

Currently, in this repository, 5 diferent trained models are avaiable to execution: 0, 100, 0-0, 100-0, 100-100, which respectively correspond to: 514 event types (ET) and 1936 argument roles (AR), 7 ET and 93 AR, 214 ET and 477 AR, 5 ET and 42 AR, and 5 ET and 12 AR.

Local Execution

Prerequisites

  1. Download and place the BERTimbau Base (SOUZA; NOGUEIRA;LOTUFO, 2020) model and vocabulary file:

    $ wget https://neuralmind-ai.s3.us-east-2.amazonaws.com/nlp/bert-base-portuguese-cased/bert-base-portuguese-cased_tensorflow_checkpoint.zip
    $ wget https://neuralmind-ai.s3.us-east-2.amazonaws.com/nlp/bert-base-portuguese-cased/vocab.txt

    Then unzip and place it in the the models directory as follows:

    ├──models
    |      └── BERTimbau
    |               └── bert_config.json
    |               └── bert_model.ckpt.data-00000-of-00001
    |               └── bert_model.ckpt.index
    |               └── bert_model.ckpt.meta
    |               └── vocab.txt
    |
    |...
    
  2. Install the packages.

    $ pip install -r requirements.txt

OPTIONS

-h, --help                           Print this help text and exit
--sentence  SENTENCE                 Sentence string to extract events from
--dir   INPUT-DIR OUTPUT-DIR         Extract events from files of input directory
	                                 (one sentence per line) and write output json
									 files on output directory.
--model  ID                          Identifier of models available: 0, 100, 0-0, 100-0 or 
                                     100-100. The default model is 100

EVENT EXTRACTION FROM A DIRECTORY OF FILES

The text files in the input directory are expected to have the format:

* all text files end with the extension .txt
* sentences are separated by newlines
$ python3 src/tefe.py --dir /tmp/input-dir /tmp/output-dir

EVENT EXTRACTION FROM A SENTENCE

$ python3 src/tefe.py --sentence 'A Petrobras aumentou o preço da gasolina para 2,30 reais, disse o presidente.'

How to cite this work

Peer-reviewed accepted paper:

  • Sacramento, A., Souza, M.: Joint Event Extraction with Contextualized Word Embeddings for the Portuguese Language. In: 10th Brazilian Conference on Intelligent System, BRACIS, São Paulo, Brazil, from November 29 to December 3, 2021.

About

TEFE - TimeBankPT Event Frame Extraction

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published