Skip to content

Writing a Custom Prediction Reader

Natalie Prange edited this page Sep 19, 2024 · 1 revision

As an alternative to converting your predictions into one of the formats supported by ELEVANT, you can write your own prediction reader, such that you can use your prediction files with the link_benchmark.py script directly. This requires three steps. Note: Make sure you perform the following steps outside of the docker container, otherwise your changes will be lost when exiting the container.

  1. Implement a prediction reader in src/elevant/prediction_readers/ that inherits from src.elevant.prediction_readers.abstract_prediction_reader.AbstractPredictionReader. You must either implement the predictions_iterator() method or the get_predictions_with_text_from_file() method.

    Implement predictions_iterator() if you are sure that the order in which the predictions are read corresponds to the article order in the benchmark. Set predictions_iterator_implemented = True when calling super().__init__(). See here for an example.

    Implement get_predictions_with_text_from_file() if you are not sure that the order in which the predictions are read corresponds to the article order in the benchmark and the prediction file contains the original article texts. Set predictions_iterator_implemented = False when calling super().__init__(). See here for an example.

  2. Add your custom prediction reader name to the src.elevant.linkers.linkers.PredictionFormats enum, e.g. MY_FORMAT = "my_format".

  3. In src.elevant.linkers.linking_system.LinkingSystem._initialize_linker add an elif case in which you load necessary mappings (if any) and initialize the LinkingSystem's prediction_reader. This could look something like this:

     elif linker_type == Linkers.MY_FORMAT.value:
         self.load_missing_mappings({MappingName.WIKIPEDIA_WIKIDATA, MappingName.REDIRECTS})
         self.prediction_reader = MyCustomPredictionReader(prediction_file, self.entity_db)
    

    where prediction_file is the path to the prediction file. The load_missing_mappings() line is necessary if you predict Wikipedia entities and therefore have to convert them to Wikidata entities. The mappings are loaded into self.entity_db. You can then get a Wikidata QID from a Wikipedia title by calling

     entity_id = KnowledgeBaseMapper.get_wikidata_qid(entity_reference, self.entity_db)
    

You can then convert your linking results into ELEVANT's internally used format by running

python3 link_benchmark.py <experiment_name> -pfile <path_to_linking_results> -pformat my_format -pname <linker_name> -b <benchmark_name>
Clone this wiki locally