This repository provides underlying code and materials for the paper A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching
.
- Installation
- Data directory and structure
- Citation
- Future work and contributing
- Get in touch
- Acknowledgements
- License
Please follow the instructions on the installation section of DeezyMatch to set up a Python environment and install all the required packages to run DeezyMatch.
Once working Python and DeezyMatch environments are available, the following additional libraries need to be installed:
pip install spacy
pip install geopy
pip install pandarallel
pip install python-Levenshtein
pip install pyxDamerauLevenshtein
pip install haversine
pip install mysql-connector-python
In our code, we assume the following directory structure:
LwM_SIGSPATIAL2020_ToponymMatching/
├── datasets
│ ├── candidate_mentions_sets
│ ├── candidate_ranking_datasets
│ ├── gazetteers
│ ├── query_mentions_sets
│ └── toponym_matching_datasets
├── experiments
│ ├── inputs
│ │ ├── characters_v001.vocab
│ │ └── dataset-string-similarity_test.txt
│ ├── levdam_results
│ ├── mapped_results
│ ├── models
│ └── ranker_results
└── processing
├── candidate_ranking_datasets
├── candselection
├── gazetteers
├── toponym_matching_datasets
└── resources
Description of main directories:
processing/
: contains code for preparing or generating the different datasets.datasets/
: contains datasets used in the experiments, resulting from running theprocessing/
codes.experiments/
: contains the experiment codes and generated files.
The experiments/
folder contains two notebooks with the experiments reported in the paper:
- Toponym_Matching_Experiments.ipynb has the experiments summarized in table 1.
- Candidate_Ranking_Experiments.ipynb has the experiments summarized in table 2.
(The results presented in this paper were generated by DeezyMatch v1.2.0 (Released: Sep 15, 2020).)
processing
steps (described here) and that you have all the data needed before you run the experiments.
If you use or adapt this code in your paper, please cite:
Mariona Coll Ardanuy, Kasra Hosseini, Katherine McDonough, Amrey Krause, Daniel van Strien, and Federico Nanni. "A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching." In Proceedings of the 28th International Conference on Advances in Geographic Information Systems (SIGSPATIAL): Poster papers, pp. 385-388. 2020.
@inproceedings{collardanuy2020sigspatial,
title={A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching},
author={Coll Ardanuy, Mariona and Hosseini, Kasra and McDonough, Katherine and Krause, Amrey and van Strien, Daniel and Nanni, Federico},
booktitle={Proceedings of the 28th International Conference on Advances in Geographic Information Systems (SIGSPATIAL): Poster papers},
pages={385--388},
year={2020}
}
A longer version of the article is available on arXiv:
Mariona Coll Ardanuy, Kasra Hosseini, Katherine McDonough, Amrey Krause, Daniel van Strien, and Federico Nanni. "A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching." arxiv:2009.08114. 2020.
@article{collardanuy2020geocandidateArxiv,
title={A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching},
author={Coll Ardanuy, Mariona and Hosseini, Kasra and McDonough, Katherine and Krause, Amrey and van Strien, Daniel and Nanni, Federico},
journal={arXiv e-prints},
pages={arxiv:2009.08114},
year={2020}
}
The authors of the paper plan to further develop the codes and extend the experiments. We welcome pull requests for improvements and issues if you encounter any errors.
Contacts of the corresponding authors:
- Mariona Coll Ardanuy, mcollardanuy[at]turing.ac.uk
- Kasra Hosseini, khosseini[at]turing.ac.uk
- Federico Nanni, fnanni[at]turing.ac.uk
Work for this paper was produced as part of Living with Machines. This project, funded by the UK Research and Innovation (UKRI) Strategic Priority Fund, is a multidisciplinary collaboration delivered by the Arts and Humanities Research Council (AHRC), with The Alan Turing Institute, the British Library and the Universities of Cambridge, East Anglia, Exeter, and Queen Mary University of London. This work was also supported by The Alan Turing Institute under the EPSRC grant EP/N510129/1. Newspaper data was kindly shared by Findmypast.
- The source codes are licensed under MIT License.
- Copyright (c) 2020 The Alan Turing Institute, British Library Board, Queen Mary University of London, University of Exeter, University of East Anglia and University of Cambridge.
- The datasets hosted on zenodo are licensed under CC-BY-4.0.