Geographically Local Representation Learning with a Spatial Prior for Visual Localization, Z. Xia, O. Booij, M. Manfredi, J.F.P. Kooij, ECCV 2020 Workshops. Lecture Notes in Computer Science, vol 12536. Springer, Cham. doi: 10.1007/978-3-030-66096-3_38
Cross-View Matching for Vehicle Localization by Learning Geographically Local Representations, Z. Xia, O. Booij, M. Manfredi, J.F.P. Kooij, IEEE Robotics and Automation Letters (RA-L), 2021, vol. 6(3), 5921-5928. doi: 10.1109/LRA.2021.3088076
We revisit end-to-end representation learning for cross-view self-localization, the task of retrieving for a query camera image the closest satellite image in a database by matching them in a shared image representation space. Previous work tackles this task as a global localization problem, i.e. assuming no prior knowledge on the location, thus the learned image representation must distinguish far apart areas of the map. However, in many practical applications such as self-driving vehicles, it is already possible to discard distant locations through well-known localization techniques using temporal filters and GNSS/GPS sensors. We argue that learned features should therefore be optimized to be discriminative within the geographic local neighborhood, instead of globally. We propose a simple but effective adaptation to the common triplet loss used in previous work to consider a prior localization estimate already in the training phase. We evaluate our approach on the existing CVACT dataset, and on a novel localization benchmark based on the Oxford RobotCar dataset which tests generalization across multiple traversals and days in the same area. For the Oxford benchmarks we collected corresponding satellite images. With a localization prior, our approach improves recall@1 by 9 percent points on CVACT, and reduces the median localization error by 2.45 meters on the Oxford benchmark, compared to a state-of-the-art baseline approach. Qualitative results underscore that with our approach the network indeed captures different aspects of the local surroundings compared to the global baseline.
Given an input image, our model extracts feature at geographical local discriminative objects, e.g. streetlights and vegetations, while the baseline mostly focuses on the road structure, which is globally distinct, but locally ambiguous.
Consequently, our model is more discriminative in local areas. In the above localization heat maps, each dot represents a satellite image, and the ground truth location is indicated by the cross. Darker colors indicate smaller embedding distance between the satellite images at those locations and the ground query taken at the center location. Inside a local neighborhood with a 100m radius, our approach results in a single peak (left), while the baseline distribution is more spread (right).
The CVACT dataset can be accessed from: https://github.com/Liumouliu/OriCNN. Please download both ACT_small and ACT_test and put them together to form the dataset we use.
We collected satellite images for Oxford RobotCar dataset. The link to our collected satellite images: https://surfdrive.surf.nl/files/index.php/s/kF4NlGOeQT1sIpV. We release the images in accordance with the ``fair use'' policy in Google Maps/Google Earth Terms of Service, version: Jan 12, 2022. The released data is for the reproducibility of scientific research and cannot be used for commercial purposes. Google remains the copyright of images.
The overview of the traversals we use:
Models used in the ECCVw and RAL paper can be find via the link: https://drive.google.com/drive/folders/1c65BThElwZAN0Pqzdqfn8jJ11LcD8Hse?usp=sharing.
For training and validation the proposed method, run: CVACT_training_and_validation_our_model.ipynb
.
For training and validation the baseline, run: CVACT_training_and_validation_baseline.ipynb
.
Please first run the pre-processing provided by the baseline SAFA to get the polar transformed satellite image.
We also provides code for downloading satellite images from Google Map Static API, run: Download_satellite_images.ipynb
.
Note that, you need to insert your Google Map Static API key into the notebook.
In our notebook, we make use of the Development Kit provided by Oxford RobotCar dataset. Please download the code and put it to ./Visual-Localization-with-Spatial-Prior/GeolocalRepresentationLearning/scripts/
.
if you use this code in your research, please cite our paper:
@InProceedings{10.1007/978-3-030-66096-3_38,
author="Xia, Zimin and Booij, Olaf and Manfredi, Marco and Kooij, Julian F. P.",
editor="Bartoli, Adrien and Fusiello, Andrea",
title="Geographically Local Representation Learning with a Spatial Prior for Visual Localization",
booktitle="Computer Vision -- ECCV 2020 Workshops",
year="2020",
publisher="Springer International Publishing",
address="Cham",
pages="557--573" }
@ARTICLE{9449965,
author={Xia, Zimin and Booij, Olaf and Manfredi, Marco and Kooij, Julian F. P.},
journal={IEEE Robotics and Automation Letters},
title={Cross-View Matching for Vehicle Localization by Learning Geographically Local Representations},
year={2021},
volume={6},
number={3},
pages={5921-5928},
doi={10.1109/LRA.2021.3088076}}