-
-
Notifications
You must be signed in to change notification settings - Fork 474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite tile expiry (management of expired tiles) #747
Conversation
Do you mind rebasing on the latest master version? That will fix the appveyor build. |
Expired tiles are now managed in a set containig the ID of the tiles as 64-bit integer numbers instead of a self-written tree structure. The tile expiry is calculated on the maximum zoom level and all lower zoom levels are calculated using simple bit shifts during the output of the tile expiry list.
* use unsigned integers * add additional comments * prevent overflows on 32-bit machines
* don't try to insert a tile if the last insertion into the set was the same tile * switch loops in output method
@lonvia wrote:
I rebased my work yesterday on master and found out today that some tests failed. That failure was not (only) caused by a misconfiguration of AppVeyor. I fixed that today and rebased onto master again. |
The constructor of expire_tiles_t expects uint32_t and the conversion to unsigned has to happen anywhere. In addition, some more checks on the arguments supplied by the user and a unit test for option `-e` does not harm. This commit adapts output_pgsql_t and output_multi_t because zoom level 0 for the tile expiry now means that no expiry output is requested.
cc @zerebubuth @woodpeck as people who I believe are using osm2pgsql tile expiry |
Code looks good and simplifies the tile expiry a lot. So overall I'm in favour of merging this. There is still the open question if the changed output format (outputting all changed tiles vs. only outputting the lowest zoom level). #709 did not yield any comments strongly in favour or against changing the output format. Given that, I'm leaning slightly towards taking this PR as is with the output of all tiles. It simplifies the code in osm2pgsql and potentially also on the consumer side. Any other opinions? @pnorman? |
Hello. Do you plan to add tile expiry to |
@andrew-aladev wrote:
No. Nominatim is not tile-based, is it? |
Yes, But how to determine whether anything in database has been updated? Today nominatim uses |
This pull request contains the changes announced in #709.
This pull request is the first part of a rewrite of the tile expiry. It is a port of the tile expiry of my own PostgreSQL import tool to osm2pgsql which I wrote as part of my master thesis. I needed an importer with a tile expiry which handles relations in much different way than osm2pgsql does and the tile expiry had to be more exact.
Each tile is represented by a 64-bit integer (bits of Y and X index interleaved). The set only contains the tiles at the highest zoom level where the list of expired tiles should be generated. All lower requested zoom levels are calculated during writing the tile IDs to the output file by shifting the bits of the tile IDs.
I did a performance test with an import of the Europe extract by Geofabrik to ensure that it has no negative impact on the performance. Time and memory consumption was measured using
/usr/bin/time -v
. The input data was located on a hard disk, flat nodes file and the database was located on SSDs.master
branch.osm2pgsql --merc --slim --flat-nodes /ssd/michael-flatnodes/michael-flat-nodes.cache -C 40000 --style ../default.style --unlogged --number-processes 2 --database gis europe-170217.osm.pbf
took 24 hours and 35 minutes and needed 26,765,608 kB. During these 24 hours the machine was busy producing a subset of Geofabrik's extracts offered on download.geofabrik.de for internal purposes, i.e. don't take these numbers serious.gis
to a backup location.osm2pgsql -d gis --slim --merc -S ../default.style --flat-nodes /ssd/michael-flatnodes/michael-flat-nodes.cache --append --number-processes 2 -C 25000 -o master-433.log --expire-bbox-size 20000 -e 10-16 433.osc.gz
. (56 minutes 28 seconds, 9,143,644 kB RAM)expiry-as-set
and build osm2pgsql. The original import could be reused because the differences between both branches were small.gis
from the backup.osm2pgsql -d gis --slim --merc -S ../default.style --flat-nodes /ssd/michael-flatnodes/michael-flat-nodes.cache --append --number-processes 2 -C 25000 -o master-433.log --expire-bbox-size 20000 -e 10-16 433.osc.gz
. (57 minutes 34 seconds, 9,389,628 kB RAM)The cronjob which produces the OSM planet extracts day by day on this machine did not run during step 8 and 14. The result is within the measurement accuracy. The code line responsible for writing the expiry file was removed in both test cases to ensure that writing larger output files (see #709) does not influence the measurements.
If you request tile expiry for smaller zoom levels (e.g. 8 to 12), both branches consume about 500 MB RAM.
This is the first part of a large rewrite of the tile expiry. If you are interested in the ongoing work, have a look at the branch
tile-selection-rewrite
at my fork of this repository. That branch will rewrite the selection of the tiles which will be marked as expired.