Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite tile expiry (management of expired tiles) #747

Merged
merged 11 commits into from
Jul 3, 2017

Conversation

Nakaner
Copy link
Contributor

@Nakaner Nakaner commented Apr 20, 2017

This pull request contains the changes announced in #709.

This pull request is the first part of a rewrite of the tile expiry. It is a port of the tile expiry of my own PostgreSQL import tool to osm2pgsql which I wrote as part of my master thesis. I needed an importer with a tile expiry which handles relations in much different way than osm2pgsql does and the tile expiry had to be more exact.

Each tile is represented by a 64-bit integer (bits of Y and X index interleaved). The set only contains the tiles at the highest zoom level where the list of expired tiles should be generated. All lower requested zoom levels are calculated during writing the tile IDs to the output file by shifting the bits of the tile IDs.

I did a performance test with an import of the Europe extract by Geofabrik to ensure that it has no negative impact on the performance. Time and memory consumption was measured using /usr/bin/time -v. The input data was located on a hard disk, flat nodes file and the database was located on SSDs.

  1. Check out the master branch.
  2. Import using osm2pgsql --merc --slim --flat-nodes /ssd/michael-flatnodes/michael-flat-nodes.cache -C 40000 --style ../default.style --unlogged --number-processes 2 --database gis europe-170217.osm.pbf took 24 hours and 35 minutes and needed 26,765,608 kB. During these 24 hours the machine was busy producing a subset of Geofabrik's extracts offered on download.geofabrik.de for internal purposes, i.e. don't take these numbers serious.
  3. Save a copy of the flat nodes file.
  4. Shut down the PostgreSQL database.
  5. Copy the tablespace of the database gis to a backup location.
  6. Start the database.
  7. Apply a diff using osm2pgsql -d gis --slim --merc -S ../default.style --flat-nodes /ssd/michael-flatnodes/michael-flat-nodes.cache --append --number-processes 2 -C 25000 -o master-433.log --expire-bbox-size 20000 -e 10-16 433.osc.gz. (56 minutes 28 seconds, 9,143,644 kB RAM)
  8. Check out branch expiry-as-set and build osm2pgsql. The original import could be reused because the differences between both branches were small.
  9. Restore the backup of the flat nodes file.
  10. Shut down the PostgreSQL database.
  11. Restore the tablespace of the database gis from the backup.
  12. Start the database.
  13. Apply a diff using osm2pgsql -d gis --slim --merc -S ../default.style --flat-nodes /ssd/michael-flatnodes/michael-flat-nodes.cache --append --number-processes 2 -C 25000 -o master-433.log --expire-bbox-size 20000 -e 10-16 433.osc.gz. (57 minutes 34 seconds, 9,389,628 kB RAM)

The cronjob which produces the OSM planet extracts day by day on this machine did not run during step 8 and 14. The result is within the measurement accuracy. The code line responsible for writing the expiry file was removed in both test cases to ensure that writing larger output files (see #709) does not influence the measurements.

If you request tile expiry for smaller zoom levels (e.g. 8 to 12), both branches consume about 500 MB RAM.

This is the first part of a large rewrite of the tile expiry. If you are interested in the ongoing work, have a look at the branch tile-selection-rewrite at my fork of this repository. That branch will rewrite the selection of the tiles which will be marked as expired.

@lonvia
Copy link
Collaborator

lonvia commented Apr 21, 2017

Do you mind rebasing on the latest master version? That will fix the appveyor build.

Nakaner and others added 10 commits April 21, 2017 15:02
Expired tiles are now managed in a set containig the ID of the tiles as
64-bit integer numbers instead of a self-written tree structure. The
tile expiry is calculated on the maximum zoom level and all lower zoom
levels are calculated using simple bit shifts during the output of the
tile expiry list.
* use unsigned integers
* add additional comments
* prevent overflows on 32-bit machines
* don't try to insert a tile if the last insertion into the set was the
same tile
* switch loops in output method
@Nakaner
Copy link
Contributor Author

Nakaner commented Apr 21, 2017

@lonvia wrote:

Do you mind rebasing on the latest master version? That will fix the appveyor build.

I rebased my work yesterday on master and found out today that some tests failed. That failure was not (only) caused by a misconfiguration of AppVeyor. I fixed that today and rebased onto master again.

The constructor of expire_tiles_t expects uint32_t and the conversion to
unsigned has to happen anywhere. In addition, some more checks on the
arguments supplied by the user and a unit test for option `-e` does not harm.

This commit adapts output_pgsql_t and output_multi_t because zoom level 0
for the tile expiry now means that no expiry output is requested.
@pnorman
Copy link
Collaborator

pnorman commented Apr 21, 2017

cc @zerebubuth @woodpeck as people who I believe are using osm2pgsql tile expiry

@lonvia
Copy link
Collaborator

lonvia commented May 6, 2017

Code looks good and simplifies the tile expiry a lot. So overall I'm in favour of merging this.

There is still the open question if the changed output format (outputting all changed tiles vs. only outputting the lowest zoom level). #709 did not yield any comments strongly in favour or against changing the output format. Given that, I'm leaning slightly towards taking this PR as is with the output of all tiles. It simplifies the code in osm2pgsql and potentially also on the consumer side. Any other opinions? @pnorman?

@andrew-aladev
Copy link

Hello. Do you plan to add tile expiry to output-gazetter?

@Nakaner
Copy link
Contributor Author

Nakaner commented Jun 9, 2017

@andrew-aladev wrote:

Hello. Do you plan to add tile expiry to output-gazetter?

No. Nominatim is not tile-based, is it?

@andrew-aladev
Copy link

Yes, But how to determine whether anything in database has been updated? Today nominatim uses update --index after any update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants