Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite tile expiry (management of expired tiles) #709

Open
Nakaner opened this issue Mar 10, 2017 · 24 comments
Open

Rewrite tile expiry (management of expired tiles) #709

Nakaner opened this issue Mar 10, 2017 · 24 comments

Comments

@Nakaner
Copy link
Contributor

Nakaner commented Mar 10, 2017

I am currently rewriting the tile expiry of osm2pgsql because its code is not nice, difficult to understand and it seems to me that it does not what it is expected to do. As part of my master thesis I wrote a database import tool which has a tile expiry. @lonvia asked me a few months ago to port its tile expiry to osm2pgsql.

The work is done in my fork of this repository in the expire-as-set branch.

I have already replaced the self-made tree structure by an unordered set of uint64_t.

Because the new tile expiry is not fully backward-compatible, I'm opening this issue. The format of the written expiry file is the same – one tile per line, each line follows the pattern zoom/x/y. In difference to the old code, my code writes out all tiles on all requested zoom levels if something in the tile[1] has changed. However, the old tile expiry writes out a tile on a zoom level if all its subtiles are expired or it is a tile on the maximum zoom level.

If you convert the expiry log files into shape files and visualize them using QGIS, it will look like this (area fill is semi-transparent):

osm2pgsql's current tile expiry converted to a shape file and visualized using QGIS

My code would generate an expiry list which will look like this (rendered without an area fill):

asset-expiry

EDIT: The lower image does not show all zoom levels because otherwise the image would only contain black lines and it would be almost opaque.

The raw data of the images can be downloaded here.

I can see only one advantage in this behaviour – reducing the size of the file. But this behaviour is not documented anywhere and there is a disadvantage: Every program which reads the tile expiry list, has to derive the list of subtiles on its own. I think that the size of the file does not matter that much because you usually don't transfer it over a network.

[1] i.e. the method expire_tiles::expire_tile(uint32_t x, uint32_t y) has been called. The selection of tiles is also handled by my rewrite but out of scope of this issue.

@Nakaner Nakaner changed the title Rewrite tile expiry Rewrite tile expiry (management of expired tiles) Mar 10, 2017
@SomeoneElseOSM
Copy link

Something that'd be really nice to have would be an osm2pgsql that didn't run out of memory with relatively high values of EXPIRY_MAXZOOM set in openstreetmap-tiles-update-expire script. I particularly noticed this when using zoom levels > 20 but from looking at the expiry code (and then backing away very quickly) I suspect it could be made much more efficient at lower max zooms too. See links from https://wiki.openstreetmap.org/wiki/User:SomeoneElse/mod_tile_28 for more info.

@lonvia
Copy link
Collaborator

lonvia commented Mar 15, 2017

If only one zoom level is requested for expiry, then the behavior is the same. For multiple level, people using scripts for the combined tiles will suddenly do a lot of unnecessary expiries. So it will increase the load but won't break anything.

You mentioned an increase in size by factor 20. For how many zoom levels was that and what are the absolute file sizes in the result. An increase from 100 kB to 2MB probably doesn't matter. However, going from 100MB to 2GB is a problem.

@SomeoneElseOSM The number of tiles grows with exponentially with each zoom level. I don't think there is much we can do about expiry at zoom level 28.

@lonvia
Copy link
Collaborator

lonvia commented Mar 15, 2017

I've never used the expiry feature myself, so it is difficult to judge the impact of such a change. Could people who are using expiry comment here how they use it? What zoom parameters do you use and at what intervals do you update your database?

@pnorman
Copy link
Collaborator

pnorman commented Mar 15, 2017

The OSMF servers don't use osm2pgsql expiry, but instead have expire.rb. I'm not sure which method it is closer to.

render_expired is another way, written by @woodpeck which consumes a tile list. I'm not sure if it recurses up/down or not.

WMF uses tilerator in pyramid mode which parses the osm2pgsql output but requires just a single zoom level (i.e. osm2pgsql is run with --expire-tiles 15)

@zerebubuth, can you add some comment on how mapzen is using the osm2pgsql tile expiry? tilezen/vector-datasource#118 points at https://github.com/mapzen/vagrant-tiles but it's deprecated, so I'm not sure what is currently being done.

@pnorman
Copy link
Collaborator

pnorman commented Mar 15, 2017

@Nakaner if I'm reading your description right, the two methods are the same for outputting just a single zoom, right?

@Nakaner
Copy link
Contributor Author

Nakaner commented Mar 15, 2017

@giggls wrote me that the German tile server uses following:

MINZOOM=13
osm2pgsql ... -e $(($MINZOOM-3))-16 -o $EXPIRELOG
..
expiremeta.pl --map=mapnikde --minzoom=$MINZOOM <$EXPIRELOG

expiremeta.pl is part of tirex.

@Nakaner
Copy link
Contributor Author

Nakaner commented Mar 15, 2017

@pnorman wrote:
if I'm reading your description right, the two methods are the same for outputting just a single zoom, right?

Yes, they are not almost the same.

differences between different tile expiry implementations

The screen shot shows the expired tiles from the same dataset as used above on zoom level 12. Red tiles were written to the output file by both implementations. Blue tiles were only written by my implementation. Both implementations used the same code to detect which tiles should be marked. The database was set back to the state before the application of the diff, i.e. testing conditions were equal.

@pnorman
Copy link
Collaborator

pnorman commented Mar 15, 2017

Yes, they are not almost the same.

That looks like a difference in the selection of tiles (out of scope), not subtiles.

@zerebubuth
Copy link
Contributor

Mapzen has a process which takes the list of expired tiles which osm2pgsql produces at zoom 16, creates the pyramid of their ancestors, and intersects that with the "tiles of interest" - a set of every tile we've got rendered. The result is put into a queue for re-rendering.

@giggls
Copy link

giggls commented Mar 16, 2017

Using expiremeta.pl is not set in stone. With the current code my feeling is that way to many tiles are scheduled for re-rendering.

@nvkelso
Copy link

nvkelso commented May 12, 2017

Nathaniel here from Mapzen. It's always nice to see old code given some love, thanks for your proposal and PR :)

Mapzen currently use a pegged version of slightly older osm2pgsql for Tilezen in our production vector tile service. We don't have issues with the expiry file getting larger (it is just on one machine as a previous commenter pointed out so no network issues) – but I'm curious about the resulting list of changed files.

  1. Can you please add more detail on how the lists are different, please?
  2. Can you confirm we could still get a list of just the changed tiles at a set zoom (say 15 or 16)?
  3. Reading the comments it's not clear to me why this image is so different between the blue and red tiles, can you please elaborate?
    blue-red expired tiles

@Nakaner
Copy link
Contributor Author

Nakaner commented May 19, 2017

@nvkelso wrote:

  1. Can you please add more detail on how the lists are different, please?

The format is the same.

The only important difference is relevant if you produce tile expiry lists for multiple zoom levels. If you produce tile expiry lists for zoom levels 10 to 11, the tile expiry will only output tile 10/612/321 if all its four subtiles (11/1224/642, 11/1224/643, 11/1225/642, 11/1225/643) have changed. If there only changes in 11/1224/642, only this tile will be written to the output file. If you produce to the tile expiry list for zoom levels 10 to 12 and there are only changes in 12/2449/1285 but no other sub-sub-tiles of 10/612/321, only 12/2449/1285 will be written to the output file.

My code will output 10/612/321 and 11/1224/642 if you requested the zoom levels 10 to 11 and the only map change happened in 12/2449/1285. If you requested zoom level 10 to 12 and the only map change happened in 12/2449/1285, it will output 10/612/321, 11/1224/642 and 12/2449/1285.

Use the Tile Calculator of Geofabrik, if you need a map showing you the tile IDs on top of the map.

Have you had a look at the shape files (and the raw tile expiry lists) I linked in my initial posting? They are the source of the images I posted.

  1. Can you confirm we could still get a list of just the changed tiles at a set zoom (say 15 or 16)?

Yes.

  1. Reading the comments it's not clear to me why this image is so different between the blue and red tiles, can you please elaborate?

It seems that the current implementation of a tree drops some tiles. I did not investigate further because it is hidden in the old, ugly implementation. Red tiles were written to the output file by both implementations. Blue tiles were only written by my implementation.

@zerebubuth
Copy link
Contributor

It seems that the current implementation of a tree drops some tiles. I did not investigate further because it is hidden in the old, ugly implementation. Red tiles were written to the output file by both implementations. Blue tiles were only written by my implementation.

I notice that the blue tiles appear to be twice as large as the red tiles, suggesting that they're at one zoom level lower. Is that the case? Has the output changed if only a single zoom level is requested (e.g: --expire-tiles 16-16)?

What confused me is this exchange:

@pnorman wrote:
if I'm reading your description right, the two methods are the same for outputting just a single zoom, right?

Yes, they are not almost the same.

That looks like a difference in the selection of tiles (out of scope), not subtiles.

If the two methods are not (almost) the same for outputting just a single zoom, then I think I'd want to understand the difference in the selection of tiles (more tiles, fewer tiles, by what proportion?) and the reason why before I updated to the new expiry algorithm.

Not that a change is necessarily bad, and I don't think it should hold up this issue being resolved. I think it would be really useful to have a thorough description of what's changed and what differences users can expect in the changelog / release notes.

@Nakaner
Copy link
Contributor Author

Nakaner commented May 19, 2017

@zerebubuth wrote:

I notice that the blue tiles appear to be twice as large as the red tiles, suggesting that they're at one zoom level lower. Is that the case? Has the output changed if only a single zoom level is requested (e.g: --expire-tiles 16-16)?

The old tree implementation silently drops some tiles (the blue ones) at any point in the code. I don't know why it does so. The unordered set does not drop anything.

Dropping the tiles is not documented, neither in the user documentation nor in the source code. I don't know if it is a bug or a feature. I myself expected the tile expiry to behave as my implementation does.

If the two methods are not (almost) the same for outputting just a single zoom, then I think I'd want to understand the difference in the selection of tiles (more tiles, fewer tiles, by what proportion?) and the reason why before I updated to the new expiry algorithm.

Not that a change is necessarily bad, and I don't think it should hold up this issue being resolved. I think it would be really useful to have a thorough description of what's changed and what differences users can expect in the changelog / release notes.

Pull request #747 does not change the selection of the tiles based on the geometry of the changed features in the database. That is still done by the legacy code (expire_tiles::from_line(double, double, double, double), expire_tiles::from_bbox(double, double, double, double)). These methods call expire_tiles::expire_tile(uint32_t, uint32_t) (formerly expire_tiles::expire_tile(int, int)) which adds the tile to the set. The PR changed expire_tiles::expire_tile(uint32_t, uint32_t) and all methods which are called by this method.

Following text could be used for the release notes:

Rewrite of the Tile Expiry

The tile expiry now uses internally an unorderd set instead of a self-implemented tree. It now outputs all tiles on all requested zoom levels which contain a map change.

During the rewrite a bug in the old code was found (and fixed) which dropped about two thirds of the expired tiles. If you used the old tile expiry for a single zoom level, the list of expired tiles will increase by the factor 2 to 3.

If you used the old tile expiry for multiple zoom levels (e.g. z..z+n), you will now get all tiles which contain changes. The old implementation summarized the list in following manner: A tile on zoom level z was only written to the output file if all sub-tiles on zoom levels z+1..z+n contained a map change. The new implementation will output the tile on zoom level z even if only one sub-tile contains a map change. The following image illustrates the changes.

changes of osm2pgsql illustrated

license of the text and the image: CC-0

@nvkelso
Copy link

nvkelso commented May 19, 2017

@Nakaner Very helpful README.md text and image, thank you! :)

@pnorman
Copy link
Collaborator

pnorman commented Jul 24, 2017

@Nakaner, is there anything left on this after #747?

@Nakaner
Copy link
Contributor Author

Nakaner commented Jul 24, 2017

@pnorman wrote:

@Nakaner, is there anything left on this after #747?

Well, half of the work is still left. I am rewriting the part which determines which tiles intersect with a geometry. Work is done on branch tile-selection-rewrite. The branch might look stale but I continued work on it on Friday but its not in a state which can be pushed. (Some unit tests are missing, some are failing due to bugs in the implementation)

@Nakaner
Copy link
Contributor Author

Nakaner commented Jul 26, 2017

I pushed my work to my fork repository a few minutes ago. Feedback to the code is appreciated. Performance has not been tested yet.

@Nakaner
Copy link
Contributor Author

Nakaner commented Jul 27, 2017

My code has no special handling for boundary relations. They are treated as polygons. If the bounding box of a polygon is larger than the maximum bounding box, it is treated as a multilinestring and its outer and inner rings will be expired. This means that there is no change in behaviour compared to the old implementation. If someone wants to treat polygons always as linestrings, he just has to set the maximum bounding box size to 0.

@Nakaner
Copy link
Contributor Author

Nakaner commented Feb 12, 2018

I am currently testing my new code (branch tile-selection-rewrite in my fork of this repository) which selects which tiles should be marked as expired.

The branch has a brand new selection algorithm for tile expiry. Polygons will no longer be expired by their bounding box. The algorithm tries some approximation (but still uses bounding boxes in some kind). You won't see any differences on medium zoom levels (10 to 14) but higher zoom levels show a small difference.

z16: 5584865 (master) vs. 5168071 (my branch), i.e. -7.5%
z15: 2045309 (master) vs. 1958609 (my branch), i.e. -4.3 %
z14: 779726 (master) vs. 763382 (my branch)

The maximum bounding box size was set to 20,000 (projected Mercator units).

A first test pointed out that my branch takes about 25 % longer to apply a daily diff of Europe but I am not sure if there was anything else running on the machine which disturbed my measurements. Therefore, I will repeat the measurements and report my results here.

An example, zoom level 16, changes between 2018-02-04 20:00 UTC and 2018-02-05 20:00 UTC, blue is new, grey is old:

expiry-fine

After looking at the images, I got the impression that some performance improvements might help:

  • always expire only the perimeter of boundary relations or don't expire them at all
  • don't expire route relations – the changed members are important and they get expired, too
  • or, to go even further, don't expire any relation whose bounding box is larger than 300 x 300 projected units (default value – should cover buildings and smaller features but exclude large forests) because any change to a large forest can expire multiple counties at once

If we ignore relations, we will get much shorter expiry lists. I calculated expiry lists with a self-written PostgreSQL importer over a year ago. The following images are from my master thesis. One shows the expiries (summed up over more than a month of hourly diffs) without relations, one with relations. Please note that the algorithm being used there did only expire the perimeter of polygons, not the whole area.

expiry-deutschland-nordost
expiry-wr-deutschland-nordost

@pnorman
Copy link
Collaborator

pnorman commented Feb 14, 2018

Polygons will no longer be expired by their bounding box. The algorithm tries some approximation (but still uses bounding boxes in some kind).

This this mean that the current algorithm expires something like the blue box and the new one would do something the green box, assuming the black/pink shape is a new polygon?

image

@pnorman
Copy link
Collaborator

pnorman commented Feb 14, 2018

always expire only the perimeter of boundary relations or don't expire them at all

The perimeter is generally safe. Not doing them at all might be a problem.

don't expire route relations – the changed members are important and they get expired, too

This would be a problem in many cases. For example, if the ref, network, or other properties used for rendering change, then the tiles containing the linestring of the relation would need to be dirtied.

@Nakaner
Copy link
Contributor Author

Nakaner commented Feb 14, 2018

@pnorman wrote:

This this mean that the current algorithm expires something like the blue box and the new one would do something the green box, assuming the black/pink shape is a new polygon?

Yes, that's almost correct. Here is an improved drawing of your example (it is difficult to distinguish the green and the blue line in the lower right corner). More details can be found in the comments in the header and source files.

sketch how the new alogrithm works

This would be a problem in many cases. For example, if the ref, network, or other properties used for rendering change, then the tiles containing the linestring of the relation would need to be dirtied.

Some map styles don't use route relations at all. I would add a command line option and let the users decide (good explanation and documentation is necessary of course).

@Nakaner
Copy link
Contributor Author

Nakaner commented Feb 14, 2018

Here are the performance measurements. I used the same machine as for the performance tests of my first pull request.

The database was imported using ./osm2pgsql -d expirytest --merc --style ../default.style --flat-nodes /ssd/michael/flatnodes/flat.nodes --slim --unlogged --number-processes 2 -C 40000 /home/michael/jobs/tileexpirytest/europe-180204.osm.pbf (took 10 hours and 44 minutes and 28.9 GiB RAM)

The commands in the order I performed them. There was a break of more than a day between the second and the third test. After each test, I shut down the PostgreSQL daemon using Systemd, restored a backup of the tablespace and restarted the daemon again. I also restored the flatnodes file.

My branch (commit 2e612f5a00bb93eb8fb13d8b3ab7eed0908b589e):
Osm2pgsql took 2539s overall
        Command being timed: "./osm2pgsql -d expirytest --merc --style ../default.style --flat-nodes /ssd/michael/flatnodes/flat.nodes --append --slim --number-processes 2 -C 25000 -o /home/michael/jobs/tileexpirytest/expirylog-branch-785.txt --expire-bbox-size 20000 -e 10-16 /home/michael/jobs/tileexpirytest/785.osc.gz"
        User time (seconds): 442.10
        System time (seconds): 181.51
        Percent of CPU this job got: 24%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 42:20.39
        Maximum resident set size (kbytes): 34669536

Master branch (commit e5869274659ada90208927c54cde51b2b35d6bbf):
Osm2pgsql took 2150s overall
        Command being timed: "./osm2pgsql -d expirytest --merc --style ../default.style --flat-nodes /ssd/michael/flatnodes/flat.nodes --append --slim --number-processes 2 -C 25000 -o /home/michael/jobs/tileexpirytest/expirylog-master-785.txt --expire-bbox-size 20000 -e 10-16 /home/michael/jobs/tileexpirytest/785.osc.gz"
        User time (seconds): 395.18
        System time (seconds): 163.13
        Percent of CPU this job got: 25%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 35:51.19
        Maximum resident set size (kbytes): 36647904


Second attempt after ensuring that no other processes disturb the measurements any more:
Master branch (commit e5869274659ada90208927c54cde51b2b35d6bbf):
Osm2pgsql took 2553s overall
        Command being timed: "./osm2pgsql -d expirytest --merc --style ../default.style --flat-nodes /ssd/michael/flatnodes/flat.nodes --append --slim --unlogged --number-processes 2 -C 25000 -o /home/michael/jobs/tileexpirytest/expirylog-master-785-v2.txt --expire-bbox-size 20000 -e 10-16 /home/michael/jobs/tileexpirytest/785.osc.gz"
        User time (seconds): 411.40
        System time (seconds): 194.65
        Percent of CPU this job got: 23%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 42:34.11
        Maximum resident set size (kbytes): 34685544

My branch (commit 2e612f5a00bb93eb8fb13d8b3ab7eed0908b589e):
Osm2pgsql took 2440s overall
        Command being timed: "./osm2pgsql -d expirytest --merc --style ../default.style --flat-nodes /ssd/michael/flatnodes/flat.nodes --append --slim --number-processes 2 -C 25000 -o /home/michael/jobs/tileexpirytest/expirylog-branch-785-v2.txt --expire-bbox-size 20000 -e 10-16 /home/michael/jobs/tileexpirytest/785.osc.gz"
        User time (seconds): 420.88
        System time (seconds): 174.78
        Percent of CPU this job got: 24%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 40:40.53
        Maximum resident set size (kbytes): 36634188

My branch  (commit 2e612f5a00bb93eb8fb13d8b3ab7eed0908b589e) without tile expiry:
Osm2pgsql took 2493s overall
        Command being timed: "./osm2pgsql -d expirytest --merc --style ../default.style --flat-nodes /ssd/michael/flatnodes/flat.nodes --append --slim --number-processes 2 -C 25000 --expire-bbox-size 20000 /home/michael/jobs/tileexpirytest/785.osc.gz"
        User time (seconds): 336.77
        System time (seconds): 170.91
        Percent of CPU this job got: 20%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 41:33.65
        Maximum resident set size (kbytes): 36221356

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants