Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non moving node updates should not update related way/relation geometries #1046

Open
cquest opened this issue Jan 3, 2020 · 6 comments
Open

Comments

@cquest
Copy link
Contributor

cquest commented Jan 3, 2020

When a node is modified, osm2pgsql does not make a difference if it is just a tag change or a lat/lon change.

When the lat/lon is the same as the actual one in the database, there is no impact on ways and relations refering to that node and their geometry do not need to be updated.

I suggest to add a test in node_modify to check if the lat/lon has changed or not.

The actual code is very basic:

  • delete the node from the DB
  • set the node location in the cache
  • create the node in the DB
  • look for related ways/relations and mark them as pending for update as well

It could be more efficient:

  • update the node in the DB (UPDATE instead of DELETE/INSERT may also limit DB bloat)
  • if location has changed:
    • set the new location in the cache
    • mark related ways/relations as pending for update
@cquest
Copy link
Contributor Author

cquest commented Jan 4, 2020

I've implemented the test for location change, and it shows a 10x speedup in the node parsing step for nodes that did not move.

In the diffs I've used for my tests, a large portion of nodes had new coordinates, so the speedup is reduced but still there for the others (from 10 to 50% in my tests).

The benefit is not limited to the node parsing... but cascades to ways and relations as fewer pending ones need to be checked and updated.

@lonvia
Copy link
Collaborator

lonvia commented Jan 8, 2020

Do you like to do a quick PR? I can't promise that we take it as is but it would be easier to discuss the implications with the actual code.

@cquest
Copy link
Contributor Author

cquest commented Jan 9, 2020

I'll do. It's only a few lines change.

Be aware... I've not written much C++ in my life, it may hurt purists ;)

@cquest
Copy link
Contributor Author

cquest commented Jan 11, 2020

Just pushed the PR: #1059

@joto
Copy link
Collaborator

joto commented Oct 20, 2020

@cquest Can you elaborate on what you tested exactly and how the numbers changed with and without this patch? There are several issues here that could affect timing: With the patch fewer ways (and relations) will be marked as (possibly) changed. On the other hand we have to lookup the old location of all nodes, which is cheap if we are using flat node store, but possibly expensive when using the database. All of this, of course, depends on what kind of node changes there are usually.

To figure out the best way to do this we need some numbers not from isolated queries, but from real-world update runs. Does this, on avarage, speed up things considerably or not? How does the flat node factor in here? How does the bucket index? @cquest You speek of a 10% speed up which would be well worth it, but I am not sure what you have been measuring there.

joto added a commit to joto/osm2pgsql that referenced this issue Aug 17, 2024
When a new version of a node has the same location as the old version of
that node, we don't need to re-process ways and relations this node is
a member of.

See osm2pgsql-dev#1046

This is some preliminary experiments. It is not tested well. Currently
only works with flat node file.

Does this work in all cases? Can we be shure that the state reflected in
the middle tables/the flat node file is the same as in the output
tables? What happens if we re-start updating with an older diff file?
What happens if a nodes is deleted and then re-surrected?
@joto
Copy link
Collaborator

joto commented Aug 17, 2024

I did some statistics on node changes using the planet history file. Of the about 2 billion node changes (i.e. new versions of an existing non-deleted node) about one quarter did not change the location. Many of these will be for POIs which are not in a way or relation, so this is not so relevant here. But even if you only look at untagged nodes (no tags in old and new version), in about 10% of cases the location did not change. Presumably these are all members of a way or relation so they will trigger further processing unnecessarily.

I did some experimentation in a branch, but it is not totally straightforward to change the current code, so this would need a lot more testing and especially benchmarking. It makes the code more complex, so this has to be evaluated thoroughly. If anybody want to play around with it, go ahead. At the moment I don't have the time to work on that.

This is also part of a larger issue: How do we handle changed objects in a better and more flexible way instead of just deleting and inserting. And how do we detect early that changes in the OSM data do not actually result in changes in the database, thus allowing us to elide further processing through the processing chain from nodes to ways to relations, to generalization to tile invalidation. This needs a lot more thoughts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants