Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run update in transaction #2205

Open
joto opened this issue Jun 18, 2024 · 0 comments
Open

Run update in transaction #2205

joto opened this issue Jun 18, 2024 · 0 comments

Comments

@joto
Copy link
Collaborator

joto commented Jun 18, 2024

Osm2pgsql currently doesn't use transactions. It opens and closes several connections to the database reading and writing data as needed.

This is okay for the initial import, because in typical use you do the import first and when that's done, you start using the data. If something breaks during import, you start from scratch. Using a transaction (that would possibly be open for many hours) doesn't gain us anything.

But for updates the situation is different. Here the use of the database happens in parallel with updates, at least in many cases.
It would be easier for users to understand what situation their database is in if we were using transactions. If something fails during the update we could be sure not to have half of the data in the database. Note that the situation is not as bad as this might look, because if an update fails, you will usually fix the situation that lead to the failure and re-start the update from the beginning and it will get you to a defined point again if it runs through this time. The half-updated data from the first try will get overwritten by the final data in most cases. (If you change the config file used, this might not always be the case, though.)

One problem with transactions is that they are usually tied to a database connection. And we use several of them in parallel for performance. But osm2pgsql has a mechanism that allows you to have several connections using the same transaction using the snapshot synchronization functions. This is certainly something we could try.

Then there is the question of the performance impact this would have. It could go either way, so we'd have to test this carefully.

The last issue I see is when osm2pgsql-gen is used. This is currently a separate program which can not share the transaction. But it doesn't have to be an separate program, it was easier to do it this way as long as it is experimental. But we can change that later.

See also #2190, #2110

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant