Replies: 8 comments 1 reply
-
I tried to make sense of this part. We have 4 workers 1-4 plus one main thread 0. On the 19th, ~20:50, all four workers Then 2 anonymous tasks, one finishes in 1ms, and the second lingers...? And immediately starts indexing Still on the 19th, at ~23:32, worker 4 starts creating indexes for Given all that, I added extra annotations that I think are the right ones to make sense of all that. I hope I can use
|
Beta Was this translation helpful? Give feedback.
-
The mix of ms, s and hms is a little bit haphazard, and the reported times don't seem to reflect all post processing. |
Beta Was this translation helpful? Give feedback.
-
This basically goes in the same direction as #207. What and how things are logged as changed over time and there never was a grand plan how to do this. I totally agree that the logging is hard to understand for somebody new to the project. You really have to know a lot about the internals of osm2pgsql processing to interpret the output. Osm2pgsql internal processing is complex and the question is, how much the user should actually see of how the sausage is made. Maybe we should just move all that logging to the debug mode and only tell the user when we are done? Does the user actually need to know? What information is actually actionable to the user? On the other hand we could add a lot more output, trying to make things clearer, but that would be a lot of information. So the question is really: What is that output for? And for whom? Currently it is for experts who want to see what's going on, either in their own setups, or, more importantly, when users report problems. @StyXman What do you expect of that output? Coincidentally I recently added https://osm2pgsql.org/contribute/how-osm2pgsql-processing-works.html to the website to help explain more about what goes on inside osm2pgsql. Could help with figuring out things, although it is just a small part of what's going on. |
Beta Was this translation helpful? Give feedback.
-
I'm using the logs to generate annotations on a grafana server like this: so I don't want to know how the sausage is made, but at least I want the fabrication and expiring date of each package I buy :) |
Beta Was this translation helpful? Give feedback.
-
But what are you creating those graphs for? What is it that you are trying to achieve in the end? |
Beta Was this translation helpful? Give feedback.
-
Right now it's investigate how disk usage changes during the import. Later it will allow me to know how updates change too. I hope to finish soon with a write up about it. |
Beta Was this translation helpful? Give feedback.
-
This level of logging could be done on a |
Beta Was this translation helpful? Give feedback.
-
@joto I'm starting to look into this. Can you confirm my findings are 100% correct? |
Beta Was this translation helpful? Give feedback.
-
What version of osm2pgsql are you using?
osm2pgsql version 1.8.0
What operating system and PostgreSQL/PostGIS version are you using?
Linux Debian Stable, PG Database version:
15.3 (Debian 15.3-0+deb12u1)
, PostGIS version:3.3
Tell us something about your system
Small server, 8GiB RAM, 1TB consumer SATA SSD.
What did you do exactly?
osm2pgsql --verbose --database gis --cache 0 --number-processes 4 --slim --flat-nodes $(pwd)/nodes.cache --hstore --multi-geometry --style $osm_carto/openstreetmap-carto.style --tag-transform-script $osm_carto/openstreetmap-carto.lua europe-latest.osm.pbf
What did you expect to happen?
The logs about the middle processing are a little bit confusing:
I separated it in sections by worker. The messages about
Starting task
andDone task
don't mention which task it is. Also, do I understand correctly that not all workers started working at the same time? If I'm right, why is that? If I Isolate worker 3's logs:At 20:49 starts processing
roads
. ~2h later it's creating 2 indexes and analyzing it. Launches 2 tasks, but finishes one first (22:57), a long one? Which one? And then another, but one seems to still be running.And suddenly it's creating indexes on
ways
. This seems to finish the next day and it's properly logged.I also see the messages about 'Done postprocessing' for
nodes
(22:57),ways
andrels
(next day, 10:48), but for the other 4 tables the message is `All postprocessing'.Could tasks be more verbose about what exactly they're doing? I didn't try
--log-level=debug
because this is a 49h+ process and I really don't want to start it all over :)Beta Was this translation helpful? Give feedback.
All reactions