Replies: 5 comments 1 reply
-
Just for the record: according to DBeaver, the resulting physical size of the 'planet-osm-admin' table is only 1.9 GB, so way less than the > 40 GB jump in memory usage. There are 2,294,085 records in the table, with admin values 2-13. |
Beta Was this translation helpful? Give feedback.
-
You can't compare the amount of memory used by some Lua structure with how much this will take in the database. Memory usage in Lua will be much larger. That's why there is the warning in the manual. So I do not rule out that the problem you are describing is due to the way we do two-stage processing. To figure out where the problem is, I suggest running the exact same config, but with the one line removed where you are actually storing anything in the global variable. The difference you see then should tell you something. There are also ways to ask Lua how much memory it has allocated etc. but this goes beyond the scope here. |
Beta Was this translation helpful? Give feedback.
-
Yes, I realize that the type of data structure used, and the way the data is stored, can make a huge difference. I recently had to handle a slightly similar issue, where I needed to store unique IDs and the vertex count of polygons for a multi-threaded Python application. My first, naive, approach was to store this information as a nested Python lists-in-list structure, with a separate sub-list for each polygon record. With a few hundred million records, memory soared to over 130 GB... Reading up more about Python objects and memory consumption, I finally settled on re-implementing this as one big 2D 'numpy array', which probably reduced memory consumption by a factor 20x.
Yes, thanks for the suggestion. I will attempt that. It will take some time before I can report the results though, as I have another process running that I would like to finish first. The code involved though, is this by the way:
|
Beta Was this translation helpful? Give feedback.
-
You will probably need to use an intermediate db for handling such data size. Probably |
Beta Was this translation helpful? Give feedback.
-
Although slightly speculative, I have the feeling that a large part of this issue with excessive memory usage, was down to the flex style issue described here: gravitystorm/openstreetmap-carto@b82ba63#r672314283 where in the OpenStreetMap route relation processing code of the flex style, the route member objects would not only be added to the route table, but also to the line and roads table. This was fixed by Paul in the above linked commit, and I have the feeling this also solved the excessive memory usage. |
Beta Was this translation helpful? Give feedback.
-
Based on @pnorman's good work for the flex version of the openstreetmap-carto style (gravitystorm/openstreetmap-carto#4431), I have been experimenting with planet size data to see the results and what potential issues might come up with the new options.
I have done two tests:
For the first test, involving the Facebook Daylight, I used an early version of the flex file of openstreetmap-carto, where Paul didn't yet add the 'planet-osm-admin', 'planet-osm-transport-line' and 'planet-osm-transport-polygon' tables, but only enhanced it with a non-spatial 'planet-osm-route' table that can be used to display routes based on database joins.
My VM was configured with 100 GB RAM, and 50 GB Swap in Ubuntu. Peak memory usage during this first test was about 95 GB RAM + 9 GB swap (with "-C 75000" set on the command line), so this first run went successfully, despite the large PBF.
I then ran the second test with the smaller 58 GB official planet PBF file. This time, I used the latest state of the Paul's work on the flex style, which adds the new 'planet-osm-admin', 'planet-osm-transport-line' and 'planet-osm-transport-polygon' spatial tables, next to the 'planet-osm-route' non-spatial table.
With the same VM configuration, the osm2pgsql process was killed when all RAM and swap was consumed. I then attempted it with smaller cache settings, but even with "-C 10000", the process was killed. Switching to slim-mode and flatnodes file allowed the processing to succeed.
Now my question:
As far as I now understand Paul's code in the flex style, only the 'planet-osm-admin' table actually requires phase 2 processing, and all the other tables are just created using phase 1 processing.
As documented on the "Osm2pgsql Manual" pages, phase 2 processing can add a considerable amount of extra memory usage, due to the need to store in main memory all data from phase 1 needed in phase 2 ("All data stored in stage 1 for use in stage 2 in your Lua script will use main memory.").
Clearly, with the "administrative boundaries", we have a kind of "worst case" scenario here, as the admin boundary relations are some of the most complex and largest ones in the whole of OpenStreetMap. So it is probably not a surprise to see a big jump in memory usage.
Yet, seeing the memory usage of osm2pgsql jump up by > 40 GB before being killed, still seems a bit to much, even for the admin boundaries??
If I understand the LUA code of the style right, only the member ways of the boundary relations need to be stored, and since the ways are even being de-duplicated in the process (this was the purpose of the new 'planet-osm-admin' spatial table), additional memory usage should be relatively modest??
What am I missing? Or is this kind of memory usage in this kind of usage scenario involving OpenStreetMap "administrative boundaries" and phase 2 processing just simply expected and normal?
Beta Was this translation helpful? Give feedback.
All reactions