-
-
Notifications
You must be signed in to change notification settings - Fork 474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New middle tables experiments #1970
Comments
created should be fastupdate=off should not be needed on modern postgresql. When we added it, the pending list was set to work_mem, which was typically large on rendering servers (128MB or so). Since PostgreSQL 10, the value is set by What schema will be used for the various |
No. See #1785.
I just took those settings from the old setup. If we don't need those, we should remove them everywhere. There is an open issue #37 about fastupdates. I never got around to researching this, but having nothing against remove them.
Tag columns just have the obvious key-value structure. For relation members I showed the structure in the PR: |
We have a disagreement between your comment and the PostgreSQL wiki. I prefer the PostgreSQL wiki as a source for how to represent a UTC timestamp, as it agrees with all PostgreSQL experts I have talked to about the matter. Since we're making a breaking change for the middle, let's get it right. |
I agree with "the PostgreSQL experts" that theirs is the right choice for most cases and that if you don't want to think about what the best solution for your use case is than you should do as they say. But every use case is different. Basically it comes down to what the "operations" are that we want to do with the data. In our case I believe the most often used "operation" is to look at the data and compare it with timestamps in the same database (in which case type doesn't matter) or with timestamps we get from outside the database (most importantly OSM files which always use UTC or the osm.org web site which shows also in UTC (if you mouse over the "33 days ago" message)). Having the data show up in my local time zone which is the default when opening a pgsql session is rather annoying, especially with daylight saving shifts involved. And I never care about my local time when an edit was made. I might care about the local time zone of the editor, but that information is lost anyway. What I care about is comparing timestamps to see what comes before what. The only time I might care about comparing with local time is to get something like that "edited 3 hours ago" thing, not something that will happen a lot in an osm2pgsql database, but I agree that this use case becomes a little bit more complex. Depending on what you do if you write a program to do something with the data you might have to do a little bit more work to set the time zone, but that's okay, you are writing program anyway. The crucial part is when you do ad-hoc queries, and they are easier and more natural when storing the data without time zone. And yes, I know that you can change the time zone in your settings so that everything shows up in UTC anyway, but then I either have to remember to do this for every session or I have to put that into a config file which breaks every other case where I might have a database that contains timestamps that I do want to see in my local time. Unfortunately there is no way to say, this is a timestamp with UTC time and always show it to me in UTC time. |
Because we're using UTC timestamps is the reason to use timestamptz! Specifically don't use timestamp (without time zone) to store UTC times.
I've considered this use case, and there's nothing unusual about it. Storing UTC time is a common use-case that is well documented. Some practical problems
Your reason for wanting If you want to display timestamps in UTC in some DBs and in local time in others, set the timezone specifically for that database. This is breaking behavior for others to force your preferences on them. |
That's an argument for my position. You get the same result in both cases with |
The code has been merged a while back. Closing here. |
I have opened PR #1969 which shows my current work on a new database formats for the so-called "middle" tables, ie.
planet_osm_nodes/ways/rels
which contain the data needed to update the database from change files.But the middle tables can also contain useful information to do specialized processing of the OSM data. The new format makes them not only smaller but also easier to use. We'd be interested to get some feedback from the community on the new format.
We hope to make the new middle table format "official" at some point, but want to be sure we have the right format first.
(And of course we'll be supporting the old format for a while next to the new format.)
This is how the tables look like:
This is with with
-x/--extra-attributes
option. Without that thecreated
,version
,changeset_id
, anduser_id
are missing in all tables and theplanet_osm_users
table will not be created.If you want to try this, check out the branch from PR #1969 and compile it. You'll get the old format by default, you need the special command line options described in the PR to use it.
The new format is also slight more efficient. The database size drops from 256 GB for the planet to 237 GB. It is much much better if you use
-x/--extra-attributes
. In that case database size drops from nearly 400 GB to about 260 GB.The text was updated successfully, but these errors were encountered: