Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate rows with --expire-tiles option #766

Closed
ardean80 opened this issue Jun 16, 2017 · 3 comments
Closed

Duplicate rows with --expire-tiles option #766

ardean80 opened this issue Jun 16, 2017 · 3 comments
Milestone

Comments

@ardean80
Copy link

ardean80 commented Jun 16, 2017

I'm trying to find out which tiles are expired each time I apply osm day diff to my planet db.
I niticed that, when I use osm2pgsql with the --expire-tiles option, the count of rows on my 4 planet tables is greater than the count of rows on the same tables obtained without using the --expire-tiles option: It seems like it duplicates modified rows, leaving the old ones in the db.
Here is the command I run:

osm2pgsql -d planetdb -s -C 4096 --number-processes 8 --hstore-all --append --expire-tiles 9-19 --expire-output /tmp/expired_tiles_list

In addition to applying the changes, I would expect such an option only to return a output file, with respect to the "normal" execution. Instead, executing osm2pgsql in two different ways (with/without the --expire-tiles option), it leaves the db in two different states.
How could this be possible? Am I doing something wrong?

@ravhed
Copy link

ravhed commented Jun 30, 2017

I've seen this behavior as well and having the expire-tiles parameter set actually affect if rows in the database are deleted or not. See this method which deletes ways:

int output_pgsql_t::pgsql_delete_way_from_output(osmid_t osm_id)
{
    /* Optimisation: we only need this is slim mode */
    if( !m_options.slim )
        return 0;
    /* in droptemp mode we don't have indices and this takes ages. */
    if (m_options.droptemp)
        return 0;

    m_tables[t_roads]->delete_row(osm_id);
    if ( expire.from_db(m_tables[t_line].get(), osm_id) != 0)
        m_tables[t_line]->delete_row(osm_id);
    if ( expire.from_db(m_tables[t_poly].get(), osm_id) != 0)
        m_tables[t_poly]->delete_row(osm_id);
    return 0;
}

As you can see it will only delete the way if it is expired. When expire-tiles is not set it will always be deleted as from_db will return -1 in that case.

My guess is that the issue is related to the tile expiry functionality and specifically that it drops some tiles from the tile expiry tree that shouldn't be dropped. See #709.

As a workaround you can create triggers that make sure that the row is dropped. This will decrease the update performance somewhat.

@Nakaner
Copy link
Contributor

Nakaner commented Jul 19, 2017

I cannot confirm this issue with the following setup (only a small extract):

  • Download extract of Lower Saxony from Geofabrik of 2017-07-17 (link)
  • Import it using ./osm2pgsql --database expirytest --slim --style ../default.style --number-processes 8 --hstore-all --cache 2000 --merc niedersachsen-170717.osm.pbf
  • Check if following objects exist: node 48644902, node 115966540, way 4473628, way 23741521, relation 84869
  • Apply the diff of the extract 000001578 using osm2pgsql --database expirytest --slim --append --expire-tiles 9-19 --expire-output extest.txt --style ../default.style --number-processes 8 --hstore-all --cache 2000 --merc 578.osc.gz
  • Look for the objects again. Node 48644902 and way 23741521 should be gone, way 4473628 should now have more than two nodes. Relation 84869 should now have three tags.

osm2pgsql works as expected.

ravhed wrote:

I've seen this behavior as well and having the expire-tiles parameter set actually affect if rows in the database are deleted or not. See this method which deletes ways:

int output_pgsql_t::pgsql_delete_way_from_output(osmid_t osm_id)
{
    /* Optimisation: we only need this is slim mode */
    if( !m_options.slim )
        return 0;
    /* in droptemp mode we don't have indices and this takes ages. */
    if (m_options.droptemp)
        return 0;

    m_tables[t_roads]->delete_row(osm_id);
    if ( expire.from_db(m_tables[t_line].get(), osm_id) != 0)
        m_tables[t_line]->delete_row(osm_id);
    if ( expire.from_db(m_tables[t_poly].get(), osm_id) != 0)
        m_tables[t_poly]->delete_row(osm_id);
    return 0;
}

As you can see it will only delete the way if it is expired. When expire-tiles is not set it will always be deleted as from_db will return -1 in that case.

I do not understand what's wrong here.

expire_tiles::from_db returns table_t::wkb_reader::get_count() which returns table_t::wkb_reader::m_count. table_t::wkb_reader::m_count is set in the constructor of table_t::wkb_reader. It is the number of rows returned by the database after the execution of the prepared statement get_wkb. If the object to be deleted does not exist in the table, table_t::wkb_reader::get_count() will return 0. This prevents the unnecessary execution of DELETE FROM %1% WHERE osm_id = %2%.

EDIT: I used Git commit e9a55b7.

@pnorman pnorman added this to the 0.94.0 milestone Jul 24, 2017
@pnorman
Copy link
Collaborator

pnorman commented Jul 24, 2017

I'm trying to find out which tiles are expired each time I apply osm day diff to my planet db.

What version of osm2pgsql is this with?

lonvia added a commit to lonvia/osm2pgsql that referenced this issue Aug 12, 2017
tomhughes pushed a commit to tomhughes/osm2pgsql that referenced this issue Feb 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants