Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tiger import: sql errors in log #2

Open
geoneutrino opened this issue Oct 21, 2020 · 3 comments
Open

Tiger import: sql errors in log #2

geoneutrino opened this issue Oct 21, 2020 · 3 comments

Comments

@geoneutrino
Copy link

Hi,

last week i imported a new planet with all additional data (master branch / postgres12)

In tiger data sql files there are more than 33 million "select tiger_line_import" statements. After import the postgresql logfile had a lot of syntax errors for approx 1.5% of these statements

  • 410.000 errors of type "invalid syntax for type integer"
    (second parameter should be integer) e.g.
    SELECT tiger_line_import(ST_GeomFromText('LINESTRING(-85.937244 31.384285,-85.937262 31.384142,-85.937319 31.383953,-85.937452 31.383749,-85.937794 31.383686,-85.938286 31.383649,-85.938534 31.383803,-85.938865 31.384175)',4326), 'UN98', 'UN00', 'all', 'Pearl St', 'Coffee, AL', '36351');

  • 60.000 errors of type "Road too short for number range"
    e.g. Road too short for number range 424 to 5364 on E State St, Clarke, AL (4.0973280119849804e-07)
    PL/pgSQL-Funktion tiger_line_import(geometry,integer,integer,text,text,text,text) Zeile 44 bei RAISE

While this doesn't affect nominatim from working maybe we should leave out the statements not matching database structure of the tiger-preprocessed files ?

Thanks

@mtmail
Copy link
Contributor

mtmail commented Oct 21, 2020

Those rows have been skipped (throwing warnings) for years. The conversion script in https://github.com/osm-search/TIGER-data should better cleanup/filter non-integers from house number ranges. It doesn't affect the search, at least not more than it always had (and I can't remember a bug that was tracked down to those lines).

zgrep -c 'invalid input syntax' postgresql-12-main.log.3.gz
821326
zgrep -ic 'road too short' postgresql-12-main.log.3.gz
119532

(I think I imported the data twice on the same day)

@lonvia lonvia transferred this issue from osm-search/Nominatim Oct 31, 2020
@mtmail
Copy link
Contributor

mtmail commented Nov 4, 2020

I had a look at a couple "road too short" warnings. That happens when the house-number range from start to finish is in the 1000s but the roads only covers a couple of houses. Examples below.

I used http://geojson.io/ to visualize the LINESTRING (Meta => Load WKT String). You can clearly see the road segments only cover a couple of houses. In the first example I think 5364 should instead be in the 5xx range (based on a house on the opposite side of the street). Note, the maps show Mapbox map tiles and the building outlines aren't in OSM data. Satellite view shows the same buildings.

  • 01025.sql:SELECT tiger_line_import(ST_GeomFromText('LINESTRING(-87.888051 31.517337,-87.889063 31.517328)',4326), '5364', '424', 'even', 'E State St', 'Clarke, AL', '36545');

image

  • SELECT tiger_line_import(ST_GeomFromText('LINESTRING(-90.118898 42.980792,-90.118886 42.980772,-90.118808 42.980660,-90.118759 42.980546,-90.118758 42.980376,-90.118885 42.980073,-90.118882 42.979993,-90.118857 42.979935,-90.118840 42.979902)',4326), '3499', '1401', 'odd', 'Bonnie Mae Blvd', 'Iowa, WI', '53533');

image

  • 55133.sql:SELECT tiger_line_import(ST_GeomFromText('LINESTRING(-88.117285 43.044837,-88.117284 43.044856,-88.117277 43.045278,-88.117283 43.045413,-88.117329 43.045508,-88.117395 43.045591,-88.117487 43.045674,-88.117556 43.045710)',4326), '6401', '9499', 'all', 'Leon Ter', 'Waukesha, WI', '53005');

image

@lonvia
Copy link
Member

lonvia commented Nov 4, 2020

I'm sure these are data errors in TIGER. I don't think it's worth trying to correct them. You can use the haversine library to compute the road length in the script and skip the roads which are too short.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants