Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lots of Shapefile SQL Errors #58

Open
esb opened this issue Sep 7, 2021 · 14 comments
Open

Lots of Shapefile SQL Errors #58

esb opened this issue Sep 7, 2021 · 14 comments

Comments

@esb
Copy link

esb commented Sep 7, 2021

I've been trying to load this datbase today, but I'm getting odd errors that I'm not sure about.

This is running on Centos 7 with Python 3.8.6 with Psycopg2 2.8.4 and Postgres 13.4 on x86_64-pc-linux-gnu and PostGIS 3.1.3 (with GEOS 3.9.1-CAPI-1.14.2).

I've downloaded the Geoscape Admin Boundaries - ESRI Shapefile - GDA2020(ZIP) file, which I'm hoping is the correct one.

The first error is

Importing ACT_TOWN_POINT_shp.shp - Couldn't run Shapefile SQL
shp2pgsql result was: column "geom" of relation "aus_town_point" does not exist

This is true. The "geom" column does not exist in the table.

Then we launch into an awful lot of these errors.

root        : WARNING   Importing WA_TOWN_POINT_shp.dbf - Couldn't run Shapefile SQL
shp2pgsql result was: current transaction is aborted, commands ignored until end of transaction block

Is this because of that first error?

@minus34
Copy link
Owner

minus34 commented Sep 7, 2021

The (now updated) readme was missing a note to download the GDA94 Shapefile version of the Admin Bdys (not the GDA2020 ones)!

Please try again with the GDA94 files

@esb
Copy link
Author

esb commented Sep 7, 2021

I downloaded the GDA94 files, but the same errors are coming up.

@minus34
Copy link
Owner

minus34 commented Sep 7, 2021

Odd error! A couple of things to check:

  1. The PostGIS extension is enabled on your database (using the following SQL - CREATE EXTENSION postgis); and
  2. The Postgres bin folder is in your $PATH (run shp2pgsql from the command line to confirm)

@esb
Copy link
Author

esb commented Sep 7, 2021

Yep. Both of those things have been done.

@minus34
Copy link
Owner

minus34 commented Sep 7, 2021

One more thing to double check - have you got the latest code? The August 2021 release PR was this morning, Sydney time, and there have been a couple of significant changes to the data (thanks Geoscape!)

Other than that - can you please share the full log with me and also the command line you're using to run it

@esb
Copy link
Author

esb commented Sep 7, 2021

Ok, my original hunch was right. The mass of errors is due to a locked transaction from the first failure.

I added the "-e" flag to the shp2pgsql command to treat each command individually, and not as part of a transaction.

After this was done most of the imports went smoothly, except for the one error for each state because of a missing column.

09/07/2021 07:04:49 PM Part 3 of 6 : Start raw admin boundary load : 2021-09-07 19:04:49.802582
09/07/2021 07:04:58 PM 	Importing ACT_TOWN_POINT_shp.shp - Couldn't run Shapefile SQL
shp2pgsql result was: column "geom" of relation "aus_town_point" does not exist
LINE 3: ..."town_point","date_creat","date_retir","town_pid",geom) VALU...                                                             ^
09/07/2021 07:08:32 PM 	Importing NSW_TOWN_POINT_shp.shp - Couldn't run Shapefile SQL
shp2pgsql result was: column "geom" of relation "aus_town_point" does not exist
LINE 3: ..."town_point","date_creat","date_retir","town_pid",geom) VALU...
09/07/2021 07:08:56 PM 	Importing NT_TOWN_POINT_shp.shp - Couldn't run Shapefile SQL
shp2pgsql result was: column "geom" of relation "aus_town_point" does not exist
LINE 3: ..."town_point","date_creat","date_retir","town_pid",geom) VALU...
09/07/2021 07:10:46 PM 	Importing QLD_TOWN_POINT_shp.shp - Couldn't run Shapefile SQL
shp2pgsql result was: column "geom" of relation "aus_town_point" does not exist
LINE 3: ..."town_point","date_creat","date_retir","town_pid",geom) VALU...
09/07/2021 07:11:44 PM 	Importing SA_TOWN_POINT_shp.shp - Couldn't run Shapefile SQL
shp2pgsql result was: column "geom" of relation "aus_town_point" does not exist
LINE 3: ..."town_point","date_creat","date_retir","town_pid",geom) VALU...
09/07/2021 07:13:09 PM 	Importing TAS_TOWN_POINT_shp.shp - Couldn't run Shapefile SQL
shp2pgsql result was: column "geom" of relation "aus_town_point" does not exist
LINE 3: ..."town_point","date_creat","date_retir","town_pid",geom) VALU...
09/07/2021 07:15:10 PM 	Importing VIC_TOWN_POINT_shp.shp - Couldn't run Shapefile SQL
shp2pgsql result was: column "geom" of relation "aus_town_point" does not exist
LINE 3: ..."town_point","date_creat","date_retir","town_pid",geom) VALU...
09/07/2021 07:16:32 PM 	Importing WA_TOWN_POINT_shp.shp - Couldn't run Shapefile SQL
shp2pgsql result was: column "geom" of relation "aus_town_point" does not exist
LINE 3: ..."town_point","date_creat","date_retir","town_pid",geom) VALU...

@minus34
Copy link
Owner

minus34 commented Sep 7, 2021

Interesting - the code should import SHP files of the same layer (e.g town points) sequentially, so PG shouldn't be locking the table. I'll go with it being a knock-on effect from the 1st failure... but could be the code isn't doing that...

A couple of more avenues to investigate re: the initial error:

  1. When a shapefile load fails at the psql stage (like the above error appears to have) - it should output a <failed_shp_file>.sql file in the same folder as the code - can you attach it to this issue
  2. Does this table exist in Postgres: admin_bdys_202108.aus_town_point and does it have the geom column?

@esb
Copy link
Author

esb commented Sep 7, 2021

The error file for error_debug_ACT_TOWN_POINT_shp.shp.sql is

SET CLIENT_ENCODING TO UTF8;
SET STANDARD_CONFORMING_STRINGS TO ON;
INSERT INTO "raw_admin_bdys_202108"."aus_town_point" ("town_point","date_creat","date_retir","town_pid",geom) VALUES ('TWP8373','20130731',NULL,'TWN8373','0101000020BB1000005DDB641F27A2624072956B2D33C141C0');
INSERT INTO "raw_admin_bdys_202108"."aus_town_point" ("town_point","date_creat","date_retir","town_pid",geom) VALUES ('TWP8374','20130731',NULL,'TWN8374','0101000020BB100000E707B0F32FA262407145A1E6BD9541C0');
INSERT INTO "raw_admin_bdys_202108"."aus_town_point" ("town_point","date_creat","date_retir","town_pid",geom) VALUES ('TWP8375','20130731',NULL,'TWN8375','0101000020BB1000009879612129A4624083647620CFA341C0');
ANALYZE "raw_admin_bdys_202108"."aus_town_point";

The table exists, but is missing the geom column.

@minus34
Copy link
Owner

minus34 commented Sep 7, 2021

Ok, so shp2pgsql worked, but the table missing the geom field is very strange...

I'm short of ideas right now as I've run this multiple times over the last few days on 2 Macs without that issue; and it hasn't come up in the last 3-4 years. I can't see why CentOS would be any different. Linux != Unix, but close enough...

A corrupt PostGIS install maybe?! Can you double check this SQL works in your DB:

select st_makepoint(151.0, -33.0, 4283);

@esb
Copy link
Author

esb commented Sep 7, 2021

Yep. That SQL works fine.

Here's the dump of the table schema.

CREATE TABLE IF NOT EXISTS raw_admin_bdys_202108.aus_town_point
(
    gid integer NOT NULL DEFAULT nextval('raw_admin_bdys_202108.aus_town_point_gid_seq'::regclass),
    town_point character varying(15) COLLATE pg_catalog."default",
    date_creat date,
    date_retir date,
    town_pid character varying(15) COLLATE pg_catalog."default",
    CONSTRAINT aus_town_point_pkey PRIMARY KEY (gid)
)

I'm not sure I understand where the table schemas come from.

@minus34
Copy link
Owner

minus34 commented Sep 7, 2021

The table is created by the first state to be loaded for each layer. So ACT_TOWN_POINT_shp.shp should create the table aus_town_point with that file's structure. The -d flag in shp2pgsql forces a create table statement to be output with the SQL.

Another test! - uncomment line 379 in load-gnaf.py ([print(table) for table in create_list]) and send through the full list of dicts. A working thought is that the code is behaving differently on CentOS and is putting ACT_TOWN_POINTS_shp.dbf as the first aus_town_point input...

There should be no .dbf files for layers that have a .shp file.

BTW - I had thought about refactoring the Shapefile load to use SHP > GeoPandas > PG, now I'm wishing I had :-)

@esb
Copy link
Author

esb commented Sep 7, 2021

I can see the problem....

This file is processed first -

shp2pgsql -e -d -I -G -n -i "/vagrant_data/geoscape_202108/AUG21_Admin_Boundaries_ESRIShapefileorDBFfile/TownPoints_AUG13_GDA94_SHP/Town Points/Town Points AUGUST 2013/Standard/TAS_TOWN_POINT_shp.dbf" raw_admin_bdys_202108.aus_town_point

Then this file comes afterwards -

shp2pgsql -e -a -s 4283 -i "/vagrant_data/geoscape_202108/AUG21_Admin_Boundaries_ESRIShapefileorDBFfile/TownPoints_AUG13_GDA94_SHP/Town Points/Town Points AUGUST 2013/Standard/TAS_TOWN_POINT_shp.shp" raw_admin_bdys_202108.aus_town_point

The dbf file creates the table, but this does not create the geom column. When the shp file is appended, it will crash.

So the problem is that a .dbf file exists for the aus_town_point table.

@esb
Copy link
Author

esb commented Sep 7, 2021

Yes. The order seems to be wrong.

TAS_GCCSA_2011_POLYGON_shp.dbf
TAS_GCCSA_2011_POLYGON_shp.shp
TAS_GCCSA_2011_shp.dbf
TAS_MB_2011_POLYGON_shp.dbf
TAS_MB_2011_POLYGON_shp.shp
TAS_MB_2011_shp.dbf
TAS_SA1_2011_POLYGON_shp.dbf
TAS_SA1_2011_POLYGON_shp.shp
TAS_SA1_2011_shp.dbf
TAS_SA2_2011_POLYGON_shp.dbf
TAS_SA2_2011_POLYGON_shp.shp
TAS_SA2_2011_shp.dbf
TAS_SA3_2011_POLYGON_shp.dbf
TAS_SA3_2011_POLYGON_shp.shp
TAS_SA3_2011_shp.dbf
TAS_SA4_2011_POLYGON_shp.dbf
TAS_SA4_2011_POLYGON_shp.shp
TAS_SA4_2011_shp.dbf
TAS_IARE_2011_POLYGON_shp.dbf
TAS_IARE_2011_POLYGON_shp.shp
TAS_IARE_2011_shp.dbf
TAS_ILOC_2011_POLYGON_shp.dbf
TAS_ILOC_2011_POLYGON_shp.shp
TAS_ILOC_2011_shp.dbf
TAS_IREG_2011_POLYGON_shp.dbf
TAS_IREG_2011_POLYGON_shp.shp
TAS_IREG_2011_shp.dbf
TAS_REMOTENESS_2011_POLYGON_shp.dbf
TAS_REMOTENESS_2011_POLYGON_shp.shp
TAS_REMOTENESS_2011_shp.dbf
TAS_SEIFA_2011_shp.dbf
TAS_SUA_2011_POLYGON_shp.dbf
TAS_SUA_2011_POLYGON_shp.shp
TAS_SUA_2011_shp.dbf
TAS_SOSR_2011_POLYGON_shp.dbf
TAS_SOSR_2011_POLYGON_shp.shp
TAS_SOSR_2011_shp.dbf
TAS_SOS_2011_POLYGON_shp.dbf
TAS_SOS_2011_POLYGON_shp.shp
TAS_SOS_2011_shp.dbf
TAS_UCL_2011_POLYGON_shp.dbf
TAS_UCL_2011_POLYGON_shp.shp
TAS_UCL_2011_shp.dbf
TAS_IARE_2016_POLYGON_shp.dbf
TAS_IARE_2016_POLYGON_shp.shp
TAS_IARE_2016_shp.dbf
TAS_ILOC_2016_POLYGON_shp.dbf
TAS_ILOC_2016_POLYGON_shp.shp
TAS_ILOC_2016_shp.dbf
TAS_IREG_2016_POLYGON_shp.dbf
TAS_IREG_2016_POLYGON_shp.shp
TAS_IREG_2016_shp.dbf
TAS_GCCSA_2016_POLYGON_shp.dbf
TAS_GCCSA_2016_POLYGON_shp.shp
TAS_GCCSA_2016_shp.dbf
TAS_MB_2016_POLYGON_shp.dbf
TAS_MB_2016_POLYGON_shp.shp
TAS_MB_2016_shp.dbf
TAS_SA1_2016_POLYGON_shp.dbf
TAS_SA1_2016_POLYGON_shp.shp
TAS_SA1_2016_shp.dbf
TAS_SA2_2016_POLYGON_shp.dbf
TAS_SA2_2016_POLYGON_shp.shp
TAS_SA2_2016_shp.dbf
TAS_SA3_2016_POLYGON_shp.dbf
TAS_SA3_2016_POLYGON_shp.shp
TAS_SA3_2016_shp.dbf
TAS_SA4_2016_POLYGON_shp.dbf
TAS_SA4_2016_POLYGON_shp.shp
TAS_SA4_2016_shp.dbf
TAS_REMOTENESS_2016_POLYGON_shp.dbf
TAS_REMOTENESS_2016_POLYGON_shp.shp
TAS_REMOTENESS_2016_shp.dbf
TAS_SEIFA_2016_shp.dbf
TAS_SOSR_2016_POLYGON_shp.dbf
TAS_SOSR_2016_POLYGON_shp.shp
TAS_SOSR_2016_shp.dbf
TAS_SOS_2016_POLYGON_shp.dbf
TAS_SOS_2016_POLYGON_shp.shp
TAS_SOS_2016_shp.dbf
TAS_SUA_2016_POLYGON_shp.dbf
TAS_SUA_2016_POLYGON_shp.shp
TAS_SUA_2016_shp.dbf
TAS_UCL_2016_POLYGON_shp.dbf
TAS_UCL_2016_POLYGON_shp.shp
TAS_UCL_2016_shp.dbf
tas_gccsa_2021.shp
tas_mb_2021.shp
tas_sa1_2021.shp
tas_sa2_2021.shp
tas_sa3_2021.shp
tas_sa4_2021.shp
TAS_COMM_ELECTORAL_POLYGON_shp.dbf
TAS_COMM_ELECTORAL_POLYGON_shp.shp
TAS_COMM_ELECTORAL_shp.dbf
TAS_STATE_ELECTORAL_POLYGON_shp.dbf
TAS_STATE_ELECTORAL_POLYGON_shp.shp
TAS_STATE_ELECTORAL_shp.dbf
tas_lga.shp
tas_localities.shp
TAS_STATE_POLYGON_shp.dbf
TAS_STATE_POLYGON_shp.shp
TAS_STATE_shp.dbf
TAS_LOCALITY_shp.dbf
TAS_LOCALITY_TOWN_shp.dbf
TAS_TOWN_POINT_shp.dbf
TAS_TOWN_POINT_shp.shp
TAS_TOWN_shp.dbf
Authority_Code_MB_CATEGORY_CLASS_AUT_shp.dbf
Authority_Code_REMOTENESS_CATEGORY_AUT_shp.dbf
Authority_Code_MB_CATEGORY_CLASS_AUT_shp.dbf
Authority_Code_REMOTENESS_CATEGORY_AUT_shp.dbf
Authority_Code_STATE_ELECTORAL_CLASS_AUT_shp.dbf
Authority_Code_TOWN_CLASS_AUT_shp.dbf

@minus34
Copy link
Owner

minus34 commented Sep 7, 2021

Cool, we've got the problem nailed down.

Curious that the behaviour is different between operating systems, but not as important as finding a universal fix...

I've just updated the code to filter out *_POINT_shp.dbf files (line 340). However, in the list above the elif filter should exclude all *_POLYGON_shp.dbf files as well; so this may not be a fix on CentOS...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants