Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Write Unknown cartesian CRS when saving gdf without a CRS to GPKG #368

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
5 changes: 4 additions & 1 deletion CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,10 @@

### Improvements

- `read_arrow` and `open_arrow` now provide [GeoArrow-compliant extension metadata](https://geoarrow.org/extension-types.html), including the CRS, when using GDAL 3.8 or higher.
- `read_arrow` and `open_arrow` now provide
[GeoArrow-compliant extension metadata](https://geoarrow.org/extension-types.html),
including the CRS, when using GDAL 3.8 or higher (#366).
- Write "Unknown cartesian CRS" when saving gdf without a CRS to GPKG (#368).

### Bug fixes

Expand Down
21 changes: 13 additions & 8 deletions pyogrio/geopandas.py
Original file line number Diff line number Diff line change
Expand Up @@ -537,14 +537,19 @@ def write_dataframe(
geometry_type = f"{geometry_type} Z"

crs = None
if geometry_column is not None and geometry.crs:
# TODO: this may need to be WKT1, due to issues
# if possible use EPSG codes instead
epsg = geometry.crs.to_epsg()
if epsg:
crs = f"EPSG:{epsg}"
else:
crs = geometry.crs.to_wkt(WktVersion.WKT1_GDAL)
if geometry_column is not None:
if geometry.crs:
# TODO: this may need to be WKT1, due to issues
# if possible use EPSG codes instead
epsg = geometry.crs.to_epsg()
if epsg:
crs = f"EPSG:{epsg}"
else:
crs = geometry.crs.to_wkt(WktVersion.WKT1_GDAL)
elif driver == "GPKG":
# In GPKG, None crs must be replaced by "Undefined Cartesian SRS", otherwise
# the default "Undefined geographic SRS" will be used.
crs = 'LOCAL_CS["Undefined Cartesian SRS"]'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this enough? My understanding was that this is required:

ENGCRS["Undefined Cartesian SRS with unknown unit",
    EDATUM["Unknown engineering datum"],
    CS[Cartesian,2],
        AXIS["X",unspecified,
            ORDER[1],
            LENGTHUNIT["unknown",0]],
        AXIS["Y",unspecified,
            ORDER[2],
            LENGTHUNIT["unknown",0]]]

See the dicsussion in r-spatial/sf#2049.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not an expert on projections, but as far as I can tell both are equivalent: LOCAL_CS["Undefined Cartesian SRS"] is just a shorter name for the full wkt.

I did a manual check on this before, but to be sure I added an explicit check in the test that the srs_id in the GPKG written is -1, which is the main thing we are trying to do here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference is that LOCAL_CS["Undefined Cartesian SRS"] assumes meters as units and axes as easting and northing, which is incorrect:

<Engineering CRS: LOCAL_CS["Undefined Cartesian SRS",UNIT["Meter",1] ...>
Name: Undefined Cartesian SRS
Axis Info [cartesian]:
- [east]: Easting (Meter)
- [north]: Northing (Meter)

The WKT above is explicit about making no assumptions about anything, because we don't know anything.

Copy link
Member Author

@theroggy theroggy Mar 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using that WKT does not comply to the GeoPackage specs: https://www.geopackage.org/spec/#r11. Using that WKT leads to a new custom CRS being "created" in the GeoPackage with srs_id = 100.000 (or whatever is the first unused srs_id in that range in the geopackage) instead of using srs_id = -1, as should be used according to the specs.

I wonder what is the rational that they went for this solution in r-spatial? I don't really found one in the post above. Maybe it should be changed in GDAL that ENGCRS["Undefined Cartesian SRS with unknown unit" is (also) mapped to srs_id = -1 in geopackage?

I did some extra tests and proj does seem to treat them all as equivalent, so I suppose that in practice it probably won't matter to much with the different crs's being interused? Would this lead to it being relatively unimportant that r-spatial choose something different than the gpkg specs (and supposedly other libraries following that) and that the end user should have trouble with those differences, or is that a bit naive? Possibly other libraries than proj don't treat them as equal?

from pyproj import CRS

crs_undefined = CRS('LOCAL_CS["Undefined Cartesian SRS"]')
crs_undefined_unknown = CRS('LOCAL_CS["Undefined Cartesian SRS with unknown unit"]')
try:
    crs_undefined_unknown_eng = CRS(
        'ENGCRS["Undefined Cartesian SRS with unknown unit"]'
    )
except Exception as ex:
    print(f"Exception raised: {ex}")

crs_undefined_unknown_eng_wkt = """
    ENGCRS["Undefined Cartesian SRS with unknown unit",
        EDATUM["Unknown engineering datum"],
        CS[Cartesian,2],
            AXIS["X",unspecified,
                ORDER[1],
                LENGTHUNIT["unknown",0]],
            AXIS["Y",unspecified,
                ORDER[2],
                LENGTHUNIT["unknown",0]]]
"""
crs_undefined_unknown_eng2 = CRS(crs_undefined_unknown_eng_wkt)

# Check differences
print(f"{crs_undefined_unknown.equals(crs_undefined_unknown_eng2)=}")
print(f"{crs_undefined_unknown.to_wkt() == crs_undefined_unknown_eng2.to_wkt()=}")

print(f"{crs_undefined_unknown.equals(crs_undefined)=}")
print(f"{crs_undefined_unknown.to_wkt() == crs_undefined.to_wkt()=}")

print(f"{crs_undefined_unknown_eng2.equals(crs_undefined)=}")
print(f"{crs_undefined_unknown_eng2.to_wkt() == crs_undefined.to_wkt()=}")

Output:

Exception raised: Invalid projection: ENGCRS["Undefined Cartesian SRS with unknown unit"]: (Internal Proj Error: proj_create: Missing EDATUM / ENGINEERINGDATUM node)
crs_undefined_unknown.equals(crs_undefined_unknown_eng2)=True
crs_undefined_unknown.to_wkt() == crs_undefined_unknown_eng2.to_wkt()=False
crs_undefined_unknown.equals(crs_undefined)=True
crs_undefined_unknown.to_wkt() == crs_undefined.to_wkt()=False
crs_undefined_unknown_eng2.equals(crs_undefined)=True
crs_undefined_unknown_eng2.to_wkt() == crs_undefined.to_wkt()=False

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tricky. When using the WKT, it correctly roundtrips with all the fields Unknown but the srid is 100000. I am not sure which is better.

@edzer are you sure the way this is handled in {sf} is actually correct? Meaning it correctly roundtrips and follows the spec?

Copy link
Member Author

@theroggy theroggy Mar 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reference to GDAL code handling srs_id=-1, as looked for during the community meeting by @jorisvandenbossche :


# If there is geometry data, prepare it to be written
if geometry_column is not None:
Expand Down
17 changes: 17 additions & 0 deletions pyogrio/tests/test_geopandas_io.py
Original file line number Diff line number Diff line change
Expand Up @@ -930,6 +930,23 @@ def test_write_dataframe_append(tmp_path, naturalearth_lowres, ext):
assert len(read_dataframe(output_path)) == 354


@pytest.mark.parametrize("ext", [".gpkg", ".shp"])
def test_write_dataframe_crs_None(tmp_path, ext):
input_gdf = gp.GeoDataFrame(geometry=[Point(0, 1)], crs=None)
output_path = tmp_path / f"test{ext}"

write_dataframe(input_gdf, output_path)

assert os.path.exists(output_path)
result_gdf = read_dataframe(output_path)

# In GPKG, None crs is replaced by "Undefined Cartesian SRS".
if ext == ".gpkg":
assert result_gdf.crs == 'LOCAL_CS["Undefined Cartesian SRS"]'
else:
assert result_gdf.crs is None


@pytest.mark.parametrize("spatial_index", [False, True])
def test_write_dataframe_gdal_options(tmp_path, naturalearth_lowres, spatial_index):
df = read_dataframe(naturalearth_lowres)
Expand Down
Loading