Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add from_wkt/to_wkt functions #50

Merged
merged 14 commits into from
Oct 25, 2024
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 8 additions & 2 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -75,12 +75,18 @@ add_library(spherely MODULE
src/accessors-geog.cpp
src/creation.cpp
src/geography.cpp
src/io.cpp
src/predicates.cpp
src/spherely.cpp
)

target_compile_definitions(spherely
PRIVATE VERSION_INFO=${PROJECT_VERSION})
target_compile_definitions(
spherely
PRIVATE
VERSION_INFO=${PROJECT_VERSION}
S2GEOGRAPHY_VERSION=${s2geography_VERSION}
S2GEOGRAPHY_VERSION_MAJOR=${s2geography_VERSION_MAJOR}
S2GEOGRAPHY_VERSION_MINOR=${s2geography_VERSION_MINOR})

target_link_libraries(spherely
PRIVATE pybind11::module pybind11::lto pybind11::windows_extras
Expand Down
10 changes: 10 additions & 0 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -61,3 +61,13 @@ Predicates
contains
within
disjoint


Input/Output
------------

.. autosummary::
:toctree: _api_generated/

from_wkt
to_wkt
3 changes: 1 addition & 2 deletions src/accessors-geog.cpp
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
#include <s2geography.h>
#include <s2geography/geography.h>

#include "constants.hpp"
#include "creation.hpp"
#include "geography.hpp"
#include "pybind11.hpp"
Expand All @@ -9,8 +10,6 @@ namespace py = pybind11;
namespace s2geog = s2geography;
using namespace spherely;

const double EARTH_RADIUS_METERS = 6371.01 * 1000;

PyObjectGeography centroid(PyObjectGeography a) {
const auto& a_ptr = a.as_geog_ptr()->geog();
auto s2_point = s2geog::s2_centroid(a_ptr);
Expand Down
5 changes: 5 additions & 0 deletions src/constants.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
namespace spherely {

const double EARTH_RADIUS_METERS = 6371.01 * 1000;

}
97 changes: 97 additions & 0 deletions src/io.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
#include <s2/s1angle.h>
#include <s2geography.h>

#include "constants.hpp"
#include "creation.hpp"
#include "geography.hpp"
#include "pybind11.hpp"

namespace py = pybind11;
namespace s2geog = s2geography;
using namespace spherely;

class FromWKT {
public:
FromWKT(bool oriented, bool planar, float tessellate_tol_m = 100.0) {
#if defined(S2GEOGRAPHY_VERSION_MAJOR) && \
(S2GEOGRAPHY_VERSION_MAJOR >= 1 || S2GEOGRAPHY_VERSION_MINOR >= 2)
s2geog::geoarrow::ImportOptions options;
options.set_oriented(oriented);
if (planar) {
auto tol = S1Angle::Radians(tessellate_tol_m / EARTH_RADIUS_METERS);
options.set_tessellate_tolerance(tol);
}
m_reader = std::make_shared<s2geog::WKTReader>(options);
#else
if (planar || oriented) {
throw std::invalid_argument(
"planar and oriented options are only available with s2geography >= 0.2");
}
m_reader = std::make_shared<s2geog::WKTReader>();
#endif
}

PyObjectGeography operator()(py::str a) const {
return make_py_geography(m_reader->read_feature(a));
}

private:
std::shared_ptr<s2geog::WKTReader> m_reader;
};

py::str to_wkt(PyObjectGeography a) {
s2geog::WKTWriter writer;
auto res = writer.write_feature(a.as_geog_ptr()->geog());
return py::str(res);
}

void init_io(py::module& m) {
m.def(
"from_wkt",
[](py::array_t<py::str> a, bool oriented, bool planar, float tessellate_tol_m) {
return py::vectorize(FromWKT(oriented, planar, tessellate_tol_m))(std::move(a));
},
py::arg("a"),
py::arg("oriented") = false,
py::arg("planar") = false,
py::arg("tessellate_tol_m") = 100.0,
R"pbdoc(
Creates geographies from the Well-Known Text (WKT) representation.

Parameters
----------
a : str or array_like
WKT strings.
oriented : bool, default False
Set to True if polygon ring directions are known to be correct
(i.e., exterior rings are defined counter clockwise and interior
rings are defined clockwise).
By default (False), it will return the polygon with the smaller
area.
planar : bool, default False
If set to True, the edges of linestrings and polygons are assumed
to be linear on the plane. In that case, additional points will
be added to the line while creating the geography objects, to
ensure every point is within 100m of the original line.
By default (False), it is assumed that the edges are spherical
(i.e. represent the shortest path on the sphere between two points).
tessellate_tol_m : float, default 100.0
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tessellate_tol_m : float, default 100.0
tessellate_tol : float, default 100.0

(nit suggestion: _m makes the parameter name slightly less readable IMHO and its meaning may not be obvious at a first glance without checking the docs anyway... _meters would be more meaningful but would also make the parameter name too long).

Copy link
Collaborator Author

@jorisvandenbossche jorisvandenbossche Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That keyword name was taken from R s2 (https://r-spatial.github.io/s2/reference/s2_geog_point.html), but I am certainly fine with dropping the _m as well. I agree that from just seeing the keyword name (without reading the explanation for what it stands), it is not necessarily clear that this stands for "meters", making it an unnecessary suffix.

(to be honest, I am also not convinced that "tessellation" is the best name. BigQuery uses it, but if you google for it, it is typically about dividing a plane into multiple shapes that nicely cover it without overlap, e.g. https://en.wikipedia.org/wiki/Edge_tessellation and https://en.wikipedia.org/wiki/Tessellation)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I'm more familiar with the latter meaning (dividing a surface into multiple shapes), but I'm fine with the name here.

Another name could be densify_pts_tol or densify_tol (pyproj uses densify_pts for similar problems).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that in the pyproj case it is (I assume) a uniform/regular densification, while s2 does a non-uniform (only adding points where needed to preserve the tolerance)

I renamed the keyword to tessellate_tolerance (making it a bit longer, but also more readable I think than the "tol", and then it is also consistent with the C++ argument)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to merge this, so I can fixup conflicts with the geoarrow PR. But happy to change the keyword name again in a quick follow-up.

The maximum distance in meters that a point must be moved to
satisfy the planar edge constraint. This is only used if `planar`
is set to True.

)pbdoc");

m.def("to_wkt",
py::vectorize(&to_wkt),
py::arg("a"),
R"pbdoc(
Returns the WKT representation of each geography.

Parameters
----------
a : :py:class:`Geography` or array_like
Geography object(s)

)pbdoc");
}
37 changes: 33 additions & 4 deletions src/pybind11.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ class PyObjectGeography : public py::object {
namespace pybind11 {
namespace detail {

// Force pybind11 to allow PyObjectGeography as argument of vectorized
// Force pybind11 to allow PyObjectGeography and str as argument of vectorized
// functions.
//
// Pybind11 doesn't support non-POD types as arguments for vectorized
Expand All @@ -128,7 +128,19 @@ struct vectorize_arg<spherely::PyObjectGeography> {
using type = conditional_t<vectorize, array_t<remove_cv_t<call_type>, array::forcecast>, T>;
};

// Register PyObjectGeography as a valid numpy dtype (numpy.object alias)
template <>
struct vectorize_arg<py::str> {
using T = py::str;
// The wrapped function gets called with this type:
using call_type = T;
// Is this a vectorized argument?
static constexpr bool vectorize = true;
// Accept this type: an array for vectorized types, otherwise the type
// as-is:
using type = conditional_t<vectorize, array_t<remove_cv_t<call_type>, array::forcecast>, T>;
};

// Register PyObjectGeography and str as a valid numpy dtype (numpy.object alias)
// from: https://github.com/pybind/pybind11/pull/1152
template <>
struct npy_format_descriptor<spherely::PyObjectGeography> {
Expand All @@ -142,6 +154,18 @@ struct npy_format_descriptor<spherely::PyObjectGeography> {
}
};

template <>
struct npy_format_descriptor<py::str> {
static constexpr auto name = _("object");
enum { value = npy_api::NPY_OBJECT_ };
static pybind11::dtype dtype() {
if (auto ptr = npy_api::get().PyArray_DescrFromType_(value)) {
return reinterpret_borrow<pybind11::dtype>(ptr);
}
pybind11_fail("Unsupported buffer format!");
}
};

// Override signature type hint for vectorized Geography arguments
template <int Flags>
struct handle_type_name<array_t<spherely::PyObjectGeography, Flags>> {
Expand All @@ -150,10 +174,10 @@ struct handle_type_name<array_t<spherely::PyObjectGeography, Flags>> {

} // namespace detail

// Specialization of ``pybind11::cast`` for PyObjectGeography (just a pass
// Specialization of ``pybind11::cast`` for PyObjectGeography and str (just a pass
// through).
//
// Allows using PyObjectGeography as return type of vectorized functions.
// Allows using PyObjectGeography and str as return type of vectorized functions.
//
template <
typename T,
Expand All @@ -162,6 +186,11 @@ object cast(T &&value) {
return value;
}

template <typename T, typename detail::enable_if_t<std::is_same<T, py::str>::value, int> = 0>
object cast(T &&value) {
return value;
}

} // namespace pybind11

#endif // SPHERELY_PYBIND11_H_
6 changes: 6 additions & 0 deletions src/spherely.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ void init_geography(py::module&);
void init_creation(py::module&);
void init_predicates(py::module&);
void init_accessors(py::module&);
void init_io(py::module&);

PYBIND11_MODULE(spherely, m) {
m.doc() = R"pbdoc(
Expand All @@ -23,10 +24,15 @@ PYBIND11_MODULE(spherely, m) {
init_creation(m);
init_predicates(m);
init_accessors(m);
init_io(m);

#ifdef VERSION_INFO
m.attr("__version__") = MACRO_STRINGIFY(VERSION_INFO);
#else
m.attr("__version__") = "dev";
#endif

#ifdef S2GEOGRAPHY_VERSION
m.attr("__s2geography_version__") = MACRO_STRINGIFY(S2GEOGRAPHY_VERSION);
#endif
}
13 changes: 13 additions & 0 deletions src/spherely.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ from typing import (
import numpy as np
import numpy.typing as npt

__version__: str = ...
__s2geography_version__: str = ...
EARTH_RADIUS_METERS: float = ...

class Geography:
Expand Down Expand Up @@ -183,3 +185,14 @@ convex_hull: _VFunc_Nin1_Nout1[
Literal["convex_hull"], PolygonGeography, PolygonGeography
]
distance: _VFunc_Nin2optradius_Nout1[Literal["distance"], float, float]

# io functions

to_wkt: _VFunc_Nin1_Nout1[Literal["to_wkt"], str, object]
benbovy marked this conversation as resolved.
Show resolved Hide resolved

def from_wkt(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically this is also a fully vectorized function, i.e., oriented and planar also accept array-like objects, right?

Or do we want to restrict oriented and planar to accepting each a single value? (and use a Functor in order to handle those arguments out of the vectorized call loop like for the predicate functions)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used a lambda in the m.def so to only vectorize the main input data argument, and not oriented/planar.

a: Iterable[str],
oriented: bool = False,
planar: bool = False,
tessellate_tol_m: float = 100.0,
) -> npt.NDArray[Any]: ...
101 changes: 101 additions & 0 deletions tests/test_io.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
import numpy as np
import pytest
from packaging.version import Version

import spherely


def test_from_wkt():
result = spherely.from_wkt(["POINT (1 1)", "POINT(2 2)", "POINT(3 3)"])
expected = spherely.points([1, 2, 3], [1, 2, 3])
# object equality does not yet work
# np.testing.assert_array_equal(result, expected)
assert spherely.equals(result, expected).all()

# from explicit object dtype
result = spherely.from_wkt(
np.array(["POINT (1 1)", "POINT(2 2)", "POINT(3 3)"], dtype=object)
)
assert spherely.equals(result, expected).all()

# from numpy string dtype
result = spherely.from_wkt(
np.array(["POINT (1 1)", "POINT(2 2)", "POINT(3 3)"], dtype="U")
)
assert spherely.equals(result, expected).all()


def test_from_wkt_invalid():
# TODO can we provide better error type?
with pytest.raises(RuntimeError):
spherely.from_wkt(["POINT (1)"])


def test_from_wkt_wrong_type():
with pytest.raises(TypeError, match="expected bytes, int found"):
spherely.from_wkt([1])

# TODO support missing values
with pytest.raises(TypeError, match="expected bytes, NoneType found"):
spherely.from_wkt(["POINT (1 1)", None])


polygon_with_bad_hole_wkt = (
"POLYGON "
"((20 35, 10 30, 10 10, 30 5, 45 20, 20 35),"
"(30 20, 20 25, 20 15, 30 20))"
)


@pytest.mark.skipif(
Version(spherely.__s2geography_version__) < Version("0.2.0"),
reason="Needs s2geography >= 0.2.0",
)
def test_from_wkt_oriented():
# by default re-orients the inner ring
result = spherely.from_wkt(polygon_with_bad_hole_wkt)
assert (
str(result)
== "POLYGON ((20 35, 10 30, 10 10, 30 5, 45 20, 20 35), (20 15, 20 25, 30 20, 20 15))"
)

# if we force to not orient, we get an error
with pytest.raises(RuntimeError, match="Inconsistent loop orientations detected"):
spherely.from_wkt(polygon_with_bad_hole_wkt, oriented=True)
Comment on lines +62 to +64
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, is there a great performance gain when setting oriented=True and when we know that all ring are consistently oriented? I haven't checked the implementation in s2geography (and not much in s2geometry), but I assume that the orientations are checked anyway if such case may trigger raising an error.

Maybe are there other reasons of setting oriented=True? I.e., when do we want to avoid implicit re-ordering of the rings?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might also later uniformize across the library the type of error raised (in #51 a ValueError is raised when a polygon has invalid edges).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know (without trying out) if there is a performance difference, but I think that the oriented=True does allow you to create different geographies? (if you have one that is larger than half of the globe, the default constructor would always normalize it to represent the smaller polygon)

And I can also imagine that just checking that the orientation of all loops are the same is still cheaper than a full normalization (but that is indeed something to check)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might also later uniformize across the library the type of error raised (in #51 a ValueError is raised when a polygon has invalid edges).

Yeah, in this case the error comes from s2geography (in theory I could catch the error and re-throw it with a different class, because those RuntimeErrors are a bit strange)



@pytest.mark.skipif(
Version(spherely.__s2geography_version__) < Version("0.2.0"),
reason="Needs s2geography >= 0.2.0",
)
def test_from_wkt_planar():
result = spherely.from_wkt("LINESTRING (-64 45, 0 45)")
assert spherely.distance(result, spherely.point(-30.1, 45)) > 10000

result = spherely.from_wkt("LINESTRING (-64 45, 0 45)", planar=True)
assert spherely.distance(result, spherely.point(-30.1, 45)) < 100

result = spherely.from_wkt(
"LINESTRING (-64 45, 0 45)", planar=True, tessellate_tol_m=10
)
assert spherely.distance(result, spherely.point(-30.1, 45)) < 10


@pytest.mark.skipif(
Version(spherely.__s2geography_version__) >= Version("0.2.0"),
reason="Needs s2geography < 0.2.0",
)
def test_from_wkt_unsupported_keywords():

with pytest.raises(ValueError):
spherely.from_wkt(polygon_with_bad_hole_wkt, oriented=True)

with pytest.raises(ValueError):
spherely.from_wkt("LINESTRING (-64 45, 0 45)", planar=True)


def test_to_wkt():
arr = spherely.points([1.1, 2, 3], [1.1, 2, 3])
result = spherely.to_wkt(arr)
expected = np.array(["POINT (1.1 1.1)", "POINT (2 2)", "POINT (3 3)"], dtype=object)
np.testing.assert_array_equal(result, expected)
Loading