Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add from_wkt/to_wkt functions #50

Merged
merged 14 commits into from
Oct 25, 2024
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 8 additions & 2 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -74,12 +74,18 @@ endif()
add_library(spherely MODULE
src/geography.cpp
src/accessors-geog.cpp
src/io.cpp
src/predicates.cpp
src/spherely.cpp
)

target_compile_definitions(spherely
PRIVATE VERSION_INFO=${PROJECT_VERSION})
target_compile_definitions(
spherely
PRIVATE
VERSION_INFO=${PROJECT_VERSION}
S2GEOGRAPHY_VERSION=${s2geography_VERSION}
S2GEOGRAPHY_VERSION_MAJOR=${s2geography_VERSION_MAJOR}
S2GEOGRAPHY_VERSION_MINOR=${s2geography_VERSION_MINOR})

target_link_libraries(spherely
PRIVATE pybind11::module pybind11::lto pybind11::windows_extras
Expand Down
10 changes: 10 additions & 0 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -67,3 +67,13 @@ Predicates
contains
within
disjoint


Input/Output
------------

.. autosummary::
:toctree: _api_generated/

from_wkt
to_wkt
3 changes: 1 addition & 2 deletions src/accessors-geog.cpp
Original file line number Diff line number Diff line change
@@ -1,14 +1,13 @@
#include <s2geography.h>

#include "constants.hpp"
#include "geography.hpp"
#include "pybind11.hpp"

namespace py = pybind11;
namespace s2geog = s2geography;
using namespace spherely;

const double EARTH_RADIUS_METERS = 6371.01 * 1000;

PyObjectGeography centroid(PyObjectGeography a) {
const auto& a_ptr = a.as_geog_ptr()->geog();
auto s2_point = s2geog::s2_centroid(a_ptr);
Expand Down
5 changes: 5 additions & 0 deletions src/constants.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
namespace spherely {

const double EARTH_RADIUS_METERS = 6371.01 * 1000;

}
81 changes: 81 additions & 0 deletions src/io.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
#include <s2/s1angle.h>
#include <s2geography.h>

#include "constants.hpp"
#include "geography.hpp"
#include "pybind11.hpp"

namespace py = pybind11;
namespace s2geog = s2geography;
using namespace spherely;

PyObjectGeography from_wkt(py::str a, bool oriented, bool planar) {
#if defined(S2GEOGRAPHY_VERSION_MAJOR) && \
(S2GEOGRAPHY_VERSION_MAJOR >= 1 || S2GEOGRAPHY_VERSION_MINOR >= 2)
s2geog::geoarrow::ImportOptions options;
options.set_oriented(oriented);
if (planar) {
auto tol = S1Angle::Radians(100.0 / EARTH_RADIUS_METERS);
options.set_tessellate_tolerance(tol);
}
s2geog::WKTReader reader(options);
#else
if (planar || oriented) {
throw std::invalid_argument(
"planar and oriented options are only available with s2geography >= 0.2");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"planar and oriented options are only available with s2geography >= 0.2");
"planar and oriented options are only available with s2geography >= 1.2");

(per above if?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have it correct now.. In any case, the s2geography version is certainly 0.2 (https://github.com/paleolimbot/s2geography/blob/9ebd7b106ba2ae30046cb4f3e9561a8bbdb18b9b/CMakeLists.txt#L6).

I initially did the #if version check wrong, see de19f0e for correcting it.

We want that both version of 0.2 or 1.1 result in True, with the current version being 0.1 returning False, so that means major >= 1 OR minor >= 2, I think?

}
s2geog::WKTReader reader;
#endif
std::unique_ptr<s2geog::Geography> s2geog = reader.read_feature(a);
auto geog_ptr = std::make_unique<spherely::Geography>(std::move(s2geog));
return PyObjectGeography::from_geog(std::move(geog_ptr));
jorisvandenbossche marked this conversation as resolved.
Show resolved Hide resolved
}

py::str to_wkt(PyObjectGeography a) {
s2geog::WKTWriter writer;
auto res = writer.write_feature(a.as_geog_ptr()->geog());
return py::str(res);
}

void init_io(py::module& m) {
m.def("from_wkt",
py::vectorize(&from_wkt),
py::arg("a"),
py::arg("oriented") = false,
py::arg("planar") = false,
R"pbdoc(
Creates geographies from the Well-Known Text (WKT) representation.

Parameters
----------
a : str or array_like
WKT strings.
oriented : bool, default False
Set to True if polygon ring directions are known to be correct
(i.e., exterior rings are defined counter clockwise and interior
rings are defined clockwise).
By default (False), it will return the polygon with the smaller
area.
planar : bool, default False
If set to True, the edges linestrings and polygons are assumed to
be planar. In that case, additional points will be added to the line
while creating the geography objects, to ensure every point is
within 100m of the original line.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to make this distance threshold configurable? (Note: this can be addressed later).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we should (eg bigquery uses a default of 10m, I think), but was also thinking that this could be added later (although it is not actually difficult to add, because I already hardcode the 100m. I can maybe add this while refactoring to reuse the reader/writer object and to not vectorize those additional arguments)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a tessellate_tol_m keyword, mimicking R's s2 (https://r-spatial.github.io/s2/reference/s2_geog_point.html)

By default (False), it is assumed that the edges are spherical
(i.e. represent the shortest path on the sphere between two points).

)pbdoc");

m.def("to_wkt",
py::vectorize(&to_wkt),
py::arg("a"),
R"pbdoc(
Returns the WKT representation of each geography.

Parameters
----------
a : :py:class:`Geography` or array_like
Geography object(s)

)pbdoc");
}
37 changes: 33 additions & 4 deletions src/pybind11.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ class PyObjectGeography : public py::object {
namespace pybind11 {
namespace detail {

// Force pybind11 to allow PyObjectGeography as argument of vectorized
// Force pybind11 to allow PyObjectGeography and str as argument of vectorized
// functions.
//
// Pybind11 doesn't support non-POD types as arguments for vectorized
Expand All @@ -128,7 +128,19 @@ struct vectorize_arg<spherely::PyObjectGeography> {
using type = conditional_t<vectorize, array_t<remove_cv_t<call_type>, array::forcecast>, T>;
};

// Register PyObjectGeography as a valid numpy dtype (numpy.object alias)
template <>
struct vectorize_arg<py::str> {
using T = py::str;
// The wrapped function gets called with this type:
using call_type = T;
// Is this a vectorized argument?
static constexpr bool vectorize = true;
// Accept this type: an array for vectorized types, otherwise the type
// as-is:
using type = conditional_t<vectorize, array_t<remove_cv_t<call_type>, array::forcecast>, T>;
};

// Register PyObjectGeography and str as a valid numpy dtype (numpy.object alias)
// from: https://github.com/pybind/pybind11/pull/1152
template <>
struct npy_format_descriptor<spherely::PyObjectGeography> {
Expand All @@ -142,6 +154,18 @@ struct npy_format_descriptor<spherely::PyObjectGeography> {
}
};

template <>
struct npy_format_descriptor<py::str> {
static constexpr auto name = _("object");
enum { value = npy_api::NPY_OBJECT_ };
static pybind11::dtype dtype() {
if (auto ptr = npy_api::get().PyArray_DescrFromType_(value)) {
return reinterpret_borrow<pybind11::dtype>(ptr);
}
pybind11_fail("Unsupported buffer format!");
}
};

// Override signature type hint for vectorized Geography arguments
template <int Flags>
struct handle_type_name<array_t<spherely::PyObjectGeography, Flags>> {
Expand All @@ -150,10 +174,10 @@ struct handle_type_name<array_t<spherely::PyObjectGeography, Flags>> {

} // namespace detail

// Specialization of ``pybind11::cast`` for PyObjectGeography (just a pass
// Specialization of ``pybind11::cast`` for PyObjectGeography and str (just a pass
// through).
//
// Allows using PyObjectGeography as return type of vectorized functions.
// Allows using PyObjectGeography and str as return type of vectorized functions.
//
template <
typename T,
Expand All @@ -162,6 +186,11 @@ object cast(T &&value) {
return value;
}

template <typename T, typename detail::enable_if_t<std::is_same<T, py::str>::value, int> = 0>
object cast(T &&value) {
return value;
}

} // namespace pybind11

#endif // SPHERELY_PYBIND11_H_
6 changes: 6 additions & 0 deletions src/spherely.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ namespace py = pybind11;
void init_geography(py::module&);
void init_predicates(py::module&);
void init_accessors(py::module&);
void init_io(py::module&);

PYBIND11_MODULE(spherely, m) {
m.doc() = R"pbdoc(
Expand All @@ -21,10 +22,15 @@ PYBIND11_MODULE(spherely, m) {
init_geography(m);
init_predicates(m);
init_accessors(m);
init_io(m);

#ifdef VERSION_INFO
m.attr("__version__") = MACRO_STRINGIFY(VERSION_INFO);
#else
m.attr("__version__") = "dev";
#endif

#ifdef S2GEOGRAPHY_VERSION
m.attr("__s2geography_version__") = MACRO_STRINGIFY(S2GEOGRAPHY_VERSION);
#endif
}
10 changes: 10 additions & 0 deletions src/spherely.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ from typing import (
import numpy as np
import numpy.typing as npt

__version__: str = ...
__s2geography_version__: str = ...
EARTH_RADIUS_METERS: float = ...

class Geography:
Expand Down Expand Up @@ -138,6 +140,14 @@ boundary: _VFunc_Nin1_Nout1[Literal["boundary"], Geography, Geography]
convex_hull: _VFunc_Nin1_Nout1[Literal["convex_hull"], Geography, Polygon]
distance: _VFunc_Nin2optradius_Nout1[Literal["distance"], float, float]

# io functions

to_wkt: _VFunc_Nin1_Nout1[Literal["to_wkt"], str, object]
benbovy marked this conversation as resolved.
Show resolved Hide resolved

def from_wkt(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically this is also a fully vectorized function, i.e., oriented and planar also accept array-like objects, right?

Or do we want to restrict oriented and planar to accepting each a single value? (and use a Functor in order to handle those arguments out of the vectorized call loop like for the predicate functions)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used a lambda in the m.def so to only vectorize the main input data argument, and not oriented/planar.

a: Iterable[str], oriented: bool = False, planar: bool = False
) -> npt.NDArray[Any]: ...

# temp (remove)

def create(arg0: Iterable[float], arg1: Iterable[float]) -> npt.NDArray[Any]: ...
Expand Down
96 changes: 96 additions & 0 deletions tests/test_io.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
import numpy as np
import pytest
from packaging.version import Version

import spherely


def test_from_wkt():
result = spherely.from_wkt(["POINT (1 1)", "POINT(2 2)", "POINT(3 3)"])
expected = spherely.create([1, 2, 3], [1, 2, 3])
jorisvandenbossche marked this conversation as resolved.
Show resolved Hide resolved
# object equality does not yet work
# np.testing.assert_array_equal(result, expected)
assert spherely.equals(result, expected).all()

# from explicit object dtype
result = spherely.from_wkt(
np.array(["POINT (1 1)", "POINT(2 2)", "POINT(3 3)"], dtype=object)
)
assert spherely.equals(result, expected).all()

# from numpy string dtype
result = spherely.from_wkt(
np.array(["POINT (1 1)", "POINT(2 2)", "POINT(3 3)"], dtype="U")
)
assert spherely.equals(result, expected).all()


def test_from_wkt_invalid():
# TODO can we provide better error type?
with pytest.raises(RuntimeError):
spherely.from_wkt(["POINT (1)"])


def test_from_wkt_wrong_type():
with pytest.raises(TypeError, match="expected bytes, int found"):
spherely.from_wkt([1])

# TODO support missing values
with pytest.raises(TypeError, match="expected bytes, NoneType found"):
spherely.from_wkt(["POINT (1 1)", None])


polygon_with_bad_hole_wkt = (
"POLYGON "
"((20 35, 10 30, 10 10, 30 5, 45 20, 20 35),"
"(30 20, 20 25, 20 15, 30 20))"
)


@pytest.mark.skipif(
Version(spherely.__s2geography_version__) < Version("0.2.0"),
reason="Needs s2geography >= 0.2.0",
)
def test_from_wkt_oriented():
# by default re-orients the inner ring
result = spherely.from_wkt(polygon_with_bad_hole_wkt)
assert (
str(result)
== "POLYGON ((20 35, 10 30, 10 10, 30 5, 45 20, 20 35), (20 15, 20 25, 30 20, 20 15))"
)

# if we force to not orient, we get an error
with pytest.raises(RuntimeError, match="Inconsistent loop orientations detected"):
spherely.from_wkt(polygon_with_bad_hole_wkt, oriented=True)
Comment on lines +62 to +64
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, is there a great performance gain when setting oriented=True and when we know that all ring are consistently oriented? I haven't checked the implementation in s2geography (and not much in s2geometry), but I assume that the orientations are checked anyway if such case may trigger raising an error.

Maybe are there other reasons of setting oriented=True? I.e., when do we want to avoid implicit re-ordering of the rings?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might also later uniformize across the library the type of error raised (in #51 a ValueError is raised when a polygon has invalid edges).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know (without trying out) if there is a performance difference, but I think that the oriented=True does allow you to create different geographies? (if you have one that is larger than half of the globe, the default constructor would always normalize it to represent the smaller polygon)

And I can also imagine that just checking that the orientation of all loops are the same is still cheaper than a full normalization (but that is indeed something to check)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might also later uniformize across the library the type of error raised (in #51 a ValueError is raised when a polygon has invalid edges).

Yeah, in this case the error comes from s2geography (in theory I could catch the error and re-throw it with a different class, because those RuntimeErrors are a bit strange)



@pytest.mark.skipif(
Version(spherely.__s2geography_version__) < Version("0.2.0"),
reason="Needs s2geography >= 0.2.0",
)
def test_from_wkt_planar():
result = spherely.from_wkt("LINESTRING (-64 45, 0 45)")
assert spherely.distance(result, spherely.Point(45, -30)) > 10000

result = spherely.from_wkt("LINESTRING (-64 45, 0 45)", planar=True)
assert spherely.distance(result, spherely.Point(45, -30)) < 100


@pytest.mark.skipif(
Version(spherely.__s2geography_version__) >= Version("0.2.0"),
reason="Needs s2geography >= 0.2.0",
jorisvandenbossche marked this conversation as resolved.
Show resolved Hide resolved
)
def test_from_wkt_unsupported_keywords():

with pytest.raises(ValueError):
spherely.from_wkt(polygon_with_bad_hole_wkt, oriented=True)

with pytest.raises(ValueError):
spherely.from_wkt("LINESTRING (-64 45, 0 45)", planar=True)


def test_to_wkt():
arr = spherely.create([1.1, 2, 3], [1.1, 2, 3])
result = spherely.to_wkt(arr)
expected = np.array(["POINT (1.1 1.1)", "POINT (2 2)", "POINT (3 3)"], dtype=object)
np.testing.assert_array_equal(result, expected)
Loading