-
Notifications
You must be signed in to change notification settings - Fork 427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PARQUET-2471: Add geometry logical type #240
base: master
Are you sure you want to change the base?
Changes from all commits
f7b956b
a1472f0
1d583d5
3a23ba1
85acff9
4b8e1f8
69b5978
ea12bd2
82ac232
82be6d8
336a4f2
7de7d79
04d4f94
68f061d
3196377
79aedb0
89bfac7
d98cf61
2a3524f
3e54c7e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -237,6 +237,105 @@ struct SizeStatistics { | |||||||||||||
3: optional list<i64> definition_level_histogram; | ||||||||||||||
} | ||||||||||||||
|
||||||||||||||
/** | ||||||||||||||
* Interpretation for edges of elements of a GEOMETRY logical type. In other | ||||||||||||||
* words, whether a point between two vertices should be interpolated in | ||||||||||||||
* its XY dimensions as if it were a Cartesian line connecting the two | ||||||||||||||
* vertices (planar) or the shortest spherical arc between the longitude | ||||||||||||||
* and latitude represented by the two vertices (spherical). This value | ||||||||||||||
* applies to all non-point geometry objects and is independent of the | ||||||||||||||
* coordinate reference system. | ||||||||||||||
* | ||||||||||||||
* Because most systems currently assume planar edges and do not support | ||||||||||||||
* spherical edges, planar should be used as the default value. | ||||||||||||||
*/ | ||||||||||||||
enum Edges { | ||||||||||||||
PLANAR = 0; | ||||||||||||||
SPHERICAL = 1; | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What does this really mean? That we only allow either Cartesian geometries, or geometries embedded on a sphere? How would one indicate geometries embedded on a spheroid (like WGS84)? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If one has a column for cartesian geometries, use Edges.Planar. If one has a column for a geometry on sphere (aka Geography type), use Edges.Spherical. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What about geodesic edges? Do we disallow them? How would you represent a geometry in WGS84? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Isn't duplicating the information provided by the CRS? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
If producing and/or storing data with geodesic edges becomes a widespread use-case, another enum value could be added. Until that is the case, spherical edges tesselated with enough vertices to minimize that error to an acceptable level are almost certainly sufficient for any practical use case.
The coordinate system (i.e., context for individual coordinate tuples) and how the dataset author intended a line to be drawn between any two coordinates are orthogonal concepts: GIS systems, for better or worse, frequently are used to "simplify" datasets and can do things like model the very long segment of the US--Canada border along the 49th parallel as a single line segment. Because there is quite a lot of data that was authored in this way (and few ways to produce data with spherical edges, even though this makes more sense in a global context), we need a way to express that intent here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Then the intend is to express the interpolation method between points? Should the enumeration have "interpolation" or something similar in its name for making that intend clearer? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the word There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I also prefer adding documentation to |
||||||||||||||
} | ||||||||||||||
|
||||||||||||||
/** | ||||||||||||||
* A custom binary-encoded polygon or multi-polygon to represent a covering of | ||||||||||||||
* geometries. For example, it may be a bounding box or an envelope of geometries | ||||||||||||||
* when a bounding box cannot be built (e.g. a geometry has spherical edges, or if | ||||||||||||||
* an edge of geographic coordinates crosses the antimeridian). It may be | ||||||||||||||
* extended in future versions to provide vendor-agnostic coverings like | ||||||||||||||
* vectors of cells on a discrete global grid (e.g., S2 or H3 cells). | ||||||||||||||
*/ | ||||||||||||||
struct Covering { | ||||||||||||||
/** | ||||||||||||||
* A type of covering. Currently accepted values: "WKB". | ||||||||||||||
*/ | ||||||||||||||
1: required string kind; | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is 'kind' a typical thing in Parquet? If not I'd call this 'type' - kind sounds weird to me, but not sure if 'type' as a very specific meaning for parquet. This is totally just a stylistic thing, so feel free to ignore. But note in the description we say 'a type of covering', not a 'the kind of covering'. Kind seems 'fuzzier', that it's wkb or close to it, while type seems to indicate just one to me. |
||||||||||||||
/** | ||||||||||||||
* A payload specific to kind. Below are the supported values: | ||||||||||||||
* - WKB: well-known binary of a POLYGON or MULTI-POLYGON that completely | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Not sure this formatting works, but this is the first mention of WKB. I originally just added it here |
||||||||||||||
* covers the contents. This will be interpreted according to the same CRS | ||||||||||||||
* and edges defined by the logical type. | ||||||||||||||
*/ | ||||||||||||||
2: required binary value; | ||||||||||||||
} | ||||||||||||||
|
||||||||||||||
/** | ||||||||||||||
* Bounding box of geometries in the representation of min/max value pair of | ||||||||||||||
* coordinates from each axis. Values of Z and M are omitted for 2D geometries. | ||||||||||||||
* Filter pushdown on geometries using this is only safe for planar spatial | ||||||||||||||
* filters. | ||||||||||||||
*/ | ||||||||||||||
struct BoundingBox { | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For spherical edges, are there any restrictions as to what is considered a valid bounding box? I would even ask the same for Cartesian edges, although it is less ambiguous here. For example, what happens if xmax is smaller than xmin? Is it up to the reader to decide what to do? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The bounding box statistics will be always calculated by getting the min/max of X and Y values. For cartesian geometries, this is not a problem at all because there is no antimeridian and orientation issue at all in this case. For spherical geometries, this is a problem so that's why the spec says that it is not safe to use it for spherical geometries. In this proposal, (1) there is another type of statistics called covering that attempts to solve this issue. (2) there is another field called However, in this proposal, the main focus is the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
For geometries on a sphere or a spheroid this may no longer produce a bounding box if you restrict to just the vertices of the geometries. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
If a GEOGRAPHY going to be a different logical type? Maybe this not such a bad idea, in which case the current proposal may want to focus just on the Cartesian space and nothing else. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. While PostGIS uses a distinct type for geography, this is not necessarily a good idea that other formats should reproduce (I think it was due to historical constraints). Any geometry associated to a geographic CRS may be considered "geography". And as said by mkaravel, naive computation of min/max values will not work in the latter case. I suggest to remove the sentence saying "it is recommended that the writer always generates bounding box statistics, regardless of whether the geometries are planar or spherical". I suggest to even avoid saying that the values are min/max, or said that it is the case in Cartesian coordinate system but not necessarily in other coordinate system. In particular I suggest to keep open the possibility that "min" > "max". This is the only approach that I know which works with anti-meridian. Bounding boxes in the EPSG database are already expressed that way. Some standards such as GGXF also mandate it. It makes calculation of union and intersection more complicated, but libraries only need to define I'm not saying that the specification should support "min" > "max" now, just suggesting to keep this door open. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think consolidating GEOGRAPHY and GEOMETRY in the same type helps simplify things and more succinctly describe what the difference is between the two. Considering them as separate types is helpful in type systems where parameterized types are difficult or impossible to express (e.g., Postgres).
I don't think this is a good idea...dataset authors should split anything that crosses the antimeridian into two polygons to satisfy this constraint (which would allow Parquet implementations to do filter pushdown without ever having to inspect the
I agree with this...I think it's fine if they exist but saying that they should or must generate them is confusing (since they shouldn't be used). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Not necessarily. This approach introduces other problems (e.g. creates an artificial edge), and "min" > "max" is a good alternative which avoids this problem. Some dataset authors already use the latter (e.g. EPSG) and may not agree that they should change. This approach is also already used by some OGC/ISO standards.
It is not necessary to inspect the CRS key if the specification said that every "min" > "max" cases are wraparound. It is not even necessary to know that the wraparound is 360° or any other value. It works in a truly generic way. The only requirement is to have more advanced implementations of the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you for the background! I still think this is a better fit for a dedicated (future) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is not a geometry-specific concept. The idea is to reinterpret the "min" and "max" values so that they are no longer min/max, but "start value" and "end value". Then add the following rules:
That's all. No geometric concept, no need to know that the wraparound happens at ±180°. It works if all coordinates are inside a consistent range of validity. It may be [−180 … +180]° of longitude, or [0 … 360]° or anything else at user's choice (actually defined by the CRS). As long as all coordinates are guaranteed inside the same range, above algorithm does not need to know that range. This proposal works for all kinds of wraparound: not only longitudes, but also climatological data (e.g. average temperatures of January, February … December, then back to January with no particular year associated to those months), radar, etc. By contrast, if we choose to add a Above needs are not hypothetical. By coincidence, I'm facing this week the problem of using climatological data in GIS system that do not understand temporal wraparound. Those data are common at WMO, NATO and other organizations. The "min" > "max" proposal addresses the general case. The There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
FYI @dmitrykoval . We were discussing earlier that some engine still have a way to use bounding box even for spherical edges (there's ways to convert?) |
||||||||||||||
1: required double xmin; | ||||||||||||||
wgtmac marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||
2: required double xmax; | ||||||||||||||
3: required double ymin; | ||||||||||||||
4: required double ymax; | ||||||||||||||
5: optional double zmin; | ||||||||||||||
6: optional double zmax; | ||||||||||||||
7: optional double mmin; | ||||||||||||||
8: optional double mmax; | ||||||||||||||
} | ||||||||||||||
|
||||||||||||||
/** Statistics specific to GEOMETRY logical type */ | ||||||||||||||
struct GeometryStatistics { | ||||||||||||||
/** A bounding box of geometries */ | ||||||||||||||
1: optional BoundingBox bbox; | ||||||||||||||
|
||||||||||||||
/** | ||||||||||||||
* A list of coverings of geometries. | ||||||||||||||
* Note that It is allowed to have more than one covering of the same kind and | ||||||||||||||
* implementation is free to use any of them. It is recommended to have at most | ||||||||||||||
* one covering for each kind. | ||||||||||||||
*/ | ||||||||||||||
2: optional list<Covering> coverings; | ||||||||||||||
|
||||||||||||||
/** | ||||||||||||||
* The geometry types of all geometries, or an empty array if they are not | ||||||||||||||
wgtmac marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||
* known. This is borrowed from `geometry_types` column metadata of GeoParquet [1] | ||||||||||||||
* except that values in the list are WKB (ISO variant) integer codes [2]. Table | ||||||||||||||
* below shows the most common geometry types and their codes: | ||||||||||||||
* | ||||||||||||||
* | Type | XY | XYZ | XYM | XYZM | | ||||||||||||||
* | :----------------- | :--- | :--- | :--- | :--: | | ||||||||||||||
* | Point | 0001 | 1001 | 2001 | 3001 | | ||||||||||||||
* | LineString | 0002 | 1002 | 2002 | 3002 | | ||||||||||||||
* | Polygon | 0003 | 1003 | 2003 | 3003 | | ||||||||||||||
* | MultiPoint | 0004 | 1004 | 2004 | 3004 | | ||||||||||||||
* | MultiLineString | 0005 | 1005 | 2005 | 3005 | | ||||||||||||||
* | MultiPolygon | 0006 | 1006 | 2006 | 3006 | | ||||||||||||||
* | GeometryCollection | 0007 | 1007 | 2007 | 3007 | | ||||||||||||||
* | ||||||||||||||
* In addition, the following rules are used: | ||||||||||||||
* - A list of multiple values indicates that multiple geometry types are | ||||||||||||||
* present (e.g. `[0003, 0006]`). | ||||||||||||||
* - An empty array explicitly signals that the geometry types are not known. | ||||||||||||||
* - The geometry types in the list must be unique (e.g. `[0001, 0001]` | ||||||||||||||
* is not valid). | ||||||||||||||
* | ||||||||||||||
* Please refer to links below for more detail: | ||||||||||||||
* [1] https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry#Well-known_binary | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just minor point, the origin of WKB and geo class hierarchy seem to be https://www.ogc.org/standard/sfa/ and these other ones are derivative. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, I really wish OGC published these as html documents so we could link directly into the relevant section, instead of just saying go to https://portal.ogc.org/files/?artifact_id=25355 and look at chapter 8. |
||||||||||||||
* [2] https://github.com/opengeospatial/geoparquet/blob/v1.1.0/format-specs/geoparquet.md?plain=1#L159 | ||||||||||||||
*/ | ||||||||||||||
3: optional list<i32> geometry_types; | ||||||||||||||
} | ||||||||||||||
|
||||||||||||||
/** | ||||||||||||||
* Statistics per row group and per page | ||||||||||||||
* All fields are optional. | ||||||||||||||
|
@@ -279,6 +378,9 @@ struct Statistics { | |||||||||||||
7: optional bool is_max_value_exact; | ||||||||||||||
/** If true, min_value is the actual minimum value for a column */ | ||||||||||||||
8: optional bool is_min_value_exact; | ||||||||||||||
|
||||||||||||||
/** statistics specific to geometry logical type */ | ||||||||||||||
9: optional GeometryStatistics geometry_stats; | ||||||||||||||
} | ||||||||||||||
|
||||||||||||||
/** Empty structs to use as logical type annotations */ | ||||||||||||||
|
@@ -373,6 +475,105 @@ struct JsonType { | |||||||||||||
struct BsonType { | ||||||||||||||
} | ||||||||||||||
|
||||||||||||||
/** | ||||||||||||||
* Physical type and encoding for the geometry type. | ||||||||||||||
*/ | ||||||||||||||
enum GeometryEncoding { | ||||||||||||||
/** | ||||||||||||||
* Allowed for physical type: BYTE_ARRAY. | ||||||||||||||
* | ||||||||||||||
* Well-known binary (WKB) representations of geometries. | ||||||||||||||
* | ||||||||||||||
* To be clear, we follow the same rule of WKB and coordinate axis order from | ||||||||||||||
* GeoParquet [1][2]. Geometries SHOULD be encoded as ISO WKB [3][4] | ||||||||||||||
* supporting XY, XYZ, XYM, XYZM and the standard geometry types | ||||||||||||||
* Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, | ||||||||||||||
* and GeometryCollection). Coordinate order is always (x, y) where x is | ||||||||||||||
* easting or longitude and y is northing or latitude. This ordering explicitly | ||||||||||||||
* overrides the axis order as specified in the CRS following the GeoPackage | ||||||||||||||
* specification [5]. | ||||||||||||||
* | ||||||||||||||
* This is the preferred encoding for maximum portability. It also supports | ||||||||||||||
* GeometryStatistics to be set in the column chunk and page index. | ||||||||||||||
* | ||||||||||||||
* [1] https://github.com/opengeospatial/geoparquet/blob/v1.1.0/format-specs/geoparquet.md?plain=1#L92 | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same comment as above, also not sure if we can reference the other one to this one to eliminate repetition. |
||||||||||||||
* [2] https://github.com/opengeospatial/geoparquet/blob/v1.1.0/format-specs/geoparquet.md?plain=1#L155 | ||||||||||||||
wgtmac marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||
* [3] https://portal.ogc.org/files/?artifact_id=18241 | ||||||||||||||
* [4] https://www.iso.org/standard/60343.html | ||||||||||||||
* [5] https://www.geopackage.org/spec130/#gpb_spec | ||||||||||||||
*/ | ||||||||||||||
WKB = 0; | ||||||||||||||
wgtmac marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||
} | ||||||||||||||
|
||||||||||||||
/** | ||||||||||||||
* Geometry logical type annotation (added in 2.11.0) | ||||||||||||||
*/ | ||||||||||||||
struct GeometryType { | ||||||||||||||
/** | ||||||||||||||
* Physical type and encoding for the geometry type. | ||||||||||||||
* Please refer to the definition of GeometryEncoding for more detail. | ||||||||||||||
*/ | ||||||||||||||
1: required GeometryEncoding encoding; | ||||||||||||||
/** | ||||||||||||||
* Edges of geometry type. | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
In the other comments there seemed to be desire to keep the 'edges' name here, but this documentation is so short that it seemed useful to at least give a short overview before referring people to a more in depth definition. |
||||||||||||||
* Please refer to the definition of Edges for more detail. | ||||||||||||||
*/ | ||||||||||||||
2: required Edges edges; | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The plural here is a bit confusing. My first hunch is that it's a list of elements... but, no, it's a single enum named |
||||||||||||||
/** | ||||||||||||||
* Coordinate Reference System, i.e. mapping of how coordinates refer to | ||||||||||||||
* precise locations on earth. Writers are not required to set this field. | ||||||||||||||
* Once crs is set, crs_encoding field below MUST be set together. | ||||||||||||||
* For example, "OGC:CRS84" can be set in the form of PROJJSON as below: | ||||||||||||||
pitrou marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||
* { | ||||||||||||||
* "$schema": "https://proj.org/schemas/v0.5/projjson.schema.json", | ||||||||||||||
* "type": "GeographicCRS", | ||||||||||||||
* "name": "WGS 84 longitude-latitude", | ||||||||||||||
* "datum": { | ||||||||||||||
* "type": "GeodeticReferenceFrame", | ||||||||||||||
* "name": "World Geodetic System 1984", | ||||||||||||||
* "ellipsoid": { | ||||||||||||||
* "name": "WGS 84", | ||||||||||||||
* "semi_major_axis": 6378137, | ||||||||||||||
* "inverse_flattening": 298.257223563 | ||||||||||||||
* } | ||||||||||||||
* }, | ||||||||||||||
* "coordinate_system": { | ||||||||||||||
* "subtype": "ellipsoidal", | ||||||||||||||
* "axis": [ | ||||||||||||||
* { | ||||||||||||||
* "name": "Geodetic longitude", | ||||||||||||||
* "abbreviation": "Lon", | ||||||||||||||
* "direction": "east", | ||||||||||||||
* "unit": "degree" | ||||||||||||||
* }, | ||||||||||||||
* { | ||||||||||||||
* "name": "Geodetic latitude", | ||||||||||||||
* "abbreviation": "Lat", | ||||||||||||||
* "direction": "north", | ||||||||||||||
* "unit": "degree" | ||||||||||||||
* } | ||||||||||||||
* ] | ||||||||||||||
* }, | ||||||||||||||
* "id": { | ||||||||||||||
* "authority": "OGC", | ||||||||||||||
* "code": "CRS84" | ||||||||||||||
* } | ||||||||||||||
* } | ||||||||||||||
*/ | ||||||||||||||
3: optional string crs; | ||||||||||||||
wgtmac marked this conversation as resolved.
Show resolved
Hide resolved
pitrou marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||
/** | ||||||||||||||
* Encoding used in the above crs field. It MUST be set if crs field is set. | ||||||||||||||
* Currently the only allowed value is "PROJJSON". | ||||||||||||||
*/ | ||||||||||||||
4: optional string crs_encoding; | ||||||||||||||
/** | ||||||||||||||
* Additional informative metadata. | ||||||||||||||
* GeoParquet could offload its column metadata in a JSON-encoded UTF-8 string: | ||||||||||||||
* https://github.com/opengeospatial/geoparquet/blob/v1.1.0/format-specs/geoparquet.md?plain=1#L46 | ||||||||||||||
Comment on lines
+570
to
+572
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
...or some other clarification to specifically indicate what is or is not allowed here at the time of this writing? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can this metadata be anything else than a simple key-value mapping? If just the latter, perhaps this can be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would be in favour of a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 for list as it sounds like that better matches how things work here. I believe used JSON because that was the way to specify in the user space that we have. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The original intention to use a |
||||||||||||||
*/ | ||||||||||||||
5: optional string metadata; | ||||||||||||||
} | ||||||||||||||
|
||||||||||||||
/** | ||||||||||||||
* LogicalType annotations to replace ConvertedType. | ||||||||||||||
* | ||||||||||||||
|
@@ -403,6 +604,7 @@ union LogicalType { | |||||||||||||
13: BsonType BSON // use ConvertedType BSON | ||||||||||||||
14: UUIDType UUID // no compatible ConvertedType | ||||||||||||||
15: Float16Type FLOAT16 // no compatible ConvertedType | ||||||||||||||
16: GeometryType GEOMETRY // no compatible ConvertedType | ||||||||||||||
} | ||||||||||||||
|
||||||||||||||
/** | ||||||||||||||
|
@@ -954,6 +1156,7 @@ union ColumnOrder { | |||||||||||||
* ENUM - unsigned byte-wise comparison | ||||||||||||||
* LIST - undefined | ||||||||||||||
* MAP - undefined | ||||||||||||||
* GEOMETRY - undefined, use GeometryStatistics instead. | ||||||||||||||
* | ||||||||||||||
* In the absence of logical types, the sort order is determined by the physical type: | ||||||||||||||
* BOOLEAN - false, true | ||||||||||||||
|
@@ -1084,6 +1287,9 @@ struct ColumnIndex { | |||||||||||||
* Same as repetition_level_histograms except for definitions levels. | ||||||||||||||
**/ | ||||||||||||||
7: optional list<i64> definition_level_histograms; | ||||||||||||||
|
||||||||||||||
/** A list containing statistics of GEOMETRY logical type for each page */ | ||||||||||||||
8: optional list<GeometryStatistics> geometry_stats; | ||||||||||||||
} | ||||||||||||||
|
||||||||||||||
struct AesGcmV1 { | ||||||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: can we avoid the plural? Perhaps make it
EdgeShape
orEdgeKind
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use the same definition from GeoParquet (https://github.com/opengeospatial/geoparquet/blob/main/format-specs/geoparquet.md) and we try to be consistent with GeoParquet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pity. If you want to store a collection of
Edges
, how do you name it?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, I understand it's probably not a practical concern :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emm.. Why would someone want to store a collection of edges? For each geometry column, the
Edges
field has only 1 possible value.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be in favour of a more descriptive enum name (
EdgeKind
orEdgeInterpolation
?), but keeping the attribute of the type consistent with GeoParquet (e.g., it's stillgeom_type.edges
, the enum class just happens to beEdgeInterpolation
)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say
EdgeInterpolation
is the better name thanEdgeKind
. +1 for keeping the attribute of the type the same as GeoParquet for consistency and simplicity, but makes sense to me that the enum name could be more descriptive.