v.dissolve: Compute attribute aggregate statistics #2388

wenzeslaus · 2022-05-20T14:22:11Z

In addition to geometry dissolving, compute aggregate statistics for the attribute values of dissolved features with v.db.univar and SQL.

v.db.select with group is used to obtain unique values of the column the dissolving is based on. Add column and update now happens for every value, column, and statistics.

Originally implemented with v.db.univar only because it has a good set of functions, but direct SQL is faster and potentially can have more functions (although default SQLite has less).

Auto-generates names and combinations of column-method for convenience, but when all needed parameters are provided, uses them as is.

Has documentation, examples, image for original functionality, and test (image generated in notebook).

Uses plural for columns and methods.

Removes duplicate columns and methods for non-explicit automatic (interactive) result column handling.

Support SQL expressions as columns (as in v.db.update query_column or v.db.select columns). Supports general SQL syntax just like v.db.select for the price of less checks. Supports also text-returning aggregate functions and functions with multiple parameters such as SQLite group_concat. Supports any layer, not just 1, for attributes.

Uses a simple SQL escape function to double single quotes.

Requires v.db.univar JSON output and v.db.select column info in JSON output.

Handles cleanup from the main function. Removes global variables. Uses PID and node name for the temporary vector. Partially modernizes the existing code by using gs alias instead of grass alias. Improves author lists.

wenzeslaus · 2022-06-07T21:58:00Z

This depends on JSON output in #2386.

wenzeslaus · 2022-06-10T04:13:31Z

Notebook to test: https://mybinder.org/v2/gh/wenzeslaus/grass/v_dissolve-attr-stats?urlpath=lab%2Ftree%2Fscripts%2Fv.dissolve%2Fv_dissolve.ipynb

HuidaeCho

@wenzeslaus Overall, it looks good to me. Just have a few minor comments.

scripts/v.dissolve/v.dissolve.py

wenzeslaus · 2022-06-13T22:45:55Z

I made some significant updates. The interface is now tailored for two different use cases, one for interactive use, when a lot of things happen automatically and then the other one for scripting when user is expected to be very explicit about what should be computed. This is now described in the documentation.

I resolved some of the comments after writing a comment with resolution, but some may still require some discussion.

Thank you for the feedback, @HuidaeCho.

wenzeslaus · 2022-06-14T17:56:17Z

Should the option names be in singular or plural? For this module, it is aggregate_column, aggregate_method, and result_column versus aggregate_columns, aggregate_methods, and result_columns.

Is it singular as in:

r.patch input=aaa,bbb,ccc output=xxx
v.patch input=aaa,bbb,ccc output=xxx
g.list type=raster,vector

or plural as in:

v.db.select map=aaa columns=xxx,yyy
v.db.addcolumn map=aaa columns="xxx double precision,yyy integer"
v.db.addtable map=aaa columns="xxx double precision,yyy integer"

?

Singular versus plural in option names

The following does not include more special options in terms of use of singular and plural such as cats, coordinates or GDAL options.

Standard options

Standard options with multiple set to yes.

id	name	s/p
G_OPT_V_INPUTS	input	singular
G_OPT_V_MAPS	map	singular
G_OPT_V_TYPE	type	singular
G_OPT_V3_TYPE	type	singular
G_OPT_DB_COLUMNS	columns	plural
G_OPT_R_INPUTS	input	singular
G_OPT_R_OUTPUTS	output	singular
G_OPT_R_MAPS	map	singular
G_OPT_R_ELEVS	elevation	singular
G_OPT_R3_INPUTS	input	singular
G_OPT_R3_MAPS	map	singular
G_OPT_M_DATATYPE	type	singular
G_OPT_STDS_INPUTS	inputs	plural
G_OPT_STRDS_INPUTS	inputs	plural
G_OPT_STRDS_OUTPUTS	outputs	plural
G_OPT_STVDS_INPUTS	inputs	plural
G_OPT_STR3DS_INPUTS	inputs	plural

Notes: G_OPT_V_OUTPUTS does not exist, but given the other ones, it would be output. There is many temporal options, but they are not used that much as the other ones: all combined are used 10 times while just G_OPT_V_INPUTS is used 5 times and G_OPT_R3_INPUTS 3 times.

Vector modules in C

C modules with multiple = YES in the vector directory. Other modules not listed (too many).

module	option	s/p
v.normal	tests	plural
v.clean	tool	singular
v.net.iso	costs	plural
v.what	layer	singular
v.distance	column, upload	singular
v.build	option	singular
v.in.pdal	class_filter	singular
v.in.ogr	layer	singular

Python modules

Python modules with multiple: yes.

module	option	s/p
v.db.select	columns	plural
v.db.univar, db.univar	percentile	singular
v.db.addtable	columns	plural
v.db.addcolumnumn	columns	plural
r.patch	input	singular
r.texture	method	singular
r.buffer.lowmem	distances	plural
r.semantic.label	semantic_label	singular
r.in.wms	layers, styles	plural
g.search.modules	keyword	singular
t.vect.db.select	columns	plural
t.merge	inputs	plural

scripts/v.dissolve/v.dissolve.html

wenzeslaus · 2023-07-18T15:36:46Z

While this could be faster or parallel, the aggregation works well for simple cases and can deal with some complex cases, too. From my perspective, this is ready to be merged right after #3090.

scripts/v.dissolve/v.dissolve.py

In addition to geometry dissolving, compute aggregate statistics for the attribute values of dissolved features with v.db.univar. Requires v.db.univar JSON output. v.db.select with group is used to obtain unique values of the column the dissolving is based on. Add column and update now happens for every value, column, and statistics.

…supported in the rest of the code), test

…rom the main function. Remove global variables. Use PID and node name for the temporary vector.

…y dev null in cleanup code. Modernize and Pylint generic error message.

…ocumentation

…ass alias

…ractive) result column handling

… v.db.select columns)

…oncat.

…s checks. Now depends on v.db.select producing the list of column names OSGeo#3090. Test and example included.

Issues addressed, code changed significantly, review is no longer relevant.

In addition to geometry dissolving, compute aggregate statistics for the attribute values of dissolved features with v.db.univar and SQL. v.db.select with group is used to obtain unique values of the column the dissolving is based on. Add column and update now happens for every value, column, and statistics. Originally implemented with v.db.univar only because it has a good set of functions, but direct SQL is faster and potentially can have more functions (although default SQLite has less). Auto-generates names and combinations of column-method for convenience, but when all needed parameters are provided, uses them as is. Has documentation, examples, image for original functionality, and test (image generated in notebook). Uses plural for columns and methods. Removes duplicate columns and methods for non-explicit automatic (interactive) result column handling. Support SQL expressions as columns (as in v.db.update query_column or v.db.select columns). Supports general SQL syntax just like v.db.select for the price of less checks. Supports also text-returning aggregate functions and functions with multiple parameters such as SQLite group_concat. Supports any layer, not just 1, for attributes. Uses a simple SQL escape function to double single quotes. Requires v.db.univar JSON output and v.db.select column info in JSON output. Handles cleanup from the main function. Removes global variables. Uses PID and node name for the temporary vector. Partially modernizes the existing code by using gs alias instead of grass alias. Improves author lists.

wenzeslaus added this to the 8.4.0 milestone May 20, 2022

wenzeslaus added Python Related code is in Python enhancement New feature or request labels May 20, 2022

wenzeslaus force-pushed the v_dissolve-attr-stats branch from 1c6fe46 to e440269 Compare June 9, 2022 20:20

HuidaeCho previously requested changes Jun 10, 2022

View reviewed changes

wenzeslaus marked this pull request as ready for review June 14, 2022 17:56

wenzeslaus added the C Related code is in C label Jul 26, 2022

wenzeslaus removed the C Related code is in C label Aug 28, 2022

wenzeslaus modified the milestones: 8.3.0, 8.4.0 Feb 10, 2023

wenzeslaus force-pushed the v_dissolve-attr-stats branch from 1f7058f to d2cd66f Compare May 4, 2023 16:42

wenzeslaus force-pushed the v_dissolve-attr-stats branch from d2cd66f to b646272 Compare July 17, 2023 13:28

wenzeslaus requested a review from HuidaeCho July 17, 2023 14:20

petrasovaa reviewed Jul 17, 2023

View reviewed changes

scripts/v.dissolve/v.dissolve.html Show resolved Hide resolved

wenzeslaus force-pushed the v_dissolve-attr-stats branch from b646272 to d14d6f2 Compare July 18, 2023 13:48

marisn approved these changes Jul 18, 2023

View reviewed changes

scripts/v.dissolve/v.dissolve.py Show resolved Hide resolved

wenzeslaus added 9 commits July 19, 2023 09:53

Create columns only for the first value, support null (not clear how …

65c833d

…supported in the rest of the code), test

More tests, fix optional aggregation and multiple column creation

0b34030

Create and update all columns at once

ea60dde

Create transaction in a function

8ba12f6

Support direct SQL as a backend besides v.db.univar. Handle cleanup f…

2b84604

…rom the main function. Remove global variables. Use PID and node name for the temporary vector.

Functions for common aggregation functionality; shorter code. Simplif…

504595f

…y dev null in cleanup code. Modernize and Pylint generic error message.

Test for dissolve/merge of areas without shared boundaries.

f7322ba

Support any layer, not just 1

4da6a56

wenzeslaus added 14 commits July 19, 2023 09:53

Check for valid methods more clear with ints than bools

d91a932

Notebook with examples and test

ec6ff98

Do not generate all combinations when all options are provided, add d…

1fee2b0

…ocumentation

Partially modernize the existing code by using gs alias instead of gr…

f3f0f84

…ass alias

Image to doc, generated in notebook

415e666

Result of in is bool, no need to convert

588df54

Remove duplicate columns and methods for non-explicit automatic (inte…

c7f26bd

…ractive) result column handling

Support SQL expressions as columns (as in v.db.update query_column or…

98a847d

… v.db.select columns)

Support also text-returning aggregate function such as SQLite group_c…

18786dd

…oncat.

Add examples to documentation, use plural for columns and methods

fbc0740

Add simple SQL escape function to double single quotes

c880175

Flip some conditions for better readability and less indentation

1d5dfee

Improve author lists

10818e2

Support general SQL syntax just like v.db.select for the price of les…

be56efd

…s checks. Now depends on v.db.select producing the list of column names OSGeo#3090. Test and example included.

wenzeslaus force-pushed the v_dissolve-attr-stats branch from 7cd9500 to be56efd Compare July 19, 2023 13:53

Add more description and see also

9dc9867

wenzeslaus merged commit 9d44603 into OSGeo:main Jul 22, 2023
19 checks passed

wenzeslaus deleted the v_dissolve-attr-stats branch July 22, 2023 01:53

wenzeslaus removed the request for review from HuidaeCho July 22, 2023 01:54

landam mentioned this pull request Feb 11, 2024

v.clip: do not fail when clip map has no table connected #3416

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v.dissolve: Compute attribute aggregate statistics #2388

v.dissolve: Compute attribute aggregate statistics #2388

wenzeslaus commented May 20, 2022 •

edited

Loading

wenzeslaus commented Jun 7, 2022

wenzeslaus commented Jun 10, 2022

HuidaeCho left a comment

wenzeslaus commented Jun 13, 2022

wenzeslaus commented Jun 14, 2022

wenzeslaus commented Jul 18, 2023

v.dissolve: Compute attribute aggregate statistics #2388

v.dissolve: Compute attribute aggregate statistics #2388

Conversation

wenzeslaus commented May 20, 2022 • edited Loading

wenzeslaus commented Jun 7, 2022

wenzeslaus commented Jun 10, 2022

HuidaeCho left a comment

Choose a reason for hiding this comment

wenzeslaus commented Jun 13, 2022

wenzeslaus commented Jun 14, 2022

Singular versus plural in option names

Standard options

Vector modules in C

Python modules

wenzeslaus commented Jul 18, 2023

wenzeslaus commented May 20, 2022 •

edited

Loading