Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pgtype.ArrayCodec should support non-comma delimiter #2086

Open
wchargin opened this issue Jul 13, 2024 · 2 comments
Open

pgtype.ArrayCodec should support non-comma delimiter #2086

wchargin opened this issue Jul 13, 2024 · 2 comments

Comments

@wchargin
Copy link

Is your feature request related to a problem? Please describe.

For many Postgres types, array elements are separated by commas:

$ psql -c "SELECT array[array[g,g],array[g,g]] FROM (VALUES('g'::text)) g(g)"
     array     
---------------
 {{g,g},{g,g}}
(1 row)

However, some types use a different delimiter. In particular, PostGIS geometry and geography types use a colon as the delimiter:

$ psql -c "SELECT array[array[g,g],array[g,g]] FROM (VALUES('GEOMETRYCOLLECTION EMPTY'::geometry)) g(g)"
                                       array                                       
-----------------------------------------------------------------------------------
 {{010700000000000000:010700000000000000}:{010700000000000000:010700000000000000}}
(1 row)

(Note that, in this multidimensional array, the delimiter separates both the base elements and the constituent 1D arrays.)

This is controlled by the DELIMITER option to CREATE TYPE: see, e.g., CREATE TYPE geometry (..., delimiter = ':').

I am using twpayne/pgx-geom to encode and decode my PostGIS values with pgx. So far, this only supports scalars. I see that pgtype's ArrayCodec should make it nice and easy to support arrays, too, and indeed the obvious approach works out of the box for the binary format, but for the text format it fails with "can't scan into dest[0]: encoding/hex: invalid byte: U+003A ':'". This is because ArrayCodec gives the whole 0107...:0107... string to the element codec, even though that represents two values.

Describe the solution you'd like

Perhaps pgtype.Type could have a new public field like Delimiter string, such that the empty string (default) means "use comma as delimiter" but any other string would be used as the delimiter instead. Then, ArrayCodec could read this value from the ElementType *pgtype.Type to encode and decode. If I understand correctly, this should be a backward-compatible change.

Describe alternatives you've considered

Alternately, the Delimiter field could exist on the ArrayCodec itself. In some ways, that seems more natural, since it affects only the array parsing, but it would deviate from Postgres's ontology, where (quoting the same CREATE TYPE docs) "the delimiter is associated with the array element type, not the array type itself".

I have of course also considered hand-rolling the parsers and encoders. It's not too bad, but I'd probably end up with a weird half-and-half solution using the ArrayCodec implementation for the binary format but a custom implementation for the text format. Workable, but certainly a shame when all that's needed is a single character to be handled differently!

Additional context

None comes to mind.

@jackc
Copy link
Owner

jackc commented Jul 13, 2024

This seems like a good idea to me.

Not sure on whether this configuration should go on pgtype.Type or pgtype.ArrayCodec.

Either way, the pgx.Conn.LoadType and pgx.Conn.LoadTypes methods should also be updated to load this value.

@wchargin
Copy link
Author

Good point about LoadType. That made me think to check what other types have typdelim <> ',', and it seems that the standard type box exhibits this bug in pgx without any external dependency on PostGIS:

err := pool.AcquireFunc(ctx, func(conn *pgxpool.Conn) error {
	formats := []struct{code int16; name string}{
		{code: pgx.BinaryFormatCode, name: "binary"},
		{code: pgx.TextFormatCode, name: "text"},
	}
	for _, format := range formats {
		rows, err := conn.Query(ctx, "WITH g(g) AS (VALUES('((1,1),(2,2))'::box)) SELECT array[array[g,g],array[g,g]] FROM g", pgx.QueryResultFormats{format.code})
		if err != nil {
			return err
		}
		for rows.Next() {
			var dst [][]pgtype.Box
			if err := rows.Scan(&dst); err != nil {
				return fmt.Errorf("format=%v: failed to scan: %w", format.name, err)
			}
			fmt.Printf("format=%v: row: %T %v\n", format.name, dst, dst)
		}
	}
	return nil
})
fmt.Printf("AcquireFunc error: %v\n", err)
format=binary: row: [][]pgtype.Box [[{[{2 2} {1 1}] true} {[{2 2} {1 1}] true}] [{[{2 2} {1 1}] true} {[{2 2} {1 1}] true}]]
AcquireFunc error: format=text: failed to scan: can't scan into dest[0]: unexpected trailing data: }

(On my system, it is just box, _box, geometry, _geometry, geography, and _geography that have non-comma delimiters.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants