-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
materialize-elasticsearch: handle fields with multiple types, and array fields #2272
base: main
Are you sure you want to change the base?
Conversation
With array inference information available in the connector protocol, materializations like Elasticsearch can create appropriately typed "array" index mappings for schematized arrays where there is known to be a single type of item.
… non singly-typed arrays Expands handling for fields that have multiple types, and arrays that do not have a single type of item. This is done by materializing the values for these fields as a synthetic object and putting that in a "flattened" index mapping, the same way as other objects are materialized. The generated object has a key of "json" and its value is the JSON representation of the original value.
7cf8e02
to
4e6ecd3
Compare
@@ -148,13 +147,16 @@ func (constrainter) NewConstraints(p *pf.Projection, deltaUpdates bool) *pm.Resp | |||
switch { | |||
case p.IsPrimaryKey: | |||
constraint.Type = pm.Response_Validated_Constraint_LOCATION_REQUIRED | |||
constraint.Reason = "Primary key locations are required" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This NewConstraints
function is now exactly the same as the one used in materialize-sql
:
connectors/materialize-sql/type_mapping.go
Lines 414 to 453 in 7f596b0
func (constrainter) NewConstraints(p *pf.Projection, deltaUpdates bool) *pm.Response_Validated_Constraint { | |
_, isNumeric := boilerplate.AsFormattedNumeric(p) | |
var constraint = pm.Response_Validated_Constraint{} | |
switch { | |
case p.IsPrimaryKey: | |
constraint.Type = pm.Response_Validated_Constraint_LOCATION_REQUIRED | |
constraint.Reason = "All Locations that are part of the collections key are required" | |
case p.IsRootDocumentProjection() && deltaUpdates: | |
constraint.Type = pm.Response_Validated_Constraint_LOCATION_RECOMMENDED | |
constraint.Reason = "The root document should usually be materialized" | |
case p.IsRootDocumentProjection(): | |
constraint.Type = pm.Response_Validated_Constraint_LOCATION_REQUIRED | |
constraint.Reason = "The root document must be materialized" | |
case len(p.Inference.Types) == 0: | |
constraint.Type = pm.Response_Validated_Constraint_FIELD_FORBIDDEN | |
constraint.Reason = "Cannot materialize a field with no types" | |
case p.Field == "_meta/op": | |
constraint.Type = pm.Response_Validated_Constraint_LOCATION_RECOMMENDED | |
constraint.Reason = "The operation type should usually be materialized" | |
case strings.HasPrefix(p.Field, "_meta/"): | |
constraint.Type = pm.Response_Validated_Constraint_FIELD_OPTIONAL | |
constraint.Reason = "Metadata fields are able to be materialized" | |
case p.Inference.IsSingleScalarType() || isNumeric: | |
constraint.Type = pm.Response_Validated_Constraint_LOCATION_RECOMMENDED | |
constraint.Reason = "The projection has a single scalar type" | |
case slices.Equal(p.Inference.Types, []string{"null"}): | |
constraint.Type = pm.Response_Validated_Constraint_FIELD_FORBIDDEN | |
constraint.Reason = "Cannot materialize a field where the only possible type is 'null'" | |
case p.Inference.IsSingleType() && slices.Contains(p.Inference.Types, "object"): | |
constraint.Type = pm.Response_Validated_Constraint_FIELD_OPTIONAL | |
constraint.Reason = "Object fields may be materialized" | |
default: | |
// Any other case is one where the field is an array or has multiple types. | |
constraint.Type = pm.Response_Validated_Constraint_LOCATION_RECOMMENDED | |
constraint.Reason = "This field is able to be materialized" | |
} | |
return &constraint | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description:
Adds handling for fields that have multiple types & fields that are arrays so that they can be materialized to Elasticsearch.
Arrays where the array items are known to have a single type (not including
null
) can use an index mapping of that type. For example, a mapping of typetext
will work with values that are"some value"
,["some value", "another value"]
,["yet another value", null]
, etc. The only exception is an array field where the items are singly-typed as an array: Although you can actually send a value like[["first", "second"], ["third"]]
to atext
mapping like this and have it work, and the connector protocol does not provide "recursive" inference like this. All that can be known about such a field is that it is an array containing arrays, and so there is no "leaf" type to use for the mapping type.Support for a few other things - fields with multiple types, arrays with unknown or non single-typed items, and the aforementioned singly-typed arrays where the type is itself
array
- is added here by wrapping the values in an object. Anobject
type itself is already able to be materialized to aflattened
mapping type, which basically stores arbitrary JSON data (as long as it is an object) in the mapping. We will construct a synthetic object to wrap these values in, in the form of:...where the value of the "json" field is obviously the JSON representation of the value, which may itself be an object, and array, or anything else.
Closes #2271
Workflow steps:
(How does one use this feature, and how has it changed)
Documentation links affected:
N/A since we don't mention anything about fields that may or may not be allowed to be materialized in the docs. This may be an area for future documentation improvement.
Notes for reviewers:
flowctl
built on that branch. It adds the array inference information to builds.This change is