Skip to content

Commit

Permalink
AVRO-3833: [Spec] Clarify usage of names and aliases (#2448)
Browse files Browse the repository at this point in the history
Also includes a section explaining schema fixes
  • Loading branch information
opwvhk authored Sep 27, 2023
1 parent 3cdc34e commit 90710b7
Showing 1 changed file with 19 additions and 3 deletions.
22 changes: 19 additions & 3 deletions doc/content/en/docs/++version++/Specification/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -179,10 +179,10 @@ For example, 16-byte quantity may be declared with:
{"type": "fixed", "size": 16, "name": "md5"}
```

### Names {#names}
Record, enums and fixed are named types. Each has a fullname that is composed of two parts; a name and a namespace, separated by a dot. Equality of names is defined on the fullname.
### Names
Record, enums and fixed are named types. Each has a fullname that is composed of two parts: a name and a namespace, separated by a dot. Equality of names is defined on the fullname – it is an error to specify two different types with the same name.

Record fields and enum symbols have names as well (but no namespace). Equality of fields and enum symbols is defined on the name of the field/symbol within its scope (the record/enum that defines it). Fields and enum symbols across scopes are never equal.
Record fields and enum symbols have names as well (but no namespace). Equality of field names and enum symbols is defined within their scope (the record/enum that defines them). It is an error to define multiple fields or enum symbols with the same name in a single type. Fields and enum symbols across scopes are never equal, so field names and enum symbols can be reused in a different type.

The name portion of the fullname of named types, record field names, and enum symbols must:

Expand Down Expand Up @@ -266,6 +266,22 @@ Aliases function by re-writing the writer's schema using aliases from the reader

A type alias may be specified either as a fully namespace-qualified, or relative to the namespace of the name it is an alias for. For example, if a type named "a.b" has aliases of "c" and "x.y", then the fully qualified names of its aliases are "a.c" and "x.y".

Aliases are alternative names, and thus subject to the same uniqueness constraints as names. Aliases should be valid names, but this is not required: any string is accepted as an alias. When aliases are used "to map a writer's schema to the reader's" (see above), this allows schema evolution to correct illegal names in old schemata.

## Fixing an invalid, but previously accepted, schema
Over time, rules and validations on schemas have changed. It is therefore possible that a schema used to work with an older version of Avro, but now fails to parse.

This can have several reasons, as listed below. Each reason also describes a fix, which can be applied using [schema resolution]({{< ref "#schema-resolution" >}}): you fix the problems in the schema in a way that is compatible, and then you can use the new schema to read the old data.

### Invalid names
Invalid names of types and fields can be corrected by renaming (using an [alias]({{< ref "#aliases" >}})). This works for simple names, namespaces and fullnames.

This fix is twofold: first, you add the invalid name as an alias to the type/field. Then, you change the name to any valid name.

### Invalid defaults
Default values are only used to fill in missing data when reading. Invalid defaults create invalid values in these cases. The fix is to correct the default values.


## Data Serialization and Deserialization
Binary encoded Avro data does not include type information or field names. The benefit is that the serialized data is small, but as a result a schema must always be used in order to read Avro data correctly. The best way to ensure that the schema is structurally identical to the one used to write the data is to use the exact same schema.

Expand Down

0 comments on commit 90710b7

Please sign in to comment.