From 90710b71802168847defb4d1a82e1eab221c04d2 Mon Sep 17 00:00:00 2001 From: Oscar Westra van Holthe - Kind Date: Wed, 27 Sep 2023 21:24:14 +0200 Subject: [PATCH] AVRO-3833: [Spec] Clarify usage of names and aliases (#2448) Also includes a section explaining schema fixes --- .../docs/++version++/Specification/_index.md | 22 ++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/doc/content/en/docs/++version++/Specification/_index.md b/doc/content/en/docs/++version++/Specification/_index.md index 9761cdc2922..e43120c2e43 100755 --- a/doc/content/en/docs/++version++/Specification/_index.md +++ b/doc/content/en/docs/++version++/Specification/_index.md @@ -179,10 +179,10 @@ For example, 16-byte quantity may be declared with: {"type": "fixed", "size": 16, "name": "md5"} ``` -### Names {#names} -Record, enums and fixed are named types. Each has a fullname that is composed of two parts; a name and a namespace, separated by a dot. Equality of names is defined on the fullname. +### Names +Record, enums and fixed are named types. Each has a fullname that is composed of two parts: a name and a namespace, separated by a dot. Equality of names is defined on the fullname – it is an error to specify two different types with the same name. -Record fields and enum symbols have names as well (but no namespace). Equality of fields and enum symbols is defined on the name of the field/symbol within its scope (the record/enum that defines it). Fields and enum symbols across scopes are never equal. +Record fields and enum symbols have names as well (but no namespace). Equality of field names and enum symbols is defined within their scope (the record/enum that defines them). It is an error to define multiple fields or enum symbols with the same name in a single type. Fields and enum symbols across scopes are never equal, so field names and enum symbols can be reused in a different type. The name portion of the fullname of named types, record field names, and enum symbols must: @@ -266,6 +266,22 @@ Aliases function by re-writing the writer's schema using aliases from the reader A type alias may be specified either as a fully namespace-qualified, or relative to the namespace of the name it is an alias for. For example, if a type named "a.b" has aliases of "c" and "x.y", then the fully qualified names of its aliases are "a.c" and "x.y". +Aliases are alternative names, and thus subject to the same uniqueness constraints as names. Aliases should be valid names, but this is not required: any string is accepted as an alias. When aliases are used "to map a writer's schema to the reader's" (see above), this allows schema evolution to correct illegal names in old schemata. + +## Fixing an invalid, but previously accepted, schema +Over time, rules and validations on schemas have changed. It is therefore possible that a schema used to work with an older version of Avro, but now fails to parse. + +This can have several reasons, as listed below. Each reason also describes a fix, which can be applied using [schema resolution]({{< ref "#schema-resolution" >}}): you fix the problems in the schema in a way that is compatible, and then you can use the new schema to read the old data. + +### Invalid names +Invalid names of types and fields can be corrected by renaming (using an [alias]({{< ref "#aliases" >}})). This works for simple names, namespaces and fullnames. + +This fix is twofold: first, you add the invalid name as an alias to the type/field. Then, you change the name to any valid name. + +### Invalid defaults +Default values are only used to fill in missing data when reading. Invalid defaults create invalid values in these cases. The fix is to correct the default values. + + ## Data Serialization and Deserialization Binary encoded Avro data does not include type information or field names. The benefit is that the serialized data is small, but as a result a schema must always be used in order to read Avro data correctly. The best way to ensure that the schema is structurally identical to the one used to write the data is to use the exact same schema.