Skip to content

Latest commit

 

History

History
199 lines (161 loc) · 6.92 KB

README.md

File metadata and controls

199 lines (161 loc) · 6.92 KB

Type-safe data interchange for Python data classes

JSON is a popular message interchange format employed in API design for its simplicity, readability, flexibility and wide support. However, json.dump and json.load offer no direct support when working with Python data classes employing type annotations. This package offers services for working with strongly-typed Python classes: serializing objects to JSON, deserializing JSON to objects, and producing a JSON schema that matches the data class, e.g. to be used in an OpenAPI specification.

This package offers the following services:

  • Generate a JSON object from a Python object (object_to_json)
  • Parse a JSON object into a Python object (json_to_object)
  • Generate a JSON schema from a Python type (classdef_to_schema and type_to_schema)
  • Validate a JSON object against a Python type (validate_object)

In the context of this package, a JSON object is the (intermediate) Python object representation produced by json.loads from a JSON string. In contrast, a JSON string is the string representation generated by json.dumps from the (intermediate) Python object representation.

Use cases

  • Writing a cloud function (lambda) that communicates with JSON messages received as HTTP payload or websocket text messages
  • Verifying if an API endpoint receives well-formed input
  • Generating a type schema for an OpenAPI specification to impose constraints on what messages an API can receive
  • Parsing JSON configuration files into a Python object

Usage

Consider the following class definition:

@dataclass
class SimpleObjectExample:
    bool_value: bool = True
    int_value: int = 23
    float_value: float = 4.5
    str_value: str = "string"

First, we serialize the object to JSON with

source = SimpleObjectExample()
json_repr = object_to_json(source)

Here, the variable json_repr has the value:

{'bool_value': True, 'int_value': 23, 'float_value': 4.5, 'str_value': 'string'}

Next, we restore the object from JSON with

target = json_to_object(SimpleObjectExample, json_repr)

Here, target holds the restored data class object:

SimpleObjectExample(bool_value=True, int_value=23, float_value=4.5, str_value='string')

We can also produce the JSON schema corresponding to the Python class:

json_schema = json.dumps(classdef_to_schema(SimpleObjectExample), indent=4)

which yields

{
    "$schema": "http://json-schema.org/draft-07/schema#",
    "type": "object",
    "properties": {
        "bool_value": {
            "type": "boolean",
            "default": true
        },
        "int_value": {
            "type": "integer",
            "default": 23
        },
        "float_value": {
            "type": "number",
            "default": 4.5
        },
        "str_value": {
            "type": "string",
            "default": "string"
        }
    },
    "additionalProperties": false,
    "required": [
        "bool_value",
        "int_value",
        "float_value",
        "str_value"
    ]
}

Conversion table

The following table shows the conversion types the package employs:

Python type JSON schema type Behavior
None null
bool boolean
int integer
float number
str string
bytes string represented with Base64 content encoding
datetime string constrained to match ISO 8601 format 2018-11-13T20:20:39+00:00
date string constrained to match ISO 8601 format 2018-11-13
time string constrained to match ISO 8601 format 20:20:39+00:00
Enum value type stores the enumeration value type (typically integer or string)
List[T] array recursive in T
Dict[K, V] object recursive in V, keys are coerced into string
Dict[Enum, V] object recursive in V, keys are of enumeration value type
Set[T] array recursive in T, container has uniqueness constraint
Tuple[T1, T2, ...] array array has fixed length, each element has specific type
data class object iterates over fields of data class
named tuple object iterates over fields of named tuple
Any object iterates over dir(obj)

JSON schema examples

Simple types:

Python type JSON schema
bool {"type": "boolean"}
int {"type": "integer"}
float {"type": "number"}
str {"type": "string"}
bytes {"type": "string", "contentEncoding": "base64"}

Enumeration types:

class Side(enum.Enum):
    LEFT = "L"
    RIGHT = "R"
{"enum": ["L", "R"], "type": "string"}

Container types:

Python type JSON schema
List[int] {"type": "array", "items": {"type": "integer"}}
Dict[str, int] {"type": "object", "additionalProperties": {"type": "integer"}}
Set[int] {"type": "array", "items": {"type": "integer"}, "uniqueItems": True}}
Tuple[int, str] {"type": "array", "minItems": 2, "maxItems": 2, "prefixItems": [{"type": "integer"}, {"type": "string"}]}

Custom serialization and de-serialization

If a composite object (e.g. a dataclass or a plain Python class) has a to_json member function, then this function is invoked to produce a JSON object representation from an instance.

If a composite object has a from_json class function (a.k.a. @classmethod), then this function is invoked, passing the JSON object as an argument, to produce an instance of the corresponding type.

Custom types

It is possible to declare custom types when generating a JSON schema. For example, the following class definition has the annotation @json_schema_type, which will register a JSON schema subtype definition under the path #/definitions/AzureBlob, which will be referenced later with $ref:

_regexp_azure_url = re.compile(
    r"^https?://([^.]+)\.blob\.core\.windows\.net/([^/]+)/(.*)$")

@dataclass
@json_schema_type(
    schema={
        "type": "object",
        "properties": {
            "mimeType": {"type": "string"},
            "blob": {
                "type": "string",
                "pattern": _regexp_azure_url.pattern,
            },
        },
        "required": ["mimeType", "blob"],
        "additionalProperties": False,
    }
)
class AzureBlob(Blob):
    ...

You can use @json_schema_type without the schema parameter to register the type name but have the schema definition automatically derived from the Python type. This is useful if the type is reused across the type hierarchy:

@json_schema_type
class Image:
    ...

class Study:
    left: Image
    right: Image

Here, the two properties of Study (left and right) will refer to the same subtype #/definitions/Image.

Name mangling

If a Python class has a property augmented with an underscore (_) as per PEP 8 to avoid conflict with a Python keyword (e.g. for or in), the underscore is removed when reading from or writing to JSON.