deriving protobuf is a ppx_deriving plugin that generates Google Protocol Buffers serializers and deserializes from an OCaml type definition.
Sponsored by Evil Martians. protoc export sponsored by MaxProfitLab.
deriving protobuf can be installed via OPAM:
$ opam install ppx_deriving_protobuf
In order to use deriving protobuf, require the package ppx_deriving_protobuf
.
deriving protobuf is not a replacement for protoc and it does not attempt to generate code based on protoc definitions. Instead, it generates code based on OCaml type definitions. It can also generate input files for protoc.
deriving protobuf-generated serializers are derived from the structure of the type
and several attributes: @key
, @encoding
, @bare
and @default
. Generation
of the serializer is triggered by a [@@deriving Protobuf]
attribute attached
to the type definition.
deriving protobuf generates two functions per type:
type t = ... [@@deriving protobuf]
val t_from_protobuf : Protobuf.Decoder.t -> t
val t_to_protobuf : t -> Protobuf.Encoder.t -> unit
In order to deserialize a value of type t
from bytes message
, use:
let output = Protobuf.Decoder.decode_exn t_from_protobuf message in
...
In order to serialize a value input
of type t
, use:
let message = Protobuf.Encoder.encode_exn t_to_protobuf input in
...
A record is the most obvious counterpart for a Protobuf message. In a record, every field must have an explicitly defined key. For example, consider this protoc definition and its deriving protobuf equivalent:
message SearchRequest {
required string query = 1;
optional int32 page_number = 2;
optional int32 result_per_page = 3;
}
type search_request = {
query : string [@key 1];
page_number : int option [@key 2];
result_per_page : int option [@key 3];
} [@@deriving protobuf]
deriving protobuf recognizes and maps option
to optional fields, and
list
and array
to repeated fields.
A [@default]
attribute attached to a required field converts it to an optional
field; if the field is not present, its value is assumed to be the default one,
and conversely, if the value of the field is same as the default value, it is
not serialized:
message Defaults {
optional int32 results = 1 [default = 10];
}
type defaults = {
results : int [@key 1] [@default 10];
}
Note that protoc's default behavior is to assign a type-specific default value
to optional fields missing from message, i.e. 0
to integer fields, ""
to
string fields, and so on. With deriving protobuf, optional fields are represented
with the option
type; it is possible to emulate protoc's behavior by explicitly
specifying int [@default 0]
, etc.
Unlike protoc, deriving protobuf allows a much more flexible mapping between
wire representations of integral types and their counterparts in OCaml.
Any combination of the known integral types (int
, int32
, int64
,
Int32.t
, Int64.t
, Uint32.t
and Uint64.t
) and wire representations
(varint
, zigzag
, bits32
and bits64
) is valid. The wire representation
is specified using the @encoding
attribute.
For example, consider this protoc definition and a compatible deriving protobuf one:
message Integers {
required int32 bar = 1;
required fixed64 baz = 2;
}
type integers = {
bar : Uint64.t [@key 1] [@encoding `varint];
baz : int [@key 2] [@encoding `bits64];
}
When parsing or serializing, the values will be appropriately extended or truncated.
If a value does not fit into the narrower type used for serialization or deserialization,
Decoder.Error Decoder.Overflow
or Encoder.Error Encoder.Overflow
is raised.
The following table summarizes equivalence between integral types of protoc and encodings of deriving protobuf:
Encoding | protoc type |
---|---|
varint | int32, int64, uint32, uint64 |
zigzag | sint32, sint64 |
bits32 | fixed32, sfixed32 |
bits64 | fixed64, sfixed64 |
By default, OCaml types use the following encoding:
OCaml type | Encoding | protoc type |
---|---|---|
int | varint | int32 or int64 |
int32 or Int32.t | bits32 | sfixed32 |
Uint32.t | bits32 | fixed32 |
int64 or Int64.t | bits64 | sfixed64 |
Uint64.t | bits64 | fixed64 |
Note that no OCaml type maps to zigzag-encoded sint32
or sint64
by default.
It is necessary to use [@encoding `zigzag]
explicitly.
Similarly to integers, float
maps to protoc's double
by default,
but it is possible to specify the encoding explicitly:
message Floats {
required float foo = 1;
required double bar = 2;
}
type floats = {
foo : float [@key 1] [@encoding `bits32];
bar : float [@key 2];
} [@@deriving protobuf]
bool
maps to protoc's bool
and is encoded on wire using varint
:
message Booleans {
required bool bar = 1;
}
type booleans = {
bar : bool [@key 1];
} [@@deriving protobuf]
All of string
, String.t
, bytes
and Bytes.t
map to protoc's string
or
bytes
and are encoded on wire using bytes
:
Note that unlike protoc, which has an additional invariant that the contents of
a string
must be valid UTF-8 text, deriving protobuf does not have this invariant.
However, you still should observe it in your programs.
message Strings {
required string bar = 1;
required bytes baz = 2;
}
type strings = {
bar : string [@key 1];
baz : bytes [@key 2];
} [@@deriving protobuf]
A tuple is treated in exactly same way as a record, except that keys are derived
automatically starting at 1. The definition of search_request
above could be
rewritten as:
type search_request' = string * int option * int option
[@@deriving protobuf]
Additionally, a tuple can be used in any context where a scalar value is expected; in this case, it is equivalent to an anonymous inner message:
message Nested {
message StringFloatPair {
required string str = 1;
required float flo = 2;
}
required int32 foo = 1;
optional StringFloatPair bar = 2;
}
type nested = {
foo : int [@key 1];
bar : (string * float) option [@key 2];
} [@@deriving protobuf]
An OCaml variant types is normally mapped to an entire Protobuf message by deriving protobuf,
as opposed to protoc, which maps an enum
to a simple varint
. This is done because
OCaml constructors can have arguments, but protoc's enum
s can not.
Note that even if a type doesn't have any constructor with arguments, it is still mapped to a message, because it would not be possible to extend the type later with a constructor with arguments otherwise.
Every constructor must have an explicitly specified key; if the constructor has one argument, it is mapped to an optional field with the key corresponding to the key of the constructor plus one. If there is more than one argument, they're treated like a tuple.
Consider this example:
message Variant {
enum T {
A = 1;
B = 2;
C = 3;
D = 4;
}
message C {
required string foo = 1;
required string bar = 2;
}
message D {
required string s1 = 1;
required string s2 = 2;
}
required T t = 1;
optional int32 b = 3; // (B = 2) + 1
optional C c = 4; // (C = 3) + 1
optional D d = 5; // (D = 4) + 1
}
type variant =
| A [@key 1]
| B of int [@key 2]
| C of string * string [@key 3]
| D of {s1: string ; s2: string} [@key 4]
[@@deriving protobuf]
Note that decoder considers messages which contain more than one optional field invalid and rejects them.
In order to achieve better compatibility with protoc, it is possible to embed a variant where no constructors have arguments without wrapping it in a message:
enum BareVariant {
A = 1;
B = 2;
}
message Container {
required T value = 1;
}
type bare_variant =
| A [@key 1]
| B [@key 2]
and container = {
value : bare_variant [@key 1] [@bare];
} [@@deriving protobuf]
In practice, if a variant has no constructors with arguments, additional two functions are generated with the following signatures:
type t = A | B | ... [@@deriving protobuf]
val t_from_protobuf_bare : Protobuf.Decoder.t -> t
val t_to_protobuf_bare : Protobuf.Encoder.t -> t -> unit
These functions do not expect additional framing; they just parse or serialize
a single varint
.
Polymorphic variants are handled in exactly same way as regular variants. However, you can also embed them directly, like tuples, in which case the semantics is the same as defining an alias for the variant and then using that type.
This feature can be combined with the [@bare]
annotation to create a useful
shorthand:
message Packet {
enum Type {
REQUEST = 1;
REPLY = 2;
}
required Type type = 1;
required int32 value = 2;
}
type packet = {
type : [ `Request [@key 1] | `Reply [@key 2] ] [@key 1] [@bare];
value : int [@key 2];
} [@@deriving protobuf]
A type alias (statement of form type a = b
) is treated by deriving protobuf as
a definition of a message with one field with key 1:
message Alias {
required int32 val = 1;
}
type alias = int [@@deriving protobuf]
When deriving protobuf encounters a non-scalar type, it generates a call to the serialization or deserialization function corresponding to the full path to the type.
Consider this definition:
type foo = bar * Baz.Quux.t [@@deriving protobuf]
The generated deserializer code will refer to bar_from_protobuf
and
Baz.Quux.t_from_protobuf
; the serializer code will call bar_to_protobuf
and Baz.Quux.t_to_protobuf
.
Types which are encoded as varint
, bits32
or bits64
, that is, numeric
fields or bare variants, can be declared as "packed" with the [@packed]
attribute,
in which case the serializer emits a more compact representation. Only protoc newer
than 2.3.0 will recognize this representation. Note that the deserializer
understands it regardless of the presence of [@packed]
attribute.
message Packed {
repeated int32 elem = 1 [packed=true];
}
type packed = int list [@key 1] [@packed] [@@deriving protobuf]
deriving protobuf is able to handle polymorphic type definitions. In this case, the serializing or deserializing function will accept one additional argument for every type variable; correspondingly, the value of this argument will be passed to serializer or deserializer of any nested parametric type.
Consider this example:
type 'a mylist =
| Nil [@key 1]
| Cons of 'a * 'a mylist [@key 2]
[@@deriving protobuf]
Here, the following functions will be generated:
val mylist_from_protobuf : (Protobuf.Decoder.t -> 'a) -> Protobuf.Decoder.t ->
'a mylist
val mylist_to_protobuf : (Protobuf.Decoder.t -> 'a -> unit) -> Protobuf.Decoder.t ->
'a mylist -> unit
An example usage would be:
type a = int [@@deriving protobuf]
let get_ints message =
let decoder = Protobuf.Decoder.of_bytes message in
mylist_from_protobuf a_from_protobuf decoder
It's also possible to specify concrete types as parameters; in this case, deriving protobuf will infer the serializer/deserializer functions automatically. For example:
(* Combining two samples above *)
type b = a mylist [@@deriving protobuf]
Both serializers and deserializers rigorously verify their input data. The only
possible exception that can be raised during serialization is
Protobuf.Encoder.Failure
, and during deserialization is Protobuf.Decoder.Failure
.
The decoder attempts to annotate its failures with useful location information, but only if that wouldn't cost too much in terms of performance and complexity.
In general, as long as you're using the same protocol on both sides, deserialization or should never raise. The errors would mainly arise when interoperating with code generated by protoc that doesn't observe OCaml-specific invariants, or when handling malicious input.
It discerns these types of failure (represented with Decoder.Failure
exception):
Incomplete
: the message was truncated or using invalid wire format. Frame overruns are likely to produce this as well.Overlong_varint
: avarint
greater than 2⁶⁴-1 was encountered.Malformed_field
: an invalid wire type was encountered.Overflow fld
: an integer field in the message contained a value outside the range of the corresponding type, e.g. avarint
field corresponding toint32
contained0xffffffff
.Unexpected_payload (fld, kind)
: a key corresponding to fieldfld
had a wire type incompatible with the specified encoding, e.g. avarint
wire type for a nested message.Missing_field fld
: a required fieldfld
was missing from the message.Malformed_variant fld
: a variantfld
contained a key not corresponding to any defined constructor.
The decoder errors refer to fields via so-called "paths"; a path corresponds
to the OCaml syntax for referring to a type, field or constructor, but can
contain additional /<number>
(e.g. /0
) component for an immediate tuple.
For example, the string
field will have the path Foo.r.ra/1
:
(* foo.ml *)
type r = {
ra: (int * string) option [@key 1];
} [@@deriving protobuf]
The encoder discerns these types of failure (represented with Encoder.Failure
exception):
Overflow fld
: an integer value was outside the range of its corresponding encoding, e.g. aint64
containing0xffffffffffff
was serialized tobits32
.
The encoder errors use the same "path" convention as decoder errors.
In real-world applications, implementations using multiple versions of the same protocol must coexist. Protocol Buffers offer an imperfect and sometimes complicated, but very powerful and practical solution to this problem.
The wire protocol is designed in a way that allows to safely extend it if one follows a set of constraints.
Any of the following changes may be applied to either the sender or receiver of the message without breaking protocol:
- Adding an optional field to a record, or an optional element to a tuple, or an optional argument to a constructor with multiple arguments.
- Converting an optional field, tuple element or constructor argument into a repeated one.
- Converting an optional field, tuple element or constructor argument into a required field with a default value, or vice versa.
- Converting a repeated field, tuple element or constructor argument into an optional one (this is not recommended, as it silently ignores some of input data).
- Turning an alias into a record that has a field marked
[@key 1]
. - Turning an alias into a tuple where the first element is the former type of the alias (this is not recommended for reasons of code clarity).
When communicating bidirectionally, violating any of the following constraints always results in exceptions or receiving garbage data:
- Never change
[@key]
or[@encoding]
annotations; never add or remove[@bare]
annotation. - Never change primitive (i.e. excluding
list
,option
orarray
qualifiers) types of existing fields, tuple elements or constructor arguments. - Never remove required fields, tuple elements or constructor arguments.
- Never replace a primitive type of a field, tuple element or constructor argument with a tuple, even if the first element of the replacing tuple is the former primitive type.
- Never add arguments to an argument-less variant constructor, or vice versa.
The following sections list some exceptions to this rule when the communication is unidirectional.
Any of the following changes may be applied exclusively to the sender without breaking the existing receivers:
- Adding a required field, tuple element, or argument to a constructor with multiple arguments.
- Converting an optional or repeated field, tuple element or constructor argument into a required one.
- Replacing an integer type with a narrower one while preserving
the encoding (it's a good idea to add the
[@encoding]
annotation explicitly). - Adding a variant constructor, but never actually sending it.
Any of the following changes may be applied exclusively to the receiver without losing the ability to decode messages from existing senders:
- Removing a required field, tuple element, or argument to a constructor with more than two arguments.
- Replacing an integer type with a wider one while preserving the encoding
(it's a good idea to add the
[@encoding]
annotation explicitly).
deriving protobuf can export message types in proto2 language, the format that protoc accepts; protoc version 2.6 or later is required.
To enable protoc export, pass a protoc
option to deriving protobuf:
(* foo.ml *)
type msg = ... [@@deriving protobuf { protoc }]
Compiling this file will create a file called Foo.protoc
(note the capitalization)
in a directory adjacent to foo.ml
; if you are using ocamlbuild and foo.ml
is located in directory src/
, the file will be generated at _build/src/Foo.protoc
.
This can be customized by providing a path explicitly, e.g.
[@@deriving protobuf { protoc = "Bar.protoc" }]
; the path is interpreted
relative to the source file.
The mapping between OCaml types and protoc messages is straightforward.
OCaml modules become protoc packages with the same name.
A nested module, e.g. module Bar
in our foo.ml
, becomes a nested package,
Foo.Bar
; it will be emitted in a file Foo.Bar.protoc
, placed adjacent to
Foo.protoc
, since protoc requires every package to reside in its own file.
OCaml records and their fields become protoc messages and fields with the same name:
type msg = {
name: string [@key 1];
value: int [@key 2];
} [@@deriving protobuf { protoc }]
message msg {
required string name = 1;
required int64 value = 2;
}
OCaml variants and their constructors become protoc messages and fields
with the same name; additionally generated are a nested enum called
_tag
whose constants have the same name as constructors with _tag
appended, and a field named tag
with the type _tag
:
type msg =
| A [@key 1]
| B of string [@key 2]
[@@deriving protobuf { protoc }]
message msg {
enum _tag {
A_tag = 1;
B_tag = 2;
}
required _tag tag = 1;
oneof value {
string B = 3;
}
}
OCaml tuples become protoc messages with the same name whose fields
are called _N
with N
being the field index:
type msg = int * string
[@@deriving protobuf { protoc }]
message msg {
required int64 _0 = 1;
required string _1 = 2;
}
OCaml aliases become protoc messages with one field called _
:
type msg = int
[@@deriving protobuf { protoc }]
message msg {
required int64 _ = 1;
}
Sometimes, a single toplevel OCaml type definition has to be translated
into several messages, e.g. when a field or a constructor contains a tuple
or a polymorphic variant. In this case, such messages become nested messages
whose name is the name of the field or constructor with _
prepended:
type msg = {
field: int * string [@key 1]
}
[@@deriving protobuf { protoc }]
message msg {
message _field {
required int64 _0 = 1;
required string _1 = 2;
}
required _field field = 1;
}
Normally, when a type from another module is referenced, deriving protobuf
automatically generates the corresponding protoc import
directive:
type imported = Other.msg
[@@deriving protobuf { protoc }]
import "Other.protoc";
message imported {
required Other.msg _ = 1;
}
However, when a type is referenced that was defined in a module defined earlier
in the same file, the produced import
directive is incorrect.
(deriving protobuf does not have an accurate model of OCaml's module scoping.)
In this case, the protoc_import
option can help:
(* foo.ml *)
module Bar = struct
type msg = int [@@deriving protobuf { protoc }]
end
type alias = Bar.msg
[@@deriving protobuf { protoc; protoc_import = ["Foo.Bar.protoc"] }]
// Foo.protoc
package Foo;
import "Foo.Bar.protoc";
message alias {
required Bar.msg _ = 1;
}
// Foo.Bar.protoc
package Foo.Bar;
message msg {
required int64 _ = 1;
}
Protocol Buffers specification suggests that if a message contains
multiple instances of a required
or optional
nested message, those nested
messages should be merged. However, there is no concept of "merging messages"
accessible to deriving protobuf, and this feature can be considered harmful anyway:
it is far too forgiving of invalid input. Thus, deriving protobuf doesn't implement
this merging.
deriving protobuf is more strict than protoc with numeric types; it raises
Failure (Overflow fld)
rather than silently truncate values. It is thought
that accidentally losing 32th or 64th bit with OCaml's int
type would be
a common error without this countermeasure.
Everything else should be entirely compatible with protoc.
The documentation for internal API is available at GitHub pages.