Skip to content

Helios-vmg/cppserialization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LAS[1] is a code generator that accepts a description of one or more data types
and generates C++ code that automatically handles the de/serialization. The
input syntax is reminiscent of C++'s class definition declarative sublanguage,
so it should be intuitive to any C++ programmer.

Advantages:
* Simple maintainability.
* Reference cycles of unlimited depth.
* Space-efficiency.
* Designed to operate with non-seekable streams.

Limitations:
* No built-in version support.
* The graph traversal algorithm can only understand pointer graphs where all the
  pointers point to the proper start of an object. If any pointers point to the
  middle of an object, the behavior is undefined.
* Objects must always be deserialized in full. It's not possible to lazily
  deserialize an object.

  
Comparison with other serialization codebases

boost::serialization[2]
boost::serialization is a helper library, not a code generator. The burden of
writing and maintaining the serialization code goes upon the user of the
library. If the data structure to be serialized changes, the programmer must
change the serialization code accordingly.
LAS, conversely, automatically generates the serialization code, outright
eliminating a whole class of potential bugs.

Google Protobuf[3]
Protobuf generates the serialization code like LAS. Protobuf is designed
primarily for RPC and network protocols; it's data model cannot represent object
graphs of any kind, it can only represent arbitrary length collections inside an
object.

Figure 1:
Root = Object A
Object A -> [Object B, Object C]
Object B -> [Object D]
Object C -> [Object D]

For example, if it's desirable to serialize an object graph such as the one in
figure 1 using Protobuf while preserving the object relationships, the
programmer would first have to manually transform the in-memory graph into
something like in figure 2.

Figure 2:
(JSON)
Root = {
	1: { "name": "Object A", "children": [2, 3] },
	2: { "name": "Object B", "children": [4] },
	3: { "name": "Object C", "children": [4] }
	4: { "name": "Object D", "children": [] }
}

That is, the root would become an associative array. The programmer would have
to write the code that maps a memory address to a position in the array. LAS
includes this code as part of the run-time library.
Another characteristic derived from its design as a protocol definition language
is that Protobuf often requires that in-memory objects be temporarily converted
into the types generated by it prior to serialization, and then converted back
after the message is deserialized. LAS on the other hand is designed to generate
classes that may be used both during serialization and throughout the program.
Protobuf has built-in support for message versioning, allowing the design of
protocols with forwards and backwards compatibility. LAS allows designing class
hierarchies that support versioning with backwards compatibility, but the
programmer must write the versioning support themselves. Forwards compatibility
is not possible.
Protobuf is much more mature than LAS, and possibly faster, but also much larger
and complex.

Apache Avro[4]
Avro shares, AFAICT, the same differences to LAS as Protobuf.

Cap'n Proto[5]
Cap'n Proto is designed by a former developer of Protobuf. Like Protobuf, it is
designed around RPC protocols, and therefore includes some things not directly
related to serialization.
Most of the things I've said about Protobuf can be said about Cap'n Proto.
Cap'n Proto defines not just the serialized format, but also the in-memory
representation of objects. Serialization basically involves dumping the memory
of the object to a stream. LAS must traverse objects and serialize each member
individually. Because the in-memory representation and the serialized
representation are the same, Cap'n Proto is subject to some artificial
restrictions to prevent vulnerabilities to certain kinds of attacks delivered
through maliciously-crafted messages. For example, object graphs are limited to
some arbitrary depth to prevent stack overflows. LAS uses a bounded number of
stack frames both when serializing and when deserializing, so it's able to
process any possible object graph.


[1] Less-Ambitious Serializer. I originally wanted to use libclang to parse C++
    and generate the serialization code directly from that, but I scaled back to
    just processing an input text file.
[2] http://www.boost.org/doc/libs/1_60_0/libs/serialization/
[3] https://developers.google.com/protocol-buffers/
[4] https://avro.apache.org/
[5] https://capnproto.org/

About

A C++ serializer generator with serialization of cycles.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published