-
Notifications
You must be signed in to change notification settings - Fork 245
MinBin
With 2.x fast serialization adds a second layer to provide the foundation to create structured binary streams.
=============================
Serialization Implementation
=============================
Codec
=============================
One of the drawbacks of serialization is interoperability. However I'd like to be able to serialize complex object graphs into e.g. a javascript client. With inter-language serialization, a client could directly take an object graph from the server and use it without any parsing or manual translation (data.set(message.get()) )
code.
As alternatives like msgpack/bson/google protocol buffers are inconvenient in use (at least from a java perspective) and/or lack features like cycle detection and reference resolvement, I defined MinBin, a kind of binary JSon (contains metainformation such as field names), which also supports and resolves references inside the serialized object graph. Since it has to be accessible from javascript, float/double is represented as strings.
The only source of information should be class definitions to avoid the necessity to explicitely define .iml or similar message description files.
This also opens up the possibility to create distributed actor networks where actors are written in different languages.
A MinBin stream consists of primitives, and tags which signal start+end of more complex composed data structures.
Primitives are:
- signed INT_8, INT_16, INT_32, INT_64
- unsigned INT_16
- arrays of the types above (so sequences of int's can be transmitted with low overhead)
- tags
the following markers signal the (primitive) type of the following byte(s) in a stream:
public final static byte INT_8 = 0b0001; // 1 (17 = array)
public final static byte INT_16 = 0b0010; // 2 (18 ..)
public final static byte INT_32 = 0b0011; // 3 (19 ..)
public final static byte INT_64 = 0b0100; // 4 (20 ..)
public final static byte TAG = 0b0101; // 5, top 5 bits contains tag id
public final static byte END = 0b0110; // 6, end marker
public final static byte RESERV = 0b0111; // escape for future extension
public final static byte UNSIGN_MASK = 0b01000; // int only
public final static byte ARRAY_MASK = 0b10000;// int only, next item expected to be length
public final static byte CHAR = UNSIGN_MASK|INT_16;
so "0b0011,0b0,0b0,0b0,0b1" is the integer value "1".
the following tags define built in "complex" object types:
public static final byte NULL = 7;
public static final byte STRING = 0;
public static final byte OBJECT = 5;
public static final byte SEQUENCE = 6;
public static final byte DOUBLE = 2;
public static final byte DOUBLE_ARR = 3;
public static final byte FLOAT = 1;
public static final byte FLOAT_ARR = 4;
public static final byte BOOL = 8;
public static final byte HANDLE = 9;
fst provides a serialization codec which reads and writes arbitrary object graphs from/to the MinBin format. The advantage of adding meta information to the data stream is, that one can decode such a stream without having access to the originating classes, which is important feature when archiving data long term or for interoperabilty with other languages. Note that MinBin is "java-first", so its convenient to create MinBin from within java, it might be somewhat inconvenient to access the data from other languages.
###Example
public static class ARecord implements Serializable {
String name;
String profession;
int postalCode;
public ARecord(String name, String profession, int postalCode) {
this.name = name;
this.profession = profession;
this.postalCode = postalCode;
}
}
public static class MinBinDemo implements Serializable {
HashMap aMap;
List aList;
int i[] = {1,2,3};
public MinBinDemo() {
aMap = new HashMap();
aMap.put("x", new ARecord("Daphne","unknwon",43355));
aList = new ArrayList<>();
aList.add(aMap);
aList.add("Second Item");
}
}
@Test
public void demo() {
FSTConfiguration conf = FSTConfiguration.createCrossPlatformConfiguration();
conf.registerCrossPlatformClassMappingUseSimpleName( Arrays.asList(
MinBinDemo.class,
ARecord.class
));
new MBPrinter().printMessage(conf.asByteArray(new MinBinDemo()));
}
results in (pretty printed, the real representation is still binary):
"MinBinDemo" {
"aList" : "list" [
2
MBRef(25)
"Second Item"
]
"aMap" : "map" [
1
"x"
"ARecord" {
"profession" : "unknwon"
"postalCode" : 43355
"name" : "Daphne"
}
]
"i" : [ 1,2,3 ]
}
Note that List and HashMap are represented by sequences number of elements as first entry. This is caused by their respective serializers e.g. FSTArrayListSerializer (see '*'):
public void writeObject(FSTObjectOutput out, Object toWrite, FSTClazzInfo clzInfo, FSTClazzInfo.FSTFieldInfo referencedBy, int streamPosition) throws IOException {
ArrayList col = (ArrayList)toWrite;
int size = col.size();
out.writeInt(size); // (*)
Class[] possibleClasses = referencedBy.getPossibleClasses();
for (int i = 0; i < size; i++) {
Object o = col.get(i);
out.writeObjectInternal(o, possibleClasses);
}
}
Conclusion: by changing the underlying codec, binary serialization gets accessible/structured. Note that this way even 'externalizable' output gets visible (however probably some guessing will be required to actually decode such an output).