Skip to content
RuedigerMoeller edited this page Aug 19, 2014 · 12 revisions

With 2.x fast serialization adds a second layer to provide the foundation to create structured binary streams.

=============================
Serialization Implementation
=============================
           Codec
=============================

One of the drawbacks of serialization is interoperability. However I'd like to be able to serialize complex object graphs into e.g. a javascript client. With inter-language serialization, a client could directly take an object graph from the server and use it without any parsing or manual translation (data.set(message.get()) ) code.

As alternatives like msgpack/bson/google protocol buffers are inconvenient in use (at least from a java perspective) and/or lack features like cycle detection and reference resolvement, I defined MinBin, a kind of binary JSon (contains metainformation such as field names), which also supports and resolves references inside the serialized object graph. Since it has to accessible from javascript, float/double is represented as string, so floating point numbers are pretty slow compared to "native" serialization. The only source of information should be class definitions to avoid the necessity to explicitely define .iml or similar message description files.

This also opens up the possibility to create distributed actor networks where actors are written in different languages.

MinBin

A MinBin stream consists of primitives, and tags which signal start+end of more complex composed data structures.

Primitives are:

  • signed INT_8, INT_16, INT_32, INT_64
  • unsigned INT_16
  • arrays of the types above (so sequences of int's can be transmitted with low overhead)
  • tags

the following markers signal the (primitive) type of the following byte(s) in a stream:

    public final static byte INT_8  = 0b0001; // 1 (17 = array)
    public final static byte INT_16 = 0b0010; // 2 (18 ..)
    public final static byte INT_32 = 0b0011; // 3 (19 ..)
    public final static byte INT_64 = 0b0100; // 4 (20 ..)
    public final static byte TAG    = 0b0101; // 5, top 5 bits contains tag id
    public final static byte END    = 0b0110; // 6, end marker
    public final static byte RESERV = 0b0111; // escape for future extension

    public final static byte UNSIGN_MASK = 0b01000; // int only
    public final static byte ARRAY_MASK = 0b10000;// int only, next item expected to be length
    public final static byte CHAR   = UNSIGN_MASK|INT_16;

so "0b0011,0b0,0b0,0b0,0b1" is the integer value "1".

the following tags define built in "complex" object types:

    public static final byte NULL = 7;
    public static final byte STRING = 0;
    public static final byte OBJECT = 5;
    public static final byte SEQUENCE = 6;
    public static final byte DOUBLE = 2;
    public static final byte DOUBLE_ARR = 3;
    public static final byte FLOAT = 1;
    public static final byte FLOAT_ARR = 4;
    public static final byte BOOL = 8;
    public static final byte HANDLE = 9;

fst provides a serialization codec which reads and writes arbitrary object graphs from/to the MinBin format. The advantage of adding meta information to the data stream is, that one can decode such a stream without having access to the originating classes, which is important feature when archiving data long term or for interoperabilty with other languages. Note that MinBin is "java-first", so its convenient to create MinBin from within java, it might be somewhat inconvenient to access the data from other languages.

###Example

public static class ARecord implements Serializable {
        String name;
        String profession;
        int postalCode;

        public ARecord(String name, String profession, int postalCode) {
            this.name = name;
            this.profession = profession;
            this.postalCode = postalCode;
        }

    }

    public static class MinBinDemo implements Serializable {
        HashMap aMap;
        List aList;
        int i[] = {1,2,3};

        public MinBinDemo() {
            aMap = new HashMap();
            aMap.put("x", new ARecord("Heinz","butcher",56555));
            aMap.put("x", new ARecord("Daphne","unknwon",43355));
            aList = new ArrayList<>();
            aList.add(aMap);
            aList.add("Second Item");
        }
    }

    @Test
    public void demo() {
        FSTConfiguration conf = FSTConfiguration.createCrossPlatformConfiguration();
        conf.registerCrossPlatformClassMappingUseSimpleName( Arrays.asList(
                MinBinDemo.class,
                ARecord.class
        ));
        new MBPrinter().printMessage(conf.asByteArray(new MinBinDemo()));
    }

results in:

"MinBinDemo" {
    "aList" : "list" [
      2
      MBRef(25)
      "Second Item"
    ]
    "aMap" : "map" [
      1
      "x"
      "ARecord" {
        "profession" : "unknwon"
        "postalCode" : 43355
        "name" : "Daphne"
      }
    ]
    "i" : [ 1,2,3 ]
  }

Note that List and HashMap are represented by sequences number of elements as first entry. This is caused by their respective serializers e.g. FSTArrayListSerializer (see '*'):


public void writeObject(FSTObjectOutput out, Object toWrite, FSTClazzInfo clzInfo,    FSTClazzInfo.FSTFieldInfo referencedBy, int streamPosition) throws IOException {
    ArrayList col = (ArrayList)toWrite;
    int size = col.size();
    out.writeInt(size);                        // (*)
    Class[] possibleClasses = referencedBy.getPossibleClasses();
    for (int i = 0; i < size; i++) {
        Object o = col.get(i);
        out.writeObjectInternal(o, possibleClasses);
    }
}
Clone this wiki locally