Skip to content
This repository has been archived by the owner on Jan 22, 2019. It is now read-only.

Byte arrays are represented as strings in generated avro schema? #39

Closed
asmaier opened this issue Mar 1, 2016 · 3 comments
Closed

Byte arrays are represented as strings in generated avro schema? #39

asmaier opened this issue Mar 1, 2016 · 3 comments
Milestone

Comments

@asmaier
Copy link

asmaier commented Mar 1, 2016

It looks like byte arrays are represented as strings in the generated Avro schema. When I try to generate a schema from my File class below and use that schema to write an Avro message, I get an JsonMappingException: Not in union ["null",{"type":"array","items":"string"}]: java.nio.HeapByteBuffer[pos=0 lim=9 cap=9]:

    private static class File {
        public String filename = "TestFile.txt";
        public byte[] filedata = "Test Text".getBytes();
    }

    @Test
    public void testAvroSchemaGenerationWithJackson() throws JsonProcessingException {

        ObjectMapper mapper = new ObjectMapper(new AvroFactory());
        mapper.registerModule(new JavaTimeModule());
        mapper.configure(SerializationFeature.WRITE_DATES_AS_TIMESTAMPS, false);

        AvroSchemaGenerator visitor = new AvroSchemaGenerator();
        mapper.acceptJsonFormatVisitor(File.class, visitor);
        AvroSchema schema = visitor.getGeneratedSchema();

        ObjectMapper mapper2 = new ObjectMapper(new AvroFactory());
        System.out.println(mapper2.writer(schema).writeValueAsBytes(new File()));

    }

I believe the reason is, because the generated schema represents the byte array as a string:

{
  "type" : "record",
  "name" : "File",
  "namespace" : "xx.xxx.nss",
  "doc" : "Schema for xx.xxx.SchemaGenerationTest$File",
  "fields" : [ {
    "name" : "filedata",
    "type" : [ "null", {
      "type" : "array",
      "items" : "string"
    } ]
  }, {
    "name" : "filename",
    "type" : [ "null", "string" ]
  } ]
}

How can I make byte arrays be represented simply by the type "bytes" ? Or is this a bug?

@cowtowncoder
Copy link
Member

That does sound like a bug, and possibly related to the fact that in JSON byte[] is serialized as Base64-encoded String. But this is not the case for binary formats that have native binary data type.

Thank you for reporting the problem.

@cowtowncoder
Copy link
Member

Hmmh. Ok, so the problem is definitely within ByteArraySerializer, reports it as String... will need to see how to resolve.

@cowtowncoder
Copy link
Member

Unfortunately it looks like there is no way to fix this reliably for 2.7: neither array route, nor String-with-format work; former because stupid JSON Schema inspired callback route does not have qualifier for number types beyond integer/floating point (hence, can't distinguish short/byte/int/long arrays!), latter because set of formats is fixed.

With 2.8 we can at least add the format, or, alternatively, allow specifying actual element type and not just limited set of pre-defined enums.

cowtowncoder added a commit that referenced this issue Mar 15, 2016
@cowtowncoder cowtowncoder changed the title Byte arrays are represented as strings in generated avro schema? (2.8) Byte arrays are represented as strings in generated avro schema? Mar 15, 2016
@cowtowncoder cowtowncoder changed the title (2.8) Byte arrays are represented as strings in generated avro schema? Byte arrays are represented as strings in generated avro schema? Mar 23, 2016
@cowtowncoder cowtowncoder added this to the 2.7.4 milestone Mar 23, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants