-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unwanted type literal added to data in textual #167
Comments
Thanks for posting your question. Truthfully, your schema is using a type union whenever a particular type can be one of an array of choices. Whenever Avro decoding of unions are done, either for binary or text encoding, the Avro decoder needs to unambiguously know what the type is to use to decode the value. With binary, the Avro designers chose to encode the zero-based type index. With text, the Avro designers chose to encode the type information as a JSON object whose property name specifies the type of the value, and whose property value is the encoded value. {
"name": "context",
"type": [
"null",
{
"type": "map",
"values": "string"
}
],
"doc": "context information",
"default": null
}, The snippet about shows a data type that is a union, which can either be a null value, or a map of string values. When unions are binary encoded in Avro, the first value decoded is a long integer that specifies the ordinal number that matches the zero-based index of the actual type. For instance, if In Avro text encoding, because Avro does not use the zero-based index of the actual type, it instead uses a JSON object to specify the type being encoded. In particular, it encodes a JSON object with a single property, the type name of the value, and the value equal to the value being encoded. When It looks like in your case above, the encoder was given an empty map, which is distinctly different than a null value. A null value is encoded as |
If I like to get rid of the types, is there anyway i can provide a map to encoder, any example would be great. Thanks |
I'm not quite sure what you mean. I understand you want to remove types from the JSON output. That would require encoding data with a different schema that does not have a union type. Is your desire to transcode a bunch of data from one schema to a different schema? Namely, you have binary data that was encoded with a schema that has union data types, and you want to encode that data using a schema that does not have union data types? Is your source of binary data some Avro files somewhere? Or is this the only schema involved is whatever you create for this particular project, and you are not working with data outside of this particular effort, and you just need to change the schema you are using to not have union data types? |
OK... let me clarify. I am consuming messages from Kafka stream and my producer is using the same schema (UNIONed) and I want to keep it same. Also I cannot remove The problem is that once I have the data, I need to invoke my implementation where I want to forward pure simple jSON object instead of an AVRO object. This is to reduce complexity and dependency on AVRO. See below sample implementation because of this problem;
I am looking for a way where my implementation should be able to simply do the following to retrieve values from JSON;
NOTE: I have implemented the same in Java using Confluent libs and there I am dealing with an implementation of "org.apache.avro.generic.GenericRecord" and I call |
I suppose what you are looking for is a new feature that converts data from Avro text encoding to JSON that does not encode the type names inside the JSON objects. |
Exactly, thanks for making it easy to explain as well as accepting it as an enhancement. This would add huge value! I will keep an eye on this issue for further updates. |
Please do not hold your breath on this enhancement. It is a bit outside the scope of this library, and I have a few other things that I'm working on. I do agree it's a useful feature, and I'm happy to do the work when I get some time. |
Resolves: linkedin#167
Hi, I have the same problem and I've implemented not to embed type literals (#201). |
Thanks @shotat will try |
For anyone interested in this feature, I believe #249 should address it once it's merged. |
I have looked at the #106 but I am not using any unions. My schema is as follows;
Code to decode is as follows;
Above code produces following output;
How can I get rid of
string
underentityId
andentityName
. Why type is present in textual and is there any way I can get rid of it? whereeventSize
andemittedTime
has correct valuesThe text was updated successfully, but these errors were encountered: