Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add avro-ocf reader codec #1344

Merged
merged 4 commits into from
Jul 28, 2022
Merged

Add avro-ocf reader codec #1344

merged 4 commits into from
Jul 28, 2022

Conversation

loicalleyne
Copy link
Contributor

Resolves #762

Option to use native encoding/json or github.com/goccy/go-json for JSON marshaling (latter is slightly faster).

@mihaitodor
Copy link
Collaborator

Thanks for this contribution! I think this implementation will suffer from the logical type serialisation issues that I somewhat fixed via #1198 by forcing the call to codec.TextualFromNative here. We could also think of a way to let users opt out from this call if they don't use logical types if this operation proves slow.

Note, however, that introducing goccy/go-json as an extra dependency should be considered carefully, since it's doing some dark magic under the hood. What is the speedup that you observed? By the way, we also don't need to force the serialisation to a JSON string at this point, I think, because we can use the SetStructured method on a Message object to set in it an object instead of a JSON string and Benthos will serialise it to a string (or something else) when it needs to emit it somewhere. This way, if some transformation (i.e. bloblang) needs to be executed against this object in some subsequent processor, then Benthos won't have to deserialise the JSON again. BUT, all of this is moot if there are logical types which need to be serialised correctly, which is why I made the change in #1198... Unfortunately, some bugs / limitations in goavro are described in #1040 that I'm hoping will be addressed soon. Otherwise, we might have to maintain a custom implementation of their codec.TextualFromNative method somewhere, which I'm not really keen to do.

@loicalleyne
Copy link
Contributor Author

First of all, I should tell you that this is my first contribution to open source ever, so thanks for your patience and explanations.

I didn't fully understand the Message type and its methods, I'll remove goccy,

Here's what I'm thinking, changing the implementation to check if the avro codec string contains logicalType and store a bool in avroOCFReader.logicalTypes.
Then, if the schema only contains primitive or complex types avroOCFReader.Next would call SetStructured, and if any are found it will instead call codec.TextualFromNative and SetBytes like in processor_schema_registry_decode.

@mihaitodor
Copy link
Collaborator

No worries and happy to help out!

Here's what I'm thinking, changing the implementation to check if the avro codec string contains logicalType and store a bool in avroOCFReader.logicalTypes.
Then, if the schema only contains primitive or complex types avroOCFReader.Next would call SetStructured, and if any are found it will instead call codec.TextualFromNative and SetBytes like in processor_schema_registry_decode.

I guess it will work, but it feels hackish and the user can't control it. If it's really important to avoid calling codec.TextualFromNative in some cases (is it a lot slower than json.Marshal or what issues are you seeing with it?), then maybe we can do something like codec: avro-ocf:marshaler=json and codec: avro-ocf:marshaler=goavro, but I'll defer to @Jeffail for the syntax review.

@loicalleyne
Copy link
Contributor Author

The issue I was seeing with it was this one - linkedin/goavro#167 Unwanted type literal added to data in textual - the fix for this is pending a merge of a pull request in goavro (Added Two Way Handling for Standard Json - linkedin/goavro#249).

I implemented your configuration suggestions, and cleaned up the constructor code a little bit.

@mihaitodor
Copy link
Collaborator

Cool, thanks! I think linkedin/goavro#249 will be merged quite soon (please feel free to upvote it) and then we can call that API both here and in schema_registry_decode, so we'll get consistent behaviour. There are a few linting issues that you'll want to address first and then I'll let @Jeffail do a detailed review to see how he'd like to proceed here until that goavro PR gets merged.

@loicalleyne
Copy link
Contributor Author

I ran make fmt, make lint and make docs and committed to my fork.
make test seems to work until it gets to the cue section.

Test 'config/test/awk.yaml' succeeded Test 'config/test/bloblang/also_tests_boolean_operands.yaml' succeeded Test 'config/test/bloblang/boolean_operands.yaml' succeeded Test 'config/test/bloblang/cities_test.yaml' succeeded Test 'config/test/bloblang/csv.yaml' succeeded Test 'config/test/bloblang/csv_formatter_test.yaml' succeeded Test 'config/test/bloblang/env.yaml' succeeded Test 'config/test/bloblang/fans.yaml' succeeded Test 'config/test/bloblang/github_releases_test.yaml' succeeded Test 'config/test/bloblang/literals.yaml' succeeded Test 'config/test/bloblang/message_expansion.yaml' succeeded Test 'config/test/bloblang/walk_json.yaml' succeeded Test 'config/test/bloblang/windowed.yaml' succeeded Test 'config/test/cookbooks/filtering.yaml' succeeded Test 'config/test/deduplicate.yaml' succeeded Test 'config/test/deduplicate_by_batch.yaml' succeeded Test 'config/test/files_for_content.yaml' succeeded Test 'config/test/filters.yaml' succeeded Test 'config/test/mock_http_proc.yaml' succeeded Test 'config/test/mock_http_proc_path.yaml' succeeded Test 'config/test/protobuf/house.yaml' succeeded Test 'config/test/protobuf/people.yaml' succeeded Test 'config/test/resources/other_mappings.yaml' succeeded Test 'config/test/resources/some_mappings.yaml' succeeded Test 'config/test/unit_test_example.yaml' succeeded make: ./resources/cue/test/test.sh: Command not found make: *** [Makefile:95: test] Error 127

I tried running the script manually but ran into other errors.
: not found test.sh: 3: set: Illegal option -

@Jeffail
Copy link
Collaborator

Jeffail commented Jul 27, 2022

Hey @loicalleyne don't worry the CUE scripts won't change with your MR, this is looking great I just need to put time aside to think through the package layout but at a glance I think it's ready to go.

@Jeffail Jeffail merged commit 56727e2 into redpanda-data:main Jul 28, 2022
@Jeffail
Copy link
Collaborator

Jeffail commented Jul 28, 2022

Thanks @loicalleyne this looks great! We can take another look once linkedin/goavro#249 is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Processing Avro OCF
3 participants