Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make TextualFromNative output deterministic #254

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

galion96
Copy link

@galion96 galion96 commented Aug 2, 2022

This is my take on resolving the #222.
I used a simple approach of extracting the keys and sorting them in a deterministic way.
Hopefully this is ok or can add some perspective to this issue

@galion96 galion96 changed the title [#222]: Make TextualFromNative output deterministic Make TextualFromNative output deterministic Aug 2, 2022
@xmcqueen
Copy link
Contributor

xmcqueen commented Aug 19, 2022

Hi and thanks! I'm worried about this having a perf impact. goavro has been very focused on perf. Go has a decent framework for running benchmarks. Would you please include a benchmark test with this PR as in this PR #223

@galion96
Copy link
Author

Understood, will do as soon as I find available time

@galion96
Copy link
Author

On my branch

goavro % go test -bench=. -benchtime=100x
goos: darwin
goarch: amd64
pkg: github.com/linkedin/goavro/v2
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkNewCodecUsingV2-16                          100             20088 ns/op
BenchmarkNativeFromAvroUsingV2-16                    100           3460976 ns/op
BenchmarkBinaryFromNativeUsingV2-16                  100           1127675 ns/op
BenchmarkNativeFromBinaryUsingV2-16                  100           3291675 ns/op
BenchmarkTextualFromNativeUsingV2-16                 100          10312088 ns/op
BenchmarkNativeFromTextualUsingV2-16                 100           7148802 ns/op
BenchmarkMapDeterministic/TextualFromNative_deterministic_map_-16                    100               391.8 ns/op

On master

go test -bench=. -benchtime=100x
goos: darwin
goarch: amd64
pkg: github.com/linkedin/goavro/v2
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkNewCodecUsingV2-16                          100             19934 ns/op
BenchmarkNativeFromAvroUsingV2-16                    100           3365947 ns/op
BenchmarkBinaryFromNativeUsingV2-16                  100           1146937 ns/op
BenchmarkNativeFromBinaryUsingV2-16                  100           3351293 ns/op
BenchmarkTextualFromNativeUsingV2-16                 100           6815802 ns/op
BenchmarkNativeFromTextualUsingV2-16                 100           7874666 ns/op
{"f1":{"k1":3.5}}
BenchmarkMapDeterministic/TextualFromNative_deterministic_map_-16               {"f1":{"k1":3.5}}
     100               403.8 ns/op

@xmcqueen What did you had in mind, do you want me to add some really large map with 1M records and compare or?

@ckarmann
Copy link

ckarmann commented Oct 31, 2022

Another way to make it deterministic that would maybe be faster than sorting the keys would be to keep the order of the fields in the records in the schema. It would be consistent with what produce Avro library for Java and also with avro-tools.

Here I think the function should should use nameFromIndex from the Record codec to loop through the names instead of using "range destMap" and that would solve the problem for records.

For map, of course there is no definitive order. The Java implementation uses LinkedHashMap to preserve insertion order. Maybe this project could switch to something similar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants