wip: integrate binary data into payloads #283

kelnos · 2024-03-07T08:56:35Z

Previously, the non-binary part of a message and the binary payloads in a message were represented separately: the non-binary portion was represented by a serde_json::Value, and could be converted to an arbitrary data structure. That data structure would not include the binary data or any indication that there is any binary data at all. The binary data would be provided in a Vec<Vec>.

There were a few problems with this:

The original code only supported cases where the payload was a flat array with some binary payloads in the root of the array, or a flat object where the root of the object was a binary payload. Objects with more complicated structure and binary data embedded in various places in the structure were not supported.
Adding support for the above turned out to not be possible in a useful way, because the ordering of the Vec<Vec> matters, and it could never be clear where exactly in the possibly-complex structure each binary payload belonged.
One of the useful features of the socket.io protocol is that it lets users efficiently transmit binary data in addition to textual/numeric data, and have that handled transparently by the protocol, with either end of the connection believing that they just sent or received a single mixed textual/numeric/binary payload. Separating the non-binary from the binary negates that benefit.

This introduces a new type, PayloadValue, that behaves similarly to serde_json::Value. The main difference is that it has a Binary variant, which holds a numeric index and a Vec. This allows us to include the binary data where the sender of that data intended it to be.

There is currently one wrinkle: serde_json does not appear to consistently handle binary data; when serializing a struct with Vec, I believe it will serialize it as an array of numbers, rather than recognize that it's binary data. For now, I've included a Binary struct that wraps a Vec, which can be included as the type of a binary member, instead of using a Vec directly. Hopefully I'll be able to figure out a better way to do this.

Unfinished tasks:

Testing: I have no idea if this even works yet. All I've done is get it to compile.
Benchmarking: I've tried to ensure that I don't copy data any more than the existing library does, but it's possible I've introduced some performance regressions, so I'll have to look into that.
Documentation: the documentation still references the old way of doing things and needs to be updated.

Closes #276.

socketioxide/src/payload_value/ser.rs

socketioxide/src/payload_value/de.rs

socketioxide/src/payload_value/ser.rs

socketioxide/src/payload_value/mod.rs

socketioxide/src/payload_value/binary.rs

kelnos · 2024-03-07T08:58:59Z

Hi @Totodore, here's what I have so far. I hate PRs that have so many changed lines, but the changes to the library itself are actually not that large. The first commit is all added lines: the addition of PayloadValue and its methods, serializers, and deserializers. The second commit is much smaller (+382 -334 lines), and adapts the library to use PayloadValue.

socketioxide/src/payload_value/mod.rs

kelnos · 2024-03-07T20:23:36Z

socketioxide/src/handler/extract.rs

        _: Option<i64>,
    ) -> Result<Self, Infallible> {
-        Ok(Bin(bin))
+        Ok(Bin(v.extract_binary_payloads()))


I've left in the Bin extractor, and preserved its behavior. Is it worth keeping?

There is an example in a doc comment where the Bin extractor is used, and then the binary payloads are emitted along with three different messages, separate from the data in two of them. Maybe that is a use-case that is useful to preserve?

A downside of keeping it is that Data would be much more efficient if it implemented FromMessage and not FromMessageParts, since with the latter we have to clone the entire thing.

If we do keep it, maybe Bin should instead implement FromMessageParts and clone the binary payloads rather than extracting them from the PayloadValue. Then Data could implement FromMessage, if we assume that the most common use case is that people will use Data and not Bin; that would make the common case more efficient, removing the clone of the data.

I'm leaning toward keeping it and implementing that last point above.

One wrinkle: if I convert Data to FromMessage, I'm thinking about TryData, and maybe leaving it as FromMessageParts. Maybe someone wants to do a TryData, but if it fails, still get the data in another form in order to do something else with it. Like:

socket.on("test", |TryData(data): TryData<MyData>, Data(value): Data<PayloadValue>| async { if let Ok(data) = data { println!("Got data as expected: {:?}", data); } else { println!("Data deser failed, payload received was: {:?}", value); } })

Or even implement a handler that can have the data deserialize into multiple possible known types:

|TryData(my_data): TryData<MyData>, TryData(other_data): TryData<OtherData>, Data(last_chance): Data<LastChanceData>|

Thoughts?

fn on_handle(socket: SocketRef, ack: AckSender, Bin(bin): Bin, Data(data): Data<PayloadValue>) { info!("Received message: {:?}", data); ack.send(data.clone()).ok(); socket.emit("message-back", data.clone()).ok(); }

The picture is the printed data, and the data received in on_handle is decoded from binary to string.

Yep, I know things don't really work yet ;) I've already fixed this locally (seems String's default serde impl handles bytes, sigh), but am working on some other fixes as well before I push it.

Thank you, you've worked hard.

I've left in the Bin extractor, and preserved its behavior. Is it worth keeping?

There is an example in a doc comment where the Bin extractor is used, and then the binary payloads are emitted along with three different messages, separate from the data in two of them. Maybe that is a use-case that is useful to preserve?

A downside of keeping it is that Data would be much more efficient if it implemented FromMessage and not FromMessageParts, since with the latter we have to clone the entire thing.

If we do keep it, maybe Bin should instead implement FromMessageParts and clone the binary payloads rather than extracting them from the PayloadValue. Then Data could implement FromMessage, if we assume that the most common use case is that people will use Data and not Bin; that would make the common case more efficient, removing the clone of the data.

I'm leaning toward keeping it and implementing that last point above.

I think the best would be to remove the Bin extractor and the bin() operator and have Data and TryData implement FromMessage at least for Data. I still don't know what we should do for TryData maybe we could start by only implement FromMessage and then later having a custom implementation of FromMessageParts that clones the data. But we still need to think about this.

Ok, removed Bin, and made Data implement FromMessage. I left TryData as FromMessageParts for now, but we can revisit that.

socketioxide/src/payload_value/binary.rs

socketioxide/src/payload_value/mod.rs

socketioxide/src/payload_value/de.rs

socketioxide/src/payload_value/ser.rs

kelnos · 2024-03-09T20:37:07Z

Did some benchmarking and compared runs with critcmp:

group                                              0.11.0-encode                          integrated-data-binary-encode
-----                                              -------------                          -----------------------------
Encode packet ack on /                             1.01    176.3±2.50ns        ? ?/sec    1.00    174.7±1.77ns        ? ?/sec
Encode packet ack on /custom_nsp                   1.00    183.0±2.07ns        ? ?/sec    1.00    183.2±3.36ns        ? ?/sec
Encode packet binary ack (b64) on /                2.12   483.7±15.09ns        ? ?/sec    1.00    228.1±2.51ns        ? ?/sec
Encode packet binary ack (b64) on /custom_nsp      2.03    488.8±6.23ns        ? ?/sec    1.00    240.6±7.92ns        ? ?/sec
Encode packet binary event (b64) on /              1.78    605.1±2.86ns        ? ?/sec    1.00    339.2±6.57ns        ? ?/sec
Encode packet binary event (b64) on /custom_nsp    1.77    612.0±6.06ns        ? ?/sec    1.00    345.5±3.81ns        ? ?/sec
Encode packet connect on /                         1.01     75.9±1.23ns        ? ?/sec    1.00     75.1±0.52ns        ? ?/sec
Encode packet connect on /custom_nsp               1.01     77.8±0.74ns        ? ?/sec    1.00     77.1±0.68ns        ? ?/sec
Encode packet event on /                           1.00    185.9±2.94ns        ? ?/sec    1.00    185.3±7.23ns        ? ?/sec
Encode packet event on /custom_nsp                 1.00    196.0±2.12ns        ? ?/sec    1.00    195.8±1.89ns        ? ?/sec
Encode packet event with ack on /                  1.01    186.6±1.56ns        ? ?/sec    1.00    183.9±2.07ns        ? ?/sec
Encode packet event with ack on /custom_nsp        1.00    194.1±2.64ns        ? ?/sec    1.00    193.6±2.08ns        ? ?/sec

group                                              0.11.0-decode                          integrated-data-binary-decode
-----                                              -------------                          -----------------------------
Decode packet ack on /                             1.00    382.2±2.05ns        ? ?/sec    1.15    439.9±1.95ns        ? ?/sec
Decode packet ack on /custom_nsp                   1.00    379.0±4.84ns        ? ?/sec    1.20    455.3±3.34ns        ? ?/sec
Decode packet binary ack (b64) on /                1.56    685.7±3.14ns        ? ?/sec    1.00    438.5±3.82ns        ? ?/sec
Decode packet binary ack (b64) on /custom_nsp      1.49    717.5±9.37ns        ? ?/sec    1.00   480.7±30.18ns        ? ?/sec
Decode packet binary event (b64) on /              1.44    827.9±7.43ns        ? ?/sec    1.00    573.4±5.78ns        ? ?/sec
Decode packet binary event (b64) on /custom_nsp    1.42   872.6±27.01ns        ? ?/sec    1.00    614.2±3.47ns        ? ?/sec
Decode packet connect on /                         1.00     82.1±1.50ns        ? ?/sec    1.11     91.0±3.12ns        ? ?/sec
Decode packet connect on /custom_nsp               1.00     86.6±1.34ns        ? ?/sec    1.18    102.3±2.12ns        ? ?/sec
Decode packet event on /                           1.00    478.6±5.45ns        ? ?/sec    1.11   530.1±20.90ns        ? ?/sec
Decode packet event on /custom_nsp                 1.00    506.7±7.16ns        ? ?/sec    1.11    562.6±3.95ns        ? ?/sec
Decode packet event with ack on /                  1.00    479.5±8.71ns        ? ?/sec    1.10    527.2±4.28ns        ? ?/sec
Decode packet event with ack on /custom_nsp        1.00    511.0±7.13ns        ? ?/sec    1.10    562.2±3.83ns        ? ?/sec

The encoding numbers look really good, improved in a few cases by 75%-110%. But decoding is a bit more mixed; a few large improvements of around 50%, but quite a few 10-20% regressions.

Since the decoding regressions are in the cases without binary payloads, my guess is that my PayloadValue implementation is less efficient than serde_json::Value. Will try to spend some time this weekend figuring out why.

This will act as a replacement for serde_json::Value that can also hold binary payloads.

Previously, the non-binary part of a message and the binary payloads in a message were represented separately: the non-binary portion was represented by a serde_json::Value, and could be converted to an arbitrary data structure. That data structure would not include the binary data or any indication that there is any binary data at all. The binary data would be provided in a Vec<Vec<u8>>. There were a few problems with this: 1. The original code only supported cases where the payload was a flat array with some binary payloads in the root of the array, or a flat object where the root of the object was a binary payload. Objects with more complicated structure and binary data embedded in various places in the structure were not supported. 2. Adding support for the above turned out to not be possible in a useful way, because the ordering of the Vec<Vec<u8>> matters, and it could never be clear where exactly in the possibly-complex structure each binary payload belonged. 3. One of the useful features of the socket.io protocol is that it lets users efficiently transmit binary data in addition to textual/numeric data, and have that handled transparently by the protocol, with either end of the connection believing that they just sent or received a single mixed textual/numeric/binary payload. Separating the non-binary from the binary negates that benefit. This introduces a new type, PayloadValue, that behaves similarly to serde_json::Value. The main difference is that it has a Binary variant, which holds a numeric index and a Vec<u8>. This allows us to include the binary data where the sender of that data intended it to be. There is currently one wrinkle: serde_json does not appear to consistently handle binary data; when serializing a struct with Vec<u8>, I believe it will serialize it as an array of numbers, rather than recognize that it's binary data. For now, I've included a Binary struct that wraps a Vec<u8>, which can be included as the type of a binary member, instead of using a Vec<u8> directly. Hopefully I'll be able to figure out a better way to do this. Unfinished tasks: * Testing: I have no idea if this even works yet. All I've done is get it to compile. * Benchmarking: I've tried to ensure that I don't copy data any more than the existing library does, but it's possible I've introduced some performance regressions, so I'll have to look into that. * Documentation: the documentation still references the old way of doing things and needs to be updated. Closes Totodore#276.

So far just the doctests that were failing.

kelnos · 2024-03-10T09:11:02Z

Did some profiling, looked at flamegraphs, made a few changes, but the performance regressions are still there.

This is very strange, though. Decoding of Connect packets has regressed about 7%, but that makes no sense: Connect packets don't use ser/deser at all, and the code path for those packets hasn't changed at all. So I'm wondering if some surrounding code has caused the compiler to optimize things differently. If that's the case, I'm not sure what to do here.

kelnos · 2024-03-10T09:37:48Z

I'm getting really confused by this, actually. Different runs of the benchmark compare very differently. For example, I just did a run where the Connect packet decoding showed as improved by 18% (whereas my prior run it had it regressing by 7%). And then even sometimes during the same run, a decode of a particular packet type will have a wildly different change percent from the decode of that same packet type but with an ack. I wonder if maybe my laptop isn't "quiet" enough to get good data; perhaps I'll try shutting down X and giving it another run.

Totodore · 2024-03-11T21:35:23Z

I'm getting really confused by this, actually. Different runs of the benchmark compare very differently. For example, I just did a run where the Connect packet decoding showed as improved by 18% (whereas my prior run it had it regressing by 7%). And then even sometimes during the same run, a decode of a particular packet type will have a wildly different change percent from the decode of that same packet type but with an ack. I wonder if maybe my laptop isn't "quiet" enough to get good data; perhaps I'll try shutting down X and giving it another run.

I have aproximately the same weird results (but on a github codespace container so it is not stable). Either the benchmark code is not working properly (but according to me it is correct code) or it is the machine.

Globally the difference is about +10% for the Connect packet decoding. But because the scale is really low and that the variance for connect packet decoding is quite high, I think it is still ok and that we can move on...

kelnos · 2024-03-12T00:38:34Z

Ok, I quit X, ran a basline against main, then ran it against this branch, comparing with main, 3 times. Numbers are the average change from main:

Encode packet connect on /                        +0.3244%	+1.0224%	+0.2637%
Encode packet connect on /custom_nsp              +7.9232%	-0.1176%	-0.7743%
Encode packet event on /                          +1.6022%	+1.8639%	+2.0130%
Encode packet event on /custom_nsp                +2.6121%	+2.9042%	+3.2108%
Encode packet event with ack on /                 +2.3220%	+1.9855%	+2.2505%
Encode packet event with ack on /custom_nsp       +2.3728%	+2.6042%	+2.4713%
Encode packet ack on /                            +0.9427%	+1.0778%	+1.2282%
Encode packet ack on /custom_nsp                  +2.4452%	+3.6130%	+2.1004%
Encode packet binary event (b64) on /             -40.989%	-43.376%	-43.861%
Encode packet binary event (b64) on /custom_nsp   -41.039%	-43.742%	-43.640%
Encode packet binary ack (b64) on /               -51.311%	-51.461%	-51.580%
Encode packet binary ack (b64) on /custom_nsp     -50.145%	-50.608%	-50.198%
Decode packet connect on /                        -18.877%	-18.676%	-19.289%
Decode packet connect on /custom_nsp              +4.8328%	+5.2217%	+4.9167%
Decode packet event on /                          +1.8359%	+1.3146%	+1.9475%
Decode packet event on /custom_nsp                +5.2678%	+2.1481%	+2.6367%
Decode packet event with ack on /                 +1.7432%	+1.3348%	+1.3040%
Decode packet event with ack on /custom_nsp       +5.2353%	+3.6659%	+2.4317%
Decode packet ack on /                            +5.5802%	+4.6327%	+5.3401%
Decode packet ack on /custom_nsp                  +8.4783%	+8.4328%	+8.2965%
Decode packet binary event (b64) on /             -34.712%	-34.472%	-34.677%
Decode packet binary event (b64) on /custom_nsp   -30.051%	-29.580%	-29.752%
Decode packet binary ack (b64) on /               -36.868%	-37.087%	-37.149%
Decode packet binary ack (b64) on /custom_nsp     -34.869%	-34.986%	-34.482%

With one exception (Encode packet connect on /custom_nsp), it's consistent across all 3 runs. But there are still some oddities: "Decode packet connect on /" is somehow ~19% faster, even though that code hasn't changed at all from what's on main. And even stranger, "Decode packet connect on /custom_nsp" is 5% slower, even though a) it does almost the exact same thing as without the custom nsp, and b) that code also hasn't changed from what's on main.

Unfortunately I'm at a loss as to what to do, though. cargo flamegraph didn't show me anything obvious to target, and the ser/deser for PayloadValue is done in almost the exact same way as for serde_json::Value. (The one notable difference is that if integer map keys are used, my version's deserializer has to do an extra copy of the string... but all of the tests use string map keys, so that's not what's causing this here.)

One thing that's notable, maybe: PayloadValue is a 56-byte struct, whereas serde_json::Value is only 32 bytes. Not sure allocation overhead, or perhaps cache locality, might have something to do with it. The difference appears to be because serde_json::Map<String, Value> only 24 bytes, while HashMap<String, PayloadValue> is 56. Unfortunately serde_json::Map's methods are only defined when the value type is serde_json::Value, so i couldn't reuse it.

kelnos · 2024-03-12T00:49:11Z

Oh, it looks like serde_json::Map is just a wrapper around either BTreeMap or IndexedMap (depending on if you compile with the preserve_order feature). If I swap out HashMap for BTreeMap, it shaves the size down, though not all the way: 48 bytes vs. serde_json::Value's size of 32.

We're not going to get all the way down to 32, but I'm surprised it's not down to 40 bytes: the Binary variant holds a usize and Bytes; those are 8 and 32 bytes, respectively, which should give us 40 bytes. I'm surprised it would require any padding for alignment, unless I'm forgetting something and it wants 16-byte alignment for things like usize.

At any rate, 48 bytes is still better than 56. I'll try to rerun the benchmarks later tonight and see if that changes anything. Feels like a bit of a long shot, though.

kelnos · 2024-03-12T04:58:48Z

Ok, here are the same numbers as before (against the baseline of the main branch), but now with three more runs with BTreeMap as the holder in the PayloadValue::Object() variant, just like serde_json::Value::Object() uses.

                                                  HashMap	HashMap		HashMap		BTreeMap	BTreeMap	BTreeMap
Encode packet connect on /                        +0.3244%	+1.0224%	+0.2637%	+3.7118%	+0.4929%	-0.2489%
Encode packet connect on /custom_nsp              +7.9232%	-0.1176%	-0.7743%	+29.232%	-1.1302%	+14.978%
Encode packet event on /                          +1.6022%	+1.8639%	+2.0130%	+10.888%	+12.810%	+11.635%
Encode packet event on /custom_nsp                +2.6121%	+2.9042%	+3.2108%	+12.844%	+13.418%	+12.542%
Encode packet event with ack on /                 +2.3220%	+1.9855%	+2.2505%	+12.238%	+14.013%	+12.626%
Encode packet event with ack on /custom_nsp       +2.3728%	+2.6042%	+2.4713%	+12.288%	+13.965%	+13.041%
Encode packet ack on /                            +0.9427%	+1.0778%	+1.2282%	+12.757%	+15.188%	+12.443%
Encode packet ack on /custom_nsp                  +2.4452%	+3.6130%	+2.1004%	+13.417%	+11.715%	+11.726%
Encode packet binary event (b64) on /             -40.989%	-43.376%	-43.861%	-41.263%	-41.204%	-40.748%
Encode packet binary event (b64) on /custom_nsp   -41.039%	-43.742%	-43.640%	-41.187%	-41.331%	-40.808%
Encode packet binary ack (b64) on /               -51.311%	-51.461%	-51.580%	-50.663%	-50.812%	-49.995%
Encode packet binary ack (b64) on /custom_nsp     -50.145%	-50.608%	-50.198%	-49.477%	-49.555%	-49.184%
Decode packet connect on /                        -18.877%	-18.676%	-19.289%	-16.085%	-18.892%	-20.472%
Decode packet connect on /custom_nsp              +4.8328%	+5.2217%	+4.9167%	+4.4243%	+5.0582%	+3.3755%
Decode packet event on /                          +1.8359%	+1.3146%	+1.9475%	+2.2657%	+2.1780%	+2.7651%
Decode packet event on /custom_nsp                +5.2678%	+2.1481%	+2.6367%	+6.3334%	+3.2236%	+3.2511%
Decode packet event with ack on /                 +1.7432%	+1.3348%	+1.3040%	+2.2029%	+1.7505%	+4.1861%
Decode packet event with ack on /custom_nsp       +5.2353%	+3.6659%	+2.4317%	+3.6227%	+4.2471%	+8.0672%
Decode packet ack on /                            +5.5802%	+4.6327%	+5.3401%	+4.1947%	+4.3384%	+5.6753%
Decode packet ack on /custom_nsp                  +8.4783%	+8.4328%	+8.2965%	+2.8838%	+8.2279%	+6.6421%
Decode packet binary event (b64) on /             -34.712%	-34.472%	-34.677%	-34.424%	-34.413%	-32.392%
Decode packet binary event (b64) on /custom_nsp   -30.051%	-29.580%	-29.752%	-29.295%	-29.499%	-27.383%
Decode packet binary ack (b64) on /               -36.868%	-37.087%	-37.149%	-35.033%	-35.360%	-33.529%
Decode packet binary ack (b64) on /custom_nsp     -34.869%	-34.986%	-34.482%	-32.803%	-33.091%	-31.718%

I'm so confused. Somehow the performance when encoding non-binary events is much much worse, despite now using the same map implementation serde_json::Value uses. Decoding is... kinda the same?

Totodore · 2024-03-12T17:53:06Z

Ok, here are the same numbers as before (against the baseline of the main branch), but now with three more runs with BTreeMap as the holder in the PayloadValue::Object() variant, just like serde_json::Value::Object() uses.

I don't understand what you changed exactly for this benchmark. It would not suprise me that your custom implementation is slower than serde_json (it has been probably optimized to hell).
What we could do though is only delegate serde to PayloadValue with binary payloads and otherwise use official serde_json::Value for normal events.

Appart from that, I also discovered that serde_json::RawValue exists and that it could be really useful to only partially serde the input payload ([event, ...]) and deserialize the RawValue immediately to T later. But it could be the subject of another PR.

Totodore · 2024-03-12T19:00:16Z

I wonder if it would possible to use a custom with = "binary_deser" module for deserializing binary data rather than reimplementing entire a new PayloadValue. Because 90% of the code is basically a copy of serde_json::Value. What do you think?

kelnos · 2024-03-12T22:48:27Z

I don't understand what you changed exactly for this benchmark.

I changed PayloadValue::Object(HashMap<String, PayloadValue>) to PayloadValue::Object(BTreeMap<String, PayloadValue>). That mirrors json_value::Value::Object(), which also uses BTreeMap. I was expecting better numbers after this change, but instead got worse numbers.

What we could do though is only delegate serde to PayloadValue with binary payloads and otherwise use official serde_json::Value for normal events.

Possibly... the downside there is that the public API would then be a little inconsistent. E.g. Packet::event() would continue to accept a serde_json::Value while Packet::bin_event() would take a PayloadValue. This would also complicate the internals quite a bit.

Appart from that, I also discovered that serde_json::RawValue exists and that it could be really useful to only partially serde the input payload ([event, ...]) and deserialize the RawValue immediately to T later. But it could be the subject of another PR.

Oh neat, I didn't know about that either. I'm not sure that would work, though, since you'd need some kind of heterogeneous array/list implementation that is serde-compatible; that is, the example in the RawValue docs works because there's a wrapper object with keys and values. Not sure how you'd represent something like:

[ String, RawValue(0), RawValue(1), ..., RawValue(n) ]

I wonder if it would possible to use a custom with = "binary_deser" module for deserializing binary data rather than reimplementing entire a new PayloadValue.

I assume you're talking about something that socketioxide would provide, but that users would add to their structs with a serde attribute? I don't think that would work, since the deserializer would also need access to the list of binary payloads, and there's no way to get extra data to a deserializer specified that way.

Another thing I'm thinking about is sticking with serde_json::Value, and writing a sort of "wrapper deserializer". Most of it would just delegate to Value's own deserializer implementation, but if we encounter a map with a placeholder in it, it will instead emit a Bytes. I'm not sure if that's actually possible to do, though.

Totodore · 2024-03-14T06:20:58Z

Another thing I'm thinking about is sticking with serde_json::Value, and writing a sort of "wrapper deserializer". Most of it would just delegate to Value's own deserializer implementation, but if we encounter a map with a placeholder in it, it will instead emit a Bytes. I'm not sure if that's actually possible to do, though.

That could be a good think to explore. For the moment I'm not fond of the entire replacement of serde_json because it is a lot of code and it will probably introduce a lot of complexity to the lib. That's why I won't merge all of this immediately. What we could do is either improve this PR by working on it or if you find completely different possibilities don't hesitate to create a new PR :).

Thanks a lot for all of your work and your time :)!

kelnos · 2024-03-14T20:46:23Z

or the moment I'm not fond of the entire replacement of serde_json because it is a lot of code and it will probably introduce a lot of complexity to the lib.

Yes, I totally agree. This got pretty large and more complicated than I expected or hoped.

What we could do is either improve this PR by working on it or if you find completely different possibilities don't hesitate to create a new PR

For now I'll probably continue to work here on this PR. Even if I do something completely different I may just squash it all down.

I've mostly written ser/deser wrappers for serde_json::Value's serializer and deserializer that transparently handle placeholder objects. The serializer was very easy, but the deserializer is a bit more tricky. I'm having some issues with lifetimes (which I've temporarily gotten around by cloning, just so I can move forward to try to prove the concept; but I may need some help there at some point), but it's mostly compiling. Unfortunately, though, it's still more new code than I'd hoped, as some useful things (like serde_json::value::de::MapKeyDeserizlizer) aren't public, and writing a Deserializer impl isn't trivial.

There is one major downside to this approach: since Value is still the user-facing representation, we can't have a unified text+binary representation in the case where a user maybe doesn't know the full payload format and needs to use Value instead of some struct that implements Deserialize. So we'll still need the separate Data and Bin extractors. And some subtle things will change, like the Value passed will need to still include the placeholder objects (otherwise there's no way for the user to know where the payloads go), while the current code removes them.

But overall I think this is a better way to go than PayloadValue. I don't consider it a total waste of time, as I've learned a lot about how serde & serde_json work internally, and it's been fun to dig into that.

Thanks a lot for all of your work and your time :)!

You're welcome! Got into this because I needed it for a personal project of mine, and even though it's been a diversion from that project, it's been fun and educational!

kelnos · 2024-03-15T05:13:11Z

Decided to do a new PR after all; see #284.

github-advanced-security bot found potential problems Mar 7, 2024

View reviewed changes

kelnos force-pushed the integrated-data-binary branch 3 times, most recently from 0164625 to be9b501 Compare March 7, 2024 10:25

github-advanced-security bot found potential problems Mar 7, 2024

View reviewed changes

socketioxide/src/payload_value/mod.rs Fixed Show fixed Hide fixed

socketioxide/src/payload_value/mod.rs Fixed Show fixed Hide fixed

kelnos force-pushed the integrated-data-binary branch 3 times, most recently from 2243d6a to 72115a0 Compare March 7, 2024 11:25

github-advanced-security bot found potential problems Mar 7, 2024

View reviewed changes

socketioxide/src/payload_value/mod.rs Fixed Show fixed Hide fixed

kelnos force-pushed the integrated-data-binary branch from 72115a0 to 18dd67c Compare March 7, 2024 11:32

github-advanced-security bot found potential problems Mar 7, 2024

View reviewed changes

socketioxide/src/payload_value/mod.rs Fixed Show fixed Hide fixed

Totodore linked an issue Mar 7, 2024 that may be closed by this pull request

Binary packet placeholder handling does not handle all possible messages #275

Open

kelnos force-pushed the integrated-data-binary branch from 18dd67c to 1e96705 Compare March 7, 2024 20:20

kelnos commented Mar 7, 2024

View reviewed changes

socketioxide/src/payload_value/binary.rs Outdated Show resolved Hide resolved

kelnos force-pushed the integrated-data-binary branch 4 times, most recently from 8d18205 to 47ada89 Compare March 8, 2024 07:11

Totodore reviewed Mar 8, 2024

View reviewed changes

socketioxide/src/payload_value/mod.rs Outdated Show resolved Hide resolved

kelnos force-pushed the integrated-data-binary branch from 47ada89 to 576baa5 Compare March 9, 2024 09:23

github-advanced-security bot found potential problems Mar 9, 2024

View reviewed changes

socketioxide/src/payload_value/de.rs Fixed Show fixed Hide fixed

socketioxide/src/payload_value/de.rs Fixed Show fixed Hide fixed

socketioxide/src/payload_value/ser.rs Fixed Show fixed Hide fixed

kelnos force-pushed the integrated-data-binary branch 2 times, most recently from 7c191b8 to ddf322b Compare March 9, 2024 09:41

kelnos added 4 commits March 10, 2024 00:39

wip: Add PayloadValue type

605f457

This will act as a replacement for serde_json::Value that can also hold binary payloads.

wip: Start updating documentation

e08769d

So far just the doctests that were failing.

wip: update examples to use PayloadValue and drop Bin

0439104

kelnos added 2 commits March 10, 2024 00:39

wip: Remove the Bin extractor and have Data implement FromMessage

f9a0a5e

wip: migrate benchmarks to use PayloadValue

75cb8b6

kelnos force-pushed the integrated-data-binary branch from cbeff71 to 75cb8b6 Compare March 10, 2024 08:44

kelnos mentioned this pull request Mar 15, 2024

Unify non-binary and binary data, take 2 #284

Open

kelnos closed this Mar 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wip: integrate binary data into payloads #283

wip: integrate binary data into payloads #283

kelnos commented Mar 7, 2024

kelnos commented Mar 7, 2024

kelnos Mar 7, 2024

kelnos Mar 7, 2024

bossman1999 Mar 8, 2024

kelnos Mar 8, 2024

bossman1999 Mar 8, 2024 •

edited

Loading

Totodore Mar 8, 2024

kelnos Mar 9, 2024

kelnos commented Mar 9, 2024

kelnos commented Mar 10, 2024

kelnos commented Mar 10, 2024

Totodore commented Mar 11, 2024 •

edited

Loading

kelnos commented Mar 12, 2024

kelnos commented Mar 12, 2024 •

edited

Loading

kelnos commented Mar 12, 2024 •

edited

Loading

Totodore commented Mar 12, 2024

Totodore commented Mar 12, 2024

kelnos commented Mar 12, 2024

Totodore commented Mar 14, 2024

kelnos commented Mar 14, 2024 •

edited

Loading

kelnos commented Mar 15, 2024

wip: integrate binary data into payloads #283

wip: integrate binary data into payloads #283

Conversation

kelnos commented Mar 7, 2024

kelnos commented Mar 7, 2024

kelnos Mar 7, 2024

Choose a reason for hiding this comment

kelnos Mar 7, 2024

Choose a reason for hiding this comment

bossman1999 Mar 8, 2024

Choose a reason for hiding this comment

kelnos Mar 8, 2024

Choose a reason for hiding this comment

bossman1999 Mar 8, 2024 • edited Loading

Choose a reason for hiding this comment

Totodore Mar 8, 2024

Choose a reason for hiding this comment

kelnos Mar 9, 2024

Choose a reason for hiding this comment

kelnos commented Mar 9, 2024

kelnos commented Mar 10, 2024

kelnos commented Mar 10, 2024

Totodore commented Mar 11, 2024 • edited Loading

kelnos commented Mar 12, 2024

kelnos commented Mar 12, 2024 • edited Loading

kelnos commented Mar 12, 2024 • edited Loading

Totodore commented Mar 12, 2024

Totodore commented Mar 12, 2024

kelnos commented Mar 12, 2024

Totodore commented Mar 14, 2024

kelnos commented Mar 14, 2024 • edited Loading

kelnos commented Mar 15, 2024

bossman1999 Mar 8, 2024 •

edited

Loading

Totodore commented Mar 11, 2024 •

edited

Loading

kelnos commented Mar 12, 2024 •

edited

Loading

kelnos commented Mar 12, 2024 •

edited

Loading

kelnos commented Mar 14, 2024 •

edited

Loading