-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for normalized_payload
field
#5429
Comments
Thanks for filing the issue. @adriansmares do you want to pick this up or should we assign @nicholaspcr? For implementation, high level, there are two approaches:
I'm a bit undecided here. For CayenneLPP we can simply normalize, almost exactly like we do now but with different field names. For the Device Repository option (2) suffices. Users using custom JavaScript codecs wouldn't be able to use this, for that we'd need (1). Should we verify the output from the normalize uplink codec script? Like with a JSON schema? We should do it anyway as part of CI in the Device Repository. If we do JSON Schema, ideally we have a code generated validator, otherwise we'd have a parallel validator written in pure Go. We should also be able to normalize the output of CayenneLPP, which is already normalized but in a different structure, to this new structure. |
Another question that we should raise here is 'do we enforce that the formatter exists only at application level, or should it exist also at end device level ?'. The use case here being of course heterogeneous applications (i.e. with different end devices in them).
I think that it makes sense to speak of regularization only for 'uncontrollable' payload formatters such as the device repository or CayenneLPP. If you control the decoder script, I see no value in running the scripts separately, as you can always do one more normalization step before
I think we should definitely have a set of type definitions, otherwise everyone will just have their own funky output - an untyped object really means 'anything goes', and we cannot validate if all of the outputs are correct statically if the return type is just The output of the normalization function should be a predefined type: return new NormalizedOutput({
payload: new SinglePayload({
temperatures: {
"main_sensor": new Temperature(29.0, Celsius)
},
})
}) This looks rather verbose but downstream it just means: {
"@version": "1.0-rc1",
"single": {
"temperatures": {
"main_sensor": {
"value": 29.0,
"unit": "celsius"
},
}
}
} And all of these building blocks ( Implementation wise, we already can (and do) cache scripts, so having this long type definition preamble before the normalizer is not as expensive as it may seem. I think that the new field should be strongly typed (i.e. have a |
I think there is a lot of value in supporting this at the device level, don't know how much complexity adds to the implementation, but it's definitely desirable since most real-world applications will have more than one type of device, having a different application for each device type seems weird.
I agree that the extra normalization step is not needed in a custom decoder, if you (as an integrations developer) need to bother creating a script manually, you might as well do it normalized directly since that's what your system will expect, this normalization step is more aimed towards automated onboarding (e.g. the user scans the QR Code on the device and the system automatically detects the manufacturer, uses the right decoder and provides the data normalized). Having strict static typing will be useful for sure. |
You mean, it wouldn't make much sense to have an application-layer normalizer if there are various device-level decoders that produce different results? That wouldn't work indeed; they come in pairs (decoder + normalizer, where device overrides application level). Or do you mean that application integrations either expect normalized payload or they don't?
True, but I would add For custom codecs, we can check if it complies with the normalized JSON schema, and if so, also set
Yes that is exactly the idea. See the referenced issue TheThingsNetwork/lorawan-devices#395. So what I meant is that we have a JSON schema that can validate the normalized output. If we only support Device Repository and CayenneLPP, we don't need validation in The Things Stack. However, if we detect compliance of custom codec output with the normalized payload, we should run the output through the JSON schema validator. The thing is that JSON schema validation at runtime is pretty expensive. So we'd need a code generated validator or a custom validator.
This is an interesting idea, but that requires bijectivity; that we can map JSON output from the normalizer codec to Protobuf, and that should marshal to the same JSON output again when sending as JSON upstream. |
I'm thinking from an implementation perspective that we have a lot of overrides in our hierarchy (between applications and end devices). Having end-device level overrides brings flexibility but also some (small) complexity in the implementation. Keep in mind the whole With that being said, I think that keeping it consistent (i.e. allowing end device level overrides) is probably better right now than enforcing that these formatters exist only globally (at application level).
Given that below we mention that the schema check is expensive, I think we're better of requiring an explicit normalization script.
I don't follow how the fact that the normalizer comes from the Device Repository ensures that its output is valid. The normalizer is a JS function, it can do and return what it wants. If I accidentally return You would always need runtime validation for the output. In this context, if we enforce that the normalizer is always a separate script, we can put the type definitions (and validation checks) only in that script. At least that way we avoid having extra overhead in the normal payload codecs. Or should the Device Repository normalizers be written in TypeScript and when we bundle the device repository we transpile to JavaScript ?
I think it would be useful to consider this, as it would allow (Go) native validation post marshalling. The payload codecs can return objects. Should there be an option in the return object that says 'just pipe this decoded payload to the normalizer' ? |
Yeah. But let's first consider automatically detecting the output of a custom per-device (and per-application) codec against the normalized schema. Because if we do that, we magically set Regarding validation cost: it's not something we can fully eliminate, but on the other hand it's not much more than defining a Go struct with all the fields, try to unmarshal the JSON (that will detect wrong types), and then check field-by-field whether it falls in the ranges (min/max/patterns/enums etc).
Indeed only to the extent that input and output is tested in CI (see https://github.com/TheThingsNetwork/lorawan-devices/#payload-codecs). But indeed, there are no promises on test coverage nor can we test all binary input exhaustively. So indeed, we can't rely on this (at all).
Okay so how do we do this? Unmarshal JSON to a proto generated struct, then validate fields, and that generated proto struct would marshal back to the same JSON? That should work. Then we indeed have the representation in code (and proto), we'd only have two places to maintain the validation (JSON schema in Device Repository and TTS code but ok).
We could indeed add a new |
TLDR: Yes. Keep in mind that internally in the Application Server you have an in-memory representation 'in the pipes' anyway, as we carry around Is it more work than passing around
👍 |
What's exactly blocking the implementation? I thought we agree on TheThingsNetwork/lorawan-devices#395 that the schema definition of the normalized format is decoupled from implementing this feature in The Things Stack. |
Yes it is, but we need at least one field with one definition so we can start doing things back-to-back, including validation and testing. |
Any updates on this? |
Hi mate, I'm the one assigned to implement this. My apologies for the delay, there was a bit more of focus on other internal issues on my end. While I don't think this should take a long time to do, I can't promise it will be on |
Absolutely not, don't worry, I was just letting it be known that I'd be available to do the job if there was no one else. I'm happy that you've picked this up, feel free to discuss with me anything related to this as you make progress. |
@adriansmares summarizing:
ACK? |
That happens only if you use
I would rather not normalize this one at all then. If the semantics are not consistent, we reduce the usefulness of having semantics in the first place. Imagine the documentation:
Run it only if
I disagree. We can have checks in the device repository that says 'hey your output would be valid normalized payload, are you sure you didn't forget to set |
I'll benchmark this.
But who will see that message? |
Good. It may indeed be the case that I'm exaggerating.
I meant to add it to the GitHub action that runs the validation. I see now that things aren't as well developed as I wished. |
It is the case, see https://github.com/TheThingsNetwork/lorawan-devices/blob/master/bin/validate.js#L99. Or do you mean something else? We use the Go script so we have the same runtime as TTS. Another reason to be reluctant with changing the function signature or argument or return objects is that I managed to standardize what we have today in the Payload Codec spec that will be published by the LoRa Alliance. If we add fields that must be set in the payload codec implementation we could run into issues in the future. |
I didn't see that part, my bad.
Then I hope that benchmarks show that I'm paranoid. ACK for implementation as described in your proposal comment. |
Summary
The new field
normalized_payload
will allow integrations to rely on a standardized data model for all devices in the device repository that support this new normalization step, regardless of the manufacturer, the units, and the wording they use, etc.Why do we need this?
This feature request has already been largely discussed in TheThingsNetwork/lorawan-devices#395
However, defining the data model right it's a different issue and does not block the implementation of this one (TheThingsNetwork/lorawan-devices#395 (comment)), so this can be done before getting “the perfect standardized format”, which is a difficult task to get right, specially for less common quantities.
The implementation in
lorawan-stack
also does not depend on each specific device supporting the newnormalizeUplink(data)
function. This step should be completely optional (though encouraged for manufacturers).What is already there? What do you see now?
There is no support for a standard data format.
What is missing? What do you want to see?
Support for a new field in uplinks with a normalized format.
How do you propose to implement this?
Has already been described in the previous referenced issue. I can't think of the specifics of
lorawan-stack
itself, since I'm not familiar with the codebase.How do you propose to test this?
I suppose unit testing is sufficient, but I'm not certain.
Can you do this yourself and submit a Pull Request?
Maybe, I'm willing to contribute to get this done as swiftly as I can, but I'll need guidance on how to proceed since I don't know the internals of the system in detail, I may miss important stuff and waste a lot of time.
Let me bring @johanstokking into discussion to see if he can assign this to someone appropriate or give me assistance to achieve this.
The text was updated successfully, but these errors were encountered: