generic topic validation and parsing #36

Tieske · 2021-11-03T21:08:48Z

see tests for parsing examples.

Tieske · 2021-11-03T21:25:31Z

not sure where this should go, it's now in init.lua. Should it go into client.lua maybe?

possibly every subscription could be parsed by the client, and the parsed results passed along in the event handler.

thoughts?

xHasKx · 2021-11-04T10:59:56Z

not sure where this should go, it's now in init.lua. Should it go into client.lua maybe?

init.lua is ok as you've added a whole-module functions.

possibly every subscription could be parsed by the client, and the parsed results passed along in the event handler.

thoughts?

Actually I'm not sure these validation functions is required at all.
I mean, for subscribe and publish packets the topic should be just a string, some broker validates it, some not, and some can even provide custom wildcards/filters in that topics. So there is no need for the client-side validation because good broker will do this for us and return appropriate SUBACK/PUBACK code.
And for custom brokers the client-side topic validation can do an obstacle for the library user.

IMO, only the last method to check topic and mask match can be frequently used in the client code - in the incoming message handler. There might be added a new client method like this:

client:add_msg_handler(topic_mask, callback) similar to the client:subscribe() method, adding not a SUBACK callback, but a PUBLISH handler called only for messages with matching topic.

Tieske · 2021-11-04T19:20:02Z

Actually I'm not sure these validation functions is required at all.

agreed they are not, and I would not implement them in the client itself. They are just convenience functions for user-code that wants to validate user-input.

Btw; the parser function needs to compile, and the compile function needs to validate, so 3 out of the 4 are needed. So I guess the last one (to validate non-wildcarded topics) is then just for completeness sake.

Tieske · 2021-11-04T19:24:12Z

client:add_msg_handler(topic_mask, callback) similar to the client:subscribe() method, adding not a SUBACK callback, but a PUBLISH handler called only for messages with matching topic.

yeah, I was thinking along the same lines. But in that case, who will be responsible to call the ACK on the received message?

xHasKx · 2021-11-04T19:26:31Z

client:add_msg_handler(topic_mask, callback) similar to the client:subscribe() method, adding not a SUBACK callback, but a PUBLISH handler called only for messages with matching topic.

yeah, I was thinking along the same lines. But in that case, who will be responsible to call the ACK on the received message?

This is not related. I think, like in usual message handler, ack should be done explicitly like this: https://github.com/xHasKx/luamqtt/blob/master/examples/simple.lua#L38

Tieske · 2021-11-04T19:33:30Z

exactly, but then you'd always need 2 handlers, a generic one that does the ACK, and another one matching a specific topic mask.

Still learning on the MQTT stuff; why doesn't the client send the ACK by default? why relying on user-code to do that?

xHasKx · 2021-11-04T20:00:52Z

exactly, but then you'd always need 2 handlers, a generic one that does the ACK, and another one matching a specific topic mask.

I think user can use subscribe handlers in 3 ways:

set handlers only topic-specific (new hypothetical method client:add_msg_handler(topic_mask, callback))
set only one generic handler (current method client:on("message", callback))
set both handlers

For now I think it will be better to leave user always responsible for sending ack. And if he will use last case with handlers set in both ways and send ack twice accidentally - this will not be a MQTT protocol violation (as I remember). Maybe some brokers will close connection in this case but I think it's uncommon.

Still learning on the MQTT stuff; why doesn't the client send the ACK by default? why relying on user-code to do that?

For example, MQTT can be used to distribute some tasks between workers, and if worker can't handle some task - it may not send ack to indicate failure (in the other words, worker will not send ack untill task is successfully done). Thus the task will stay in a queue and thus can be handled by some different worker. This scenario requires a persistent mqtt session (clean=false)

I suppose there might be a client option for __init() method enabling auto-ack for all messages, but I think it should be disabled by default.

xHasKx · 2021-11-04T20:05:07Z

For example, user can use topic-specific handler to execute it's logic and do ack, and at the same time also use generic client:on() message handler for logging without ack.

Tieske · 2021-11-04T20:47:57Z

For example, MQTT can be used to distribute some tasks between workers, and if worker can't handle some task - it may not send ack to indicate failure (in the other words, worker will not send ack untill task is successfully done). Thus the task will stay in a queue and thus can be handled by some different worker. This scenario requires a persistent mqtt session (clean=false)

That makes sense, thx for the explanation

Tieske · 2022-08-06T11:58:12Z

@xHasKx any chance we can move this forward? anything required from me?

I rebased the PR on master to it get up-to-date

.editorconfig

xHasKx · 2022-08-08T07:23:21Z

mqtt/init.lua

+-- @param t (string) wildcard topic to validate
+-- @return topic, or false+error
+function mqtt.validate_subscribe_topic(t)
+	if type(t) ~= "string" then


please follow such errors checking behavior:

luamqtt/mqtt/init.lua

Line 10 in e3fa62c

* passing invalid arguments (like number instead of string) to function in this library will raise exception

since it is validation, the output of a validation function should be usable as input for an assertion. Hence this function does not throw errors itself, and returns the input value if valid.

This is exactly what I implemented for the other functions, as they should throw errors, as documented.

mqtt/init.lua

xHasKx · 2022-08-08T07:27:16Z

mqtt/init.lua

+-- @param t (string) topic to validate
+-- @return topic, or false+error
+function mqtt.validate_publish_topic(t)
+	if type(t) ~= "string" then


please follow such errors checking behavior:

luamqtt/mqtt/init.lua

Line 10 in e3fa62c

* passing invalid arguments (like number instead of string) to function in this library will raise exception

same; result is input for assertions

xHasKx · 2022-08-08T07:39:39Z

mqtt/init.lua

+	local pattern = opts.pattern
+	if not pattern then
+		local ptopic = opts.topic
+		if not ptopic then


please follow such errors checking behavior:

luamqtt/mqtt/init.lua

Line 10 in e3fa62c

* passing invalid arguments (like number instead of string) to function in this library will raise exception

example:

luamqtt/mqtt/client.lua

Line 102 in e3fa62c

assert(value_type == "string", "expecting uri to be a string")

xHasKx · 2022-08-08T07:53:35Z

mqtt/init.lua

+			end
+		end
+	end
+	if not pattern:find("%(%.[%-%+]%)%$$") then -- pattern for "#" as last char


please move that pattern string (and its related strings from compile_topic_pattern function) into the local variables in the init.lua file to look like constants as they are related very implicitly.

Done. moved them into narrow scoped module locals.

xHasKx · 2022-08-08T08:33:37Z

@xHasKx any chance we can move this forward? anything required from me?

I rebased the PR on master to it get up-to-date

@Tieske, I posted some comments (change requests) on the commits in the PR, please take a look

xHasKx · 2022-08-08T08:49:52Z

And I'm still concerned about mqtt.topic_match function. It's arguments are complicated and it's doing two tasks at the same time:

check that the topic matches the subscribe-topic-filter string
parse part of the topic at wildcards places

IMO, we can break that into two functions, first one is just compile_topic_pattern+string.match call, and the second one can be just a helper to split a particular topic string with / char into an array (table)

What do you think?

xHasKx · 2022-08-08T08:57:42Z

IMO, we can break that into two functions, first one is just compile_topic_pattern+string.match call, and the second one can be just a helper to split a particular topic string with / char into an array (table)

Actually string.match will return the words at the wildcards of the topic, so your topic_match function is just adding keys to them. Does it actually required?

Tieske · 2022-08-08T22:56:26Z

Pushed a commit with the review comments (probably easiest to review that commit without whitespace changes).

the validators don't ever throw errors, see comments above. Let me know if you want me to change that behaviour anyway.
throw errors on bad input when matching or compiling patterns
updated ldoc comments
moved patterns into narrow scoped module locals
made the error messages more descriptive

Tieske · 2022-08-08T23:43:49Z

Actually string.match will return the words at the wildcards of the topic, so your topic_match function is just adding keys to them. Does it actually required?

No, it is not required. It is just convenience to get named arguments/parameters, so user code can be more readable. I have a number of 'id' type fields, which I hate to have in code as array indices, using names is so much more convenient for that. That's the whole purpose I guess (and caching the compiled pattern).

IMO, we can break that into two functions, first one is just compile_topic_pattern+string.match call, and the second one can be just a helper to split a particular topic string with / char into an array (table)

compile + string_match would not work, since there would be no structure where the compiled pattern can be stored (cached) for future use. So that would have to be implemented in user code, and then this function would be reduced to only a string.match call, so a function solely for that would not make sense to me.

That said, I do have doubts about the return values, or the entire interface. Currently I do not like the 2 return values, makes it cumbersome to handle.
Also the function call with the topic and options, since the options table stores the cached pattern. Maybe the options table should become a class, from which a "topic" can be instantiated, that would hold all the named values. That would make the usercode more natural to retain the "class" instead of an options table.

something like;

local topic_matcher = mqtt.create_topic_matcher {
    topic = "homes/+/+/#",
    "homeid",
    "roomid",
    "varargs",
}

local fields, err = topic_matcher("homes/myhome/living/mainlights/brightness")

print(fields.homeid)       --> "myhome"
print(fields.roomid)       --> "living"
print(fields.varargs[1])   --> "mainlights"
print(fields.varargs[2])   --> "brightness"

This would enable adding a new incoming message callback per topic-pattern, where the msg parameter would already contain the parsed fields from the topic.

xHasKx · 2022-08-09T20:31:24Z

@Tieske, I wrote some comments and continue review tomorrow

xHasKx · 2022-08-10T09:04:27Z

@Tieske, what do you think about that form:

local topic_matcher = mqtt.topic_matcher { -- without "create_" it looks like a class constructor calling
    topic = "homes/+/+/#",
    "homeid",
    "roomid",
}

local fields, err = topic_matcher("homes/myhome/living/mainlights/brightness")

print(fields.homeid)       --> "myhome"
print(fields.roomid)       --> "living"
print(fields[1])           --> "myhome"
print(fields[2])           --> "living"
print(fields[3])           --> "mainlights"
print(fields[4])           --> "brightness"

It will simplify mqtt.topic_matcher constructor args validation so you don't have to check that amount of number-indexed keys are equals to the /+/ topic wildcards.

Another variant is to explicitly provide a key to store array of all unnamed topic words like that:

local topic_matcher = mqtt.topic_matcher { -- without "create_" it looks like a class constructor calling
    topic = "homes/+/+/#",
    "homeid",
    "roomid",
    rest = "varargs",
}

local fields, err = topic_matcher("homes/myhome/living/mainlights/brightness")

print(fields.homeid)       --> "myhome"
print(fields.roomid)       --> "living"
print(fields[1])           --> "myhome"
print(fields[2])           --> "living"
print(fields.varargs[1])   --> "mainlights"
print(fields.varargs[2])   --> "brightness"

Tieske · 2022-08-10T21:50:42Z

It will simplify mqtt.topic_matcher constructor args validation so you don't have to check that amount of number-indexed keys are equals to the /+/ topic wildcards.

if we adjust compile_topic_pattern to return as second and third return values the number of + and # found, that check could actually be very simple.

This option:

print(fields.homeid)       --> "myhome"
print(fields.roomid)       --> "living"
print(fields[1])           --> "myhome"
print(fields[2])           --> "living"
print(fields[3])           --> "mainlights"
print(fields[4])           --> "brightness"

has as a drawback that you can't immediately tell where the varargs start in the array part. Hence I would prefer your second option;

print(fields.homeid)       --> "myhome"
print(fields.roomid)       --> "living"
print(fields[1])           --> "myhome"
print(fields[2])           --> "living"
print(fields.varargs[1])   --> "mainlights"
print(fields.varargs[2])   --> "brightness"

But here the user needs an extra field name for the varargs, to prevent namecollisions between the field names and the varargs field.

Leading me to:

print(fields.homeid)       --> "myhome"
print(fields.roomid)       --> "living"
print(fields[1])           --> "mainlights"
print(fields[2])           --> "brightness"

as the simplest version, clearly separating the known fields from the varargs, whilst making varargs size and traversal easy as the array part. And no extra user specified name required.

Tieske · 2022-08-10T22:01:02Z

mqtt/init.lua

+	function mqtt.compile_topic_pattern(t)
+		t = assert(mqtt.validate_subscribe_topic(t))
+		if t == "#" then
+			t = MATCH_ALL
+		else
+			t = t:gsub("#", MATCH_HASH)
+			t = t:gsub("%+", MATCH_PLUS)
+		end
+		return "^"..t.."$"
+	end


Updated to return the number of fields and presence of varargs;

Suggested change

function mqtt.compile_topic_pattern(t)

t = assert(mqtt.validate_subscribe_topic(t))

if t == "#" then

t = MATCH_ALL

else

t = t:gsub("#", MATCH_HASH)

t = t:gsub("%+", MATCH_PLUS)

end

return "^"..t.."$"

end

function mqtt.compile_topic_pattern(t)

t = assert(mqtt.validate_subscribe_topic(t))

if t == "#" then

return MATCH_ALL, 0, true

end

local fields, vararg

t, vararg = t:gsub("#", MATCH_HASH)

t, fields = t:gsub("%+", MATCH_PLUS)

return "^"..t.."$", fields, vararg ~= 0

end

xHasKx · 2022-08-11T16:51:05Z

Leading me to:

I suggested rest = "varargs" because of that possible case:

local topic_matcher = mqtt.topic_matcher {
    topic = "homes/+/+/#",
    "homeid",
    -- "roomid", -- note only one name for +
    ...
}

Where the second + wildcard are without given name.
In your case its value from the topic will be lost.

And we have to find a proper form for the case when user don't have names for +-es at all.

Leading me to:

I suggested rest = "varargs" because of that possible case:

local topic_matcher = mqtt.topic_matcher {
    topic = "homes/+/+/#",
    "homeid",
    -- "roomid", -- note only one name for +
    ...
}

Where the second + wildcard are without given name.
In your case its value from the topic will be lost.

And we have to find a proper form for the case when user don't have names for +-es at all.

We can define some default name for the rest = "varargs" to store all +-wildcards in the root returned object and all #-wildcards into it's subfield with that default name.

Or accept that it's ok to return two values from the matcher function - first table with +-es with all values as indexes and some of them with names, and the second table with #-s as an array.
What do you think?
We can define some default name for the rest = "varargs" to store all +-wildcards in the root returned object and all #-wildcards into it's subfield with that default name.

Or accept that it's ok to return two values from the matcher function - first table with +-es with all values as indexes and some of them with names, and the second table with #-s as an array.
What do you think?

Tieske · 2022-08-11T18:05:53Z

Where the second + wildcard are without given name.
In your case its value from the topic will be lost.

is that expected to be a common case? I’d expect "all fields named" to be the common case. We can just throw an error if there aren’t enough field names?

Or we could inject numbered names field 1 etc

xHasKx · 2022-08-15T14:35:46Z

I would say, we can't predict which case will be more common. The only thing I'm sure is that the best interface is the one that hard to use in the wrong way unintentionally.

We can do one step back and ask how that topic_matcher can be used?
I mean, users start their code from the client instance - creating it, then connecting and subscribing it to some topics.
Then somewhere has to be a callback function with arrived PUBLISH message in its argument. And that message's topic can be parsed using the topic_matcher instance.
But who should call the topic_matcher instance - the library before calling callback or the user itself inside the callback code?

I mean, user can call some new method client:add_msg_handler(topic_filter, matcher_options, callback) where matcher_options is an args for the topic_matcher instance creation. Then for the each new PUBLISH messages callback will be called in that way:

callback(publish_message, topic_fields, topic_varargs_fields)

In this case for the the topic_matcher instance will be better to return 2 values - +-wildcards with optional names and #-wildcards as two separate tables (by the way, the second table also can contain user-defined names).

In the case of two tables in resulf of the topic_matcher, it's better to always set array-keys into the table and optionally add user-defined names.

(and of course users can create and use the topic_matcher instances separately, in the only one publish messages callback)

What do you think?

Tieske force-pushed the topics branch 4 times, most recently from 5ba0510 to 525351a Compare November 4, 2021 07:54

Tieske force-pushed the topics branch from 525351a to 7054005 Compare November 9, 2021 14:04

Tieske added 2 commits August 6, 2022 13:56

chore(*) add config files (busted/editor)

b3ad989

feat(topics) add validation and matching

5a0e12c

Tieske force-pushed the topics branch from 7054005 to 5a0e12c Compare August 6, 2022 11:56

xHasKx reviewed Aug 8, 2022

View reviewed changes

.editorconfig Show resolved Hide resolved

xHasKx reviewed Aug 8, 2022

View reviewed changes

mqtt/init.lua Outdated Show resolved Hide resolved

xHasKx reviewed Aug 8, 2022

View reviewed changes

mqtt/init.lua Show resolved Hide resolved

xHasKx reviewed Aug 8, 2022

View reviewed changes

address review comments

837a73e

Tieske force-pushed the topics branch from 572724b to 837a73e Compare August 8, 2022 22:38

Tieske commented Aug 10, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

generic topic validation and parsing #36

generic topic validation and parsing #36

Tieske commented Nov 3, 2021

Tieske commented Nov 3, 2021

xHasKx commented Nov 4, 2021

Tieske commented Nov 4, 2021

Tieske commented Nov 4, 2021

xHasKx commented Nov 4, 2021

Tieske commented Nov 4, 2021

xHasKx commented Nov 4, 2021 •

edited

Loading

xHasKx commented Nov 4, 2021

Tieske commented Nov 4, 2021

Tieske commented Aug 6, 2022

xHasKx Aug 8, 2022 •

edited

Loading

Tieske Aug 8, 2022

xHasKx Aug 8, 2022 •

edited

Loading

Tieske Aug 8, 2022

xHasKx Aug 8, 2022 •

edited

Loading

xHasKx Aug 8, 2022

Tieske Aug 8, 2022

xHasKx commented Aug 8, 2022

xHasKx commented Aug 8, 2022

xHasKx commented Aug 8, 2022

Tieske commented Aug 8, 2022

Tieske commented Aug 8, 2022 •

edited

Loading

xHasKx commented Aug 9, 2022

xHasKx commented Aug 10, 2022

Tieske commented Aug 10, 2022

Tieske Aug 10, 2022

xHasKx commented Aug 11, 2022

Tieske commented Aug 11, 2022 •

edited

Loading

xHasKx commented Aug 15, 2022

generic topic validation and parsing #36

Are you sure you want to change the base?

generic topic validation and parsing #36

Conversation

Tieske commented Nov 3, 2021

Tieske commented Nov 3, 2021

xHasKx commented Nov 4, 2021

Tieske commented Nov 4, 2021

Tieske commented Nov 4, 2021

xHasKx commented Nov 4, 2021

Tieske commented Nov 4, 2021

xHasKx commented Nov 4, 2021 • edited Loading

xHasKx commented Nov 4, 2021

Tieske commented Nov 4, 2021

Tieske commented Aug 6, 2022

xHasKx Aug 8, 2022 • edited Loading

Choose a reason for hiding this comment

Tieske Aug 8, 2022

Choose a reason for hiding this comment

xHasKx Aug 8, 2022 • edited Loading

Choose a reason for hiding this comment

Tieske Aug 8, 2022

Choose a reason for hiding this comment

xHasKx Aug 8, 2022 • edited Loading

Choose a reason for hiding this comment

xHasKx Aug 8, 2022

Choose a reason for hiding this comment

Tieske Aug 8, 2022

Choose a reason for hiding this comment

xHasKx commented Aug 8, 2022

xHasKx commented Aug 8, 2022

xHasKx commented Aug 8, 2022

Tieske commented Aug 8, 2022

Tieske commented Aug 8, 2022 • edited Loading

xHasKx commented Aug 9, 2022

xHasKx commented Aug 10, 2022

Tieske commented Aug 10, 2022

Tieske Aug 10, 2022

Choose a reason for hiding this comment

xHasKx commented Aug 11, 2022

Tieske commented Aug 11, 2022 • edited Loading

xHasKx commented Aug 15, 2022

xHasKx commented Nov 4, 2021 •

edited

Loading

xHasKx Aug 8, 2022 •

edited

Loading

xHasKx Aug 8, 2022 •

edited

Loading

xHasKx Aug 8, 2022 •

edited

Loading

Tieske commented Aug 8, 2022 •

edited

Loading

Tieske commented Aug 11, 2022 •

edited

Loading