Normalizer mishandles "X%.", returns "X %." #196

ChanceNCounter · 2021-05-17T19:03:39Z

normalize("Set Volume to 50%.") -> "Set Volume to 50 %."

This is bad. It should probably, at worst, return "Set Volume to 50 % ."

The text was updated successfully, but these errors were encountered:

Badboy-16 · 2021-06-04T15:53:58Z

Hi @ChanceNCounter
I would like to work on this issue. As this would be my first contribution to this project, I'll complete the steps required to become a contributor and submit a PR shortly. :)

ChanceNCounter · 2021-06-04T19:47:11Z

Sounds good! I think it should ideally maintain the percentage as such, meaning that when the normalized phrase is passed to a tokenizer, one of the tokens should be "50%". But that's my opinion.

In the long run, the oddness of the current behavior aside, there might be a design choice to be made here: @krisgesling, what are your thoughts on the extractors and percentages?

krisgesling · 2021-06-09T02:59:50Z

Yeah agreed - the % is inherently tied to the number eg it's not the same as "50 apples", if anything it's closer to "0.5".

Thanks for digging into this @Badboy-16 :)

JarbasAl · 2021-06-11T13:16:12Z

since the point of normalize was making intent parsing etc easier, this just makes it harder to detect numbers or percentages, eg, a voc file containing "percent" and "%" will no longer match in adapt, any downstream that is depending on tokens being number words might also suddenly fail

this change was intentionally part of normalization process

ChanceNCounter · 2021-06-11T15:13:46Z

this change was intentionally part of normalization process

Okay but the current state of affairs is unacceptable.

JarbasAl · 2021-06-11T16:20:46Z

then normalize the symbol into a word

ChanceNCounter · 2021-06-11T23:14:02Z

I think we might be talking about different things here. The periods in the issue title are literal.

The normalizer handles "5%" correctly. It mishandles "5%.", returning "5 %."

"%." is nothing.

ChanceNCounter added bug Something isn't working multi_lang relates to several languages labels May 17, 2021

Badboy-16 mentioned this issue Jun 11, 2021

Implement function to tie percent sign to number #201

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalizer mishandles "X%.", returns "X %." #196

Normalizer mishandles "X%.", returns "X %." #196

ChanceNCounter commented May 17, 2021

Badboy-16 commented Jun 4, 2021

ChanceNCounter commented Jun 4, 2021

krisgesling commented Jun 9, 2021

JarbasAl commented Jun 11, 2021

ChanceNCounter commented Jun 11, 2021

JarbasAl commented Jun 11, 2021

ChanceNCounter commented Jun 11, 2021

Normalizer mishandles "X%.", returns "X %." #196

Normalizer mishandles "X%.", returns "X %." #196

Comments

ChanceNCounter commented May 17, 2021

Badboy-16 commented Jun 4, 2021

ChanceNCounter commented Jun 4, 2021

krisgesling commented Jun 9, 2021

JarbasAl commented Jun 11, 2021

ChanceNCounter commented Jun 11, 2021

JarbasAl commented Jun 11, 2021

ChanceNCounter commented Jun 11, 2021