-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normalizer mishandles "X%.", returns "X %." #196
Comments
Hi @ChanceNCounter |
Sounds good! I think it should ideally maintain the percentage as such, meaning that when the normalized phrase is passed to a tokenizer, one of the tokens should be "50%". But that's my opinion. In the long run, the oddness of the current behavior aside, there might be a design choice to be made here: @krisgesling, what are your thoughts on the extractors and percentages? |
Yeah agreed - the % is inherently tied to the number eg it's not the same as "50 apples", if anything it's closer to "0.5". Thanks for digging into this @Badboy-16 :) |
since the point of normalize was making intent parsing etc easier, this just makes it harder to detect numbers or percentages, eg, a voc file containing "percent" and "%" will no longer match in adapt, any downstream that is depending on tokens being number words might also suddenly fail this change was intentionally part of normalization process |
Okay but the current state of affairs is unacceptable. |
then normalize the symbol into a word |
I think we might be talking about different things here. The periods in the issue title are literal. The normalizer handles
|
normalize("Set Volume to 50%.") -> "Set Volume to 50 %."
This is bad. It should probably, at worst, return
"Set Volume to 50 % ."
The text was updated successfully, but these errors were encountered: