You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The infix matching skips matches that start at index 0 in the token string. Could you match this as a prefix instead (probably still in addition to the infix matching)?
How to reproduce the behaviour
I would like the tokenizer to split by nearly any punctuation symbol, and I am having issues in some weird cases.
I initialize the tokenizer this way:
But, although the dot is set as an infix, I get this:
I can't understand why '.2014' is output as a token and is not split in '.' and '2014'
Is there something weird going on there? Or am I missing something? Any help is appreciated
Your Environment
The text was updated successfully, but these errors were encountered: