-
-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
consider adding a tokenizer/scanning routine #103
Comments
I think if
This gives me correct results. |
Ah yeah, great catch! |
There's still one error, because we prematurely cat the final output, but this "final" node may still be a prefix of another node that's valid. So with that corrected:
|
Regarding the name, I think |
Having done exactly this, I agree that inclusion in the public API would be desirable. (Perhaps more generally, prefix matching would be a natural inclusion given the trie's internal structure, though expectations vary when it comes to APIs.) |
@llogiq brought up this use case where one might have a really big set of tokens that one wants to use to scan some text. It turns out that this is pretty easy to support using existing public APIs, but it might be nice to actually make this a proper part of the API. Here is a candidate implementation:
And the output:
The text was updated successfully, but these errors were encountered: