Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to build a transducer for longest suffix/prefix #133

Open
nitnelave opened this issue Dec 1, 2021 · 1 comment
Open

How to build a transducer for longest suffix/prefix #133

nitnelave opened this issue Dec 1, 2021 · 1 comment

Comments

@nitnelave
Copy link

Hi,

I'm working on autocompletion, and I'd like to find, given a known completion C, what is the longest suffix of the input that is a prefix of C.
Example: C = "banana"
input: "my name is alibaba"
output (overlap between the two): "ba" (or equivalently just the length 2 would be enough).

Intuitively, it seems like it would be possible to build such a transducer: feed the input and at every step it tells you what is the longest matching suffix, every state being accepting. I'm just not sure how to build it with this library, the interface to get an output from an evaluation is not clear, could you give me a hand there?

Essentially you would have a map of all the prefixes of C to their length, make that a repeating FST, and on evaluation take the max of the outputs (longest sequence).

@BurntSushi
Copy link
Owner

If you need to traverse the automaton directly, then you want a raw::Fst, and you'll likely want to make use of the raw::Node API.

I'm not sure if I quite understand what you're trying to do, but you might consider storing your completions in reverse. That way, you can traverse the automaton by starting at the end of the input simply until you cannot move any longer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants