Similar subtree matching notes

Jump to bottom

Gunther Cox edited this page Nov 21, 2016 · 4 revisions

Goal: Given a list of strings, find the most similar matching sequence in a graph data structure.

The found sequence may have additional strings or strings that are not in the one being searched for.
An optimal solution will have the greatest number of strings that have a close match in the original sequence, and the fewest number of extraneous strings added.
The last statement in the best matching sequence that gets selected should have possible responses to choose from.

Ideas

Convert text to stemmed text
Compare stemmed strings

A "smarter" comparison algorithm could equate similar sentences by their actual meaning. For example, sentences such as:

Have a free sample of this frozen dairy beverage. Try a sample of ice cream at no charge.

These two statements have very different text but the same meaning.

Object models

Communication: A connection that represents the response relationship between two statemets and holds other meta data.
- speaker: F-key Persona
- date: datetime
- statement: F-key Statement
- in_response_to: F-key Statement - The parent node
Conversation
- statements: A ordered list of Statement objects representing a conversation

References

See Algorithms book, page 380
- Wildcard between each statement?
Think of a sequence of strings in the graph as one long string?
Tries data structure