Skip to content

Similar subtree matching notes

Gunther Cox edited this page Nov 21, 2016 · 4 revisions

Goal: Given a list of strings, find the most similar matching sequence in a graph data structure.

  • The found sequence may have additional strings or strings that are not in the one being searched for.
  • An optimal solution will have the greatest number of strings that have a close match in the original sequence, and the fewest number of extraneous strings added.
  • The last statement in the best matching sequence that gets selected should have possible responses to choose from.

Ideas

  1. Convert text to stemmed text
  2. Compare stemmed strings

A "smarter" comparison algorithm could equate similar sentences by their actual meaning. For example, sentences such as:

Have a free sample of this frozen dairy beverage. Try a sample of ice cream at no charge.

These two statements have very different text but the same meaning.

Object models

  • Communication: A connection that represents the response relationship between two statemets and holds other meta data.
    • speaker: F-key Persona
    • date: datetime
    • statement: F-key Statement
    • in_response_to: F-key Statement - The parent node
  • Conversation
    • statements: A ordered list of Statement objects representing a conversation

References

  • See Algorithms book, page 380
    • Wildcard between each statement?
  • Think of a sequence of strings in the graph as one long string?
  • Tries data structure