You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Optional argument to chatlight: dictionary file, ordered by counted occurences in a specific corpus. In JSON format like so:
[
["hello", 34],
["world", 56]
]
Where the "tuples" are a word and an incidence count in the corpus of text that's considered representative. Zero-incidence words that are considered valid but did not appear in the corpus should be indicated by a record with an incidence of 0, not be dropped.
Given this file and another path to write state to (an absolute filename whose directory already exists to read/write JSON to), record incidence in chat of non-categorized words.
Another utility should exist that takes the same two files and lists non-categorized words experienced in chat, ordered in some hybrid (TBD) of rarity in corpus and commonality in chat.
The text was updated successfully, but these errors were encountered:
Dump as tf-idf that will guess conversation topics.
Use top collocations, with stopwords dumped out. i.e. most common pairs, or triples of words. Give a human the top 10, let them categorize.
This is a paraphrase of what I think he told me, so grain of salt.
Optional argument to chatlight: dictionary file, ordered by counted occurences in a specific corpus. In JSON format like so:
Where the "tuples" are a word and an incidence count in the corpus of text that's considered representative. Zero-incidence words that are considered valid but did not appear in the corpus should be indicated by a record with an incidence of 0, not be dropped.
Given this file and another path to write state to (an absolute filename whose directory already exists to read/write JSON to), record incidence in chat of non-categorized words.
Another utility should exist that takes the same two files and lists non-categorized words experienced in chat, ordered in some hybrid (TBD) of rarity in corpus and commonality in chat.
The text was updated successfully, but these errors were encountered: