You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Feature request here. I would like to be able to have wordcloud be able to detect commonly detected phrases. For instance rather than "hole in one" be detected as 3 different words, if it appears in that order multiple times then the phrase could carry weight, as opposed to the individual words. Another example would be "toll road" or "run of the mill".
This feature could be turned on and off with a --phrase tag at command line.
For added bonus, the phrase detection length could be fine tuned at the command line. For instance, --phrase=2 could parse for phrases of a max length of 2 words.
The text was updated successfully, but these errors were encountered:
This is actually implemented and turned on by default for phrases of length
two. Run the "new hope" example and you'll see "death star" as a single
phrase, or "United States" for the Constitution.
This is controlled by the collocation parameter (which is the technical
term for the way phrases are detected).
This could be expanded to longer phrases, but I haven't done that, mostly
because I didn't need it and didn't have the time.
Pull request welcome. There's a reference in the code for the paper, or you
can check out nltk. I'm using a heuristic to discount the words that make
up the phrase, that would need to be extended to lager phrases, too, but
that shouldn't be a problem.
Feature request here. I would like to be able to have wordcloud be able to detect commonly detected phrases. For instance rather than "hole in one" be detected as 3 different words, if it appears in that order multiple times then the phrase could carry weight, as opposed to the individual words. Another example would be "toll road" or "run of the mill".
This feature could be turned on and off with a --phrase tag at command line.
For added bonus, the phrase detection length could be fine tuned at the command line. For instance, --phrase=2 could parse for phrases of a max length of 2 words.
The text was updated successfully, but these errors were encountered: