Skip to content

Number of Topics vs Probability Threshold #1372

Closed Answered by MaartenGr
noahberhe asked this question in Q&A
Discussion options

You must be logged in to vote

Am I right in thinking that the greater the number of clusters created by the model then the lower the probability needed for a document to be assigned to a cluster?

It depends on where the probability is retrieved from, namely the underlying cluster model. However, the probabilities are generally more dispersed across topics which results in lower probabilities. That, however, is from an absolute perspective and you generally want to compare relatively.

I can see in my dataset of about 100,000 docs there are 120 clusters created, and docs mapped to a cluster can have probabilities as low as e.g. 0.01.

As mentioned above, it depends on the underlying cluster model. The probabilities w…

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@noahberhe
Comment options

@MaartenGr
Comment options

Answer selected by noahberhe
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants