Replies: 1 comment
-
What you can do is calculate the document embeddings and compare them through cosine similarity with the topic embeddings. Then, you simply pick the ones with the second-highest probability and re-assign them using Another solution would be to use the Lastly, you could use manual topic modeling to create a new model if you want to manually assign topics to documents. It is similar to
I believe you would have to use the similarity matrix as probabilities after updating the topics. |
Beta Was this translation helpful? Give feedback.
-
Hi all,
While BERTopic does a fantastic job finding meaningful topics in my data, some of the topics it identifies are substantially uninteresting given my particular use case. I am therefore trying to find a way to reassign documents from these topics to their second highest probability topics, effectively performing outlier reduction on them.
To give some more context, I am analyzing open-ended survey responses containing respondents' arguments for their position on immigration. Among the topics generated is a fairly large one relating to immigration itself, containing arguments like "immigration boosts the economy" and "immigrants enrich our culture". Since the general topic of immigration is given by the context of the survey question, these responses would be better placed in topics related to the economy and culture respectively (which BERTopic does an excellent job identifying). Is there a straightforward way of reassigning the documents placed in the "immigration" topic to their second most likely topic?
A potentially complicating factor is that I'd eventually like to run covariate analyses on the topics as discussed here. It would therefore be ideal if I was able to estimate the probabilities of each document belonging to each of the new, updated topics. Any advice on how this could be incorporated into a solution to the above would be very much appreciated!
Beta Was this translation helpful? Give feedback.
All reactions