Getting errors 1) for a larger zeroshot_topic_list and 2) when tried to reduce outliers #2174
Replies: 2 comments 1 reply
-
It might have something to do with the version that you are using. Since then, there have been several fixes to zero-shot topic modeling. Fortunately, I just released 0.16.4. Could you try that version? |
Beta Was this translation helpful? Give feedback.
-
Thanks @MaartenGr, But I am geeting the following error on topic_model.save when using embedding_model=openai_embedder. Code:
Error: topic_model.save(f'{destination_path}{filen}{middle_part_filename}{today}') |
Beta Was this translation helpful? Give feedback.
-
Hi @MaartenGr ,
I need your help regarding the following issues.
Code:
`vectorizer_model = CountVectorizer(stop_words="english", ngram_range=(1, 4))
topic_model = BERTopic(language="english",
#top_n_words=15,
min_topic_size=10,
zeroshot_topic_list=zeroshot_topic_list,
zeroshot_min_similarity=0.6,
verbose=True,
representation_model=representation_model_chatgpt,
calculate_probabilities=True,
vectorizer_model=vectorizer_model
)
topics, probs = topic_model.fit_transform(docs)`
Error due to a large list of zeroshot_topic_list:
topics, probs = topic_model.fit_transform(docs)
File ".\lib\site-packages\bertopic_bertopic.py", line 448, in fit_transform
predictions = self._combine_zeroshot_topics(documents, assigned_documents, assigned_embeddings)
File ".\lib\site-packages\bertopic_bertopic.py", line 3540, in _combine_zeroshot_topics
merged_model = BERTopic.merge_models([zeroshot_model, self], min_similarity=1)
File ".\lib\site-packages\bertopic_bertopic.py", line 3153, in merge_models
new_tensors = tensors[new_topic - selected_topics["_outliers"]]
IndexError: index -2 is out of bounds for axis 0 with size 1
a. When using;
new_topics = topic_model.reduce_outliers(docs, topics, probabilities=probs, threshold=0.05, strategy="probabilities")
Error:
new_topics = topic_model.reduce_outliers(docs, topics, probabilities=probs, threshold=0.05, strategy="probabilities")
File ".\lib\site-packages\bertopic_bertopic.py", line 2142, in reduce_outliers
raise ValueError("Make sure to pass in
probabilities
in order to use the probabilities strategy")ValueError: Make sure to pass in
probabilities
in order to use the probabilities strategyb. and When use;
new_topics = topic_model.reduce_outliers(docs, topics)
Error:
new_topics = topic_model.reduce_outliers(docs, topics)
File ".\lib\site-packages\bertopic_bertopic.py", line 2153, in reduce_outliers
topic_distr, _ = self.approximate_distribution(outlier_docs, min_similarity=threshold, **distributions_params)
File ".\lib\site-packages\bertopic_bertopic.py", line 1279, in approximate_distribution
topic_distributions = np.vstack(topic_distributions)
File "<array_function internals>", line 180, in vstack
File ".\lib\site-packages\numpy\core\shape_base.py", line 282, in vstack
return _nx.concatenate(arrs, 0)
File "<array_function internals>", line 180, in concatenate
ValueError: need at least one array to concatenate
Thanks
Beta Was this translation helpful? Give feedback.
All reactions