pyLDAvis "__num_dist_rows__" cannot assure that all rows' sum is 1 #173

umusa · 2020-09-08T06:21:30Z

I am trying to use pyLDAvis to visualize LDA results on databricks.

The env:

 Spark NLP version:  2.5.5
 Apache Spark version:  2.4.5

I got error:

 ValidationError: 
 * Not all rows (distributions) in topic_term_dists sum to 1.

from code:

pyLDAvis.prepare(**data)

the data has two arrays:

   data['doc_topic_dists'], data['doc_lengths']

def __num_dist_rows__(array, ndigits=2):
return array.shape[0] - int((pd.DataFrame(array).sum(axis=1) < 0.999).sum())

to make sure that all rows' sum is 1.

But, I still got the error.

I found that the error only poped up when it size is large. Currently, it is 900+ rows.

If it is 300+ rows, no errorr.

Could anybody help me with this ?

thanks

The text was updated successfully, but these errors were encountered:

Provide feedback