Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inappropriate usage of gensim models in gensim_models.py #204

Open
jonaschn opened this issue Apr 21, 2021 · 1 comment
Open

Inappropriate usage of gensim models in gensim_models.py #204

jonaschn opened this issue Apr 21, 2021 · 1 comment
Assignees

Comments

@jonaschn
Copy link
Contributor

The code for support of gensim models looks pretty old.
I am not sure if gensim (at the time of writing this code) didn't support better means to achieve the goals this code tries to achieve.

Example:

# TODO: add the hyperparam to smooth it out? no beta in online LDA impl.. hmm..
# for now, I'll just make sure we don't ever get zeros...
beta = 0.01
fnames_argsort = np.asarray(list(dictionary.token2id.values()), dtype=np.int_)
term_freqs = corpus_csc.sum(axis=1).A.ravel()[fnames_argsort]
term_freqs[term_freqs == 0] = beta

The LDA model does not offer the beta parameter because it is called eta in gensim.
Furthermore, the gensim's Dictionary offers the term frequency (across the collection) as model.id2word.cfs and document frequency model.id2word.dfs (in how many documents the term occurs).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants