Quick fix for calculation of widths of red bars #41
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This branch implements a quick fix for calculating the widths of the red bars. Here, we calculate the term frequencies internally within
createJSON()
rather than using the user-suppliedterm.frequency
.The details are described in Issue #32
The good news is that the red bar widths are correct now. The bad news is that the blue bar widths, representing the overall frequencies of words, are not necessarily correct. They won't match the actual term frequencies provided by the user. Most of the differences are small, but this is a lingering issue. As mentioned in Issue #32 the solution may be to require the user to specify the priors that he/she used in fitting the model, so that the red and blue bars can properly account for the influence of the priors as well as the raw data itself. This would require a few additional calculations, and so far it only works when I've fit the model using the collapsed Gibbs sampler -- it failed to correctly visualize a model fit using gensim, which implements variational Bayes to fit the LDA model. I'm not sure if this is the source of the error, or just a coincidence.
Anyway, in this branch I've also updated the vignette called "details" with a similar explanation as written here.
I think this is an improvement on the previous version, so we should merge it, and maybe a bit down the line we can solve the problem for good.