Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve speed of KWIC results return time #60

Open
stephbuon opened this issue Feb 11, 2022 · 1 comment
Open

Improve speed of KWIC results return time #60

stephbuon opened this issue Feb 11, 2022 · 1 comment

Comments

@stephbuon
Copy link
Owner

Go to Language/Word Context. Then select "similarity" from the measure drop down. Then search for a word in the corpus. The app will return a scatter plot for word most associated to the search word (according to word2vec and cosign similarity. see line 102). If you click on one of those scatter plot points and wait for ~9 seconds a data frame will pop up with the word's keyword in context (KWIC).

Obviously, it's a problem that it takes ~9 seconds for results to return. Can we optimize KWIC so it returns results in a reasonable amount of time?

Here's the KWIC code:
https://github.com/stephbuon/hansard-shiny/tree/main/app/modules/kwic

It's called by: https://github.com/stephbuon/hansard-shiny/blob/main/app/modules/word-context/word_context.R

Caching the results (kwick_cache.R) obviously allows us to return results in real time, however, I don't know if we would generate too much cache.

You'll see that I am borrowing a function from Quanteda (this one: https://quanteda.io/reference/kwic.html)

@stephbuon
Copy link
Owner Author

@EliasLMann here is another first problem you can work on if you do not want to work on Log Likelihood.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants