-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wordcloud #25
Comments
Do not add Supervisor at this stage as we are determining feasibility based on the data. This may be for future work. @danydvd can you please investigate the process for filtering terms (e.g., removing stop words)? |
James estimate: Not sure what is meant here by 'activate results': "make the items responsive: when click on a word activate results from that category" An example would be good like, "When clicking 'alberta' in the word cloud that should invoke a brand new query that replaces whatever term is in the given 'category', e.g, subject, with the new term and otherwise run the same query as had just been run) Other word cloud changes depend on what the word cloud software allows, however, estimate (aside from removing stop words, which Danoosh would do in SOLR): 2-5 days. |
@CarlsoFiorention can you briefly further clarify for James? |
This is also illustrated in one of the scenarios. By "responsive" it means that words within the wordcloud can be clicked by the user, and show the list of all the documents available in the collection under that category (e.g. "biology"). This may result in invoking a new query funneling down to small number of results (e.g. select "institution" from the top menu, may show a word cloud of all institutions that include "biology" and the list of results will get shorter when you click in one word (e.g. U of A) |
@sfarnel @jchartrand I can try and implement SOLR's stopwords_ca.txt and reindex the entire data but I don't think that will solve our problems here as we have many multi word subjects (e.g. personality and academic achievement) and removing stopwords (e.g and ) might make the subject seems weird. I think the world cloud right now is tokenizing the subjects into single words! |
Thanks @danydvd It seems this needs more investigation before we try something. Would you be able to dig into this a bit more? (if @jchartrand can point you to the library he's using; wordcloud.js?) |
I’m using https://www.npmjs.com/package/react-wordcloud
I do see what Danoosh means about the tokenization. There may well be a setting to stop react-wordcloud from doing that.
And quickly looking at the npm page, the documentation does say that it can handle stop words, although as Danoosh points out that might make the subjects (that rely on the stop words for meaning) confusing.
… On Aug 24, 2020, at 4:20 PM, Sharon Farnel ***@***.***> wrote:
Thanks @danydvd <https://github.com/danydvd> It seems this needs more investigation before we try something. Would you be able to dig into this a bit more? (if @jchartrand <https://github.com/jchartrand> can point you to the library he's using; wordcloud.js?)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#25 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAEFSXOTYYUEKXCUAJ6B2WTSCLDSHANCNFSM4QHVVUYQ>.
|
Thanks James. @danydvd if you can poke around the package James mentions and see if you can find anything promising that would be great; thanks! |
@jchartrand I created too more SOLR cores (CanLink and CanLink-1 and re-indexed the data (same data source as before) with different tokenizer options. I was not sure how to change the react part (and did not want to make a mess). can we try the wordcloud with these? |
Yes, will do. Thanks Danoosh.
…Sent from my iPhone
On Aug 25, 2020, at 4:03 PM, Danoosh Davoodi ***@***.***> wrote:
@jchartrand I created too more SOLR cores (CanLink and CanLink-1 and re-indexed the data (same data source as before) with different tokenizer options. I was not sure how to change the react part (and did not want to make a mess). can we try the wordcloud with these?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
The text was updated successfully, but these errors were encountered: