Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wordcloud #25

Open
CarlsoFiorention opened this issue Aug 21, 2020 · 10 comments
Open

Wordcloud #25

CarlsoFiorention opened this issue Aug 21, 2020 · 10 comments

Comments

@CarlsoFiorention
Copy link
Collaborator

CarlsoFiorention commented Aug 21, 2020

  • Choose a font consistent with the fonts used in menus and labels
  • arrange the cloud horizontally
  • make the cloud one neutral color (grey)
  • filter irrelevant terms (of, the, title cases, etc.)
  • make the items responsive: when click on a word activate results from that category
  • make the word cloud not only available for subjects, but also institutions and supervisors

03 wordcloud

@sfarnel
Copy link
Member

sfarnel commented Aug 24, 2020

Do not add Supervisor at this stage as we are determining feasibility based on the data. This may be for future work.

@danydvd can you please investigate the process for filtering terms (e.g., removing stop words)?

@jchartrand
Copy link
Collaborator

James estimate:

Not sure what is meant here by 'activate results':

"make the items responsive: when click on a word activate results from that category"

An example would be good like, "When clicking 'alberta' in the word cloud that should invoke a brand new query that replaces whatever term is in the given 'category', e.g, subject, with the new term and otherwise run the same query as had just been run)

Other word cloud changes depend on what the word cloud software allows, however, estimate (aside from removing stop words, which Danoosh would do in SOLR): 2-5 days.

@sfarnel
Copy link
Member

sfarnel commented Aug 24, 2020

@CarlsoFiorention can you briefly further clarify for James?

@CarlsoFiorention
Copy link
Collaborator Author

This is also illustrated in one of the scenarios. By "responsive" it means that words within the wordcloud can be clicked by the user, and show the list of all the documents available in the collection under that category (e.g. "biology"). This may result in invoking a new query funneling down to small number of results (e.g. select "institution" from the top menu, may show a word cloud of all institutions that include "biology" and the list of results will get shorter when you click in one word (e.g. U of A)

@danydvd
Copy link
Contributor

danydvd commented Aug 24, 2020

@sfarnel @jchartrand I can try and implement SOLR's stopwords_ca.txt and reindex the entire data but I don't think that will solve our problems here as we have many multi word subjects (e.g. personality and academic achievement) and removing stopwords (e.g and ) might make the subject seems weird. I think the world cloud right now is tokenizing the subjects into single words!

@sfarnel
Copy link
Member

sfarnel commented Aug 24, 2020

Thanks @danydvd It seems this needs more investigation before we try something. Would you be able to dig into this a bit more? (if @jchartrand can point you to the library he's using; wordcloud.js?)

@jchartrand
Copy link
Collaborator

jchartrand commented Aug 24, 2020 via email

@sfarnel
Copy link
Member

sfarnel commented Aug 24, 2020

Thanks James. @danydvd if you can poke around the package James mentions and see if you can find anything promising that would be great; thanks!

@danydvd
Copy link
Contributor

danydvd commented Aug 25, 2020

@jchartrand I created too more SOLR cores (CanLink and CanLink-1 and re-indexed the data (same data source as before) with different tokenizer options. I was not sure how to change the react part (and did not want to make a mess). can we try the wordcloud with these?

@jchartrand
Copy link
Collaborator

jchartrand commented Aug 26, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants