Glove method - lemmatization #8

MarekKeskyll · 2021-05-29T08:21:36Z

Hi!
Questions:

Why didn't you use lemmatization when processing your document's? Is there a reason behind that?
Why did you use this glove pre-trained model(dimensions)?
Can you validate the results somehow?

4OH4 · 2021-06-01T09:35:33Z

Hi there,

Good question on the use of lemmatization - I did use it on the TF-idf model, but not for Glove. I think its more important for TF-idf, in order to get accurate word counts. I don't remember why I did not use it with the Glove model (the example is based on some project work that I did, but borrows heavily from the documentation) - I would expect that I tried it and found that for the particular use case I was looking at it did not offer a performance benefit.

The glove-wiki-gigaword-50 model is the smallest of the Gensim models trained on Wikipedia, in terms of model complexity. I was originally looking at near real-time processing of high volumes of data, so compute requirements and latency was an issue. This model is the fastest to run. You would expect to get accuracy benefits if moving to a more complex model, although they may be small and at the cost of significant additional memory requirements.

For my application, I was comparing against human operators that were conducting information retrieval tasks. Our metric was (something like) how often does the most similar document appear in the top-1, top-3, or top-10 positions. That is quite a hard route for validation though, and can be quite expensive. Perhaps validation against an equivalent gold standard automation technique might be better? TF-idf is an established technique that is well used, so is a reasonable baseline against which you could compare more advanced techniques such as Glove, to see if they offer an improvement.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Glove method - lemmatization #8

Glove method - lemmatization #8

MarekKeskyll commented May 29, 2021

4OH4 commented Jun 1, 2021

Glove method - lemmatization #8

Glove method - lemmatization #8

Comments

MarekKeskyll commented May 29, 2021

4OH4 commented Jun 1, 2021