Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lemmatizing with Mallet #203

Open
Glorifier85 opened this issue Jun 29, 2021 · 1 comment
Open

Lemmatizing with Mallet #203

Glorifier85 opened this issue Jun 29, 2021 · 1 comment

Comments

@Glorifier85
Copy link

HI there,

First of: great solution that has helped me a lot in the past. I am currently preparing to do topic modeling via Mallet and have finished pulling the raw datasets. Before I import and start modeling, I need to take some steps to clean and streamline the texts. What I am a little fuzzy about is stemming and lemmatizing. Not on the concept itself but rather what the best approach would be.

To be specific, here is what I need to do:

  • standardize inconsistencies in spelling, e.g. topicmodeling -> topic modeling
  • remove extra whitespaces from words, e.g. two whitespaces in a row
  • stem and lemmatize

I realize that this is not exactly an issue with Mallet but I was hoping that anyone, based on experience, could recommend an approach on how to best tackle that?

Many thanks in advance!

@sdedeo
Copy link

sdedeo commented Jun 29, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants