Skip to content

Commit

Permalink
Update for qunteda v4.0
Browse files Browse the repository at this point in the history
dfm.character() is deprecated in v4.0
  • Loading branch information
koheiw authored Apr 10, 2024
1 parent dd316a7 commit 7d68d9e
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion vignettes/overview.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ The test makes more sense if more than one coder is involved. A suggested workfl
Preprocess and create a document-feature matrix

```{r, eval = FALSE}
dfm(abstracts$text, tolower = TRUE, stem = TRUE, remove = stopwords('english'), remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE, remove_hyphens = TRUE) %>% dfm_trim(min_docfreq = 3, max_docfreq = 500) %>% dfm_select(min_nchar = 3, pattern = "^[a-zA-Z]+$", valuetype = "regex") -> abstracts_dfm
tokens(abstracts$text, remove_punct = TRUE, remove_symbols = TRUE, remove_numbers = TRUE, remove_url = TRUE, spilit_hyphens = TRUE) %>% tokens_wordstem %>% tokens_remove(stopwords("en")) %>% dfm(tolower = TRUE) %>% dfm_trim(min_docfreq = 3, max_docfreq = 500) %>% dfm_select(min_nchar = 3, pattern = "^[a-zA-Z]+$", valuetype = "regex") -> abstracts_dfm
```

Train a topic model.
Expand Down

0 comments on commit 7d68d9e

Please sign in to comment.