This is a minor update that mainly made dictionary search much faster (enabling the use of huge dictionary resources, such as named entity lists).
This update makes some changes to the API.
- Many of the functionalities that were before provided as R6 methods of the tCorpus are now replaced with regular functions. The R6 methods are now only used in cases where the tCorpus is modified by reference.
- Working with the R6 methods is now optional for operations that modify the tCorpus, such as subsetting. Before, the default approach was to modify the tCorpus by reference, but with the option to make copies. Now, this behavior is split up into functions and R6 methods. For instance, instead of the R6 method 'subset' that changes the tCorpus by reference, the generic subset (S3) function returns the subset without modifying the input tCorpus. Using the R6 methods is still recommended, but unless large corpora are used it's not that important.
Several new features have been added, and there is (finally) a vignette.
- A versatile browse_texts function has been added that makes it easy to create full-text browsers from a tCorpus, using token features or annotations (e.g., search results, topics from a topic model, sentiment scores) to highlight and navigate the texts.
- The udpipe wrapper in create_tcorpus now makes persistent caches for the last three calls, that store results for every batch (100 docs) that has been processed. This makes it easy to parse many documents across sessions (and continue if stuff freezes), and is overall more user friendly.
- A vignette has been added that demonstrates the main features, and discusses the reason for and use of the R6 methods.