Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-submission inquiry for {kgrams}: Classical k-gram Language Models #450

Closed
1 of 12 tasks
vgherard opened this issue Jul 2, 2021 · 1 comment
Closed
1 of 12 tasks

Comments

@vgherard
Copy link

vgherard commented Jul 2, 2021

Submitting Author: Valerio Gherardi (@vgherard)
Repository: https://github.com/vgherard/kgrams
Submission type: Pre-submission


  • Paste the full DESCRIPTION file inside a code block below:
Package: kgrams
Title: Classical k-gram Language Models
Version: 0.1.0.9000
Authors@R: 
    person(given = "Valerio",
           family = "Gherardi",
           role = c("aut", "cre"),
           email = "[email protected]",
           comment = c(ORCID = "0000-0002-8215-3013"))
Description: 
        Tools for training and evaluating k-gram language models in R, 
        supporting several probability smoothing techniques, 
        perplexity computations, random text generation and more.
License: GPL (>= 3)
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.1
SystemRequirements: C++11
LinkingTo: 
    Rcpp, RcppProgress
Imports: 
    Rcpp, rlang, methods, utils,  RcppProgress (>= 0.1), Rdpack
Depends: 
    R (>= 3.5)
Suggests: 
    testthat (>= 3.0.0),
    covr,
    knitr,
    rmarkdown
Config/testthat/edition: 3
RdMacros: Rdpack
VignetteBuilder: knitr
URL: https://vgherard.github.io/kgrams/,
    https://github.com/vgherard/kgrams
BugReports: https://github.com/vgherard/kgrams/issues

Scope

  • Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below.:

    • data retrieval
    • data extraction
    • database access
    • data munging
    • data deposition
    • workflow automation
    • version control
    • citation management and bibliometrics
    • scientific software wrappers
    • database software bindings
    • geospatial data
    • text analysis
  • Explain how and why the package falls under these categories (briefly, 1-2 sentences). Please note any areas you are unsure of:

This package implements classical k-gram language model algorithms, including utilities for training, evaluation and text prediction. Language models are an angular stone of Natural Language Processing applications, and the conceptual simplicity of k-gram models makes them a good model baseline, also of pedagogical value.

  • Who is the target audience and what are scientific applications of this package?

The package can be useful for students and/or researchers, for performing small-scale experiments with Natural Language Processing. In addition, it might be helpful in the building of more complex language models, for quick baseline modeling.

I am not aware of any R package with same purpose and functionalities of kgrams. The CRAN package ngram has some relative overlap in scope, in that it provides k-gram tokenization algorithms, but offers no support for language model algorithms.

Not applicable

  • Any other questions or issues we should be aware of?:
  1. The package was accepted some months ago by CRAN.
  2. Despite the "lifecycle:experimental" badge and the development version number, I am not currently planning any important API change or additional feature for this package (with the exception for feedback/suggestions which might originate from an rOpenSci review, of course).
@vgherard
Copy link
Author

I opened a new presubmission inquiry in #452

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant