From 0e620a72657c4fb2fcba8f67a98eb1c1b168dd06 Mon Sep 17 00:00:00 2001 From: Emil Hvitfeldt Date: Tue, 19 Mar 2024 22:08:47 -0700 Subject: [PATCH] devtools::build_readme() --- README.md | 108 +++++++++++++++++++++++++++--------------------------- 1 file changed, 55 insertions(+), 53 deletions(-) diff --git a/README.md b/README.md index b03f229a..4e61162e 100644 --- a/README.md +++ b/README.md @@ -25,55 +25,57 @@ dependencies, [`rstanarm`](https://CRAN.r-project.org/package=rstanarm), Some steps handle categorical predictors: -- `step_lencode_glm()`, `step_lencode_bayes()`, and - `step_lencode_mixed()` estimate the effect of each of the factor - levels on the outcome and these estimates are used as the new - encoding. The estimates are estimated by a generalized linear model. - This step can be executed without pooling (via `glm`) or with partial - pooling (`stan_glm` or `lmer`). Currently implemented for numeric and - two-class outcomes. - -- `step_embed()` uses `keras::layer_embedding` to translate the original - *C* factor levels into a set of *D* new variables (\< *C*). The model - fitting routine optimizes which factor levels are mapped to each of - the new variables as well as the corresponding regression coefficients - (i.e., neural network weights) that will be used as the new encodings. - -- `step_woe()` creates new variables based on weight of evidence - encodings. - -- `step_feature_hash()` can create indicator variables using feature - hashing. +- `step_lencode_glm()`, `step_lencode_bayes()`, and + `step_lencode_mixed()` estimate the effect of each of the factor + levels on the outcome and these estimates are used as the new + encoding. The estimates are estimated by a generalized linear model. + This step can be executed without pooling (via `glm`) or with + partial pooling (`stan_glm` or `lmer`). Currently implemented for + numeric and two-class outcomes. + +- `step_embed()` uses `keras::layer_embedding` to translate the + original *C* factor levels into a set of *D* new variables (\< *C*). + The model fitting routine optimizes which factor levels are mapped + to each of the new variables as well as the corresponding regression + coefficients (i.e., neural network weights) that will be used as the + new encodings. + +- `step_woe()` creates new variables based on weight of evidence + encodings. + +- `step_feature_hash()` can create indicator variables using feature + hashing. For numeric predictors: -- `step_umap()` uses a nonlinear transformation similar to t-SNE but can - be used to project the transformation on new data. Both supervised and - unsupervised methods can be used. +- `step_umap()` uses a nonlinear transformation similar to t-SNE but + can be used to project the transformation on new data. Both + supervised and unsupervised methods can be used. -- `step_discretize_xgb()` and `step_discretize_cart()` can make binned - versions of numeric predictors using supervised tree-based models. +- `step_discretize_xgb()` and `step_discretize_cart()` can make binned + versions of numeric predictors using supervised tree-based models. -- `step_pca_sparse()` and `step_pca_sparse_bayes()` conduct feature - extraction with sparsity of the component loadings. +- `step_pca_sparse()` and `step_pca_sparse_bayes()` conduct feature + extraction with sparsity of the component loadings. Some references for these methods are: -- Francois C and Allaire JJ (2018) [*Deep Learning with - R*](https://www.manning.com/books/deep-learning-with-r), Manning -- Guo, C and Berkhahn F (2016) “[Entity Embeddings of Categorical - Variables](https://arxiv.org/abs/1604.06737)” -- Micci-Barreca D (2001) “[A preprocessing scheme for high-cardinality - categorical attributes in classification and prediction - problems](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=A+preprocessing+scheme+for+high-cardinality+categorical+attributes+in+classification+and+prediction+problems&btnG=),” - ACM SIGKDD Explorations Newsletter, 3(1), 27-32. -- Zumel N and Mount J (2017) “[`vtreat`: a `data.frame` Processor for - Predictive Modeling](https://arxiv.org/abs/1611.09477)” -- McInnes L and Healy J (2018) [UMAP: Uniform Manifold Approximation and - Projection for Dimension Reduction](https://arxiv.org/abs/1802.03426) -- Good, I. J. (1985), “[Weight of evidence: A brief - survey](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=Weight+of+evidence%3A+A+brief+survey&btnG=)”, - Bayesian Statistics, 2, pp.249-270. +- Francois C and Allaire JJ (2018) [*Deep Learning with + R*](https://www.manning.com/books/deep-learning-with-r), Manning +- Guo, C and Berkhahn F (2016) “[Entity Embeddings of Categorical + Variables](https://arxiv.org/abs/1604.06737)” +- Micci-Barreca D (2001) “[A preprocessing scheme for high-cardinality + categorical attributes in classification and prediction + problems](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=A+preprocessing+scheme+for+high-cardinality+categorical+attributes+in+classification+and+prediction+problems&btnG=),” + ACM SIGKDD Explorations Newsletter, 3(1), 27-32. +- Zumel N and Mount J (2017) “[`vtreat`: a `data.frame` Processor for + Predictive Modeling](https://arxiv.org/abs/1611.09477)” +- McInnes L and Healy J (2018) [UMAP: Uniform Manifold Approximation + and Projection for Dimension + Reduction](https://arxiv.org/abs/1802.03426) +- Good, I. J. (1985), “[Weight of evidence: A brief + survey](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=Weight+of+evidence%3A+A+brief+survey&btnG=)”, + Bayesian Statistics, 2, pp.249-270. ## Getting Started @@ -113,18 +115,18 @@ This project is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms. -- For questions and discussions about tidymodels packages, modeling, and - machine learning, please [post on RStudio - Community](https://community.rstudio.com/new-topic?category_id=15&tags=tidymodels,question). +- For questions and discussions about tidymodels packages, modeling, + and machine learning, please [post on RStudio + Community](https://community.rstudio.com/new-topic?category_id=15&tags=tidymodels,question). -- If you think you have encountered a bug, please [submit an - issue](https://github.com/tidymodels/embed/issues). +- If you think you have encountered a bug, please [submit an + issue](https://github.com/tidymodels/embed/issues). -- Either way, learn how to create and share a - [reprex](https://reprex.tidyverse.org/articles/articles/learn-reprex.html) - (a minimal, reproducible example), to clearly communicate about your - code. +- Either way, learn how to create and share a + [reprex](https://reprex.tidyverse.org/articles/articles/learn-reprex.html) + (a minimal, reproducible example), to clearly communicate about your + code. -- Check out further details on [contributing guidelines for tidymodels - packages](https://www.tidymodels.org/contribute/) and [how to get - help](https://www.tidymodels.org/help/). +- Check out further details on [contributing guidelines for tidymodels + packages](https://www.tidymodels.org/contribute/) and [how to get + help](https://www.tidymodels.org/help/).