Skip to content

Commit

Permalink
devtools::build_readme()
Browse files Browse the repository at this point in the history
  • Loading branch information
EmilHvitfeldt committed Mar 20, 2024
1 parent 8196995 commit 0e620a7
Showing 1 changed file with 55 additions and 53 deletions.
108 changes: 55 additions & 53 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,55 +25,57 @@ dependencies, [`rstanarm`](https://CRAN.r-project.org/package=rstanarm),

Some steps handle categorical predictors:

- `step_lencode_glm()`, `step_lencode_bayes()`, and
`step_lencode_mixed()` estimate the effect of each of the factor
levels on the outcome and these estimates are used as the new
encoding. The estimates are estimated by a generalized linear model.
This step can be executed without pooling (via `glm`) or with partial
pooling (`stan_glm` or `lmer`). Currently implemented for numeric and
two-class outcomes.

- `step_embed()` uses `keras::layer_embedding` to translate the original
*C* factor levels into a set of *D* new variables (\< *C*). The model
fitting routine optimizes which factor levels are mapped to each of
the new variables as well as the corresponding regression coefficients
(i.e., neural network weights) that will be used as the new encodings.

- `step_woe()` creates new variables based on weight of evidence
encodings.

- `step_feature_hash()` can create indicator variables using feature
hashing.
- `step_lencode_glm()`, `step_lencode_bayes()`, and
`step_lencode_mixed()` estimate the effect of each of the factor
levels on the outcome and these estimates are used as the new
encoding. The estimates are estimated by a generalized linear model.
This step can be executed without pooling (via `glm`) or with
partial pooling (`stan_glm` or `lmer`). Currently implemented for
numeric and two-class outcomes.

- `step_embed()` uses `keras::layer_embedding` to translate the
original *C* factor levels into a set of *D* new variables (\< *C*).
The model fitting routine optimizes which factor levels are mapped
to each of the new variables as well as the corresponding regression
coefficients (i.e., neural network weights) that will be used as the
new encodings.

- `step_woe()` creates new variables based on weight of evidence
encodings.

- `step_feature_hash()` can create indicator variables using feature
hashing.

For numeric predictors:

- `step_umap()` uses a nonlinear transformation similar to t-SNE but can
be used to project the transformation on new data. Both supervised and
unsupervised methods can be used.
- `step_umap()` uses a nonlinear transformation similar to t-SNE but
can be used to project the transformation on new data. Both
supervised and unsupervised methods can be used.

- `step_discretize_xgb()` and `step_discretize_cart()` can make binned
versions of numeric predictors using supervised tree-based models.
- `step_discretize_xgb()` and `step_discretize_cart()` can make binned
versions of numeric predictors using supervised tree-based models.

- `step_pca_sparse()` and `step_pca_sparse_bayes()` conduct feature
extraction with sparsity of the component loadings.
- `step_pca_sparse()` and `step_pca_sparse_bayes()` conduct feature
extraction with sparsity of the component loadings.

Some references for these methods are:

- Francois C and Allaire JJ (2018) [*Deep Learning with
R*](https://www.manning.com/books/deep-learning-with-r), Manning
- Guo, C and Berkhahn F (2016) “[Entity Embeddings of Categorical
Variables](https://arxiv.org/abs/1604.06737)
- Micci-Barreca D (2001) “[A preprocessing scheme for high-cardinality
categorical attributes in classification and prediction
problems](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=A+preprocessing+scheme+for+high-cardinality+categorical+attributes+in+classification+and+prediction+problems&btnG=),”
ACM SIGKDD Explorations Newsletter, 3(1), 27-32.
- Zumel N and Mount J (2017) “[`vtreat`: a `data.frame` Processor for
Predictive Modeling](https://arxiv.org/abs/1611.09477)
- McInnes L and Healy J (2018) [UMAP: Uniform Manifold Approximation and
Projection for Dimension Reduction](https://arxiv.org/abs/1802.03426)
- Good, I. J. (1985), “[Weight of evidence: A brief
survey](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=Weight+of+evidence%3A+A+brief+survey&btnG=)”,
Bayesian Statistics, 2, pp.249-270.
- Francois C and Allaire JJ (2018) [*Deep Learning with
R*](https://www.manning.com/books/deep-learning-with-r), Manning
- Guo, C and Berkhahn F (2016) “[Entity Embeddings of Categorical
Variables](https://arxiv.org/abs/1604.06737)
- Micci-Barreca D (2001) “[A preprocessing scheme for high-cardinality
categorical attributes in classification and prediction
problems](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=A+preprocessing+scheme+for+high-cardinality+categorical+attributes+in+classification+and+prediction+problems&btnG=),”
ACM SIGKDD Explorations Newsletter, 3(1), 27-32.
- Zumel N and Mount J (2017) “[`vtreat`: a `data.frame` Processor for
Predictive Modeling](https://arxiv.org/abs/1611.09477)
- McInnes L and Healy J (2018) [UMAP: Uniform Manifold Approximation
and Projection for Dimension
Reduction](https://arxiv.org/abs/1802.03426)
- Good, I. J. (1985), “[Weight of evidence: A brief
survey](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=Weight+of+evidence%3A+A+brief+survey&btnG=)”,
Bayesian Statistics, 2, pp.249-270.

## Getting Started

Expand Down Expand Up @@ -113,18 +115,18 @@ This project is released with a [Contributor Code of
Conduct](https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html).
By contributing to this project, you agree to abide by its terms.

- For questions and discussions about tidymodels packages, modeling, and
machine learning, please [post on RStudio
Community](https://community.rstudio.com/new-topic?category_id=15&tags=tidymodels,question).
- For questions and discussions about tidymodels packages, modeling,
and machine learning, please [post on RStudio
Community](https://community.rstudio.com/new-topic?category_id=15&tags=tidymodels,question).

- If you think you have encountered a bug, please [submit an
issue](https://github.com/tidymodels/embed/issues).
- If you think you have encountered a bug, please [submit an
issue](https://github.com/tidymodels/embed/issues).

- Either way, learn how to create and share a
[reprex](https://reprex.tidyverse.org/articles/articles/learn-reprex.html)
(a minimal, reproducible example), to clearly communicate about your
code.
- Either way, learn how to create and share a
[reprex](https://reprex.tidyverse.org/articles/articles/learn-reprex.html)
(a minimal, reproducible example), to clearly communicate about your
code.

- Check out further details on [contributing guidelines for tidymodels
packages](https://www.tidymodels.org/contribute/) and [how to get
help](https://www.tidymodels.org/help/).
- Check out further details on [contributing guidelines for tidymodels
packages](https://www.tidymodels.org/contribute/) and [how to get
help](https://www.tidymodels.org/help/).

0 comments on commit 0e620a7

Please sign in to comment.