Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some questions/comments on Categorical Predictors #39

Open
bmreiniger opened this issue Apr 30, 2024 · 0 comments
Open

Some questions/comments on Categorical Predictors #39

bmreiniger opened this issue Apr 30, 2024 · 0 comments

Comments

@bmreiniger
Copy link

  1. The example with each agent working with a single customer type introduced in 5.2:
    1. I think the row-wise sum comment could use some clarification; it's the sum among agents with a given customer type, and the single customer type column?
    2. Later, in 5.4.3, the example is reused, but I think the language is stronger: "agent was aliased with the customer type" to me means there's a one-to-one correspondence rather than the many-to-one relationship I think the original insinuated. And in a one-to-one relationship, the effect encodings will end up being identical, so the argument fails. Separately: can we add a ref-link?
  2. Figure 5.1 typo "distirbution"
  3. In 5.4, I would expect to see some mention of coarsening the categories according to domain knowledge (e.g. states into regions). Maybe also model-based coarsening that uses other predictors?
  4. The Cerda & Varoquaux citation seems to deal more with encodings that take the string nature of the predictor into account, with a hint of natural language processing to it.
  5. In 5.4.2, I'm not sure whether adding a -1 to the hashing values leads to "fewer collisions"; it depends on what exactly you mean by a collision, and I'm not familiar with the cryptography literature to say. But in a parametric model, it's still enforcing some arbitrary constraint.
  6. The intro to 5.3.2 says "different" supervised tool, but it's the only supervised tool in the chapter.
  7. In 5.5, I'd like a small note about integer-encoding the values being reasonable for certain models. (Again, "will be discussed more later", but a preview would be nice.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant