Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new questions or data set in "intro to data" lab #30

Open
beanumber opened this issue Oct 2, 2015 · 6 comments
Open

new questions or data set in "intro to data" lab #30

beanumber opened this issue Oct 2, 2015 · 6 comments

Comments

@beanumber
Copy link
Contributor

Could we replace the body weight data set from the Intro to Data lab with something else? Or at least change the questions? As it stands, it strikes me as one of those surreptitious things that makes women feel bad about their bodies for no reason.

@mine-cetinkaya-rundel
Copy link
Collaborator

Agreed. Should we use the nycflights13 data? It's a good one for a lab that doesn't involve inference.

@norcalbiostat
Copy link

You'd have to change the Normal distribution lab as well. And I feel the data set is fine, just choose a different outcome variable perhaps.

@mine-cetinkaya-rundel
Copy link
Collaborator

@norcalbiostat we could use different datasets for the two labs though, so I don't think we need to feel limited to variables that are normally distributed for the intro to data lab.

@beanumber
Copy link
Contributor Author

But I think @norcalbiostat 's point is the the body dimensions data in the Normal Distribution lab has the same problem.

@mine-cetinkaya-rundel
Copy link
Collaborator

I should admit I haven't used the normal distribution lab in a while, so I should first correct myself - the two labs don't use the same dataset anyway.

I feel like the issue with the intro to data lab is the wdiff variable, that we then compare between men and women. The normal distribution lab compares heights, briefly, but beyond that doesn't go into comparing peoples' desired weights, so perhaps it's a bit more factual and bit less about body image?

I'm completely on board with changing the dataset for the intro to data lab, as I think that lab can be enhanced to be more about data wrangling skills (in addition to resolving the issue @beanumber raised). And I'm also on board with changing the data in the normal distribution lab because it's not that exciting (likely the reason why I haven't been doing that lab lately...). But if we're prioritizing, it seems like intro to data lab might have a more urgent issue to be addressed.

@andrewpbray
Copy link
Owner

I'm all for refreshing data sets, but the challenge is always finding a replacement that is better. And there's often that unfortunate trade-off between data that clearly illustrate a statistical principle and data that is most interesting (please oh please, let us find a population level data set so we can replace the ames data).

I think a data wrangling lab based on the nycflights13 would be terrific. It has heterogeneous data types and is interesting enough to naturally motivate several different questions and analyses. It also has that nice opportunity to define on-time performance in multiple ways, so it's an improvement on wdiff that way. If this lab were to replace lab 1, it's important that it cover some of the key points of chapter 1. It could also be cool to have it go off on it's own data sciency direction, but then it's probably work best as an extra lab.

If I remember correctly, the main thing in favor of the bdims data set is that it's a collection of continuous variables that exhibit a mix of symmetric and skewed distributions. I think we should keep our eyes out for a more interesting replacement, but I have nothing on hand right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants