Group name: Group Gamon
To make life easier, run the setup.py
script to download all the
datasets and setup the directories the way our code handles it. The script shold be similar to the environment we used for the homeworks.
- Python version 3.11.5
- Requests, OS, zipfile, shutil libraries are required
- Make sure to run the file in this directory
For the Core Trends datasets, you will need to create an email to download the data. We've created a dummy throwaway email.
- Email: [email protected]
- Password: e28s"Fe-eDu;nS9
For the NSDUH datasets, there is no email required. However make sure to download the 'delimited' files since that's what we used
Dataset 1: Core Trends 2018 -
https://www.pewresearch.org/internet/dataset/jan-3-10-2018-core-trends-survey/
Data included:
-Age demographics (18+)
-Social/Personal demographics
-Educational demographics
-Household demographics
-Political demographics
-Amount/type of internet/technological use
-Opinions on the impact of the internet
-Amount of reading
Overall this dataset shows surface level opinions on who uses the internet and their general opinions on it.
Dataset 2: Core Trends 2019 -
https://www.pewresearch.org/internet/dataset/core-trends-survey/
Same as above
Dataset 3: Core Trends 2021 -
https://www.pewresearch.org/internet/dataset/2021-core-trends-survey/
Same as above
Data included:
Honestly there's too much data to cover but as the name suggests, it includes demographic data on health/drug use for 2022 for US civilian population ages 12+
Dataset 4: NSDUH 2021 -
https://www.datafiles.samhsa.gov/dataset/national-survey-drug-use-and-health-2021-nsduh-2021-ds0001
Same as above
Dataset 5: NSDUH 2019 -
https://www.datafiles.samhsa.gov/dataset/national-survey-drug-use-and-health-2019-nsduh-2019-ds0001
Same as above
Dataset 6: NSDUH 2018 -
https://www.datafiles.samhsa.gov/dataset/national-survey-drug-use-and-health-2018-nsduh-2018-ds0001
Same as above