Skip to content

Commit

Permalink
Merge pull request #110 from UBC-MDS/109-touch-up-readme
Browse files Browse the repository at this point in the history
Touched up read me
  • Loading branch information
atabak-alishiri authored Dec 9, 2023
2 parents 94ce8de + 86f98ec commit f25604a
Showing 1 changed file with 14 additions and 8 deletions.
22 changes: 14 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,19 @@

## 📔 About

Welcome to our project, "English Language Learning Ability Prediction", an innovative venture by a team of dedicated data scientists from the Master of Data Science program at the University of British Columbia. The goal of our project was to forecast an individual's aptitude for learning by predicting their performance in an English speaking test, which is part of a quiz to assess English proficiency. This prediction is based on demographic details and their linguistic background, including factors such as the duration of their stay in English-speaking countries, their primary and native languages, among others. We aimed to develop a regression model utilizing both Ridge and Lasso models. The effectiveness of these models was evaluated using metrics like R-squared, Root Mean Squared Error (RMSE), and Negative Mean Squared Error (NMSE).
We utilized a subset of a dataset originally compiled from 680,333 participants, ranging in age from 7 to 89, for our study. These individuals completed an English grammar quiz and provided demographic information, as well as details about their language backgrounds and the countries they have lived in. This dataset was central to the research paper "A critical period for second language acquisition: Evidence from 2/3 million English speakers" by Joshua K. Hartshorne, Joshua B. Tenenbaum, and Steven Pinker. The study focused on understanding the peak age for grammar-learning ability, observing its maintenance during childhood, and its notable decline in late adolescence. The authors performed various analyses and compared multiple models to identify age-related patterns in language acquisition. The complete dataset is accessible at [https://osf.io/pyb8s/](https://osf.io/pyb8s/). For the initial phase of our project (Milestone 1), we only used 30% of the total dataset. The script for sampling this subset is available in 'src/Sampling from dataset.ipynb'. In future project stages, we plan to consider incorporating the entire dataset. This rich dataset allows for a multi-faceted analysis of language learning patterns.
Welcome to our project, **"English Language Learning Ability Prediction"**, an innovative venture by some students from the Master of Data Science program at the University of British Columbia.

The goal of our project was to forecast an individual's aptitude for learning by **predicting their performance in an English speaking test**, which is part of a quiz to assess English proficiency. This prediction is based on demographic details and their linguistic background, including factors such as the duration of their stay in English-speaking countries, their primary and native languages, among others. We aimed to develop a regression model utilizing both Ridge and Lasso models. The effectiveness of these models was evaluated using metrics like R-squared, Root Mean Squared Error (RMSE), and Negative Mean Squared Error (NMSE).

We utilized a subset of a dataset originally compiled from 680,333 participants, ranging in age from 7 to 89, for our study. These individuals completed an English grammar quiz and provided demographic information, as well as details about their language backgrounds and the countries they have lived in.

The complete dataset is accessible at [https://osf.io/pyb8s/](https://osf.io/pyb8s/). For the purpose of this analysis, we only used 30% of the total dataset. The script for sampling this subset is available in `src/scripts/english_score_get_data.py`. In future project stages, we plan to consider incorporating the entire dataset. This rich dataset allows for a multi-faceted analysis of language learning patterns.

---

## Report

The final report can be found
[here](https://ubc-mds.github.io/522-workflows-group-18/docs/english_language_learning_ability_prediction_analysis.html)
The final report can be found [here](https://ubc-mds.github.io/522-workflows-group-18/docs/english_language_learning_ability_prediction_analysis.html).

---

Expand Down Expand Up @@ -157,10 +161,10 @@ Our analysis, based on a dataset of approximately 200,000 individuals, shows tha

## 👥 Team Members

- Salva
- Atabak
- Nando
- Rachel
- [Atabak Alishiri](https://github.com/atabak-alishiri)
- [Rachel Bouwer](https://github.com/rbouwer)
- [Salva Umar](https://github.com/salva-u)
- [Farrandi Hernando (Nando)](https://github.com/farrandi)

---

Expand All @@ -185,3 +189,5 @@ Special thanks to our course instructors and UBC for providing the resources and
## ✉️ Contact

For queries or collaborations, feel free to contact any of our team members.

You can find our contact details in the [Team Members](#-team-members) section.

0 comments on commit f25604a

Please sign in to comment.