Skip to content

Commit

Permalink
[inequality] Update exercise 3 (#498)
Browse files Browse the repository at this point in the history
* [inequality] Update exercise 3

Hi Matt @mmcky ,

I have updated the exercise 3 of the inequality lecture using your code in #410 and add the simulation part below your solution.

What do you think about this version of the solution?

Best,
Longye

* Update inequality.md

Hi Matt,
I have updated the solution and in the main text by adding ` %%time`.

What do you think about this comparison?

* Update inequality.md

add labels to the main text gini coefficient code.

* Update inequality.md

* add data.ipynb and delete to csv

Hi Matt,

I have added the data.ipynb to the folder and I think it contains sufficient code to save the data.

I have also modified the contain to deal with the saving and call issues related to the csv.

What do you think about these changes?

Best,
Longye

* remove skip-execution code as it is not compatible with google collab

* test the problem

This commit is to test whether the problem is due to this code.

* Revert "test the problem"

This reverts commit 395657e.

* test google colab RAM

this commit is to test whether the crash is led by the

* change link to notebook on github

* update_inequality_exercise

Hi Matt,

This commit select 3000 random sample from the original dataset.

Best,
Longye

* update year in the text

update year in the text

---------

Co-authored-by: mmcky <[email protected]>
  • Loading branch information
longye-tian and mmcky authored Jul 5, 2024
1 parent fa99488 commit 2b7dd96
Show file tree
Hide file tree
Showing 3 changed files with 247 additions and 68 deletions.
133 changes: 133 additions & 0 deletions lectures/_static/lecture_specific/inequality/data.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"id": "258b4bc9-2964-470a-8010-05c2162f5e05",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: wbgapi in /Users/longye/anaconda3/lib/python3.10/site-packages (1.0.12)\n",
"Requirement already satisfied: plotly in /Users/longye/anaconda3/lib/python3.10/site-packages (5.22.0)\n",
"Requirement already satisfied: requests in /Users/longye/anaconda3/lib/python3.10/site-packages (from wbgapi) (2.31.0)\n",
"Requirement already satisfied: tabulate in /Users/longye/anaconda3/lib/python3.10/site-packages (from wbgapi) (0.9.0)\n",
"Requirement already satisfied: PyYAML in /Users/longye/anaconda3/lib/python3.10/site-packages (from wbgapi) (6.0)\n",
"Requirement already satisfied: tenacity>=6.2.0 in /Users/longye/anaconda3/lib/python3.10/site-packages (from plotly) (8.4.1)\n",
"Requirement already satisfied: packaging in /Users/longye/anaconda3/lib/python3.10/site-packages (from plotly) (23.1)\n",
"Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/longye/anaconda3/lib/python3.10/site-packages (from requests->wbgapi) (1.26.16)\n",
"Requirement already satisfied: charset-normalizer<4,>=2 in /Users/longye/anaconda3/lib/python3.10/site-packages (from requests->wbgapi) (2.0.4)\n",
"Requirement already satisfied: idna<4,>=2.5 in /Users/longye/anaconda3/lib/python3.10/site-packages (from requests->wbgapi) (3.4)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /Users/longye/anaconda3/lib/python3.10/site-packages (from requests->wbgapi) (2024.6.2)\n"
]
}
],
"source": [
"!pip install wbgapi plotly\n",
"\n",
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import random as rd\n",
"import wbgapi as wb\n",
"import plotly.express as px\n",
"\n",
"url = 'https://media.githubusercontent.com/media/QuantEcon/high_dim_data/main/SCF_plus/SCF_plus_mini.csv'\n",
"df = pd.read_csv(url)\n",
"df_income_wealth = df.dropna()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "9630a07a-fce5-474e-92af-104e67e82be5",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: quantecon in /Users/longye/anaconda3/lib/python3.10/site-packages (0.7.1)\n",
"Requirement already satisfied: requests in /Users/longye/anaconda3/lib/python3.10/site-packages (from quantecon) (2.31.0)\n",
"Requirement already satisfied: numpy>=1.17.0 in /Users/longye/anaconda3/lib/python3.10/site-packages (from quantecon) (1.26.3)\n",
"Requirement already satisfied: numba>=0.49.0 in /Users/longye/anaconda3/lib/python3.10/site-packages (from quantecon) (0.59.1)\n",
"Requirement already satisfied: sympy in /Users/longye/anaconda3/lib/python3.10/site-packages (from quantecon) (1.12)\n",
"Requirement already satisfied: scipy>=1.5.0 in /Users/longye/anaconda3/lib/python3.10/site-packages (from quantecon) (1.12.0)\n",
"Requirement already satisfied: llvmlite<0.43,>=0.42.0dev0 in /Users/longye/anaconda3/lib/python3.10/site-packages (from numba>=0.49.0->quantecon) (0.42.0)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /Users/longye/anaconda3/lib/python3.10/site-packages (from requests->quantecon) (2024.6.2)\n",
"Requirement already satisfied: idna<4,>=2.5 in /Users/longye/anaconda3/lib/python3.10/site-packages (from requests->quantecon) (3.4)\n",
"Requirement already satisfied: charset-normalizer<4,>=2 in /Users/longye/anaconda3/lib/python3.10/site-packages (from requests->quantecon) (2.0.4)\n",
"Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/longye/anaconda3/lib/python3.10/site-packages (from requests->quantecon) (1.26.16)\n",
"Requirement already satisfied: mpmath>=0.19 in /Users/longye/anaconda3/lib/python3.10/site-packages (from sympy->quantecon) (1.3.0)\n"
]
}
],
"source": [
"!pip install quantecon\n",
"import quantecon as qe\n",
"\n",
"varlist = ['n_wealth', # net wealth \n",
" 't_income', # total income\n",
" 'l_income'] # labor income\n",
"\n",
"df = df_income_wealth\n",
"years = df.year.unique()\n",
"\n",
"# create lists to store Gini for each inequality measure\n",
"results = {}\n",
"\n",
"for var in varlist:\n",
" # create lists to store Gini\n",
" gini_yr = []\n",
" for year in years:\n",
" # repeat the observations according to their weights\n",
" counts = list(round(df[df['year'] == year]['weights'] ))\n",
" y = df[df['year'] == year][var].repeat(counts)\n",
" y = np.asarray(y)\n",
" \n",
" rd.shuffle(y) # shuffle the sequence\n",
" \n",
" # calculate and store Gini\n",
" gini = qe.gini_coefficient(y)\n",
" gini_yr.append(gini)\n",
" \n",
" results[var] = gini_yr\n",
"\n",
"# Convert to DataFrame\n",
"results = pd.DataFrame(results, index=years)\n",
"results.to_csv(\"usa-gini-nwealth-tincome-lincome.csv\", index_label='year')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d59e876b-2f77-4fa7-b79a-8e455ad82d43",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Original file line number Diff line number Diff line change
@@ -1,21 +1,21 @@
year,n_wealth,t_income,l_income
1950,0.8257332034366338,0.44248654139458626,0.5342948198773412
1953,0.8059487586599329,0.4264544060935945,0.5158978980963702
1956,0.8121790488050616,0.44426942873399283,0.5349293526208142
1959,0.795206874163792,0.43749348077061573,0.5213985948309416
1962,0.8086945076579359,0.4435843103853645,0.5345127915054341
1965,0.7904149225687935,0.43763715466663444,0.7487860020887753
1968,0.7982885066993497,0.4208620794438902,0.5242396427381545
1971,0.7911574835420259,0.4233344246090255,0.5576454812313466
1977,0.7571418922185215,0.46187678800902543,0.5704448110072049
1983,0.7494335400643013,0.439345618464469,0.5662220844385915
1989,0.7715705301674302,0.5115249581654197,0.601399568747142
1992,0.7508126614055308,0.4740650672076798,0.5983592657979563
1995,0.7569492388110265,0.48965523558400603,0.5969779516716903
1998,0.7603291991801185,0.49117441585168614,0.5774462841723305
2001,0.7816118750507056,0.5239092994681135,0.6042739644967272
2004,0.7700355469522361,0.4884350383903255,0.5981432201792727
2007,0.7821413776486978,0.5197156312086187,0.626345219575322
2010,0.8250825295193438,0.5195972120145615,0.6453653328291903
2013,0.8227698931835303,0.531400174984336,0.6498682917772644
2016,0.8342975903562234,0.5541400068900825,0.6706846793375284
1950,0.8257332034366366,0.44248654139458743,0.534294819877344
1953,0.805948758659935,0.4264544060935942,0.5158978980963682
1956,0.8121790488050612,0.44426942873399367,0.5349293526208106
1959,0.7952068741637912,0.43749348077061534,0.5213985948309414
1962,0.8086945076579386,0.44358431038536356,0.5345127915054446
1965,0.7904149225687949,0.4376371546666344,0.7487860020887701
1968,0.7982885066993503,0.4208620794438885,0.5242396427381534
1971,0.7911574835420282,0.4233344246090255,0.5576454812313462
1977,0.7571418922185215,0.46187678800902554,0.57044481100722
1983,0.749433540064301,0.4393456184644682,0.5662220844385925
1989,0.7715705301674285,0.5115249581654115,0.6013995687471289
1992,0.7508126614055305,0.4740650672076754,0.5983592657979544
1995,0.7569492388110274,0.4896552355840001,0.5969779516717039
1998,0.7603291991801172,0.49117441585168525,0.5774462841723346
2001,0.781611875050703,0.523909299468113,0.6042739644967232
2004,0.7700355469522372,0.48843503839032354,0.5981432201792916
2007,0.782141377648698,0.5197156312086207,0.6263452195753227
2010,0.825082529519342,0.5195972120145641,0.6453653328291843
2013,0.8227698931835299,0.5314001749843426,0.6498682917772886
2016,0.8342975903562537,0.55414000689009,0.6706846793375292
142 changes: 94 additions & 48 deletions lectures/inequality.md
Original file line number Diff line number Diff line change
Expand Up @@ -247,7 +247,7 @@ The following code block imports a subset of the dataset `SCF_plus` for 2016,
which is derived from the [Survey of Consumer Finances](https://en.wikipedia.org/wiki/Survey_of_Consumer_Finances) (SCF).

```{code-cell} ipython3
url = 'https://media.githubusercontent.com/media/QuantEcon/high_dim_data/main/SCF_plus/SCF_plus_mini.csv'
url = 'https://github.com/QuantEcon/high_dim_data/raw/main/SCF_plus/SCF_plus_mini.csv'
df = pd.read_csv(url)
df_income_wealth = df.dropna()
```
Expand Down Expand Up @@ -435,6 +435,8 @@ Let's examine the Gini coefficient in some simulations.

The code below computes the Gini coefficient from a sample.

(code:gini-coefficient)=

```{code-cell} ipython3
def gini_coefficient(y):
Expand Down Expand Up @@ -481,6 +483,7 @@ You can check this by looking up the expression for the mean of a lognormal
distribution.

```{code-cell} ipython3
%%time
k = 5
σ_vals = np.linspace(0.2, 4, k)
n = 2_000
Expand Down Expand Up @@ -616,51 +619,11 @@ We will use US data from the {ref}`Survey of Consumer Finances<data:survey-consu
df_income_wealth.year.describe()
```

This code can be used to compute this information over the full dataset.
[This notebook](https://github.com/QuantEcon/lecture-python-intro/tree/main/lectures/_static/lecture_specific/inequality/data.ipynb) can be used to compute this information over the full dataset.

```{code-cell} ipython3
:tags: [skip-execution, hide-input, hide-output]
!pip install quantecon
import quantecon as qe
varlist = ['n_wealth', # net wealth
't_income', # total income
'l_income'] # labor income
df = df_income_wealth
# create lists to store Gini for each inequality measure
results = {}
for var in varlist:
# create lists to store Gini
gini_yr = []
for year in years:
# repeat the observations according to their weights
counts = list(round(df[df['year'] == year]['weights'] ))
y = df[df['year'] == year][var].repeat(counts)
y = np.asarray(y)
rd.shuffle(y) # shuffle the sequence
# calculate and store Gini
gini = qe.gini_coefficient(y)
gini_yr.append(gini)
results[var] = gini_yr
# Convert to DataFrame
results = pd.DataFrame(results, index=years)
results.to_csv("_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv", index_label='year')
```

However, to speed up execution we will import a pre-computed dataset from the lecture repository.

<!-- TODO: update from csv to github location -->

```{code-cell} ipython3
ginis = pd.read_csv("_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv", index_col='year')
data_url = 'https://github.com/QuantEcon/lecture-python-intro/raw/main/lectures/_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv'
ginis = pd.read_csv(data_url, index_col='year')
ginis.head(n=5)
```

Expand All @@ -687,10 +650,6 @@ One possibility is that this change is mainly driven by technology.

However, we will see below that not all advanced economies experienced similar growth of inequality.





### Cross-country comparisons of income inequality

Earlier in this lecture we used `wbgapi` to get Gini data across many countries
Expand Down Expand Up @@ -1093,3 +1052,90 @@ plt.show()

```{solution-end}
```

```{exercise}
:label: inequality_ex3
The {ref}`code to compute the Gini coefficient is listed in the lecture above <code:gini-coefficient>`.
This code uses loops to calculate the coefficient based on income or wealth data.
This function can be re-written using vectorization which will greatly improve the computational efficiency when using `python`.
Re-write the function `gini_coefficient` using `numpy` and vectorized code.
You can compare the output of this new function with the one above, and note the speed differences.
```

```{solution-start} inequality_ex3
:class: dropdown
```

Let's take a look at some raw data for the US that is stored in `df_income_wealth`

```{code-cell} ipython3
df_income_wealth.describe()
```

```{code-cell} ipython3
df_income_wealth.head(n=4)
```

We will focus on wealth variable `n_wealth` to compute a Gini coefficient for the year 2016.

```{code-cell} ipython3
data = df_income_wealth[df_income_wealth.year == 2016].sample(3000, random_state=1)
```

```{code-cell} ipython3
data.head(n=2)
```

We can first compute the Gini coefficient using the function defined in the lecture above.

```{code-cell} ipython3
gini_coefficient(data.n_wealth.values)
```

Now we can write a vectorized version using `numpy`

```{code-cell} ipython3
def gini(y):
n = len(y)
y_1 = np.reshape(y, (n, 1))
y_2 = np.reshape(y, (1, n))
g_sum = np.sum(np.abs(y_1 - y_2))
return g_sum / (2 * n * np.sum(y))
```
```{code-cell} ipython3
gini(data.n_wealth.values)
```
Let's simulate five populations by drawing from a lognormal distribution as before

```{code-cell} ipython3
k = 5
σ_vals = np.linspace(0.2, 4, k)
n = 2_000
σ_vals = σ_vals.reshape((k,1))
μ_vals = -σ_vals**2/2
y_vals = np.exp(μ_vals + σ_vals*np.random.randn(n))
```
We can compute the Gini coefficient for these five populations using the vectorized function, the computation time is shown below:

```{code-cell} ipython3
%%time
gini_coefficients =[]
for i in range(k):
gini_coefficients.append(gini(y_vals[i]))
```
This shows the vectorized function is much faster.
This gives us the Gini coefficients for these five households.

```{code-cell} ipython3
gini_coefficients
```
```{solution-end}
```



0 comments on commit 2b7dd96

Please sign in to comment.