Skip to content

Commit de9b5d2

Browse files
committed
@mmcky final review
1 parent 25414fc commit de9b5d2

File tree

2 files changed

+65
-93
lines changed

2 files changed

+65
-93
lines changed
Lines changed: 20 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,21 @@
11
year,n_wealth,t_income,l_income
2-
1950,0.8257332034366353,0.4424865413945867,0.5342948198773424
3-
1953,0.8059487586599343,0.42645440609359475,0.5158978980963699
4-
1956,0.8121790488050622,0.4442694287339929,0.5349293526208142
5-
1959,0.7952068741637924,0.43749348077061573,0.5213985948309421
6-
1962,0.8086945076579368,0.4435843103853639,0.5345127915054342
7-
1965,0.790414922568795,0.43763715466663367,0.7487860020887751
8-
1968,0.7982885066993514,0.42086207944388976,0.5242396427381543
9-
1971,0.7911574835420238,0.4233344246090258,0.5576454812313485
10-
1977,0.7571418922185226,0.461876788009026,0.5704448110072052
11-
1983,0.7494335400643025,0.4393456184644705,0.5662220844385915
12-
1989,0.7715705301674318,0.511524958165423,0.6013995687471408
13-
1992,0.7508126614055309,0.4740650672076755,0.5983592657979545
14-
1995,0.7569492388110265,0.4896552355840044,0.5969779516716882
15-
1998,0.760329199180118,0.4911744158516898,0.5774462841723345
16-
2001,0.7816118750507034,0.5239092994681134,0.6042739644967283
17-
2004,0.7700355469522369,0.4884350383903255,0.5981432201792665
18-
2007,0.7821413776486987,0.5197156312086179,0.6263452195753251
19-
2010,0.825082529519343,0.5195972120145644,0.6453653328291921
20-
2013,0.8227698931835268,0.5314001749843339,0.6498682917772639
21-
2016,0.8342975903562239,0.5541400068900838,0.6706846793375301
2+
1950,0.8257332034366338,0.44248654139458626,0.5342948198773412
3+
1953,0.8059487586599329,0.4264544060935945,0.5158978980963702
4+
1956,0.8121790488050616,0.44426942873399283,0.5349293526208142
5+
1959,0.795206874163792,0.43749348077061573,0.5213985948309416
6+
1962,0.8086945076579359,0.4435843103853645,0.5345127915054341
7+
1965,0.7904149225687935,0.43763715466663444,0.7487860020887753
8+
1968,0.7982885066993497,0.4208620794438902,0.5242396427381545
9+
1971,0.7911574835420259,0.4233344246090255,0.5576454812313466
10+
1977,0.7571418922185215,0.46187678800902543,0.5704448110072049
11+
1983,0.7494335400643013,0.439345618464469,0.5662220844385915
12+
1989,0.7715705301674302,0.5115249581654197,0.601399568747142
13+
1992,0.7508126614055308,0.4740650672076798,0.5983592657979563
14+
1995,0.7569492388110265,0.48965523558400603,0.5969779516716903
15+
1998,0.7603291991801185,0.49117441585168614,0.5774462841723305
16+
2001,0.7816118750507056,0.5239092994681135,0.6042739644967272
17+
2004,0.7700355469522361,0.4884350383903255,0.5981432201792727
18+
2007,0.7821413776486978,0.5197156312086187,0.626345219575322
19+
2010,0.8250825295193438,0.5195972120145615,0.6453653328291903
20+
2013,0.8227698931835303,0.531400174984336,0.6498682917772644
21+
2016,0.8342975903562234,0.5541400068900825,0.6706846793375284

lectures/inequality.md

Lines changed: 45 additions & 73 deletions
Original file line numberDiff line numberDiff line change
@@ -139,7 +139,6 @@ income or wealth data into the cumulative share
139139
of individuals (or households) and the cumulative share of income (or wealth).
140140

141141
```{code-cell} ipython3
142-
143142
def lorenz_curve(y):
144143
"""
145144
Calculates the Lorenz Curve, a graphical representation of
@@ -216,9 +215,9 @@ ax.plot(f_vals, f_vals, label='equality', lw=2)
216215
ax.vlines([0.8], [0.0], [0.43], alpha=0.5, colors='k', ls='--')
217216
ax.hlines([0.43], [0], [0.8], alpha=0.5, colors='k', ls='--')
218217
ax.set_xlim((0, 1))
219-
ax.set_xlabel("Cumulative share of households (%)")
218+
ax.set_xlabel("share of households (%)")
220219
ax.set_ylim((0, 1))
221-
ax.set_ylabel("Cumulative share of income (%)")
220+
ax.set_ylabel("share of income (%)")
222221
ax.legend()
223222
plt.show()
224223
```
@@ -304,8 +303,8 @@ ax.plot(f_vals_nw[-1], l_vals_nw[-1], label=f'net wealth')
304303
ax.plot(f_vals_ti[-1], l_vals_ti[-1], label=f'total income')
305304
ax.plot(f_vals_li[-1], l_vals_li[-1], label=f'labor income')
306305
ax.plot(f_vals_nw[-1], f_vals_nw[-1], label=f'equality')
307-
ax.set_xlabel("household percentile")
308-
ax.set_ylabel("income/wealth percentile")
306+
ax.set_xlabel("share of households (%)")
307+
ax.set_ylabel("share of income/wealth (%)")
309308
ax.legend()
310309
plt.show()
311310
```
@@ -372,18 +371,14 @@ ax.fill_between(f_vals, l_vals, f_vals, alpha=0.06)
372371
ax.set_ylim((0, 1))
373372
ax.set_xlim((0, 1))
374373
ax.text(0.04, 0.5, r'$G = 2 \times$ shaded area')
375-
ax.set_xlabel("household percentile")
376-
ax.set_ylabel("income/wealth percentile")
374+
ax.set_xlabel("share of households (%)")
375+
ax.set_ylabel("share of income/wealth (%)")
377376
ax.legend()
378377
plt.show()
379378
```
380379

381380
Another way to think of the Gini coefficient is as a ratio of the area between the 45-degree line of
382-
perfect equality and the Lorenz curve (A) divided by the total area below the 45-degree line (A+B).
383-
384-
```{seealso}
385-
The World in Data project has a [nice graphical exploration of the Lorenz curve and the Gini coefficient](https://ourworldindata.org/what-is-the-gini-coefficient])
386-
```
381+
perfect equality and the Lorenz curve (A) divided by the total area below the 45-degree line (A+B) as shown in {numref}`lorenz_gini2`.
387382

388383
```{code-cell} ipython3
389384
---
@@ -402,8 +397,8 @@ ax.set_ylim((0, 1))
402397
ax.set_xlim((0, 1))
403398
ax.text(0.55, 0.4, 'A')
404399
ax.text(0.75, 0.15, 'B')
405-
ax.set_xlabel("household percentile")
406-
ax.set_ylabel("income/wealth percentile")
400+
ax.set_xlabel("share of households (%)")
401+
ax.set_ylabel("share of income/wealth (%)")
407402
ax.legend()
408403
plt.show()
409404
```
@@ -414,6 +409,10 @@ $$
414409

415410
It is an average measure of deviation from the line of equality.
416411

412+
```{seealso}
413+
The World in Data project has a [nice graphical exploration of the Lorenz curve and the Gini coefficient](https://ourworldindata.org/what-is-the-gini-coefficient])
414+
```
415+
417416
### Gini coefficient of simulated data
418417

419418
Let's examine the Gini coefficient in some simulations.
@@ -463,10 +462,8 @@ In each case we set $\mu = - \sigma^2 / 2$.
463462

464463
This implies that the mean of the distribution does not change with $\sigma$.
465464

466-
```{note}
467465
You can check this by looking up the expression for the mean of a lognormal
468466
distribution.
469-
```
470467

471468
```{code-cell} ipython3
472469
k = 5
@@ -504,18 +501,18 @@ fix, ax = plot_inequality_measures(σ_vals,
504501
ginis,
505502
'simulated',
506503
'$\sigma$',
507-
'gini coefficients')
504+
'Gini coefficients')
508505
plt.show()
509506
```
510507

511508
The plots show that inequality rises with $\sigma$, according to the Gini
512509
coefficient.
513510

514-
### Gini coefficient dynamics for US data (income)
511+
### Gini coefficient for US data (income)
515512

516513
Now let's look at the Gini coefficient using US data.
517514

518-
We will get pre-computed Gini coefficients from the World Bank using the [wbgapi](https://blogs.worldbank.org/opendata/introducing-wbgapi-new-python-package-accessing-world-bank-data).
515+
We will get pre-computed Gini coefficients (based on income) from the World Bank using the [wbgapi](https://blogs.worldbank.org/opendata/introducing-wbgapi-new-python-package-accessing-world-bank-data).
519516

520517
Let's use the `wbgapi` package we imported earlier to search the world bank data for Gini to find the Series ID.
521518

@@ -578,63 +575,28 @@ gini_all.columns = gini_all.columns.map(lambda x: int(x.replace('YR',''))) # rem
578575
gini_all = gini_all.unstack(level='economy').dropna()
579576
580577
# Build a histogram
581-
gini_all.plot(kind="hist",
582-
bins=20,
583-
title="Gini coefficient"
584-
)
578+
ax = gini_all.plot(kind="hist", bins=20)
579+
ax.set_xlabel("Gini coefficient")
580+
ax.set_ylabel("frequency")
585581
plt.show()
586582
```
587583

588-
We can see that across 50 years of data and all countries (including low and high income countries) the measure varies between 20 and 65.
589-
590-
Let us zoom in a little on the US data and add some trendlines.
591-
592-
{numref}`gini_usa1` suggests there is a change in trend around the year 1981
584+
We can see that across 50 years of data and all countries (including low and high income countries) the measure only varies between 20 and 65.
593585

594-
```{code-cell} ipython3
595-
# Use pandas filters to find data before 1981
596-
pre_1981 = data_usa[data_usa.index <= 1981]
597-
# Use pandas filters to find data after 1981
598-
post_1981 = data_usa[data_usa.index > 1981]
599-
```
600-
601-
We can use `numpy` to compute a linear line of best fit.
602-
603-
```{code-cell} ipython3
604-
# Pre 1981 Data Trend
605-
x1 = pre_1981.dropna().index.values
606-
y1 = pre_1981.dropna().values
607-
a1, b1 = np.polyfit(x1, y1, 1)
608-
609-
# Post 1981 Data Trend
610-
x2 = post_1981.dropna().index.values
611-
y2 = post_1981.dropna().values
612-
a2, b2 = np.polyfit(x2, y2, 1)
613-
```
586+
{numref}`gini_usa1` suggests there is a change in trend around the year 1980.
614587

615-
We can now built a plot that includes trend and a range that offers a closer
616-
look at the dynamics over time in the Gini coefficient for the USA.
588+
Let us zoom on the US data so we can more clearly observe trends.
617589

618590
```{code-cell} ipython3
619-
---
620-
mystnb:
621-
figure:
622-
caption: Gini coefficients (USA) with trend
623-
name: gini_usa_trend
624-
---
625-
x = data_usa.dropna().index.values
626-
y = data_usa.dropna().values
627-
plt.scatter(x,y)
628-
plt.plot(x1, a1*x1+b1)
629-
plt.plot(x2, a2*x2+b2)
630-
plt.title("US Gini coefficient dynamics")
631-
plt.legend(['Gini coefficient', 'trend (before 1981)', 'trend (after 1981)'])
632-
plt.ylabel("Gini coefficient")
633-
plt.xlabel("year")
591+
fig, ax = plt.subplots()
592+
ax = data_usa.plot(ax=ax)
593+
ax.set_ylim(data_usa.min()-1, data_usa.max()+1)
594+
ax.set_ylabel("Gini coefficient")
595+
ax.set_xlabel("year")
634596
plt.show()
635597
```
636598

637-
{numref}`gini_usa_trend` shows inequality was falling in the USA until 1981 when it appears to have started to change course and steadily rise over time.
599+
{numref}`gini_usa_trend` shows inequality was falling in the USA until 1980 when it appears to have started to change course and steadily rise over time.
638600

639601
(compare-income-wealth-usa-over-time)=
640602
### Comparing income and wealth inequality (the US case)
@@ -766,7 +728,7 @@ The wealth time series exhibits a strong U-shape.
766728

767729
As we saw earlier in this lecture we used `wbgapi` to get Gini data across many countries and saved it in a variable called `gini_all`
768730

769-
In this section we will compare a few countries and the evolution in their respective Gini coefficients
731+
In this section we will compare a few western economies and look at the evolution in their respective Gini coefficients
770732

771733
```{code-cell} ipython3
772734
data = gini_all.unstack() # Obtain data for all countries as a table
@@ -778,7 +740,11 @@ There are 167 countries represented in this dataset.
778740
Let us compare three western economies: USA, United Kingdom, and Norway
779741

780742
```{code-cell} ipython3
781-
data[['USA','GBR', 'NOR']].plot(ylabel='Gini coefficient')
743+
ax = data[['USA','GBR', 'NOR']].plot()
744+
ax.set_xlabel('year')
745+
ax.set_ylabel('Gini coefficient')
746+
ax.legend(title="")
747+
plt.show()
782748
```
783749

784750
We see that Norway has a shorter time series so let us take a closer look at the underlying data
@@ -796,12 +762,13 @@ data['NOR'] = data['NOR'].ffill()
796762
ax = data[['USA','GBR', 'NOR']].plot()
797763
ax.set_xlabel('year')
798764
ax.set_ylabel('Gini coefficient')
765+
ax.legend(title="")
799766
plt.show()
800767
```
801768

802769
From this plot we can observe that the USA has a higher Gini coefficient (i.e. higher income inequality) when compared to the UK and Norway.
803770

804-
Norway has the lowest Gini coefficient over the three economies it is substantially lower than the US.
771+
Norway has the lowest Gini coefficient over the three economies and is substantially lower than the US.
805772

806773
### Gini Coefficient and GDP per capita (over time)
807774

@@ -841,10 +808,9 @@ min_year = plot_data.year.min()
841808
max_year = plot_data.year.max()
842809
```
843810

844-
```{note}
845-
The time series for all three countries start and stop in different years. We will add a year mask to the data to
811+
812+
**Note:** The time series for all three countries start and stop in different years. We will add a year mask to the data to
846813
improve clarity in the chart including the different end years associated with each countries time series.
847-
```
848814

849815
```{code-cell} ipython3
850816
labels = [1979, 1986, 1991, 1995, 2000, 2020, 2021, 2022] + list(range(min_year,max_year,5))
@@ -871,7 +837,9 @@ This figure is built using `plotly` and is {ref}` available on the website <fig:
871837
```
872838

873839
This plot shows that all three western economies GDP per capita has grown over time with some fluctuations
874-
in the Gini coefficient. From the early 80's the United Kingdom and the US economies both saw increases in income
840+
in the Gini coefficient.
841+
842+
From the early 80's the United Kingdom and the US economies both saw increases in income
875843
inequality.
876844

877845
Interestingly, since the year 2000, the United Kingdom saw a decline in income inequality while
@@ -893,10 +861,14 @@ As before, suppose that the sample $w_1, \ldots, w_n$ has been sorted from small
893861
Given the Lorenz curve $y = L(x)$ defined above, the top $100 \times p \%$
894862
share is defined as
895863

864+
```{prf:definition}
865+
:label: top-shares
866+
896867
$$
897868
T(p) = 1 - L (1-p)
898869
\approx \frac{\sum_{j\geq i} w_j}{ \sum_{j \leq n} w_j}, \quad i = \lfloor n (1-p)\rfloor
899870
$$ (topshares)
871+
```
900872

901873
Here $\lfloor \cdot \rfloor$ is the floor function, which rounds any
902874
number down to the integer less than or equal to that number.

0 commit comments

Comments
 (0)