@mmcky final review

mmcky · mmcky · commit de9b5d22fac4 · 2024-03-21T12:36:22.000+11:00
diff --git a/lectures/_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv b/lectures/_static/lecture_specific/inequality/usa-gini-nwealth-tincome-lincome.csv
@@ -1,21 +1,21 @@
 year,n_wealth,t_income,l_income
-1950,0.8257332034366353,0.4424865413945867,0.5342948198773424
-1953,0.8059487586599343,0.42645440609359475,0.5158978980963699
-1956,0.8121790488050622,0.4442694287339929,0.5349293526208142
-1959,0.7952068741637924,0.43749348077061573,0.5213985948309421
-1962,0.8086945076579368,0.4435843103853639,0.5345127915054342
-1965,0.790414922568795,0.43763715466663367,0.7487860020887751
-1968,0.7982885066993514,0.42086207944388976,0.5242396427381543
-1971,0.7911574835420238,0.4233344246090258,0.5576454812313485
-1977,0.7571418922185226,0.461876788009026,0.5704448110072052
-1983,0.7494335400643025,0.4393456184644705,0.5662220844385915
-1989,0.7715705301674318,0.511524958165423,0.6013995687471408
-1992,0.7508126614055309,0.4740650672076755,0.5983592657979545
-1995,0.7569492388110265,0.4896552355840044,0.5969779516716882
-1998,0.760329199180118,0.4911744158516898,0.5774462841723345
-2001,0.7816118750507034,0.5239092994681134,0.6042739644967283
-2004,0.7700355469522369,0.4884350383903255,0.5981432201792665
-2007,0.7821413776486987,0.5197156312086179,0.6263452195753251
-2010,0.825082529519343,0.5195972120145644,0.6453653328291921
-2013,0.8227698931835268,0.5314001749843339,0.6498682917772639
-2016,0.8342975903562239,0.5541400068900838,0.6706846793375301
+1950,0.8257332034366338,0.44248654139458626,0.5342948198773412
+1953,0.8059487586599329,0.4264544060935945,0.5158978980963702
+1956,0.8121790488050616,0.44426942873399283,0.5349293526208142
+1959,0.795206874163792,0.43749348077061573,0.5213985948309416
+1962,0.8086945076579359,0.4435843103853645,0.5345127915054341
+1965,0.7904149225687935,0.43763715466663444,0.7487860020887753
+1968,0.7982885066993497,0.4208620794438902,0.5242396427381545
+1971,0.7911574835420259,0.4233344246090255,0.5576454812313466
+1977,0.7571418922185215,0.46187678800902543,0.5704448110072049
+1983,0.7494335400643013,0.439345618464469,0.5662220844385915
+1989,0.7715705301674302,0.5115249581654197,0.601399568747142
+1992,0.7508126614055308,0.4740650672076798,0.5983592657979563
+1995,0.7569492388110265,0.48965523558400603,0.5969779516716903
+1998,0.7603291991801185,0.49117441585168614,0.5774462841723305
+2001,0.7816118750507056,0.5239092994681135,0.6042739644967272
+2004,0.7700355469522361,0.4884350383903255,0.5981432201792727
+2007,0.7821413776486978,0.5197156312086187,0.626345219575322
+2010,0.8250825295193438,0.5195972120145615,0.6453653328291903
+2013,0.8227698931835303,0.531400174984336,0.6498682917772644
+2016,0.8342975903562234,0.5541400068900825,0.6706846793375284
diff --git a/lectures/inequality.md b/lectures/inequality.md
@@ -139,7 +139,6 @@ income or wealth data into the cumulative share
 of individuals (or households) and the cumulative share of income (or wealth).
 
 ```{code-cell} ipython3
-
 def lorenz_curve(y):
     """
     Calculates the Lorenz Curve, a graphical representation of
@@ -216,9 +215,9 @@ ax.plot(f_vals, f_vals, label='equality', lw=2)
 ax.vlines([0.8], [0.0], [0.43], alpha=0.5, colors='k', ls='--')
 ax.hlines([0.43], [0], [0.8], alpha=0.5, colors='k', ls='--')
 ax.set_xlim((0, 1))
-ax.set_xlabel("Cumulative share of households (%)")
+ax.set_xlabel("share of households (%)")
 ax.set_ylim((0, 1))
-ax.set_ylabel("Cumulative share of income (%)")
+ax.set_ylabel("share of income (%)")
 ax.legend()
 plt.show()
 ```
@@ -304,8 +303,8 @@ ax.plot(f_vals_nw[-1], l_vals_nw[-1], label=f'net wealth')
 ax.plot(f_vals_ti[-1], l_vals_ti[-1], label=f'total income')
 ax.plot(f_vals_li[-1], l_vals_li[-1], label=f'labor income')
 ax.plot(f_vals_nw[-1], f_vals_nw[-1], label=f'equality')
-ax.set_xlabel("household percentile")
-ax.set_ylabel("income/wealth percentile")
+ax.set_xlabel("share of households (%)")
+ax.set_ylabel("share of income/wealth (%)")
 ax.legend()
 plt.show()
 ```
@@ -372,18 +371,14 @@ ax.fill_between(f_vals, l_vals, f_vals, alpha=0.06)
 ax.set_ylim((0, 1))
 ax.set_xlim((0, 1))
 ax.text(0.04, 0.5, r'$G = 2 \times$ shaded area')
-ax.set_xlabel("household percentile")
-ax.set_ylabel("income/wealth percentile")
+ax.set_xlabel("share of households (%)")
+ax.set_ylabel("share of income/wealth (%)")
 ax.legend()
 plt.show()
 ```
 
 Another way to think of the Gini coefficient is as a ratio of the area between the 45-degree line of 
-perfect equality and the Lorenz curve (A) divided by the total area below the 45-degree line (A+B). 
-
-```{seealso}
-The World in Data project has a [nice graphical exploration of the Lorenz curve and the Gini coefficient](https://ourworldindata.org/what-is-the-gini-coefficient])
-```
+perfect equality and the Lorenz curve (A) divided by the total area below the 45-degree line (A+B) as shown in {numref}`lorenz_gini2`. 
 
 ```{code-cell} ipython3
 ---
@@ -402,8 +397,8 @@ ax.set_ylim((0, 1))
 ax.set_xlim((0, 1))
 ax.text(0.55, 0.4, 'A')
 ax.text(0.75, 0.15, 'B')
-ax.set_xlabel("household percentile")
-ax.set_ylabel("income/wealth percentile")
+ax.set_xlabel("share of households (%)")
+ax.set_ylabel("share of income/wealth (%)")
 ax.legend()
 plt.show()
 ```
@@ -414,6 +409,10 @@ $$
 
 It is an average measure of deviation from the line of equality.
 
+```{seealso}
+The World in Data project has a [nice graphical exploration of the Lorenz curve and the Gini coefficient](https://ourworldindata.org/what-is-the-gini-coefficient])
+```
+
 ### Gini coefficient of simulated data
 
 Let's examine the Gini coefficient in some simulations.
@@ -463,10 +462,8 @@ In each case we set $\mu = - \sigma^2 / 2$.
 
 This implies that the mean of the distribution does not change with $\sigma$. 
 
-```{note}
 You can check this by looking up the expression for the mean of a lognormal
 distribution.
-```
 
 ```{code-cell} ipython3
 k = 5
@@ -504,18 +501,18 @@ fix, ax = plot_inequality_measures(σ_vals,
                                   ginis, 
                                   'simulated', 
                                   '$\sigma$', 
-                                  'gini coefficients')
+                                  'Gini coefficients')
 plt.show()
 ```
 
 The plots show that inequality rises with $\sigma$, according to the Gini
 coefficient.
 
-### Gini coefficient dynamics for US data (income)
+### Gini coefficient for US data (income)
 
 Now let's look at the Gini coefficient using US data.
 
-We will get pre-computed Gini coefficients from the World Bank using the [wbgapi](https://blogs.worldbank.org/opendata/introducing-wbgapi-new-python-package-accessing-world-bank-data).
+We will get pre-computed Gini coefficients (based on income) from the World Bank using the [wbgapi](https://blogs.worldbank.org/opendata/introducing-wbgapi-new-python-package-accessing-world-bank-data).
 
 Let's use the `wbgapi` package we imported earlier to search the world bank data for Gini to find the Series ID.
 
@@ -578,63 +575,28 @@ gini_all.columns = gini_all.columns.map(lambda x: int(x.replace('YR',''))) # rem
 gini_all = gini_all.unstack(level='economy').dropna()
 
 # Build a histogram
-gini_all.plot(kind="hist", 
-              bins=20,
-              title="Gini coefficient"
-             )
+ax = gini_all.plot(kind="hist", bins=20)
+ax.set_xlabel("Gini coefficient")
+ax.set_ylabel("frequency")
 plt.show()
 ```
 
-We can see that across 50 years of data and all countries (including low and high income countries) the measure varies between 20 and 65.
-
-Let us zoom in a little on the US data and add some trendlines.
-
-{numref}`gini_usa1` suggests there is a change in trend around the year 1981
+We can see that across 50 years of data and all countries (including low and high income countries) the measure only varies between 20 and 65.
 
-```{code-cell} ipython3
-# Use pandas filters to find data before 1981
-pre_1981 = data_usa[data_usa.index <= 1981]
-# Use pandas filters to find data after 1981
-post_1981 = data_usa[data_usa.index > 1981]
-```
-
-We can use `numpy` to compute a linear line of best fit.
-
-```{code-cell} ipython3
-# Pre 1981 Data Trend
-x1 = pre_1981.dropna().index.values
-y1 = pre_1981.dropna().values
-a1, b1 = np.polyfit(x1, y1, 1)
-
-# Post 1981 Data Trend
-x2 = post_1981.dropna().index.values
-y2 = post_1981.dropna().values
-a2, b2 = np.polyfit(x2, y2, 1)
-```
+{numref}`gini_usa1` suggests there is a change in trend around the year 1980.
 
-We can now built a plot that includes trend and a range that offers a closer 
-look at the dynamics over time in the Gini coefficient for the USA.
+Let us zoom on the US data so we can more clearly observe trends.
 
 ```{code-cell} ipython3
----
-mystnb:
-  figure:
-    caption: Gini coefficients (USA) with trend
-    name: gini_usa_trend
----
-x = data_usa.dropna().index.values
-y = data_usa.dropna().values
-plt.scatter(x,y)
-plt.plot(x1, a1*x1+b1)
-plt.plot(x2, a2*x2+b2)
-plt.title("US Gini coefficient dynamics")
-plt.legend(['Gini coefficient', 'trend (before 1981)', 'trend (after 1981)'])
-plt.ylabel("Gini coefficient")
-plt.xlabel("year")
+fig, ax = plt.subplots()
+ax = data_usa.plot(ax=ax)
+ax.set_ylim(data_usa.min()-1, data_usa.max()+1)
+ax.set_ylabel("Gini coefficient")
+ax.set_xlabel("year")
 plt.show()
 ```
 
-{numref}`gini_usa_trend` shows inequality was falling in the USA until 1981 when it appears to have started to change course and steadily rise over time. 
+{numref}`gini_usa_trend` shows inequality was falling in the USA until 1980 when it appears to have started to change course and steadily rise over time. 
 
 (compare-income-wealth-usa-over-time)=
 ### Comparing income and wealth inequality (the US case)
@@ -766,7 +728,7 @@ The wealth time series exhibits a strong U-shape.
 
 As we saw earlier in this lecture we used `wbgapi` to get Gini data across many countries and saved it in a variable called `gini_all`
 
-In this section we will compare a few countries and the evolution in their respective Gini coefficients
+In this section we will compare a few western economies and look at the evolution in their respective Gini coefficients
 
 ```{code-cell} ipython3
 data = gini_all.unstack() # Obtain data for all countries as a table
@@ -778,7 +740,11 @@ There are 167 countries represented in this dataset.
 Let us compare three western economies: USA, United Kingdom, and Norway
 
 ```{code-cell} ipython3
-data[['USA','GBR', 'NOR']].plot(ylabel='Gini coefficient')
+ax = data[['USA','GBR', 'NOR']].plot()
+ax.set_xlabel('year')
+ax.set_ylabel('Gini coefficient')
+ax.legend(title="")
+plt.show()
 ```
 
 We see that Norway has a shorter time series so let us take a closer look at the underlying data
@@ -796,12 +762,13 @@ data['NOR'] = data['NOR'].ffill()
 ax = data[['USA','GBR', 'NOR']].plot()
 ax.set_xlabel('year')
 ax.set_ylabel('Gini coefficient')
+ax.legend(title="")
 plt.show()
 ```
 
 From this plot we can observe that the USA has a higher Gini coefficient (i.e. higher income inequality) when compared to the UK and Norway. 
 
-Norway has the lowest Gini coefficient over the three economies it is substantially lower than the US.
+Norway has the lowest Gini coefficient over the three economies and is substantially lower than the US.
 
 ### Gini Coefficient and GDP per capita (over time)
 
@@ -841,10 +808,9 @@ min_year = plot_data.year.min()
 max_year = plot_data.year.max()
 ```
 
-```{note}
-The time series for all three countries start and stop in different years. We will add a year mask to the data to
+
+**Note:** The time series for all three countries start and stop in different years. We will add a year mask to the data to
 improve clarity in the chart including the different end years associated with each countries time series.
-```
 
 ```{code-cell} ipython3
 labels = [1979, 1986, 1991, 1995, 2000, 2020, 2021, 2022] + list(range(min_year,max_year,5))
@@ -871,7 +837,9 @@ This figure is built using `plotly` and is {ref}` available on the website <fig:
 ```
 
 This plot shows that all three western economies GDP per capita has grown over time with some fluctuations
-in the Gini coefficient. From the early 80's the United Kingdom and the US economies both saw increases in income 
+in the Gini coefficient. 
+
+From the early 80's the United Kingdom and the US economies both saw increases in income 
 inequality. 
 
 Interestingly, since the year 2000, the United Kingdom saw a decline in income inequality while
@@ -893,10 +861,14 @@ As before, suppose that the sample $w_1, \ldots, w_n$ has been sorted from small
 Given the Lorenz curve $y = L(x)$ defined above, the top $100 \times p \%$
 share is defined as
 
+```{prf:definition}
+:label: top-shares
+
 $$
 T(p) = 1 - L (1-p) 
     \approx \frac{\sum_{j\geq i} w_j}{ \sum_{j \leq n} w_j}, \quad i = \lfloor n (1-p)\rfloor
 $$ (topshares)
+```
 
 Here $\lfloor \cdot \rfloor$ is the floor function, which rounds any
 number down to the integer less than or equal to that number.