Skip to content

Commit 959e37d

Browse files
committed
reorganise code and add q-q plot
1 parent b607dd1 commit 959e37d

File tree

1 file changed

+43
-12
lines changed

1 file changed

+43
-12
lines changed

lectures/heavy_tails.md

Lines changed: 43 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -582,6 +582,12 @@ $$
582582

583583
Thus, $\hat G(x)$ shows the fraction of the sample that exceeds $x$.
584584

585+
```{code-cell} ipython3
586+
def eccdf(x, data):
587+
"Simple empirical CCDF function."
588+
return np.mean(data > x)
589+
```
590+
585591
Here's a figure containing some empirical CCDFs from simulated data.
586592

587593
```{code-cell} ipython3
@@ -591,21 +597,20 @@ mystnb:
591597
caption: Empirical CCDFs
592598
name: ccdf-empirics
593599
---
594-
def eccdf(x, data):
595-
"Simple empirical CCDF function."
596-
return np.mean(data > x)
597-
600+
# Parameters and grid
598601
x_grid = np.linspace(1, 1000, 1000)
599602
sample_size = 1000
600603
np.random.seed(13)
601604
z = np.random.randn(sample_size)
602605
603-
data_1 = np.random.exponential(size=sample_size)
604-
data_2 = np.exp(z)
605-
data_3 = np.exp(np.random.exponential(size=sample_size))
606+
# Draws
607+
data_exp = np.random.exponential(size=sample_size)
608+
data_logn = np.exp(z)
609+
data_pareto = np.exp(np.random.exponential(size=sample_size))
606610
607-
data_list = [data_1, data_2, data_3]
611+
data_list = [data_exp, data_logn, data_pareto]
608612
613+
# Build figure
609614
fig, axes = plt.subplots(3, 1, figsize=(6, 8))
610615
axes = axes.flatten()
611616
labels = ['exponential', 'lognormal', 'Pareto']
@@ -630,6 +635,36 @@ approximately linear in a log-log plot.
630635

631636
We will use this idea [below](https://intro.quantecon.org/heavy_tails.html#heavy-tails-in-economic-cross-sections) when we look at real data.
632637

638+
+++
639+
640+
#### Q-Q Plots
641+
642+
We can also use a [qq plot](https://en.wikipedia.org/wiki/Q%E2%80%93Q_plot) to do a visual comparison between two probability distributions.
643+
644+
The [statsmodels](https://www.statsmodels.org/stable/index.html) package provides a convenient [qqplot](https://www.statsmodels.org/stable/generated/statsmodels.graphics.gofplots.qqplot.html) function that, by default, compares sample data to the quintiles of the normal distribution.
645+
646+
If the data is drawn from a Normal distribution, the plot would look like:
647+
648+
```{code-cell} ipython3
649+
data_normal = np.random.normal(size=sample_size)
650+
sm.qqplot(data_normal, line='45')
651+
plt.show()
652+
```
653+
654+
We can now compare this with the exponential, log-normal, and pareto distributions
655+
656+
```{code-cell} ipython3
657+
# Build figure
658+
fig, axes = plt.subplots(3, 1, figsize=(6, 8))
659+
axes = axes.flatten()
660+
labels = ['exponential', 'lognormal', 'Pareto']
661+
for data, label, ax in zip(data_list, labels, axes):
662+
sm.qqplot(data, line='45', ax=ax, )
663+
ax.set_title(label)
664+
plt.tight_layout()
665+
plt.show()
666+
```
667+
633668

634669
### Power laws
635670

@@ -776,7 +811,6 @@ mystnb:
776811
name: firm-size-dist
777812
tags: [hide-input]
778813
---
779-
780814
df_fs = pd.read_csv('https://media.githubusercontent.com/media/QuantEcon/high_dim_data/main/cross_section/forbes-global2000.csv')
781815
df_fs = df_fs[['Country', 'Sales', 'Profits', 'Assets', 'Market Value']]
782816
fig, ax = plt.subplots(figsize=(6.4, 3.5))
@@ -803,7 +837,6 @@ mystnb:
803837
name: city-size-dist
804838
tags: [hide-input]
805839
---
806-
807840
# import population data of cities in 2023 United States and 2023 Brazil from world population review
808841
df_cs_us = pd.read_csv('https://media.githubusercontent.com/media/QuantEcon/high_dim_data/main/cross_section/cities_us.csv')
809842
df_cs_br = pd.read_csv('https://media.githubusercontent.com/media/QuantEcon/high_dim_data/main/cross_section/cities_brazil.csv')
@@ -830,7 +863,6 @@ mystnb:
830863
name: wealth-dist
831864
tags: [hide-input]
832865
---
833-
834866
df_w = pd.read_csv('https://media.githubusercontent.com/media/QuantEcon/high_dim_data/main/cross_section/forbes-billionaires.csv')
835867
df_w = df_w[['country', 'realTimeWorth', 'realTimeRank']].dropna()
836868
df_w = df_w.astype({'realTimeRank': int})
@@ -886,7 +918,6 @@ mystnb:
886918
name: gdppc-dist
887919
tags: [hide-input]
888920
---
889-
890921
fig, axes = plt.subplots(1, 2, figsize=(8.8, 3.6))
891922
892923
for name, ax in zip(variable_names, axes):

0 commit comments

Comments
 (0)