Closed
Description
We can write a new exercise in the inequality
lecture to teach the difference in python loops and vectorization.
Here is a starting point for the exercise.
```{exercise}
:label: inequality_ex3
The {ref}`code to compute the Gini coefficient is listed in the lecture above <code:gini-coefficient>`.
This code uses loops to calculate the coefficient based on income or wealth data.
This function can be re-written using vectorization which will greatly improve the computational efficiency when using `python`.
Re-write the function `gini_coefficient` using `numpy` and vectorized code.
You can compare the output of this new function with the one above, and note the speed differences.
```
```{solution-start} inequality_ex3
:class: dropdown
```
Let's take a look at some raw data for the US that is stored in `df_income_wealth`
```{code-cell} ipython3
df_income_wealth.describe()
```
```{code-cell} ipython3
df_income_wealth.head(n=4)
```
We will focus on wealth variable `n_wealth` to compute a Gini coefficient for the year 1990.
```{code-cell} ipython3
data = df_income_wealth[df_income_wealth.year == 2016]
```
```{code-cell} ipython3
data.head(n=2)
```
We can first compute the Gini coefficient using the function defined in the lecture above.
```{code-cell} ipython3
gini_coefficient(data.n_wealth.values)
```
Now we can write a vectorized version using `numpy`
```{code-cell} ipython3
def gini(y):
n = len(y)
y_1 = np.reshape(y, (n, 1))
y_2 = np.reshape(y, (1, n))
g_sum = np.sum(np.abs(y_1 - y_2))
return g_sum / (2 * n * np.sum(y))
```
```{code-cell} ipython3
gini(data.n_wealth.values)
```
however this uses a long run time series so it would be better to migrate this to use simulation data that we can control the size and generate in the lecture.
Metadata
Metadata
Assignees
Labels
No labels