Skip to content

Commit a9e5345

Browse files
committed
Update simple_linear_regression.md
1 parent e7f0cc5 commit a9e5345

File tree

1 file changed

+15
-15
lines changed

1 file changed

+15
-15
lines changed

lectures/simple_linear_regression.md

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -297,7 +297,7 @@ Calculating $\beta$
297297
```{code-cell} ipython3
298298
df = df[['X','Y']].copy() # Original Data
299299
300-
# Calcuate the sample means
300+
# Calculate the sample means
301301
x_bar = df['X'].mean()
302302
y_bar = df['Y'].mean()
303303
```
@@ -393,7 +393,7 @@ df
393393
Sometimes it can be useful to rename your columns to make it easier to work with in the DataFrame
394394
395395
```{code-cell} ipython3
396-
df.columns = ["cntry", "year", "life_expectency", "gdppc"]
396+
df.columns = ["cntry", "year", "life_expectancy", "gdppc"]
397397
df
398398
```
399399
@@ -415,10 +415,10 @@ It is always a good idea to spend a bit of time understanding what data you actu
415415
416416
For example, you may want to explore this data to see if there is consistent reporting for all countries across years
417417
418-
Let's first look at the Life Expectency Data
418+
Let's first look at the Life Expectancy Data
419419
420420
```{code-cell} ipython3
421-
le_years = df[['cntry', 'year', 'life_expectency']].set_index(['cntry', 'year']).unstack()['life_expectency']
421+
le_years = df[['cntry', 'year', 'life_expectancy']].set_index(['cntry', 'year']).unstack()['life_expectancy']
422422
le_years
423423
```
424424
@@ -453,13 +453,13 @@ df = df[df.year == 2018].reset_index(drop=True).copy()
453453
```
454454
455455
```{code-cell} ipython3
456-
df.plot(x='gdppc', y='life_expectency', kind='scatter', xlabel="GDP per capita", ylabel="Life Expectency (Years)",);
456+
df.plot(x='gdppc', y='life_expectancy', kind='scatter', xlabel="GDP per capita", ylabel="Life Expectancy (Years)",);
457457
```
458458
459459
This data shows a couple of interesting relationships.
460460
461461
1. there are a number of countries with similar GDP per capita levels but a wide range in Life Expectancy
462-
2. there appears to be a positive relationship between GDP per capita and life expectancy. Countries with higher GDP per capita tend to have higher life expectency outcomes
462+
2. there appears to be a positive relationship between GDP per capita and life expectancy. Countries with higher GDP per capita tend to have higher life expectancy outcomes
463463
464464
Even though OLS is solving linear equations -- one option we have is to transform the variables, such as through a log transform, and then use OLS to estimate the transformed variables
465465
@@ -470,7 +470,7 @@ ln -> ln == elasticities
470470
By specifying `logx` you can plot the GDP per Capita data on a log scale
471471
472472
```{code-cell} ipython3
473-
df.plot(x='gdppc', y='life_expectency', kind='scatter', xlabel="GDP per capita", ylabel="Life Expectancy (Years)", logx=True);
473+
df.plot(x='gdppc', y='life_expectancy', kind='scatter', xlabel="GDP per capita", ylabel="Life Expectancy (Years)", logx=True);
474474
```
475475
476476
As you can see from this transformation -- a linear model fits the shape of the data more closely.
@@ -486,11 +486,11 @@ df
486486
**Q4:** Use {eq}`eq:optimal-alpha` and {eq}`eq:optimal-beta` to compute optimal values for $\alpha$ and $\beta$
487487
488488
```{code-cell} ipython3
489-
data = df[['log_gdppc', 'life_expectency']].copy() # Get Data from DataFrame
489+
data = df[['log_gdppc', 'life_expectancy']].copy() # Get Data from DataFrame
490490
491491
# Calculate the sample means
492492
x_bar = data['log_gdppc'].mean()
493-
y_bar = data['life_expectency'].mean()
493+
y_bar = data['life_expectancy'].mean()
494494
```
495495
496496
```{code-cell} ipython3
@@ -499,7 +499,7 @@ data
499499
500500
```{code-cell} ipython3
501501
# Compute the Sums
502-
data['num'] = data['log_gdppc'] * data['life_expectency'] - y_bar * data['log_gdppc']
502+
data['num'] = data['log_gdppc'] * data['life_expectancy'] - y_bar * data['log_gdppc']
503503
data['den'] = pow(data['log_gdppc'],2) - x_bar * data['log_gdppc']
504504
β = data['num'].sum() / data['den'].sum()
505505
print(β)
@@ -513,13 +513,13 @@ print(α)
513513
**Q5:** Plot the line of best fit found using OLS
514514
515515
```{code-cell} ipython3
516-
data['life_expectency_hat'] = α + β * df['log_gdppc']
517-
data['error'] = data['life_expectency_hat'] - data['life_expectency']
516+
data['life_expectancy_hat'] = α + β * df['log_gdppc']
517+
data['error'] = data['life_expectancy_hat'] - data['life_expectancy']
518518
519519
fig, ax = plt.subplots()
520-
data.plot(x='log_gdppc',y='life_expectency', kind='scatter', ax=ax)
521-
data.plot(x='log_gdppc',y='life_expectency_hat', kind='line', ax=ax, color='g')
522-
plt.vlines(data['log_gdppc'], data['life_expectency_hat'], data['life_expectency'], color='r')
520+
data.plot(x='log_gdppc',y='life_expectancy', kind='scatter', ax=ax)
521+
data.plot(x='log_gdppc',y='life_expectancy_hat', kind='line', ax=ax, color='g')
522+
plt.vlines(data['log_gdppc'], data['life_expectancy_hat'], data['life_expectancy'], color='r')
523523
```
524524
525525
:::{solution-end}

0 commit comments

Comments
 (0)