@@ -61,7 +61,7 @@ We can use a scatter plot of the data to see the relationship between $y_i$ (ice
61
61
mystnb:
62
62
figure:
63
63
caption: "Scatter plot"
64
- name: sales-v-temp
64
+ name: sales-v-temp1
65
65
---
66
66
ax = df.plot(
67
67
x='X',
@@ -97,7 +97,8 @@ mystnb:
97
97
---
98
98
fig, ax = plt.subplots()
99
99
ax = df.plot(x='X',y='Y', kind='scatter', ax=ax)
100
- df.plot(x='X',y='Y_hat', kind='line', ax=ax)
100
+ ax = df.plot(x='X',y='Y_hat', kind='line', ax=ax)
101
+ plt.show()
101
102
```
102
103
103
104
We can see that this model does a poor job of estimating the relationship.
@@ -113,12 +114,13 @@ df['Y_hat'] = α + β * df['X']
113
114
---
114
115
mystnb:
115
116
figure:
116
- caption: "Scatter plot with a line of fit"
117
+ caption: "Scatter plot with a line of fit #2 "
117
118
name: sales-v-temp3
118
119
---
119
120
fig, ax = plt.subplots()
120
121
ax = df.plot(x='X',y='Y', kind='scatter', ax=ax)
121
- df.plot(x='X',y='Y_hat', kind='line', ax=ax)
122
+ ax = df.plot(x='X',y='Y_hat', kind='line', ax=ax)
123
+ plt.show()
122
124
```
123
125
124
126
``` {code-cell} ipython3
@@ -130,12 +132,13 @@ df['Y_hat'] = α + β * df['X']
130
132
---
131
133
mystnb:
132
134
figure:
133
- caption: "Scatter plot with a line of fit"
135
+ caption: "Scatter plot with a line of fit #3 "
134
136
name: sales-v-temp4
135
137
---
136
138
fig, ax = plt.subplots()
137
139
ax = df.plot(x='X',y='Y', kind='scatter', ax=ax)
138
- df.plot(x='X',y='Y_hat', kind='line', ax=ax, color='g')
140
+ ax = df.plot(x='X',y='Y_hat', kind='line', ax=ax, color='g')
141
+ plt.show()
139
142
```
140
143
141
144
However we need to think about formalizing this guessing process by thinking of this problem as an optimization problem.
@@ -167,7 +170,8 @@ mystnb:
167
170
fig, ax = plt.subplots()
168
171
ax = df.plot(x='X',y='Y', kind='scatter', ax=ax)
169
172
ax = df.plot(x='X',y='Y_hat', kind='line', ax=ax, color='g')
170
- plt.vlines(df['X'], df['Y_hat'], df['Y'], color='r');
173
+ plt.vlines(df['X'], df['Y_hat'], df['Y'], color='r')
174
+ plt.show()
171
175
```
172
176
173
177
The Ordinary Least Squares (OLS) method chooses $\alpha$ and $\beta$ in such a way that ** minimizes** the sum of the squared residuals (SSR).
@@ -231,7 +235,7 @@ Plotting the error
231
235
mystnb:
232
236
figure:
233
237
caption: "Plotting the error (2)"
234
- name: plt-errors2
238
+ name: plt-errors-2
235
239
---
236
240
ax = pd.Series(errors).plot(xlabel='α', ylabel='error')
237
241
plt.axvline(α_optimal, color='r');
0 commit comments