Skip to content

Commit 4ea40a0

Browse files
authored
ENH: Use numpy.polynomial for linear fit instead of statsmodels (#182)
* Initial removal of statsmodels from mooreslaw md * Update wording for polynomial.fit. * Update wording, rm std. err. * MAINT: rm statsmodels from requirements/testing.
1 parent 89b0464 commit 4ea40a0

File tree

4 files changed

+18
-63
lines changed

4 files changed

+18
-63
lines changed

content/mooreslaw-tutorial.md

Lines changed: 18 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -44,27 +44,24 @@ the 53 years following his prediction. You will determine the best-fit constants
4444

4545
* NumPy
4646
* [Matplotlib](https://matplotlib.org/)
47-
* [statsmodels](https://www.statsmodels.org) ordinary linear regression
4847

4948
imported with the following commands
5049

5150
```{code-cell}
5251
import matplotlib.pyplot as plt
5352
import numpy as np
54-
import statsmodels.api as sm
5553
```
5654

5755
**2.** Since this is an exponential growth law you need a little background in doing math with [natural logs](https://en.wikipedia.org/wiki/Natural_logarithm) and [exponentials](https://en.wikipedia.org/wiki/Exponential_function).
5856

59-
You'll use these NumPy, Matplotlib, and statsmodels functions:
57+
You'll use these NumPy and Matplotlib functions:
6058

6159
* [`np.loadtxt`](https://numpy.org/doc/stable/reference/generated/numpy.loadtxt.html): this function loads text into a NumPy array
6260
* [`np.log`](https://numpy.org/doc/stable/reference/generated/numpy.log.html): this function takes the natural log of all elements in a NumPy array
6361
* [`np.exp`](https://numpy.org/doc/stable/reference/generated/numpy.exp.html): this function takes the exponential of all elements in a NumPy array
6462
* [`lambda`](https://docs.python.org/3/library/ast.html?highlight=lambda#ast.Lambda): this is a minimal function definition for creating a function model
6563
* [`plt.semilogy`](https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.semilogy.html): this function will plot x-y data onto a figure with a linear x-axis and $\log_{10}$ y-axis
6664
[`plt.plot`](https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.plot.html): this function will plot x-y data on linear axes
67-
* [`sm.OLS`](https://www.statsmodels.org/stable/generated/statsmodels.regression.linear_model.OLS.html): find fitting parameters and standard errors using the statsmodels ordinary least squares model
6865
* slicing arrays: view parts of the data loaded into the workspace, slice the arrays e.g. `x[:10]` for the first 10 values in the array, `x`
6966
* boolean array indexing: to view parts of the data that match a given condition use boolean operations to index an array
7067
* [`np.block`](https://numpy.org/doc/stable/reference/generated/numpy.block.html): to combine arrays into 2D arrays
@@ -215,59 +212,31 @@ where $\mathbf{y}$ are the observations of the log of the number of
215212
transistors in a 1D array and $\mathbf{Z}=[\text{year}_i^1,~\text{year}_i^0]$ are the
216213
polynomial terms for $\text{year}_i$ in the first and second columns. By
217214
creating this set of regressors in the $\mathbf{Z}-$matrix you set
218-
up an ordinary least squares statistical model. Some clever
219-
NumPy array features will build $\mathbf{Z}$
215+
up an ordinary least squares statistical model.
220216

221-
1. `year[:,np.newaxis]` : takes the 1D array with shape `(179,)` and turns it into a 2D column vector with shape `(179,1)`
222-
2. `**[1, 0]` : stacks two columns, in the first column is `year**1` and the second column is `year**0 == 1`
217+
`Z` is a linear model with two parameters, i.e. a polynomial with degree `1`.
218+
Therefore we can represent the model with `numpy.polynomial.Polynomial` and
219+
use the fitting functionality to determine the model parameters:
223220

224221
```{code-cell}
225-
Z = year[:, np.newaxis] ** [1, 0]
222+
model = np.polynomial.Polynomial.fit(year, yi, deg=1)
226223
```
227224

228-
Now that you have the created a matrix of regressors, $\mathbf{Z},$ and
229-
the observations are in vector, $\mathbf{y},$ you can use these
230-
variables to build the an ordinary least squares model with
231-
[`sm.OLS`](https://www.statsmodels.org/stable/generated/statsmodels.regression.linear_model.OLS.html).
225+
By default, `Polynomial.fit` performs the fit in the domain determined by the
226+
independent variable (`year` in this case).
227+
The coefficients for the unscaled and unshifted model can be recovered with the
228+
`convert` method:
232229

233-
```{code-cell}
234-
model = sm.OLS(yi, Z)
235-
```
236-
237-
Now, you can view the fitting constants, $A$ and $B$, and their standard
238-
errors. Run the
239-
[`fit`](https://www.statsmodels.org/stable/generated/statsmodels.regression.linear_model.OLS.html) and print the
240-
[`summary`](https://www.statsmodels.org/stable/generated/statsmodels.regression.linear_model.RegressionResults.summary.html) to view results as such,
241230

242231
```{code-cell}
243-
results = model.fit()
244-
print(results.summary())
232+
model = model.convert()
233+
model
245234
```
246235

247-
The __OLS Regression Results__ summary gives a lot of information about
248-
the regressors, $\mathbf{Z},$ and observations, $\mathbf{y}.$ The most
249-
important outputs for your current analysis are
250-
251-
```
252-
=================================
253-
coef std err
254-
---------------------------------
255-
x1 0.3416 0.006
256-
const -666.3264 11.890
257-
=================================
258-
```
259-
where `x1` is slope, $A=0.3416$, `const` is the intercept,
260-
$B=-666.364$, and `std error` gives the precision of constants
261-
$A=0.342\pm 0.006~\dfrac{\log(\text{transistors}/\text{chip})}{\text{years}}$ and $B=-666\pm
262-
12~\log(\text{transistors}/\text{chip}),$ where the units are in
263-
$\log(\text{transistors}/\text{chip})$. You created an exponential growth model.
264-
To get the constants, save them to an array `AB` with
265-
`results.params` and assign $A$ and $B$ to `x1` and `constant`.
236+
The individual parameters $A$ and $B$ are the coefficients of our linear model:
266237

267238
```{code-cell}
268-
AB = results.params
269-
A = AB[0]
270-
B = AB[1]
239+
B, A = model
271240
```
272241

273242
Did manufacturers double the transistor count every two years? You have
@@ -277,24 +246,14 @@ $\dfrac{\text{transistor_count}(\text{year} +2)}{\text{transistor_count}(\text{y
277246
\dfrac{e^{B}e^{A( \text{year} + 2)}}{e^{B}e^{A \text{year}}} = e^{2A}$
278247

279248
where increase in number of transistors is $xFactor,$ number of years is
280-
2, and $A$ is the best fit slope on the semilog function. The error in
281-
your
282-
prediction, $\Delta(xFactor),$ comes from the precision of your constant
283-
$A,$ which you calculated as the standard error $\Delta A= 0.006$.
284-
285-
$\Delta (xFactor) = \frac{\partial}{\partial A}(e^{2A})\Delta A = 2Ae^{2A}\Delta A$
249+
2, and $A$ is the best fit slope on the semilog function.
286250

287251
```{code-cell}
288-
print("Rate of semiconductors added on a chip every 2 years:")
289-
print(
290-
"\tx{:.2f} +/- {:.2f} semiconductors per chip".format(
291-
np.exp((A) * 2), 2 * A * np.exp(2 * A) * 0.006
292-
)
293-
)
252+
print(f"Rate of semiconductors added on a chip every 2 years: {np.exp(2 * A):.2f}")
294253
```
295254

296255
Based upon your least-squares regression model, the number of
297-
semiconductors per chip increased by a factor of $1.98\pm 0.01$ every two
256+
semiconductors per chip increased by a factor of $1.98$ every two
298257
years. You have a model that predicts the number of semiconductors each
299258
year. Now compare your model to the actual manufacturing reports. Plot
300259
the linear regression results and all of the transistor counts.
@@ -455,7 +414,7 @@ np.savez(
455414
transistor_count=transistor_count,
456415
transistor_count_predicted=transistor_count_predicted,
457416
transistor_Moores_law=transistor_Moores_law,
458-
regression_csts=AB,
417+
regression_csts=(A, B),
459418
)
460419
```
461420

environment.yml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,6 @@ dependencies:
77
- scipy
88
- matplotlib
99
- pandas
10-
- statsmodels
1110
- imageio
1211
# For building the site
1312
- sphinx

requirements.txt

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,6 @@ numpy
33
scipy
44
matplotlib
55
pandas
6-
statsmodels
76
imageio
87
# For supporting .md-based notebooks
98
jupytext

tox.ini

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,6 @@ deps =
1919
oldestdeps: matplotlib==3.4
2020
oldestdeps: scipy==1.6
2121
oldestdeps: pandas==1.2
22-
oldestdeps: statsmodels==0.13
2322

2423
allowlist_externals = bash
2524

@@ -28,7 +27,6 @@ commands =
2827
devdeps: pip install -U --pre --only-binary :all: -i https://pypi.anaconda.org/scipy-wheels-nightly/simple scipy
2928
devdeps: pip install -U --pre --only-binary :all: -i https://pypi.anaconda.org/scipy-wheels-nightly/simple matplotlib
3029
devdeps: pip install -U --pre --only-binary :all: -i https://pypi.anaconda.org/scipy-wheels-nightly/simple pandas
31-
devdeps: pip install -U --pre --only-binary :all: -i https://pypi.anaconda.org/scipy-wheels-nightly/simple statsmodels
3230

3331
pip freeze
3432

0 commit comments

Comments
 (0)