From f2316e81849a13a7ed2a34c0fd5aed889a76a16f Mon Sep 17 00:00:00 2001 From: Ross Barnowski Date: Sun, 30 Apr 2023 09:43:41 -0700 Subject: [PATCH 1/4] Initial removal of statsmodels from mooreslaw md --- content/mooreslaw-tutorial.md | 47 +++++++---------------------------- 1 file changed, 9 insertions(+), 38 deletions(-) diff --git a/content/mooreslaw-tutorial.md b/content/mooreslaw-tutorial.md index 18a13dcc..5718ef0b 100644 --- a/content/mooreslaw-tutorial.md +++ b/content/mooreslaw-tutorial.md @@ -44,19 +44,17 @@ the 53 years following his prediction. You will determine the best-fit constants * NumPy * [Matplotlib](https://matplotlib.org/) -* [statsmodels](https://www.statsmodels.org) ordinary linear regression imported with the following commands ```{code-cell} import matplotlib.pyplot as plt import numpy as np -import statsmodels.api as sm ``` **2.** Since this is an exponential growth law you need a little background in doing math with [natural logs](https://en.wikipedia.org/wiki/Natural_logarithm) and [exponentials](https://en.wikipedia.org/wiki/Exponential_function). -You'll use these NumPy, Matplotlib, and statsmodels functions: +You'll use these NumPy and Matplotlib functions: * [`np.loadtxt`](https://numpy.org/doc/stable/reference/generated/numpy.loadtxt.html): this function loads text into a NumPy array * [`np.log`](https://numpy.org/doc/stable/reference/generated/numpy.log.html): this function takes the natural log of all elements in a NumPy array @@ -64,7 +62,6 @@ You'll use these NumPy, Matplotlib, and statsmodels functions: * [`lambda`](https://docs.python.org/3/library/ast.html?highlight=lambda#ast.Lambda): this is a minimal function definition for creating a function model * [`plt.semilogy`](https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.semilogy.html): this function will plot x-y data onto a figure with a linear x-axis and $\log_{10}$ y-axis [`plt.plot`](https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.plot.html): this function will plot x-y data on linear axes -* [`sm.OLS`](https://www.statsmodels.org/stable/generated/statsmodels.regression.linear_model.OLS.html): find fitting parameters and standard errors using the statsmodels ordinary least squares model * slicing arrays: view parts of the data loaded into the workspace, slice the arrays e.g. `x[:10]` for the first 10 values in the array, `x` * boolean array indexing: to view parts of the data that match a given condition use boolean operations to index an array * [`np.block`](https://numpy.org/doc/stable/reference/generated/numpy.block.html): to combine arrays into 2D arrays @@ -231,43 +228,17 @@ variables to build the an ordinary least squares model with [`sm.OLS`](https://www.statsmodels.org/stable/generated/statsmodels.regression.linear_model.OLS.html). ```{code-cell} -model = sm.OLS(yi, Z) +model = np.polynomial.Polynomial.fit(year, yi, deg=1) ``` -Now, you can view the fitting constants, $A$ and $B$, and their standard -errors. Run the -[`fit`](https://www.statsmodels.org/stable/generated/statsmodels.regression.linear_model.OLS.html) and print the -[`summary`](https://www.statsmodels.org/stable/generated/statsmodels.regression.linear_model.RegressionResults.summary.html) to view results as such, +%% TODO: make mention of model.convert() -```{code-cell} -results = model.fit() -print(results.summary()) -``` - -The __OLS Regression Results__ summary gives a lot of information about -the regressors, $\mathbf{Z},$ and observations, $\mathbf{y}.$ The most -important outputs for your current analysis are - -``` -================================= - coef std err ---------------------------------- -x1 0.3416 0.006 -const -666.3264 11.890 -================================= -``` -where `x1` is slope, $A=0.3416$, `const` is the intercept, -$B=-666.364$, and `std error` gives the precision of constants -$A=0.342\pm 0.006~\dfrac{\log(\text{transistors}/\text{chip})}{\text{years}}$ and $B=-666\pm -12~\log(\text{transistors}/\text{chip}),$ where the units are in -$\log(\text{transistors}/\text{chip})$. You created an exponential growth model. -To get the constants, save them to an array `AB` with -`results.params` and assign $A$ and $B$ to `x1` and `constant`. +Now, you can view the fitting constants, $A$ and $B$: ```{code-cell} -AB = results.params -A = AB[0] -B = AB[1] +model = model.convert() +print(model) +B, A = model ``` Did manufacturers double the transistor count every two years? You have @@ -294,7 +265,7 @@ print( ``` Based upon your least-squares regression model, the number of -semiconductors per chip increased by a factor of $1.98\pm 0.01$ every two +semiconductors per chip increased by a factor of $1.98$ every two years. You have a model that predicts the number of semiconductors each year. Now compare your model to the actual manufacturing reports. Plot the linear regression results and all of the transistor counts. @@ -455,7 +426,7 @@ np.savez( transistor_count=transistor_count, transistor_count_predicted=transistor_count_predicted, transistor_Moores_law=transistor_Moores_law, - regression_csts=AB, + regression_csts=(A, B), ) ``` From cf84cb3df71242a8dc5202a7197030c6134b30b8 Mon Sep 17 00:00:00 2001 From: Ross Barnowski Date: Sun, 30 Apr 2023 11:29:32 -0700 Subject: [PATCH 2/4] Update wording for polynomial.fit. --- content/mooreslaw-tutorial.md | 28 +++++++++++++--------------- 1 file changed, 13 insertions(+), 15 deletions(-) diff --git a/content/mooreslaw-tutorial.md b/content/mooreslaw-tutorial.md index 5718ef0b..a6e09066 100644 --- a/content/mooreslaw-tutorial.md +++ b/content/mooreslaw-tutorial.md @@ -212,32 +212,30 @@ where $\mathbf{y}$ are the observations of the log of the number of transistors in a 1D array and $\mathbf{Z}=[\text{year}_i^1,~\text{year}_i^0]$ are the polynomial terms for $\text{year}_i$ in the first and second columns. By creating this set of regressors in the $\mathbf{Z}-$matrix you set -up an ordinary least squares statistical model. Some clever -NumPy array features will build $\mathbf{Z}$ +up an ordinary least squares statistical model. -1. `year[:,np.newaxis]` : takes the 1D array with shape `(179,)` and turns it into a 2D column vector with shape `(179,1)` -2. `**[1, 0]` : stacks two columns, in the first column is `year**1` and the second column is `year**0 == 1` +`Z` is a linear model with two parameters, i.e. a polynomial with degree `1`. +Therefore we can represent the model with `numpy.polynomial.Polynomial` and +use the fitting functionality to determine the model parameters: ```{code-cell} -Z = year[:, np.newaxis] ** [1, 0] +model = np.polynomial.Polynomial.fit(year, yi, deg=1) ``` -Now that you have the created a matrix of regressors, $\mathbf{Z},$ and -the observations are in vector, $\mathbf{y},$ you can use these -variables to build the an ordinary least squares model with -[`sm.OLS`](https://www.statsmodels.org/stable/generated/statsmodels.regression.linear_model.OLS.html). +By default, `Polynomial.fit` performs the fit in the domain determined by the +independent variable (`year` in this case). +The coefficients for the unscaled and unshifted model can be recovered with the +`convert` method: + ```{code-cell} -model = np.polynomial.Polynomial.fit(year, yi, deg=1) +model = model.convert() +model ``` -%% TODO: make mention of model.convert() - -Now, you can view the fitting constants, $A$ and $B$: +The individual parameters $A$ and $B$ are the coefficients of our linear model: ```{code-cell} -model = model.convert() -print(model) B, A = model ``` From c49fbf2799082bceaf720ed3a61a2bcc7f571730 Mon Sep 17 00:00:00 2001 From: Ross Barnowski Date: Sun, 30 Apr 2023 11:33:31 -0700 Subject: [PATCH 3/4] Update wording, rm std. err. --- content/mooreslaw-tutorial.md | 14 ++------------ 1 file changed, 2 insertions(+), 12 deletions(-) diff --git a/content/mooreslaw-tutorial.md b/content/mooreslaw-tutorial.md index a6e09066..b9fefc8d 100644 --- a/content/mooreslaw-tutorial.md +++ b/content/mooreslaw-tutorial.md @@ -246,20 +246,10 @@ $\dfrac{\text{transistor_count}(\text{year} +2)}{\text{transistor_count}(\text{y \dfrac{e^{B}e^{A( \text{year} + 2)}}{e^{B}e^{A \text{year}}} = e^{2A}$ where increase in number of transistors is $xFactor,$ number of years is -2, and $A$ is the best fit slope on the semilog function. The error in -your -prediction, $\Delta(xFactor),$ comes from the precision of your constant -$A,$ which you calculated as the standard error $\Delta A= 0.006$. - -$\Delta (xFactor) = \frac{\partial}{\partial A}(e^{2A})\Delta A = 2Ae^{2A}\Delta A$ +2, and $A$ is the best fit slope on the semilog function. ```{code-cell} -print("Rate of semiconductors added on a chip every 2 years:") -print( - "\tx{:.2f} +/- {:.2f} semiconductors per chip".format( - np.exp((A) * 2), 2 * A * np.exp(2 * A) * 0.006 - ) -) +print(f"Rate of semiconductors added on a chip every 2 years: {np.exp(2 * A):.2f}") ``` Based upon your least-squares regression model, the number of From 7cc9d89ea3739c1cd9e3d9780ab8b120037dd260 Mon Sep 17 00:00:00 2001 From: Ross Barnowski Date: Sun, 30 Apr 2023 11:44:21 -0700 Subject: [PATCH 4/4] MAINT: rm statsmodels from requirements/testing. --- environment.yml | 1 - requirements.txt | 1 - tox.ini | 2 -- 3 files changed, 4 deletions(-) diff --git a/environment.yml b/environment.yml index 8421695f..137b04ca 100644 --- a/environment.yml +++ b/environment.yml @@ -7,7 +7,6 @@ dependencies: - scipy - matplotlib - pandas - - statsmodels - imageio # For building the site - sphinx diff --git a/requirements.txt b/requirements.txt index 63b2eefa..0ec30181 100644 --- a/requirements.txt +++ b/requirements.txt @@ -3,7 +3,6 @@ numpy scipy matplotlib pandas -statsmodels imageio # For supporting .md-based notebooks jupytext diff --git a/tox.ini b/tox.ini index b14c817d..e6f26ac8 100644 --- a/tox.ini +++ b/tox.ini @@ -19,7 +19,6 @@ deps = oldestdeps: matplotlib==3.4 oldestdeps: scipy==1.6 oldestdeps: pandas==1.2 - oldestdeps: statsmodels==0.13 allowlist_externals = bash @@ -28,7 +27,6 @@ commands = devdeps: pip install -U --pre --only-binary :all: -i https://pypi.anaconda.org/scipy-wheels-nightly/simple scipy devdeps: pip install -U --pre --only-binary :all: -i https://pypi.anaconda.org/scipy-wheels-nightly/simple matplotlib devdeps: pip install -U --pre --only-binary :all: -i https://pypi.anaconda.org/scipy-wheels-nightly/simple pandas - devdeps: pip install -U --pre --only-binary :all: -i https://pypi.anaconda.org/scipy-wheels-nightly/simple statsmodels pip freeze