diff --git a/_posts/ipython-notebooks/2015-06-30-principal_component_analysis.html b/_posts/ipython-notebooks/2015-06-30-principal_component_analysis.html index 0bc5594ffa3e..969aa44549f8 100755 --- a/_posts/ipython-notebooks/2015-06-30-principal_component_analysis.html +++ b/_posts/ipython-notebooks/2015-06-30-principal_component_analysis.html @@ -1,17 +1,17 @@ --- permalink: ipython-notebooks/principal-component-analysis/ description: A step by step tutorial to Principal Component Analysis, a simple yet powerful transformation technique. -title: Principal Component Analysis in 3 Simple Steps +name: Principal Component Analysis in 3 Simple Steps has_thumbnail: false thumbnail: /images/static-image -layout: user-guide name: Principal Component Analysis -language: python +ipynb: ~notebook_demo/264 +layout: user-guide page_type: u-guide +language: python --- {% raw %} -
Often, the desired goal is to reduce the dimensions of a $d$-dimensional dataset by projecting it onto a $(k)$-dimensional subspace (where $k\;<\;d$) in order to increase the computational efficiency while retaining most of the information. An important question is "what is the size of $k$ that represents the data 'well'?"
+Often, the desired goal is to reduce the dimensions of a $d$-dimensional dataset by projecting it onto a $(k)$-dimensional subspace (where $k\;<\;d$) in order to increase the computational efficiency while retaining most of the information. An important question is "what is the size of $k$ that represents the data 'well'?"
Later, we will compute eigenvectors (the principal components) of a dataset and collect them in a projection matrix. Each of those eigenvectors is associated with an eigenvalue which can be interpreted as the "length" or "magnitude" of the corresponding eigenvector. If some eigenvalues have a significantly larger magnitude than others that the reduction of the dataset via PCA onto a smaller dimensional subspace by dropping the "less informative" eigenpairs is reasonable.
import pandas as pd
+import pandas as pd
df = pd.read_csv(
filepath_or_buffer='https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data',
- header=None,
+ header=None,
sep=',')
df.columns=['sepal_len', 'sepal_wid', 'petal_len', 'petal_wid', 'class']
-df.dropna(how="all", inplace=True) # drops the empty line at file-end
+df.dropna(how="all", inplace=True) # drops the empty line at file-end
df.tail()
@@ -187,11 +175,26 @@ Loading the Dataset
+
Out[1]:
+