diff --git a/notebooks/7_reshape_table_layout.ipynb b/notebooks/7_reshape_table_layout.ipynb
new file mode 100644
index 0000000..0e103ab
--- /dev/null
+++ b/notebooks/7_reshape_table_layout.ipynb
@@ -0,0 +1,1654 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Objectives\n",
+ "\n",
+ "- Order the rows of a table using a chosen column\n",
+ "- Convert to long format to plot multiple columns at the same time\n",
+ "- Switch between short/long table format\n",
+ "\n",
+ "Content to cover\n",
+ "\n",
+ "- sort_values\n",
+ "- pivot, pivot_table\n",
+ "- melt\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import pandas as pd\n",
+ "import numpy as np\n",
+ "import matplotlib.pyplot as plt\n",
+ "%matplotlib inline"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
PassengerId
\n",
+ "
Survived
\n",
+ "
Pclass
\n",
+ "
Name
\n",
+ "
Sex
\n",
+ "
Age
\n",
+ "
SibSp
\n",
+ "
Parch
\n",
+ "
Ticket
\n",
+ "
Fare
\n",
+ "
Cabin
\n",
+ "
Embarked
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "
\n",
+ "
0
\n",
+ "
1
\n",
+ "
0
\n",
+ "
3
\n",
+ "
Braund, Mr. Owen Harris
\n",
+ "
male
\n",
+ "
22.0
\n",
+ "
1
\n",
+ "
0
\n",
+ "
A/5 21171
\n",
+ "
7.2500
\n",
+ "
NaN
\n",
+ "
S
\n",
+ "
\n",
+ "
\n",
+ "
1
\n",
+ "
2
\n",
+ "
1
\n",
+ "
1
\n",
+ "
Cumings, Mrs. John Bradley (Florence Briggs Th...
\n",
+ "
female
\n",
+ "
38.0
\n",
+ "
1
\n",
+ "
0
\n",
+ "
PC 17599
\n",
+ "
71.2833
\n",
+ "
C85
\n",
+ "
C
\n",
+ "
\n",
+ "
\n",
+ "
2
\n",
+ "
3
\n",
+ "
1
\n",
+ "
3
\n",
+ "
Heikkinen, Miss. Laina
\n",
+ "
female
\n",
+ "
26.0
\n",
+ "
0
\n",
+ "
0
\n",
+ "
STON/O2. 3101282
\n",
+ "
7.9250
\n",
+ "
NaN
\n",
+ "
S
\n",
+ "
\n",
+ "
\n",
+ "
3
\n",
+ "
4
\n",
+ "
1
\n",
+ "
1
\n",
+ "
Futrelle, Mrs. Jacques Heath (Lily May Peel)
\n",
+ "
female
\n",
+ "
35.0
\n",
+ "
1
\n",
+ "
0
\n",
+ "
113803
\n",
+ "
53.1000
\n",
+ "
C123
\n",
+ "
S
\n",
+ "
\n",
+ "
\n",
+ "
4
\n",
+ "
5
\n",
+ "
0
\n",
+ "
3
\n",
+ "
Allen, Mr. William Henry
\n",
+ "
male
\n",
+ "
35.0
\n",
+ "
0
\n",
+ "
0
\n",
+ "
373450
\n",
+ "
8.0500
\n",
+ "
NaN
\n",
+ "
S
\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " PassengerId Survived Pclass \\\n",
+ "0 1 0 3 \n",
+ "1 2 1 1 \n",
+ "2 3 1 3 \n",
+ "3 4 1 1 \n",
+ "4 5 0 3 \n",
+ "\n",
+ " Name Sex Age SibSp \\\n",
+ "0 Braund, Mr. Owen Harris male 22.0 1 \n",
+ "1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n",
+ "2 Heikkinen, Miss. Laina female 26.0 0 \n",
+ "3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n",
+ "4 Allen, Mr. William Henry male 35.0 0 \n",
+ "\n",
+ " Parch Ticket Fare Cabin Embarked \n",
+ "0 0 A/5 21171 7.2500 NaN S \n",
+ "1 0 PC 17599 71.2833 C85 C \n",
+ "2 0 STON/O2. 3101282 7.9250 NaN S \n",
+ "3 0 113803 53.1000 C123 S \n",
+ "4 0 373450 8.0500 NaN S "
+ ]
+ },
+ "execution_count": 2,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "titanic = pd.read_csv(\"../data/titanic.csv\")\n",
+ "titanic.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Air quality data about $NO_2$ and Particulate matter less than 2.5 micrometers is used, made available by [openaq](https://openaq.org) and using the [py-openaq](http://dhhagan.github.io/py-openaq/index.html) package. The `air_quality_long.csv` data set provides $NO_2$ and $pm25$ values for the measurement stations _FR04014_, _BETR801_ and _London Westminster_ in respectively Paris, Antwerp and London. In this case, the data set is provided in a so-called long data format representation."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
city
\n",
+ "
country
\n",
+ "
location
\n",
+ "
parameter
\n",
+ "
value
\n",
+ "
unit
\n",
+ "
\n",
+ "
\n",
+ "
date.utc
\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "
\n",
+ "
2019-06-18 06:00:00+00:00
\n",
+ "
Antwerpen
\n",
+ "
BE
\n",
+ "
BETR801
\n",
+ "
pm25
\n",
+ "
18.0
\n",
+ "
µg/m³
\n",
+ "
\n",
+ "
\n",
+ "
2019-06-17 08:00:00+00:00
\n",
+ "
Antwerpen
\n",
+ "
BE
\n",
+ "
BETR801
\n",
+ "
pm25
\n",
+ "
6.5
\n",
+ "
µg/m³
\n",
+ "
\n",
+ "
\n",
+ "
2019-06-17 07:00:00+00:00
\n",
+ "
Antwerpen
\n",
+ "
BE
\n",
+ "
BETR801
\n",
+ "
pm25
\n",
+ "
18.5
\n",
+ "
µg/m³
\n",
+ "
\n",
+ "
\n",
+ "
2019-06-17 06:00:00+00:00
\n",
+ "
Antwerpen
\n",
+ "
BE
\n",
+ "
BETR801
\n",
+ "
pm25
\n",
+ "
16.0
\n",
+ "
µg/m³
\n",
+ "
\n",
+ "
\n",
+ "
2019-06-17 05:00:00+00:00
\n",
+ "
Antwerpen
\n",
+ "
BE
\n",
+ "
BETR801
\n",
+ "
pm25
\n",
+ "
7.5
\n",
+ "
µg/m³
\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " city country location parameter value unit\n",
+ "date.utc \n",
+ "2019-06-18 06:00:00+00:00 Antwerpen BE BETR801 pm25 18.0 µg/m³\n",
+ "2019-06-17 08:00:00+00:00 Antwerpen BE BETR801 pm25 6.5 µg/m³\n",
+ "2019-06-17 07:00:00+00:00 Antwerpen BE BETR801 pm25 18.5 µg/m³\n",
+ "2019-06-17 06:00:00+00:00 Antwerpen BE BETR801 pm25 16.0 µg/m³\n",
+ "2019-06-17 05:00:00+00:00 Antwerpen BE BETR801 pm25 7.5 µg/m³"
+ ]
+ },
+ "execution_count": 3,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "air_quality = pd.read_csv(\"../data/air_quality_long.csv\", index_col=\"date.utc\", parse_dates=True)\n",
+ "air_quality.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The `no2` data set contains only the measurements of $NO_2$:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "no2 = air_quality[air_quality[\"parameter\"] == \"no2\"]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Reshape the layout of tables"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Sort table rows"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "> I want to arrange the titanic date according to the age of the passengers."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
PassengerId
\n",
+ "
Survived
\n",
+ "
Pclass
\n",
+ "
Name
\n",
+ "
Sex
\n",
+ "
Age
\n",
+ "
SibSp
\n",
+ "
Parch
\n",
+ "
Ticket
\n",
+ "
Fare
\n",
+ "
Cabin
\n",
+ "
Embarked
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "
\n",
+ "
803
\n",
+ "
804
\n",
+ "
1
\n",
+ "
3
\n",
+ "
Thomas, Master. Assad Alexander
\n",
+ "
male
\n",
+ "
0.42
\n",
+ "
0
\n",
+ "
1
\n",
+ "
2625
\n",
+ "
8.5167
\n",
+ "
NaN
\n",
+ "
C
\n",
+ "
\n",
+ "
\n",
+ "
755
\n",
+ "
756
\n",
+ "
1
\n",
+ "
2
\n",
+ "
Hamalainen, Master. Viljo
\n",
+ "
male
\n",
+ "
0.67
\n",
+ "
1
\n",
+ "
1
\n",
+ "
250649
\n",
+ "
14.5000
\n",
+ "
NaN
\n",
+ "
S
\n",
+ "
\n",
+ "
\n",
+ "
644
\n",
+ "
645
\n",
+ "
1
\n",
+ "
3
\n",
+ "
Baclini, Miss. Eugenie
\n",
+ "
female
\n",
+ "
0.75
\n",
+ "
2
\n",
+ "
1
\n",
+ "
2666
\n",
+ "
19.2583
\n",
+ "
NaN
\n",
+ "
C
\n",
+ "
\n",
+ "
\n",
+ "
469
\n",
+ "
470
\n",
+ "
1
\n",
+ "
3
\n",
+ "
Baclini, Miss. Helene Barbara
\n",
+ "
female
\n",
+ "
0.75
\n",
+ "
2
\n",
+ "
1
\n",
+ "
2666
\n",
+ "
19.2583
\n",
+ "
NaN
\n",
+ "
C
\n",
+ "
\n",
+ "
\n",
+ "
78
\n",
+ "
79
\n",
+ "
1
\n",
+ "
2
\n",
+ "
Caldwell, Master. Alden Gates
\n",
+ "
male
\n",
+ "
0.83
\n",
+ "
0
\n",
+ "
2
\n",
+ "
248738
\n",
+ "
29.0000
\n",
+ "
NaN
\n",
+ "
S
\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " PassengerId Survived Pclass Name Sex \\\n",
+ "803 804 1 3 Thomas, Master. Assad Alexander male \n",
+ "755 756 1 2 Hamalainen, Master. Viljo male \n",
+ "644 645 1 3 Baclini, Miss. Eugenie female \n",
+ "469 470 1 3 Baclini, Miss. Helene Barbara female \n",
+ "78 79 1 2 Caldwell, Master. Alden Gates male \n",
+ "\n",
+ " Age SibSp Parch Ticket Fare Cabin Embarked \n",
+ "803 0.42 0 1 2625 8.5167 NaN C \n",
+ "755 0.67 1 1 250649 14.5000 NaN S \n",
+ "644 0.75 2 1 2666 19.2583 NaN C \n",
+ "469 0.75 2 1 2666 19.2583 NaN C \n",
+ "78 0.83 0 2 248738 29.0000 NaN S "
+ ]
+ },
+ "execution_count": 5,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "titanic.sort_values(by=\"Age\").head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "> I want to arrange the titanic date according to the cabin class and age in descending order."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
PassengerId
\n",
+ "
Survived
\n",
+ "
Pclass
\n",
+ "
Name
\n",
+ "
Sex
\n",
+ "
Age
\n",
+ "
SibSp
\n",
+ "
Parch
\n",
+ "
Ticket
\n",
+ "
Fare
\n",
+ "
Cabin
\n",
+ "
Embarked
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "
\n",
+ "
851
\n",
+ "
852
\n",
+ "
0
\n",
+ "
3
\n",
+ "
Svensson, Mr. Johan
\n",
+ "
male
\n",
+ "
74.0
\n",
+ "
0
\n",
+ "
0
\n",
+ "
347060
\n",
+ "
7.7750
\n",
+ "
NaN
\n",
+ "
S
\n",
+ "
\n",
+ "
\n",
+ "
116
\n",
+ "
117
\n",
+ "
0
\n",
+ "
3
\n",
+ "
Connors, Mr. Patrick
\n",
+ "
male
\n",
+ "
70.5
\n",
+ "
0
\n",
+ "
0
\n",
+ "
370369
\n",
+ "
7.7500
\n",
+ "
NaN
\n",
+ "
Q
\n",
+ "
\n",
+ "
\n",
+ "
280
\n",
+ "
281
\n",
+ "
0
\n",
+ "
3
\n",
+ "
Duane, Mr. Frank
\n",
+ "
male
\n",
+ "
65.0
\n",
+ "
0
\n",
+ "
0
\n",
+ "
336439
\n",
+ "
7.7500
\n",
+ "
NaN
\n",
+ "
Q
\n",
+ "
\n",
+ "
\n",
+ "
483
\n",
+ "
484
\n",
+ "
1
\n",
+ "
3
\n",
+ "
Turkula, Mrs. (Hedwig)
\n",
+ "
female
\n",
+ "
63.0
\n",
+ "
0
\n",
+ "
0
\n",
+ "
4134
\n",
+ "
9.5875
\n",
+ "
NaN
\n",
+ "
S
\n",
+ "
\n",
+ "
\n",
+ "
326
\n",
+ "
327
\n",
+ "
0
\n",
+ "
3
\n",
+ "
Nysveen, Mr. Johan Hansen
\n",
+ "
male
\n",
+ "
61.0
\n",
+ "
0
\n",
+ "
0
\n",
+ "
345364
\n",
+ "
6.2375
\n",
+ "
NaN
\n",
+ "
S
\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " PassengerId Survived Pclass Name Sex Age \\\n",
+ "851 852 0 3 Svensson, Mr. Johan male 74.0 \n",
+ "116 117 0 3 Connors, Mr. Patrick male 70.5 \n",
+ "280 281 0 3 Duane, Mr. Frank male 65.0 \n",
+ "483 484 1 3 Turkula, Mrs. (Hedwig) female 63.0 \n",
+ "326 327 0 3 Nysveen, Mr. Johan Hansen male 61.0 \n",
+ "\n",
+ " SibSp Parch Ticket Fare Cabin Embarked \n",
+ "851 0 0 347060 7.7750 NaN S \n",
+ "116 0 0 370369 7.7500 NaN Q \n",
+ "280 0 0 336439 7.7500 NaN Q \n",
+ "483 0 0 4134 9.5875 NaN S \n",
+ "326 0 0 345364 6.2375 NaN S "
+ ]
+ },
+ "execution_count": 6,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "titanic.sort_values(by=['Pclass', 'Age'], ascending=False).head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "With `sort_values`, the rows in the table are sorted according to the defined column(s). The index will follow the row order. Sorting is also possible acccording to the index labels or a combination of the values and index."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "__To user guide:__ More details about sorting of tables is provided in :ref:`basics.sorting`."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Long to wide table format"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Let's use a small subset of the air quality data set, for each location the first two measurements (i.e. the head of each group):"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
city
\n",
+ "
country
\n",
+ "
location
\n",
+ "
parameter
\n",
+ "
value
\n",
+ "
unit
\n",
+ "
\n",
+ "
\n",
+ "
date.utc
\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "
\n",
+ "
2019-04-09 01:00:00+00:00
\n",
+ "
Antwerpen
\n",
+ "
BE
\n",
+ "
BETR801
\n",
+ "
no2
\n",
+ "
22.5
\n",
+ "
µg/m³
\n",
+ "
\n",
+ "
\n",
+ "
2019-04-09 01:00:00+00:00
\n",
+ "
Paris
\n",
+ "
FR
\n",
+ "
FR04014
\n",
+ "
no2
\n",
+ "
24.4
\n",
+ "
µg/m³
\n",
+ "
\n",
+ "
\n",
+ "
2019-04-09 02:00:00+00:00
\n",
+ "
London
\n",
+ "
GB
\n",
+ "
London Westminster
\n",
+ "
no2
\n",
+ "
67.0
\n",
+ "
µg/m³
\n",
+ "
\n",
+ "
\n",
+ "
2019-04-09 02:00:00+00:00
\n",
+ "
Antwerpen
\n",
+ "
BE
\n",
+ "
BETR801
\n",
+ "
no2
\n",
+ "
53.5
\n",
+ "
µg/m³
\n",
+ "
\n",
+ "
\n",
+ "
2019-04-09 02:00:00+00:00
\n",
+ "
Paris
\n",
+ "
FR
\n",
+ "
FR04014
\n",
+ "
no2
\n",
+ "
27.4
\n",
+ "
µg/m³
\n",
+ "
\n",
+ "
\n",
+ "
2019-04-09 03:00:00+00:00
\n",
+ "
London
\n",
+ "
GB
\n",
+ "
London Westminster
\n",
+ "
no2
\n",
+ "
67.0
\n",
+ "
µg/m³
\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " city country location parameter \\\n",
+ "date.utc \n",
+ "2019-04-09 01:00:00+00:00 Antwerpen BE BETR801 no2 \n",
+ "2019-04-09 01:00:00+00:00 Paris FR FR04014 no2 \n",
+ "2019-04-09 02:00:00+00:00 London GB London Westminster no2 \n",
+ "2019-04-09 02:00:00+00:00 Antwerpen BE BETR801 no2 \n",
+ "2019-04-09 02:00:00+00:00 Paris FR FR04014 no2 \n",
+ "2019-04-09 03:00:00+00:00 London GB London Westminster no2 \n",
+ "\n",
+ " value unit \n",
+ "date.utc \n",
+ "2019-04-09 01:00:00+00:00 22.5 µg/m³ \n",
+ "2019-04-09 01:00:00+00:00 24.4 µg/m³ \n",
+ "2019-04-09 02:00:00+00:00 67.0 µg/m³ \n",
+ "2019-04-09 02:00:00+00:00 53.5 µg/m³ \n",
+ "2019-04-09 02:00:00+00:00 27.4 µg/m³ \n",
+ "2019-04-09 03:00:00+00:00 67.0 µg/m³ "
+ ]
+ },
+ "execution_count": 17,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "no2_subset = no2.sort_index().groupby([\"location\"]).head(2)\n",
+ "no2_subset"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "> I want the values for the three stations as separate columns next to each other to plot them together"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
location
\n",
+ "
BETR801
\n",
+ "
FR04014
\n",
+ "
London Westminster
\n",
+ "
\n",
+ "
\n",
+ "
date.utc
\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "
\n",
+ "
2019-04-09 01:00:00+00:00
\n",
+ "
22.5
\n",
+ "
24.4
\n",
+ "
NaN
\n",
+ "
\n",
+ "
\n",
+ "
2019-04-09 02:00:00+00:00
\n",
+ "
53.5
\n",
+ "
27.4
\n",
+ "
67.0
\n",
+ "
\n",
+ "
\n",
+ "
2019-04-09 03:00:00+00:00
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
67.0
\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ "location BETR801 FR04014 London Westminster\n",
+ "date.utc \n",
+ "2019-04-09 01:00:00+00:00 22.5 24.4 NaN\n",
+ "2019-04-09 02:00:00+00:00 53.5 27.4 67.0\n",
+ "2019-04-09 03:00:00+00:00 NaN NaN 67.0"
+ ]
+ },
+ "execution_count": 18,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "no2_subset.pivot(columns=\"location\", values=\"value\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The `pivot` function is purely restructering of the data: a single value for each index/column combination is required. \n",
+ "\n",
+ "As Pandas support plotting of multiple columns (see [plotting tutorial](./4_plotting.ipynb)) out of the box, the conversion from long to wide format enables the plotting of the different time series at the same time:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
\n",
+ " \n",
+ "__Note__: When the `index` parameter is not defined, the existing index (row labels) is used.\n",
+ "\n",
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "__To user guide:__ For more information about `pivot`, see :ref:`reshaping.reshaping`"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Pivot table"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "> I want the mean concentrations for $NO_2$ and $PM_{2.5}$ in each of the stations in table form"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 38,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
parameter
\n",
+ "
no2
\n",
+ "
pm25
\n",
+ "
\n",
+ "
\n",
+ "
location
\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "
\n",
+ "
BETR801
\n",
+ "
26.950920
\n",
+ "
23.169492
\n",
+ "
\n",
+ "
\n",
+ "
FR04014
\n",
+ "
29.374284
\n",
+ "
NaN
\n",
+ "
\n",
+ "
\n",
+ "
London Westminster
\n",
+ "
29.740050
\n",
+ "
13.443568
\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ "parameter no2 pm25\n",
+ "location \n",
+ "BETR801 26.950920 23.169492\n",
+ "FR04014 29.374284 NaN\n",
+ "London Westminster 29.740050 13.443568"
+ ]
+ },
+ "execution_count": 38,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "air_quality.pivot_table(values=\"value\", index=\"location\", \n",
+ " columns=\"parameter\", aggfunc=\"mean\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "In the case of `pivot`, the data is only rearranged. When multiple values need to be aggregated (in this specific case, the values on different time steps) `pivot_table` can to be used, providing an aggregation function (e.g. mean) on how to combine these values."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Pivot table is a well known concept in spreadsheet software. When interested in summary columns for each variable separately as well, put the `margin` parameter to `True`:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 40,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
\n",
+ " \n",
+ "__Note__: If you're wondering, `pivot_table` is indeed directly linked to `groupby`. The same values can be calculated by grouping on both `parameter` and `location`: \n",
+ "\n",
+ " air_quality.groupby([\"parameter\", \"location\"]).mean()\n",
+ " \n",
+ "__To user guide:__ Have a look at `groupby` in combination with `unstack` at [:ref:`TODO LABEL`](https://pandas.pydata.org/pandas-docs/stable/user_guide/reshaping.html#combining-with-stats-and-groupby)\n",
+ "\n",
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Wide to long format"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Starting again from the wide format table created in the previous section:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 21,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
"
+ ],
+ "text/plain": [
+ " date.utc location value\n",
+ "0 2019-04-09 01:00:00+00:00 BETR801 22.5\n",
+ "1 2019-04-09 02:00:00+00:00 BETR801 53.5\n",
+ "2 2019-04-09 03:00:00+00:00 BETR801 54.5\n",
+ "3 2019-04-09 04:00:00+00:00 BETR801 34.5\n",
+ "4 2019-04-09 05:00:00+00:00 BETR801 46.5"
+ ]
+ },
+ "execution_count": 35,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "no_2 = no2_pivoted.melt(id_vars=\"date.utc\")\n",
+ "no_2.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The solution is the short version on how to apply `melt`. The method will _melt_ all columns NOT mentioned in `id_vars` together into two columns: A columns with the column header names and a column with the values itself. The latter column gets by default the name `value`."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The `melt` method can be defined in more detail:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 39,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
date.utc
\n",
+ "
id_location
\n",
+ "
NO_2
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "
\n",
+ "
0
\n",
+ "
2019-04-09 01:00:00+00:00
\n",
+ "
BETR801
\n",
+ "
22.5
\n",
+ "
\n",
+ "
\n",
+ "
1
\n",
+ "
2019-04-09 02:00:00+00:00
\n",
+ "
BETR801
\n",
+ "
53.5
\n",
+ "
\n",
+ "
\n",
+ "
2
\n",
+ "
2019-04-09 03:00:00+00:00
\n",
+ "
BETR801
\n",
+ "
54.5
\n",
+ "
\n",
+ "
\n",
+ "
3
\n",
+ "
2019-04-09 04:00:00+00:00
\n",
+ "
BETR801
\n",
+ "
34.5
\n",
+ "
\n",
+ "
\n",
+ "
4
\n",
+ "
2019-04-09 05:00:00+00:00
\n",
+ "
BETR801
\n",
+ "
46.5
\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " date.utc id_location NO_2\n",
+ "0 2019-04-09 01:00:00+00:00 BETR801 22.5\n",
+ "1 2019-04-09 02:00:00+00:00 BETR801 53.5\n",
+ "2 2019-04-09 03:00:00+00:00 BETR801 54.5\n",
+ "3 2019-04-09 04:00:00+00:00 BETR801 34.5\n",
+ "4 2019-04-09 05:00:00+00:00 BETR801 46.5"
+ ]
+ },
+ "execution_count": 39,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "no_2 = no2_pivoted.melt(id_vars=\"date.utc\", \n",
+ " value_vars=[\"BETR801\", \"FR04014\", \"London Westminster\"],\n",
+ " value_name=\"NO_2\",\n",
+ " var_name=\"id_location\")\n",
+ "no_2.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The result in the same, but in more detail defined:\n",
+ "\n",
+ "- `value_vars` defines explicitly which columns to _melt_ together\n",
+ "- `value_name` provides a custom column name for the values column instead of the default columns name `value`\n",
+ "- `var_name` provides a custom olumn name for the columns collecting the column header names. Otherwise it takes the index name or a default `variable`\n",
+ "\n",
+ "Hence, the arguments `value_name` and `var_name` are just user-defined names for the two generated columns. The columns to melt are defined by `id_vars` and `value_vars`."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "
\n",
+ " \n",
+ "__Note__: The long format is also referred to as [_tidy_ data format](https://www.jstatsoft.org/article/view/v059i10). The representation defines that each observation is on a separate line and each variable a separate column. \n",
+ "\n",
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "__To user guide:__ Conversion from wide to long format with `melt` is explained in :ref:`reshaping.melt`."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## REMEMBER\n",
+ "\n",
+ "- Sorting by one or more columns is supported by `sort_values`\n",
+ "- The `pivot` function is purely restructering of the data, `pivot_table` supports aggregations\n",
+ "- The reverse of `pivot` (long to wide format) is `melt` (wide to long format)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "__To user guide:__ More information on reshaping and pivoting is provided in :ref:`reshaping`."
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.7.3"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}