diff --git a/notebooks/3_subset_data.ipynb b/notebooks/3_subset_data.ipynb
new file mode 100644
index 0000000..aaa1880
--- /dev/null
+++ b/notebooks/3_subset_data.ipynb
@@ -0,0 +1,1693 @@
+{
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": 41,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import pandas as pd"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "
\n",
+ " \n",
+ "This tutorial uses the titanic data set, stored as CSV. The data consists of the following data columns:\n",
+ "\n",
+ "- PassengerId: Id of every passenger.\n",
+ "- Survived: This feature have value 0 and 1. 0 for not survived and 1 for survived.\n",
+ "- Pclass: There are 3 classes: Class 1, Class 2 and Class 3.\n",
+ "- Name: Name of passenger.\n",
+ "- Sex: Gender of passenger.\n",
+ "- Age: Age of passenger.\n",
+ "- SibSp: Indication that passenger have siblings and spouse.\n",
+ "- Parch: Whether a passenger is alone or have family.\n",
+ "- Ticket: Ticket number of passenger.\n",
+ "- Fare: Indicating the fare.\n",
+ "- Cabin: The cabin of passenger.\n",
+ "- Embarked: The embarked category.\n",
+ "\n",
+ "Reading in a data set is explained in the [tutorial on read/write operations](./2_read_write.ipynb).\n",
+ "\n",
+ "
"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 42,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " PassengerId | \n",
+ " Survived | \n",
+ " Pclass | \n",
+ " Name | \n",
+ " Sex | \n",
+ " Age | \n",
+ " SibSp | \n",
+ " Parch | \n",
+ " Ticket | \n",
+ " Fare | \n",
+ " Cabin | \n",
+ " Embarked | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " 3 | \n",
+ " Braund, Mr. Owen Harris | \n",
+ " male | \n",
+ " 22.0 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " A/5 21171 | \n",
+ " 7.2500 | \n",
+ " NaN | \n",
+ " S | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " 2 | \n",
+ " 1 | \n",
+ " 1 | \n",
+ " Cumings, Mrs. John Bradley (Florence Briggs Th... | \n",
+ " female | \n",
+ " 38.0 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " PC 17599 | \n",
+ " 71.2833 | \n",
+ " C85 | \n",
+ " C | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " 3 | \n",
+ " 1 | \n",
+ " 3 | \n",
+ " Heikkinen, Miss. Laina | \n",
+ " female | \n",
+ " 26.0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " STON/O2. 3101282 | \n",
+ " 7.9250 | \n",
+ " NaN | \n",
+ " S | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " 4 | \n",
+ " 1 | \n",
+ " 1 | \n",
+ " Futrelle, Mrs. Jacques Heath (Lily May Peel) | \n",
+ " female | \n",
+ " 35.0 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " 113803 | \n",
+ " 53.1000 | \n",
+ " C123 | \n",
+ " S | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " 5 | \n",
+ " 0 | \n",
+ " 3 | \n",
+ " Allen, Mr. William Henry | \n",
+ " male | \n",
+ " 35.0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 373450 | \n",
+ " 8.0500 | \n",
+ " NaN | \n",
+ " S | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " PassengerId Survived Pclass \\\n",
+ "0 1 0 3 \n",
+ "1 2 1 1 \n",
+ "2 3 1 3 \n",
+ "3 4 1 1 \n",
+ "4 5 0 3 \n",
+ "\n",
+ " Name Sex Age SibSp \\\n",
+ "0 Braund, Mr. Owen Harris male 22.0 1 \n",
+ "1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n",
+ "2 Heikkinen, Miss. Laina female 26.0 0 \n",
+ "3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n",
+ "4 Allen, Mr. William Henry male 35.0 0 \n",
+ "\n",
+ " Parch Ticket Fare Cabin Embarked \n",
+ "0 0 A/5 21171 7.2500 NaN S \n",
+ "1 0 PC 17599 71.2833 C85 C \n",
+ "2 0 STON/O2. 3101282 7.9250 NaN S \n",
+ "3 0 113803 53.1000 C123 S \n",
+ "4 0 373450 8.0500 NaN S "
+ ]
+ },
+ "execution_count": 42,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "titanic = pd.read_csv(\"../data/titanic.csv\")\n",
+ "titanic.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# How do I select a subset of data in a `DataFrame`? "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### How do I select specific columns from a `DataFrame`?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ " > I'm interested in the age of the titanic passengers."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 43,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0 22.0\n",
+ "1 38.0\n",
+ "2 26.0\n",
+ "3 35.0\n",
+ "4 35.0\n",
+ "Name: Age, dtype: float64"
+ ]
+ },
+ "execution_count": 43,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "ages = titanic[\"Age\"]\n",
+ "ages.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "To select a single column, use square brackets `[]` with the column name of the column of interest."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Each column in a `DataFrame` is a `Series`. As a single column is selected, the returned object is a pandas `Series`. We can verify this by checking the type of the output:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 65,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "pandas.core.series.Series"
+ ]
+ },
+ "execution_count": 65,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "type(titanic[\"Age\"])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "And have a look at the `shape` of the output:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 64,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(891,)"
+ ]
+ },
+ "execution_count": 64,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "titanic[\"Age\"].shape"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "`shape` is an attribute (remember [previous tutorial](./2_read_write.ipynb), no parantheses for attributes) of a pandas `Series` and `DataFrame` containing the number of rows and columns: _(nrows, ncolumns)_. A pandas Series is 1-dimensional and only the number of rows is returned."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ " > I'm interested in the age and sex of the titanic passengers."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 66,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " Age | \n",
+ " Sex | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " 22.0 | \n",
+ " male | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " 38.0 | \n",
+ " female | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " 26.0 | \n",
+ " female | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " 35.0 | \n",
+ " female | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " 35.0 | \n",
+ " male | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " Age Sex\n",
+ "0 22.0 male\n",
+ "1 38.0 female\n",
+ "2 26.0 female\n",
+ "3 35.0 female\n",
+ "4 35.0 male"
+ ]
+ },
+ "execution_count": 66,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "age_sex = titanic[[\"Age\", \"Sex\"]]\n",
+ "age_sex.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "To select multiple columns, use a list of column names within the selection brackets `[]`. \n",
+ "\n",
+ "\n",
+ " \n",
+ "__Note:__ The inner square brackets define a :ref:`Python list
` with column names, whereas the outer brackets are used to select the data from a pandas `DataFrame` as seen in the previous example.\n",
+ "\n",
+ " "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The returned data type is a Pandas DataFrame:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 67,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "pandas.core.frame.DataFrame"
+ ]
+ },
+ "execution_count": 67,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "type(titanic[[\"Age\", \"Sex\"]])"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 68,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(891, 2)"
+ ]
+ },
+ "execution_count": 68,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "titanic[[\"Age\", \"Sex\"]].shape"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The selection returned a `DataFrame` with 891 rows and 2 columns. A `DataFrame` is 2-dimensional with both a row and column dimension."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "__To user guide:__ For basic information on indexing, see :ref:`indexing.basics`"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### How do I filter specific rows from a `DataFrame`?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "> I'm interested in the passengers older than 35 years."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 73,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " PassengerId | \n",
+ " Survived | \n",
+ " Pclass | \n",
+ " Name | \n",
+ " Sex | \n",
+ " Age | \n",
+ " SibSp | \n",
+ " Parch | \n",
+ " Ticket | \n",
+ " Fare | \n",
+ " Cabin | \n",
+ " Embarked | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 1 | \n",
+ " 2 | \n",
+ " 1 | \n",
+ " 1 | \n",
+ " Cumings, Mrs. John Bradley (Florence Briggs Th... | \n",
+ " female | \n",
+ " 38.0 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " PC 17599 | \n",
+ " 71.2833 | \n",
+ " C85 | \n",
+ " C | \n",
+ "
\n",
+ " \n",
+ " 6 | \n",
+ " 7 | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " McCarthy, Mr. Timothy J | \n",
+ " male | \n",
+ " 54.0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 17463 | \n",
+ " 51.8625 | \n",
+ " E46 | \n",
+ " S | \n",
+ "
\n",
+ " \n",
+ " 11 | \n",
+ " 12 | \n",
+ " 1 | \n",
+ " 1 | \n",
+ " Bonnell, Miss. Elizabeth | \n",
+ " female | \n",
+ " 58.0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 113783 | \n",
+ " 26.5500 | \n",
+ " C103 | \n",
+ " S | \n",
+ "
\n",
+ " \n",
+ " 13 | \n",
+ " 14 | \n",
+ " 0 | \n",
+ " 3 | \n",
+ " Andersson, Mr. Anders Johan | \n",
+ " male | \n",
+ " 39.0 | \n",
+ " 1 | \n",
+ " 5 | \n",
+ " 347082 | \n",
+ " 31.2750 | \n",
+ " NaN | \n",
+ " S | \n",
+ "
\n",
+ " \n",
+ " 15 | \n",
+ " 16 | \n",
+ " 1 | \n",
+ " 2 | \n",
+ " Hewlett, Mrs. (Mary D Kingcome) | \n",
+ " female | \n",
+ " 55.0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 248706 | \n",
+ " 16.0000 | \n",
+ " NaN | \n",
+ " S | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " PassengerId Survived Pclass \\\n",
+ "1 2 1 1 \n",
+ "6 7 0 1 \n",
+ "11 12 1 1 \n",
+ "13 14 0 3 \n",
+ "15 16 1 2 \n",
+ "\n",
+ " Name Sex Age SibSp \\\n",
+ "1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n",
+ "6 McCarthy, Mr. Timothy J male 54.0 0 \n",
+ "11 Bonnell, Miss. Elizabeth female 58.0 0 \n",
+ "13 Andersson, Mr. Anders Johan male 39.0 1 \n",
+ "15 Hewlett, Mrs. (Mary D Kingcome) female 55.0 0 \n",
+ "\n",
+ " Parch Ticket Fare Cabin Embarked \n",
+ "1 0 PC 17599 71.2833 C85 C \n",
+ "6 0 17463 51.8625 E46 S \n",
+ "11 0 113783 26.5500 C103 S \n",
+ "13 5 347082 31.2750 NaN S \n",
+ "15 0 248706 16.0000 NaN S "
+ ]
+ },
+ "execution_count": 73,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "above_35 = titanic[titanic[\"Age\"] > 35]\n",
+ "above_35.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "To select rows based on a conditional expression, use a condition inside the selection brackets `[]`. The condition inside the selection brackets `titanic[\"Age\"] > 35` checks for which rows the `Age` column has a value larger than 35:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 70,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0 False\n",
+ "1 True\n",
+ "2 False\n",
+ "3 False\n",
+ "4 False\n",
+ " ... \n",
+ "886 False\n",
+ "887 False\n",
+ "888 False\n",
+ "889 False\n",
+ "890 False\n",
+ "Name: Age, Length: 891, dtype: bool"
+ ]
+ },
+ "execution_count": 70,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "titanic[\"Age\"] > 35"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The output of the conditional expression (`>`, but also `==`, `!=`, `<`, `<=`,... would work) is actually a pandas `Series` of boolean values (either `True` or `False`) with the same number of rows as the original `DataFrame`. Such a `Series` of boolean values can be used to filter the `DataFrame` by putting it in between the selection brackets `[]`. Only rows for which the value is `True` will be selected."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We now from before that the original titanic `DataFrame` consists of 891 rows. Let's have a look at the amount of rows which satisfy the condition by checking the `shape` attribute of the resulting `DataFrame` above_35:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 75,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(217, 12)"
+ ]
+ },
+ "execution_count": 75,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "above_35.shape"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "> I'm interested in the titanic passengers from cabin class 2 and 3."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 76,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " PassengerId | \n",
+ " Survived | \n",
+ " Pclass | \n",
+ " Name | \n",
+ " Sex | \n",
+ " Age | \n",
+ " SibSp | \n",
+ " Parch | \n",
+ " Ticket | \n",
+ " Fare | \n",
+ " Cabin | \n",
+ " Embarked | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " 3 | \n",
+ " Braund, Mr. Owen Harris | \n",
+ " male | \n",
+ " 22.0 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " A/5 21171 | \n",
+ " 7.2500 | \n",
+ " NaN | \n",
+ " S | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " 3 | \n",
+ " 1 | \n",
+ " 3 | \n",
+ " Heikkinen, Miss. Laina | \n",
+ " female | \n",
+ " 26.0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " STON/O2. 3101282 | \n",
+ " 7.9250 | \n",
+ " NaN | \n",
+ " S | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " 5 | \n",
+ " 0 | \n",
+ " 3 | \n",
+ " Allen, Mr. William Henry | \n",
+ " male | \n",
+ " 35.0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 373450 | \n",
+ " 8.0500 | \n",
+ " NaN | \n",
+ " S | \n",
+ "
\n",
+ " \n",
+ " 5 | \n",
+ " 6 | \n",
+ " 0 | \n",
+ " 3 | \n",
+ " Moran, Mr. James | \n",
+ " male | \n",
+ " NaN | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 330877 | \n",
+ " 8.4583 | \n",
+ " NaN | \n",
+ " Q | \n",
+ "
\n",
+ " \n",
+ " 7 | \n",
+ " 8 | \n",
+ " 0 | \n",
+ " 3 | \n",
+ " Palsson, Master. Gosta Leonard | \n",
+ " male | \n",
+ " 2.0 | \n",
+ " 3 | \n",
+ " 1 | \n",
+ " 349909 | \n",
+ " 21.0750 | \n",
+ " NaN | \n",
+ " S | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " PassengerId Survived Pclass Name Sex \\\n",
+ "0 1 0 3 Braund, Mr. Owen Harris male \n",
+ "2 3 1 3 Heikkinen, Miss. Laina female \n",
+ "4 5 0 3 Allen, Mr. William Henry male \n",
+ "5 6 0 3 Moran, Mr. James male \n",
+ "7 8 0 3 Palsson, Master. Gosta Leonard male \n",
+ "\n",
+ " Age SibSp Parch Ticket Fare Cabin Embarked \n",
+ "0 22.0 1 0 A/5 21171 7.2500 NaN S \n",
+ "2 26.0 0 0 STON/O2. 3101282 7.9250 NaN S \n",
+ "4 35.0 0 0 373450 8.0500 NaN S \n",
+ "5 NaN 0 0 330877 8.4583 NaN Q \n",
+ "7 2.0 3 1 349909 21.0750 NaN S "
+ ]
+ },
+ "execution_count": 76,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "class_23 = titanic[titanic[\"Pclass\"].isin([2, 3])]\n",
+ "class_23.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Similar to the conditional expression, the `isin` conditional function returns a `True` for each row the values are in the provided list. To filter the rows based on such a function, use the conditional function inside the selection brackets `[]`. In this case, the condition inside the selection brackets `titanic[\"Pclass\"].isin([2, 3])` checks for which rows the `Pclass` column is either 2 or 3."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The above is equivalent to filtering by rows for which the class is either 2 or 3 and combining the two statements with an `|` (or) operator:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 58,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " PassengerId | \n",
+ " Survived | \n",
+ " Pclass | \n",
+ " Name | \n",
+ " Sex | \n",
+ " Age | \n",
+ " SibSp | \n",
+ " Parch | \n",
+ " Ticket | \n",
+ " Fare | \n",
+ " Cabin | \n",
+ " Embarked | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " 3 | \n",
+ " Braund, Mr. Owen Harris | \n",
+ " male | \n",
+ " 22.0 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " A/5 21171 | \n",
+ " 7.2500 | \n",
+ " NaN | \n",
+ " S | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " 3 | \n",
+ " 1 | \n",
+ " 3 | \n",
+ " Heikkinen, Miss. Laina | \n",
+ " female | \n",
+ " 26.0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " STON/O2. 3101282 | \n",
+ " 7.9250 | \n",
+ " NaN | \n",
+ " S | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " 5 | \n",
+ " 0 | \n",
+ " 3 | \n",
+ " Allen, Mr. William Henry | \n",
+ " male | \n",
+ " 35.0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 373450 | \n",
+ " 8.0500 | \n",
+ " NaN | \n",
+ " S | \n",
+ "
\n",
+ " \n",
+ " 5 | \n",
+ " 6 | \n",
+ " 0 | \n",
+ " 3 | \n",
+ " Moran, Mr. James | \n",
+ " male | \n",
+ " NaN | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 330877 | \n",
+ " 8.4583 | \n",
+ " NaN | \n",
+ " Q | \n",
+ "
\n",
+ " \n",
+ " 7 | \n",
+ " 8 | \n",
+ " 0 | \n",
+ " 3 | \n",
+ " Palsson, Master. Gosta Leonard | \n",
+ " male | \n",
+ " 2.0 | \n",
+ " 3 | \n",
+ " 1 | \n",
+ " 349909 | \n",
+ " 21.0750 | \n",
+ " NaN | \n",
+ " S | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " PassengerId Survived Pclass Name Sex \\\n",
+ "0 1 0 3 Braund, Mr. Owen Harris male \n",
+ "2 3 1 3 Heikkinen, Miss. Laina female \n",
+ "4 5 0 3 Allen, Mr. William Henry male \n",
+ "5 6 0 3 Moran, Mr. James male \n",
+ "7 8 0 3 Palsson, Master. Gosta Leonard male \n",
+ "\n",
+ " Age SibSp Parch Ticket Fare Cabin Embarked \n",
+ "0 22.0 1 0 A/5 21171 7.2500 NaN S \n",
+ "2 26.0 0 0 STON/O2. 3101282 7.9250 NaN S \n",
+ "4 35.0 0 0 373450 8.0500 NaN S \n",
+ "5 NaN 0 0 330877 8.4583 NaN Q \n",
+ "7 2.0 3 1 349909 21.0750 NaN S "
+ ]
+ },
+ "execution_count": 58,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "class_23 = titanic[(titanic[\"Pclass\"] == 2) | (titanic[\"Pclass\"] == 3)]\n",
+ "class_23.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n",
+ " \n",
+ "__Note:__ When combining multiple conditional statements, each condition must be surrounded by parentheses `()`. Moreover, you can not use `or`/`and` but need to use the `or` operator `|` and the `and` operator `&`.\n",
+ "\n",
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "__To user guide:__ Conditional (boolean) indexing, see :ref:`indexing.boolean`. Specific information on `isin`, see :ref:`indexing.basics.indexing_isin`. "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "> I want to work with passenger data for which the age is known."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 59,
+ "metadata": {
+ "scrolled": true
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " PassengerId | \n",
+ " Survived | \n",
+ " Pclass | \n",
+ " Name | \n",
+ " Sex | \n",
+ " Age | \n",
+ " SibSp | \n",
+ " Parch | \n",
+ " Ticket | \n",
+ " Fare | \n",
+ " Cabin | \n",
+ " Embarked | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " 3 | \n",
+ " Braund, Mr. Owen Harris | \n",
+ " male | \n",
+ " 22.0 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " A/5 21171 | \n",
+ " 7.2500 | \n",
+ " NaN | \n",
+ " S | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " 2 | \n",
+ " 1 | \n",
+ " 1 | \n",
+ " Cumings, Mrs. John Bradley (Florence Briggs Th... | \n",
+ " female | \n",
+ " 38.0 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " PC 17599 | \n",
+ " 71.2833 | \n",
+ " C85 | \n",
+ " C | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " 3 | \n",
+ " 1 | \n",
+ " 3 | \n",
+ " Heikkinen, Miss. Laina | \n",
+ " female | \n",
+ " 26.0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " STON/O2. 3101282 | \n",
+ " 7.9250 | \n",
+ " NaN | \n",
+ " S | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " 4 | \n",
+ " 1 | \n",
+ " 1 | \n",
+ " Futrelle, Mrs. Jacques Heath (Lily May Peel) | \n",
+ " female | \n",
+ " 35.0 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " 113803 | \n",
+ " 53.1000 | \n",
+ " C123 | \n",
+ " S | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " 5 | \n",
+ " 0 | \n",
+ " 3 | \n",
+ " Allen, Mr. William Henry | \n",
+ " male | \n",
+ " 35.0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 373450 | \n",
+ " 8.0500 | \n",
+ " NaN | \n",
+ " S | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " PassengerId Survived Pclass \\\n",
+ "0 1 0 3 \n",
+ "1 2 1 1 \n",
+ "2 3 1 3 \n",
+ "3 4 1 1 \n",
+ "4 5 0 3 \n",
+ "\n",
+ " Name Sex Age SibSp \\\n",
+ "0 Braund, Mr. Owen Harris male 22.0 1 \n",
+ "1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n",
+ "2 Heikkinen, Miss. Laina female 26.0 0 \n",
+ "3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n",
+ "4 Allen, Mr. William Henry male 35.0 0 \n",
+ "\n",
+ " Parch Ticket Fare Cabin Embarked \n",
+ "0 0 A/5 21171 7.2500 NaN S \n",
+ "1 0 PC 17599 71.2833 C85 C \n",
+ "2 0 STON/O2. 3101282 7.9250 NaN S \n",
+ "3 0 113803 53.1000 C123 S \n",
+ "4 0 373450 8.0500 NaN S "
+ ]
+ },
+ "execution_count": 59,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "age_no_na = titanic[titanic[\"Age\"].notna()]\n",
+ "age_no_na.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The `notna` conditional function returns a `True` for each row the values are not an `Null` value. As such, this can be combined with the selection brackets `[]` to filter the data table."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "You might wonder what actually changed, as the first 5 lines are still the same values. One way to verify is to check if the shape has changed:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 78,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(714, 12)"
+ ]
+ },
+ "execution_count": 78,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "age_no_na.shape"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "__To user guide:__ For more dedicated functions on missing values, see :ref:`missing-data`"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### How do I select specific rows and columns from a `DataFrame`? "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "> I'm interested in the names of the passengers older than 35 years."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 60,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "1 Cumings, Mrs. John Bradley (Florence Briggs Th...\n",
+ "6 McCarthy, Mr. Timothy J\n",
+ "11 Bonnell, Miss. Elizabeth\n",
+ "13 Andersson, Mr. Anders Johan\n",
+ "15 Hewlett, Mrs. (Mary D Kingcome) \n",
+ "Name: Name, dtype: object"
+ ]
+ },
+ "execution_count": 60,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "adult_names = titanic.loc[titanic[\"Age\"] > 35, \"Name\"]\n",
+ "adult_names.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "In this case, a subset of both rows and columns is made in one go and just using selection brackets `[]` is not sufficient anymore. The `loc`/`iloc` operators are required in front of the selection brackets `[]`. When using `loc`/`iloc`, the part before the comma is the rows you want, and the part after the comma is the columns you want to select.\n",
+ "\n",
+ "When using the column names, row labels or a condition expression, use the `loc` operator in front of the selection brackets `[]`. For both the part before and after the comma, you can use a single label, a list of labels, a slice of labels, a conditional expression or a colon. using a colon specificies you want to select all rows or columns."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "> I'm interested in rows 10 till 25 and columns 3 to 5."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 61,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " Pclass | \n",
+ " Name | \n",
+ " Sex | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 9 | \n",
+ " 2 | \n",
+ " Nasser, Mrs. Nicholas (Adele Achem) | \n",
+ " female | \n",
+ "
\n",
+ " \n",
+ " 10 | \n",
+ " 3 | \n",
+ " Sandstrom, Miss. Marguerite Rut | \n",
+ " female | \n",
+ "
\n",
+ " \n",
+ " 11 | \n",
+ " 1 | \n",
+ " Bonnell, Miss. Elizabeth | \n",
+ " female | \n",
+ "
\n",
+ " \n",
+ " 12 | \n",
+ " 3 | \n",
+ " Saundercock, Mr. William Henry | \n",
+ " male | \n",
+ "
\n",
+ " \n",
+ " 13 | \n",
+ " 3 | \n",
+ " Andersson, Mr. Anders Johan | \n",
+ " male | \n",
+ "
\n",
+ " \n",
+ " 14 | \n",
+ " 3 | \n",
+ " Vestrom, Miss. Hulda Amanda Adolfina | \n",
+ " female | \n",
+ "
\n",
+ " \n",
+ " 15 | \n",
+ " 2 | \n",
+ " Hewlett, Mrs. (Mary D Kingcome) | \n",
+ " female | \n",
+ "
\n",
+ " \n",
+ " 16 | \n",
+ " 3 | \n",
+ " Rice, Master. Eugene | \n",
+ " male | \n",
+ "
\n",
+ " \n",
+ " 17 | \n",
+ " 2 | \n",
+ " Williams, Mr. Charles Eugene | \n",
+ " male | \n",
+ "
\n",
+ " \n",
+ " 18 | \n",
+ " 3 | \n",
+ " Vander Planke, Mrs. Julius (Emelia Maria Vande... | \n",
+ " female | \n",
+ "
\n",
+ " \n",
+ " 19 | \n",
+ " 3 | \n",
+ " Masselmani, Mrs. Fatima | \n",
+ " female | \n",
+ "
\n",
+ " \n",
+ " 20 | \n",
+ " 2 | \n",
+ " Fynney, Mr. Joseph J | \n",
+ " male | \n",
+ "
\n",
+ " \n",
+ " 21 | \n",
+ " 2 | \n",
+ " Beesley, Mr. Lawrence | \n",
+ " male | \n",
+ "
\n",
+ " \n",
+ " 22 | \n",
+ " 3 | \n",
+ " McGowan, Miss. Anna \"Annie\" | \n",
+ " female | \n",
+ "
\n",
+ " \n",
+ " 23 | \n",
+ " 1 | \n",
+ " Sloper, Mr. William Thompson | \n",
+ " male | \n",
+ "
\n",
+ " \n",
+ " 24 | \n",
+ " 3 | \n",
+ " Palsson, Miss. Torborg Danira | \n",
+ " female | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " Pclass Name Sex\n",
+ "9 2 Nasser, Mrs. Nicholas (Adele Achem) female\n",
+ "10 3 Sandstrom, Miss. Marguerite Rut female\n",
+ "11 1 Bonnell, Miss. Elizabeth female\n",
+ "12 3 Saundercock, Mr. William Henry male\n",
+ "13 3 Andersson, Mr. Anders Johan male\n",
+ "14 3 Vestrom, Miss. Hulda Amanda Adolfina female\n",
+ "15 2 Hewlett, Mrs. (Mary D Kingcome) female\n",
+ "16 3 Rice, Master. Eugene male\n",
+ "17 2 Williams, Mr. Charles Eugene male\n",
+ "18 3 Vander Planke, Mrs. Julius (Emelia Maria Vande... female\n",
+ "19 3 Masselmani, Mrs. Fatima female\n",
+ "20 2 Fynney, Mr. Joseph J male\n",
+ "21 2 Beesley, Mr. Lawrence male\n",
+ "22 3 McGowan, Miss. Anna \"Annie\" female\n",
+ "23 1 Sloper, Mr. William Thompson male\n",
+ "24 3 Palsson, Miss. Torborg Danira female"
+ ]
+ },
+ "execution_count": 61,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "titanic.iloc[9:25, 2:5]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Again, a subset of both rows and columns is made in one go and just using selection brackets `[]` is not sufficient anymore. When specifically interested in certain rows and/or columns based on their position in the table, use the `iloc` operator in front of the selection brackets `[]`."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "When selecting specific rows and/or columns with `loc` or `iloc`, new values can be assigned to the selected data. For example, to assign the name `anonymous` to the first 3 elements of the third column:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 40,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " PassengerId | \n",
+ " Survived | \n",
+ " Pclass | \n",
+ " Name | \n",
+ " Sex | \n",
+ " Age | \n",
+ " SibSp | \n",
+ " Parch | \n",
+ " Ticket | \n",
+ " Fare | \n",
+ " Cabin | \n",
+ " Embarked | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " 3 | \n",
+ " anonymous | \n",
+ " male | \n",
+ " 22.0 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " A/5 21171 | \n",
+ " 7.2500 | \n",
+ " NaN | \n",
+ " S | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " 2 | \n",
+ " 1 | \n",
+ " 1 | \n",
+ " anonymous | \n",
+ " female | \n",
+ " 38.0 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " PC 17599 | \n",
+ " 71.2833 | \n",
+ " C85 | \n",
+ " C | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " 3 | \n",
+ " 1 | \n",
+ " 3 | \n",
+ " anonymous | \n",
+ " female | \n",
+ " 26.0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " STON/O2. 3101282 | \n",
+ " 7.9250 | \n",
+ " NaN | \n",
+ " S | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " 4 | \n",
+ " 1 | \n",
+ " 1 | \n",
+ " Futrelle, Mrs. Jacques Heath (Lily May Peel) | \n",
+ " female | \n",
+ " 35.0 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " 113803 | \n",
+ " 53.1000 | \n",
+ " C123 | \n",
+ " S | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " 5 | \n",
+ " 0 | \n",
+ " 3 | \n",
+ " Allen, Mr. William Henry | \n",
+ " male | \n",
+ " 35.0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 373450 | \n",
+ " 8.0500 | \n",
+ " NaN | \n",
+ " S | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " PassengerId Survived Pclass \\\n",
+ "0 1 0 3 \n",
+ "1 2 1 1 \n",
+ "2 3 1 3 \n",
+ "3 4 1 1 \n",
+ "4 5 0 3 \n",
+ "\n",
+ " Name Sex Age SibSp Parch \\\n",
+ "0 anonymous male 22.0 1 0 \n",
+ "1 anonymous female 38.0 1 0 \n",
+ "2 anonymous female 26.0 0 0 \n",
+ "3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 \n",
+ "4 Allen, Mr. William Henry male 35.0 0 0 \n",
+ "\n",
+ " Ticket Fare Cabin Embarked \n",
+ "0 A/5 21171 7.2500 NaN S \n",
+ "1 PC 17599 71.2833 C85 C \n",
+ "2 STON/O2. 3101282 7.9250 NaN S \n",
+ "3 113803 53.1000 C123 S \n",
+ "4 373450 8.0500 NaN S "
+ ]
+ },
+ "execution_count": 40,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "titanic.iloc[0:3, 3] = \"anonymous\"\n",
+ "titanic.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "__To user guide:__ For more detailed description on selecting subsets of a data table, see :ref:`indexing.choice`"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## REMEMBER\n",
+ "\n",
+ "- When selecting subsets of data, square brackets `[]` are used.\n",
+ "- Inside these brackets, you can use a single column/row label, a list of column/row labels, a slice of labels, a conditional expression or a colon.\n",
+ "- Select specific rows and/or columns using `loc` when using the row and column names\n",
+ "- Select specific rows and/or columns using `iloc` when using the positions in the table\n",
+ "- You can assign new values to a selection based on `loc`/`iloc`."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "__To user guide:__ Further details about indexing is provided in :ref:`indexing`"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.7.3"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}