From 9cfadd4e8ce419d0e7a43bde76e73f6f906f8fba Mon Sep 17 00:00:00 2001
From: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Date: Mon, 21 Oct 2019 16:19:22 +0200
Subject: [PATCH] Proofread

---
 notebooks/1_table_oriented.ipynb | 27 +++++++++++++++++++--------
 notebooks/2_read_write.ipynb     | 14 +++++++-------
 notebooks/3_subset_data.ipynb    | 13 +++++++++----
 notebooks/4_plotting.ipynb       |  2 +-
 4 files changed, 36 insertions(+), 20 deletions(-)
diff --git a/notebooks/1_table_oriented.ipynb b/notebooks/1_table_oriented.ipynb
index 432f6c6..64a8dfe 100644
--- a/notebooks/1_table_oriented.ipynb
+++ b/notebooks/1_table_oriented.ipynb
@@ -134,7 +134,7 @@
    "source": [
     "A `DataFrame` is a 2-dimensional data structure that can store data of different types (including characters, integers, floating point values, categorical data and more) in columns. It is similar to a spreadsheet, a SQL table or the `data.frame` in R. \n",
     "\n",
-    "- The table has 3 columns, each of them with a column label. The column labels are respectively `Name`, `Age` and `Sex`.\n",
+    "- The table above has 3 columns, each of them with a column label. The column labels are `Name`, `Age` and `Sex`, respectively.\n",
     "- The column `Name` consists of textual data with each value a string, the column `Age` are numbers and the column `Sex` is textual data.\n",
     "\n",
     "In spreadsheet software, the table representation of our data would look very similar:\n",
@@ -142,6 +142,17 @@
     "![](../schemas/01_table_spreadsheet.png)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<div class=\"alert alert-info\">\n",
+    "    \n",
+    "__Note__: You probably do not want to manually input the data of a DataFrame! In most situations, data stored in a file format are the starting point of an analysis. We will get to that later!\n",
+    "\n",
+    "</div>"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -199,7 +210,7 @@
    "source": [
     "<div class=\"alert alert-info\">\n",
     "    \n",
-    "If you are familiar to Python :ref:`dictionaries <python:tut-dictionaries>`, the selection of a single column is very similar to selection of dictionary values based on the key.\n",
+    "If you are familiar to Python :ref:`dictionaries <python:tut-dictionaries>`, the selection of a single column is very similar to the selection of dictionary values based on the key.\n",
     "\n",
     "</div>"
    ]
@@ -287,7 +298,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Or to the `Series`:"
+    "Or on the `Series`:"
    ]
   },
   {
@@ -314,7 +325,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "As illustrated by the `max()` method, you can _do_ things with a `DataFrame` or `Series`. Pandas provides a lot of functionalities each of them a _method_ you can apply to a `DataFrame` or `Series`. As methods are functions, do not forget to use parentheses `()`."
+    "As illustrated by the `max()` method, you can _do_ things with a `DataFrame` or `Series`. Pandas provides a lot of functionality for working with `DataFrame` or `Series`, often defined as methods on those objects. As methods are functions, do not forget to use parentheses `()`."
    ]
   },
   {
@@ -415,7 +426,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The `describe` method provides quick overview of the numerical data in a `DataFrame`. As the `Name` and `Sex` columns are textual data, these are by default not taken into account by the `describe` method. Many pandas operations return a `DataFrame` or a `Series`. The `describe` method is an example of a pandas operation returning a pandas `Series`.\n",
+    "The `describe` method provides quick overview of the numerical data in a `DataFrame`. As the `Name` and `Sex` columns are textual data, these are by default not taken into account by the `describe` method. Many pandas operations return a `DataFrame` or a `Series`. The `describe` method is an example of a pandas operation returning a pandas `DataFrame`.\n",
     "\n",
     "\n",
     "__To user guide:__ check more options on `describe` :ref:`basics.describe`"
@@ -438,10 +449,10 @@
    "source": [
     "## REMEMBER\n",
     "\n",
-    "- Import the package, aka `import Pandas as pd`\n",
+    "- Import the package, aka `import pandas as pd`\n",
     "- A table of data is stored as a pandas `DataFrame`\n",
     "- Each column in a `DataFrame` is a `Series`\n",
-    "- You can do things by applying a method to a `DataFrame` or `Series`"
+    "- You can do things by calling a method on a `DataFrame` or `Series`"
    ]
   },
   {
@@ -472,5 +483,5 @@
   }
  },
  "nbformat": 4,
- "nbformat_minor": 2
+ "nbformat_minor": 4
 }
diff --git a/notebooks/2_read_write.ipynb b/notebooks/2_read_write.ipynb
index e21c689..e693e7b 100644
--- a/notebooks/2_read_write.ipynb
+++ b/notebooks/2_read_write.ipynb
@@ -17,14 +17,14 @@
     "    \n",
     "This tutorial uses the titanic data set, stored as CSV. The data consists of the following data columns:\n",
     "\n",
-    "- PassengerId: Id of every passenger.\n",
-    "- Survived: This feature have value 0 and 1. 0 for not survived and 1 for survived.\n",
+    "- PassengerId: ID of every passenger.\n",
+    "- Survived: This feature has value 0 and 1. 0 for not survived and 1 for survived.\n",
     "- Pclass: There are 3 classes: Class 1, Class 2 and Class 3.\n",
     "- Name: Name of passenger.\n",
     "- Sex: Gender of passenger.\n",
     "- Age: Age of passenger.\n",
     "- SibSp: Indication that passenger have siblings and spouse.\n",
-    "- Parch: Whether a passenger is alone or have family.\n",
+    "- Parch: Whether a passenger is alone or has family.\n",
     "- Ticket: Ticket number of passenger.\n",
     "- Fare: Indicating the fare.\n",
     "- Cabin: The cabin of passenger.\n",
@@ -561,7 +561,7 @@
    "source": [
     "<div class=\"alert alert-info\">\n",
     "    \n",
-    "__Note__: Interested in the last N rows instead? Pandas also provides a `tail` method. For example, `titanic.tail(10)` will return the last 10 rows of the DataFrame.\n",
+    "__Note__: Interested in the last N rows instead? Pandas also provides a `tail()` method. For example, `titanic.tail(10)` will return the last 10 rows of the DataFrame.\n",
     "\n",
     "</div>"
    ]
@@ -570,7 +570,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "A check on how Pandas interpreted each of the column data types can be done by requesting the Pandas `dtypes` attribute:"
+    "A check on how Pandas interpreted each of the column data types can be done by requesting the `dtypes` attribute:"
    ]
   },
   {
@@ -643,7 +643,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Whereas `read_*` fucntions are used to read data to Pandas, the `to_*` methods are used to store data. The `to_excel` method stores the data as an excel file. In the example here, the `sheet_name` is named _passengers_ instead of the default _Sheet1_. By setting `index=False` the row index labels are not saved in the spreadsheet."
+    "Whereas `read_*` functions are used to read data to Pandas, the `to_*` methods are used to store data. The `to_excel` method stores the data as an excel file. In the example here, the `sheet_name` is named _passengers_ instead of the default _Sheet1_. By setting `index=False` the row index labels are not saved in the spreadsheet."
    ]
   },
   {
@@ -908,5 +908,5 @@
   }
  },
  "nbformat": 4,
- "nbformat_minor": 2
+ "nbformat_minor": 4
 }
diff --git a/notebooks/3_subset_data.ipynb b/notebooks/3_subset_data.ipynb
index 03d1524..6c514bc 100644
--- a/notebooks/3_subset_data.ipynb
+++ b/notebooks/3_subset_data.ipynb
@@ -292,7 +292,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "`shape` is an attribute (remember [previous tutorial](./2_read_write.ipynb), no parantheses for attributes) of a pandas `Series` and `DataFrame` containing the number of rows and columns: _(nrows, ncolumns)_. A pandas Series is 1-dimensional and only the number of rows is returned."
+    "`shape` is an attribute (remember [previous tutorial](./2_read_write.ipynb), no parentheses for attributes) of a pandas `Series` and `DataFrame` containing the number of rows and columns: _(nrows, ncolumns)_. A pandas Series is 1-dimensional and only the number of rows is returned."
    ]
   },
   {
@@ -389,7 +389,12 @@
     "\n",
     "<div class=\"alert alert-info\">\n",
     "    \n",
-    "__Note:__ The inner square brackets define a :ref:`Python list <python:tut-morelists>` with column names, whereas the outer brackets are used to select the data from a pandas `DataFrame` as seen in the previous example.\n",
+    "__Note:__ The inner square brackets define a :ref:`Python list <python:tut-morelists>` with column names, whereas the outer brackets are used to select the data from a pandas `DataFrame`. The previous example can therefore also be written as:\n",
+    "\n",
+    "```python\n",
+    "columns_to_select = [\"Age\", \"Sex\"]\n",
+    "titanic[columns_to_select]\n",
+    "```\n",
     "\n",
     "</div>"
    ]
@@ -1020,7 +1025,7 @@
    "source": [
     "<div class=\"alert alert-info\">\n",
     "    \n",
-    "__Note:__ When combining multiple conditional statements, each condition must be surrounded by parentheses `()`. Moreover, you can not use `or`/`and` but need to use the `or` operator `|` and the `and` operator `&`.\n",
+    "__Note:__ When combining multiple conditional statements, each condition must be surrounded by parentheses `()`. Moreover, you can not use `or`/`and` but need to use the \"or\" operator `|` and the \"and\" operator `&`.\n",
     "\n",
     "</div>"
    ]
@@ -1674,5 +1679,5 @@
   }
  },
  "nbformat": 4,
- "nbformat_minor": 2
+ "nbformat_minor": 4
 }
diff --git a/notebooks/4_plotting.ipynb b/notebooks/4_plotting.ipynb
index 741cdb5..5b235a3 100644
--- a/notebooks/4_plotting.ipynb
+++ b/notebooks/4_plotting.ipynb
@@ -493,5 +493,5 @@
   }
  },
  "nbformat": 4,
- "nbformat_minor": 2
+ "nbformat_minor": 4
 }