From 9cfadd4e8ce419d0e7a43bde76e73f6f906f8fba Mon Sep 17 00:00:00 2001 From: Joris Van den Bossche Date: Mon, 21 Oct 2019 16:19:22 +0200 Subject: [PATCH] Proofread --- notebooks/1_table_oriented.ipynb | 27 +++++++++++++++++++-------- notebooks/2_read_write.ipynb | 14 +++++++------- notebooks/3_subset_data.ipynb | 13 +++++++++---- notebooks/4_plotting.ipynb | 2 +- 4 files changed, 36 insertions(+), 20 deletions(-) diff --git a/notebooks/1_table_oriented.ipynb b/notebooks/1_table_oriented.ipynb index 432f6c6..64a8dfe 100644 --- a/notebooks/1_table_oriented.ipynb +++ b/notebooks/1_table_oriented.ipynb @@ -134,7 +134,7 @@ "source": [ "A `DataFrame` is a 2-dimensional data structure that can store data of different types (including characters, integers, floating point values, categorical data and more) in columns. It is similar to a spreadsheet, a SQL table or the `data.frame` in R. \n", "\n", - "- The table has 3 columns, each of them with a column label. The column labels are respectively `Name`, `Age` and `Sex`.\n", + "- The table above has 3 columns, each of them with a column label. The column labels are `Name`, `Age` and `Sex`, respectively.\n", "- The column `Name` consists of textual data with each value a string, the column `Age` are numbers and the column `Sex` is textual data.\n", "\n", "In spreadsheet software, the table representation of our data would look very similar:\n", @@ -142,6 +142,17 @@ "![](../schemas/01_table_spreadsheet.png)" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + " \n", + "__Note__: You probably do not want to manually input the data of a DataFrame! In most situations, data stored in a file format are the starting point of an analysis. We will get to that later!\n", + "\n", + "
" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -199,7 +210,7 @@ "source": [ "
\n", " \n", - "If you are familiar to Python :ref:`dictionaries `, the selection of a single column is very similar to selection of dictionary values based on the key.\n", + "If you are familiar to Python :ref:`dictionaries `, the selection of a single column is very similar to the selection of dictionary values based on the key.\n", "\n", "
" ] @@ -287,7 +298,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Or to the `Series`:" + "Or on the `Series`:" ] }, { @@ -314,7 +325,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "As illustrated by the `max()` method, you can _do_ things with a `DataFrame` or `Series`. Pandas provides a lot of functionalities each of them a _method_ you can apply to a `DataFrame` or `Series`. As methods are functions, do not forget to use parentheses `()`." + "As illustrated by the `max()` method, you can _do_ things with a `DataFrame` or `Series`. Pandas provides a lot of functionality for working with `DataFrame` or `Series`, often defined as methods on those objects. As methods are functions, do not forget to use parentheses `()`." ] }, { @@ -415,7 +426,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The `describe` method provides quick overview of the numerical data in a `DataFrame`. As the `Name` and `Sex` columns are textual data, these are by default not taken into account by the `describe` method. Many pandas operations return a `DataFrame` or a `Series`. The `describe` method is an example of a pandas operation returning a pandas `Series`.\n", + "The `describe` method provides quick overview of the numerical data in a `DataFrame`. As the `Name` and `Sex` columns are textual data, these are by default not taken into account by the `describe` method. Many pandas operations return a `DataFrame` or a `Series`. The `describe` method is an example of a pandas operation returning a pandas `DataFrame`.\n", "\n", "\n", "__To user guide:__ check more options on `describe` :ref:`basics.describe`" @@ -438,10 +449,10 @@ "source": [ "## REMEMBER\n", "\n", - "- Import the package, aka `import Pandas as pd`\n", + "- Import the package, aka `import pandas as pd`\n", "- A table of data is stored as a pandas `DataFrame`\n", "- Each column in a `DataFrame` is a `Series`\n", - "- You can do things by applying a method to a `DataFrame` or `Series`" + "- You can do things by calling a method on a `DataFrame` or `Series`" ] }, { @@ -472,5 +483,5 @@ } }, "nbformat": 4, - "nbformat_minor": 2 + "nbformat_minor": 4 } diff --git a/notebooks/2_read_write.ipynb b/notebooks/2_read_write.ipynb index e21c689..e693e7b 100644 --- a/notebooks/2_read_write.ipynb +++ b/notebooks/2_read_write.ipynb @@ -17,14 +17,14 @@ " \n", "This tutorial uses the titanic data set, stored as CSV. The data consists of the following data columns:\n", "\n", - "- PassengerId: Id of every passenger.\n", - "- Survived: This feature have value 0 and 1. 0 for not survived and 1 for survived.\n", + "- PassengerId: ID of every passenger.\n", + "- Survived: This feature has value 0 and 1. 0 for not survived and 1 for survived.\n", "- Pclass: There are 3 classes: Class 1, Class 2 and Class 3.\n", "- Name: Name of passenger.\n", "- Sex: Gender of passenger.\n", "- Age: Age of passenger.\n", "- SibSp: Indication that passenger have siblings and spouse.\n", - "- Parch: Whether a passenger is alone or have family.\n", + "- Parch: Whether a passenger is alone or has family.\n", "- Ticket: Ticket number of passenger.\n", "- Fare: Indicating the fare.\n", "- Cabin: The cabin of passenger.\n", @@ -561,7 +561,7 @@ "source": [ "
\n", " \n", - "__Note__: Interested in the last N rows instead? Pandas also provides a `tail` method. For example, `titanic.tail(10)` will return the last 10 rows of the DataFrame.\n", + "__Note__: Interested in the last N rows instead? Pandas also provides a `tail()` method. For example, `titanic.tail(10)` will return the last 10 rows of the DataFrame.\n", "\n", "
" ] @@ -570,7 +570,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "A check on how Pandas interpreted each of the column data types can be done by requesting the Pandas `dtypes` attribute:" + "A check on how Pandas interpreted each of the column data types can be done by requesting the `dtypes` attribute:" ] }, { @@ -643,7 +643,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Whereas `read_*` fucntions are used to read data to Pandas, the `to_*` methods are used to store data. The `to_excel` method stores the data as an excel file. In the example here, the `sheet_name` is named _passengers_ instead of the default _Sheet1_. By setting `index=False` the row index labels are not saved in the spreadsheet." + "Whereas `read_*` functions are used to read data to Pandas, the `to_*` methods are used to store data. The `to_excel` method stores the data as an excel file. In the example here, the `sheet_name` is named _passengers_ instead of the default _Sheet1_. By setting `index=False` the row index labels are not saved in the spreadsheet." ] }, { @@ -908,5 +908,5 @@ } }, "nbformat": 4, - "nbformat_minor": 2 + "nbformat_minor": 4 } diff --git a/notebooks/3_subset_data.ipynb b/notebooks/3_subset_data.ipynb index 03d1524..6c514bc 100644 --- a/notebooks/3_subset_data.ipynb +++ b/notebooks/3_subset_data.ipynb @@ -292,7 +292,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "`shape` is an attribute (remember [previous tutorial](./2_read_write.ipynb), no parantheses for attributes) of a pandas `Series` and `DataFrame` containing the number of rows and columns: _(nrows, ncolumns)_. A pandas Series is 1-dimensional and only the number of rows is returned." + "`shape` is an attribute (remember [previous tutorial](./2_read_write.ipynb), no parentheses for attributes) of a pandas `Series` and `DataFrame` containing the number of rows and columns: _(nrows, ncolumns)_. A pandas Series is 1-dimensional and only the number of rows is returned." ] }, { @@ -389,7 +389,12 @@ "\n", "
\n", " \n", - "__Note:__ The inner square brackets define a :ref:`Python list ` with column names, whereas the outer brackets are used to select the data from a pandas `DataFrame` as seen in the previous example.\n", + "__Note:__ The inner square brackets define a :ref:`Python list ` with column names, whereas the outer brackets are used to select the data from a pandas `DataFrame`. The previous example can therefore also be written as:\n", + "\n", + "```python\n", + "columns_to_select = [\"Age\", \"Sex\"]\n", + "titanic[columns_to_select]\n", + "```\n", "\n", "
" ] @@ -1020,7 +1025,7 @@ "source": [ "
\n", " \n", - "__Note:__ When combining multiple conditional statements, each condition must be surrounded by parentheses `()`. Moreover, you can not use `or`/`and` but need to use the `or` operator `|` and the `and` operator `&`.\n", + "__Note:__ When combining multiple conditional statements, each condition must be surrounded by parentheses `()`. Moreover, you can not use `or`/`and` but need to use the \"or\" operator `|` and the \"and\" operator `&`.\n", "\n", "
" ] @@ -1674,5 +1679,5 @@ } }, "nbformat": 4, - "nbformat_minor": 2 + "nbformat_minor": 4 } diff --git a/notebooks/4_plotting.ipynb b/notebooks/4_plotting.ipynb index 741cdb5..5b235a3 100644 --- a/notebooks/4_plotting.ipynb +++ b/notebooks/4_plotting.ipynb @@ -493,5 +493,5 @@ } }, "nbformat": 4, - "nbformat_minor": 2 + "nbformat_minor": 4 }