Skip to content

Commit 5c57a97

Browse files
author
craigsdennis
committed
Review pass
1 parent 7e61274 commit 5c57a97

File tree

1 file changed

+120
-24
lines changed

1 file changed

+120
-24
lines changed

s2n8-manipulating-text.ipynb

Lines changed: 120 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,5 @@
11
{
22
"cells": [
3-
{
4-
"cell_type": "markdown",
5-
"metadata": {},
6-
"source": [
7-
"[![Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/treehouse-projects/python-introducing-pandas/master?filepath=s2n6-manipulating-text.ipynb)"
8-
]
9-
},
103
{
114
"cell_type": "markdown",
125
"metadata": {},
@@ -15,7 +8,7 @@
158
"\n",
169
"Oftentimes, there will be something a bit off with the string data in your dataset. You may want to replace some characters, change the case, or strip the whitespace. You know, anything you normally need to do with strings.\n",
1710
"\n",
18-
"Now this might lead you to want to loop through each row and manipulate the data, but before you do that, step back and remember about **vectorization**. \n",
11+
"Now this might lead you to want to loop through each row and manipulate the data, but before you do that, step back and lean into **vectorization**. \n",
1912
"\n",
2013
"A `Series` provides a way to use vectorized string methods in a property named [`str`](https://pandas.pydata.org/pandas-docs/stable/api.html#string-handling) and the vectorized methods are then available.\n",
2114
"\n",
@@ -35,6 +28,7 @@
3528
"\n",
3629
"from utils import make_chaos\n",
3730
"\n",
31+
"pd.options.display.max_rows = 10\n",
3832
"transactions = pd.read_csv(os.path.join('data', 'transactions.csv'), index_col=0)\n",
3933
"# Pay no attention to the person behind the curtain\n",
4034
"make_chaos(transactions, 42, ['sender'], lambda val: '$' + val)\n",
@@ -47,9 +41,9 @@
4741
"source": [
4842
"## Replacing Text\n",
4943
"\n",
50-
"When CashBox first got started, usernames were allowed to start with a dollar sign. As time progressed, they changed their mind. They made a mass update to the system. However, someone on the Customer Support team reported that the data in the **`transactions`** `DataFrame` was still showing some senders whose user name still had the $ prefix.\n",
44+
"When CashBox first got started, usernames were allowed to start with a dollar sign. As time progressed, they changed their mind. They made a mass update to the system. However, someone on the Customer Support team reported that there are some records in the **`transactions`** `DataFrame` still showing some senders whose user name still had the $ prefix.\n",
5145
"\n",
52-
"So in order to get ahold of those rows where the sender starts with a $, we can use the [`Series.str.startswith`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.startswith.html#pandas.Series.str.startswith) method. This will return a boolean `Series` which we can use as an index."
46+
"In order to get ahold of those rows where the sender starts with a $, we can use the [`Series.str.startswith`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.startswith.html#pandas.Series.str.startswith) method. This will return a boolean `Series` which we can use as an index."
5347
]
5448
},
5549
{
@@ -120,17 +114,68 @@
120114
" <td>9.31</td>\n",
121115
" <td>2018-07-04</td>\n",
122116
" </tr>\n",
117+
" <tr>\n",
118+
" <th>...</th>\n",
119+
" <td>...</td>\n",
120+
" <td>...</td>\n",
121+
" <td>...</td>\n",
122+
" <td>...</td>\n",
123+
" </tr>\n",
124+
" <tr>\n",
125+
" <th>877</th>\n",
126+
" <td>$april9082</td>\n",
127+
" <td>jacob.davis</td>\n",
128+
" <td>50.37</td>\n",
129+
" <td>2018-09-21</td>\n",
130+
" </tr>\n",
131+
" <tr>\n",
132+
" <th>889</th>\n",
133+
" <td>$victor</td>\n",
134+
" <td>anthony1788</td>\n",
135+
" <td>39.06</td>\n",
136+
" <td>2018-09-21</td>\n",
137+
" </tr>\n",
138+
" <tr>\n",
139+
" <th>900</th>\n",
140+
" <td>$andersen</td>\n",
141+
" <td>corey.ingram</td>\n",
142+
" <td>4.81</td>\n",
143+
" <td>2018-09-22</td>\n",
144+
" </tr>\n",
145+
" <tr>\n",
146+
" <th>927</th>\n",
147+
" <td>$janet.williams</td>\n",
148+
" <td>bsmith</td>\n",
149+
" <td>50.15</td>\n",
150+
" <td>2018-09-23</td>\n",
151+
" </tr>\n",
152+
" <tr>\n",
153+
" <th>934</th>\n",
154+
" <td>$robert8280</td>\n",
155+
" <td>roger</td>\n",
156+
" <td>98.35</td>\n",
157+
" <td>2018-09-24</td>\n",
158+
" </tr>\n",
123159
" </tbody>\n",
124160
"</table>\n",
161+
"<p>42 rows × 4 columns</p>\n",
125162
"</div>"
126163
],
127164
"text/plain": [
128-
" sender receiver amount sent_date\n",
129-
"59 $porter gail7896 75.16 2018-05-14\n",
130-
"70 $emily.lewis kevin 5.49 2018-05-21\n",
131-
"158 $robinson rodriguez 8.91 2018-06-25\n",
132-
"168 $nancy margaret265 84.15 2018-06-26\n",
133-
"198 $acook adam.saunders 9.31 2018-07-04"
165+
" sender receiver amount sent_date\n",
166+
"59 $porter gail7896 75.16 2018-05-14\n",
167+
"70 $emily.lewis kevin 5.49 2018-05-21\n",
168+
"158 $robinson rodriguez 8.91 2018-06-25\n",
169+
"168 $nancy margaret265 84.15 2018-06-26\n",
170+
"198 $acook adam.saunders 9.31 2018-07-04\n",
171+
".. ... ... ... ...\n",
172+
"877 $april9082 jacob.davis 50.37 2018-09-21\n",
173+
"889 $victor anthony1788 39.06 2018-09-21\n",
174+
"900 $andersen corey.ingram 4.81 2018-09-22\n",
175+
"927 $janet.williams bsmith 50.15 2018-09-23\n",
176+
"934 $robert8280 roger 98.35 2018-09-24\n",
177+
"\n",
178+
"[42 rows x 4 columns]"
134179
]
135180
},
136181
"execution_count": 2,
@@ -139,7 +184,7 @@
139184
}
140185
],
141186
"source": [
142-
"transactions[transactions.sender.str.startswith('$')].head()"
187+
"transactions[transactions.sender.str.startswith('$')]"
143188
]
144189
},
145190
{
@@ -253,17 +298,68 @@
253298
" <td>85.21</td>\n",
254299
" <td>2018-04-26</td>\n",
255300
" </tr>\n",
301+
" <tr>\n",
302+
" <th>...</th>\n",
303+
" <td>...</td>\n",
304+
" <td>...</td>\n",
305+
" <td>...</td>\n",
306+
" <td>...</td>\n",
307+
" </tr>\n",
308+
" <tr>\n",
309+
" <th>963</th>\n",
310+
" <td>stanley7729</td>\n",
311+
" <td>JOSEPH.LOPEZ</td>\n",
312+
" <td>50.84</td>\n",
313+
" <td>2018-09-25</td>\n",
314+
" </tr>\n",
315+
" <tr>\n",
316+
" <th>977</th>\n",
317+
" <td>martha6969</td>\n",
318+
" <td>PATRICIA</td>\n",
319+
" <td>87.33</td>\n",
320+
" <td>2018-09-25</td>\n",
321+
" </tr>\n",
322+
" <tr>\n",
323+
" <th>987</th>\n",
324+
" <td>alvarado</td>\n",
325+
" <td>PAMELA</td>\n",
326+
" <td>48.74</td>\n",
327+
" <td>2018-09-25</td>\n",
328+
" </tr>\n",
329+
" <tr>\n",
330+
" <th>990</th>\n",
331+
" <td>robert</td>\n",
332+
" <td>HEATHER.WADE</td>\n",
333+
" <td>86.44</td>\n",
334+
" <td>2018-09-25</td>\n",
335+
" </tr>\n",
336+
" <tr>\n",
337+
" <th>992</th>\n",
338+
" <td>pamela</td>\n",
339+
" <td>CALEB</td>\n",
340+
" <td>25.01</td>\n",
341+
" <td>2018-09-25</td>\n",
342+
" </tr>\n",
256343
" </tbody>\n",
257344
"</table>\n",
345+
"<p>88 rows × 4 columns</p>\n",
258346
"</div>"
259347
],
260348
"text/plain": [
261-
" sender receiver amount sent_date\n",
262-
"2 rose.eaton EMILY.LEWIS 62.67 2018-02-15\n",
263-
"5 francis.hernandez LMOORE 91.46 2018-03-14\n",
264-
"14 palmer CHAD.CHEN 36.27 2018-04-07\n",
265-
"28 elang DONNA1922 26.07 2018-04-23\n",
266-
"34 payne GRIFFIN4992 85.21 2018-04-26"
349+
" sender receiver amount sent_date\n",
350+
"2 rose.eaton EMILY.LEWIS 62.67 2018-02-15\n",
351+
"5 francis.hernandez LMOORE 91.46 2018-03-14\n",
352+
"14 palmer CHAD.CHEN 36.27 2018-04-07\n",
353+
"28 elang DONNA1922 26.07 2018-04-23\n",
354+
"34 payne GRIFFIN4992 85.21 2018-04-26\n",
355+
".. ... ... ... ...\n",
356+
"963 stanley7729 JOSEPH.LOPEZ 50.84 2018-09-25\n",
357+
"977 martha6969 PATRICIA 87.33 2018-09-25\n",
358+
"987 alvarado PAMELA 48.74 2018-09-25\n",
359+
"990 robert HEATHER.WADE 86.44 2018-09-25\n",
360+
"992 pamela CALEB 25.01 2018-09-25\n",
361+
"\n",
362+
"[88 rows x 4 columns]"
267363
]
268364
},
269365
"execution_count": 4,
@@ -272,7 +368,7 @@
272368
}
273369
],
274370
"source": [
275-
"transactions[transactions.receiver.str.isupper()].head()"
371+
"transactions[transactions.receiver.str.isupper()]"
276372
]
277373
},
278374
{

0 commit comments

Comments
 (0)