@@ -215,14 +215,6 @@ These operations produce a pandas object the same type as the left-hand-side inp
215
215
that if of dtype ``bool ``. These ``boolean `` objects can be used in indexing operations,
216
216
see :ref: `here<indexing.boolean> `
217
217
218
- As of v0.13.1, Series, DataFrames and Panels have an equals method to compare if
219
- two such objects are equal.
220
-
221
- .. ipython :: python
222
-
223
- df.equals(df)
224
- df.equals(df2)
225
-
226
218
.. _basics.reductions :
227
219
228
220
Boolean Reductions
@@ -281,6 +273,35 @@ To evaluate single-element pandas objects in a boolean context, use the method `
281
273
282
274
See :ref: `gotchas<gotchas.truth> ` for a more detailed discussion.
283
275
276
+ .. _basics.equals :
277
+
278
+ Often you may find there is more than one way to compute the same
279
+ result. As a simple example, consider ``df+df `` and ``df*2 ``. To test
280
+ that these two computations produce the same result, given the tools
281
+ shown above, you might imagine using ``(df+df == df*2).all() ``. But in
282
+ fact, this expression is False:
283
+
284
+ .. ipython :: python
285
+
286
+ df+ df == df* 2
287
+ (df+ df == df* 2 ).all()
288
+
289
+ Notice that the boolean DataFrame ``df+df == df*2 `` contains some False values!
290
+ That is because NaNs do not compare as equals:
291
+
292
+ .. ipython :: python
293
+
294
+ np.nan == np.nan
295
+
296
+ So, as of v0.13.1, NDFrames (such as Series, DataFrames, and Panels)
297
+ have an ``equals `` method for testing equality, with NaNs in corresponding
298
+ locations treated as equal.
299
+
300
+ .. ipython :: python
301
+
302
+ (df+ df).equals(df* 2 )
303
+
304
+
284
305
285
306
Combining overlapping data sets
286
307
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -497,7 +518,7 @@ of a 1D array of values. It can also be used as a function on regular arrays:
497
518
s.value_counts()
498
519
value_counts(data)
499
520
500
- Similarly, you can get the most frequently occuring value(s) (the mode) of the values in a Series or DataFrame:
521
+ Similarly, you can get the most frequently occurring value(s) (the mode) of the values in a Series or DataFrame:
501
522
502
523
.. ipython :: python
503
524
@@ -783,7 +804,7 @@ DataFrame's index.
783
804
pre-aligned data **. Adding two unaligned DataFrames internally triggers a
784
805
reindexing step. For exploratory analysis you will hardly notice the
785
806
difference (because ``reindex `` has been heavily optimized), but when CPU
786
- cycles matter sprinking a few explicit ``reindex `` calls here and there can
807
+ cycles matter sprinkling a few explicit ``reindex `` calls here and there can
787
808
have an impact.
788
809
789
810
.. _basics.reindex_like :
@@ -1013,7 +1034,7 @@ containing the data in each row:
1013
1034
...: print('%s\n %s' % (row_index, row))
1014
1035
...:
1015
1036
1016
- For instance, a contrived way to transpose the dataframe would be:
1037
+ For instance, a contrived way to transpose the DataFrame would be:
1017
1038
1018
1039
.. ipython :: python
1019
1040
@@ -1160,12 +1181,12 @@ relies on strict ``re.match``, while ``contains`` relies on ``re.search``.
1160
1181
1161
1182
This old, deprecated behavior of ``match `` is still the default. As
1162
1183
demonstrated above, use the new behavior by setting ``as_indexer=True ``.
1163
- In this mode, ``match `` is analagous to ``contains ``, returning a boolean
1184
+ In this mode, ``match `` is analogous to ``contains ``, returning a boolean
1164
1185
Series. The new behavior will become the default behavior in a future
1165
1186
release.
1166
1187
1167
1188
Methods like ``match ``, ``contains ``, ``startswith ``, and ``endswith `` take
1168
- an extra ``na `` arguement so missing values can be considered True or False:
1189
+ an extra ``na `` argument so missing values can be considered True or False:
1169
1190
1170
1191
.. ipython :: python
1171
1192
@@ -1189,7 +1210,7 @@ Methods like ``match``, ``contains``, ``startswith``, and ``endswith`` take
1189
1210
``slice_replace ``,Replace slice in each string with passed value
1190
1211
``count ``,Count occurrences of pattern
1191
1212
``startswith ``,Equivalent to ``str.startswith(pat) `` for each element
1192
- ``endswidth ``,Equivalent to ``str.endswith(pat) `` for each element
1213
+ ``endswith ``,Equivalent to ``str.endswith(pat) `` for each element
1193
1214
``findall ``,Compute list of all occurrences of pattern/regex for each string
1194
1215
``match ``,"Call ``re.match `` on each element, returning matched groups as list"
1195
1216
``extract ``,"Call ``re.match `` on each element, as ``match `` does, but return matched groups as strings for convenience."
@@ -1364,7 +1385,7 @@ from the current type (say ``int`` to ``float``)
1364
1385
df3.dtypes
1365
1386
1366
1387
The ``values `` attribute on a DataFrame return the *lower-common-denominator * of the dtypes, meaning
1367
- the dtype that can accomodate **ALL ** of the types in the resulting homogenous dtyped numpy array. This can
1388
+ the dtype that can accommodate **ALL ** of the types in the resulting homogenous dtyped numpy array. This can
1368
1389
force some *upcasting *.
1369
1390
1370
1391
.. ipython :: python
@@ -1376,7 +1397,7 @@ astype
1376
1397
1377
1398
.. _basics.cast :
1378
1399
1379
- You can use the ``astype `` method to explicity convert dtypes from one to another. These will by default return a copy,
1400
+ You can use the ``astype `` method to explicitly convert dtypes from one to another. These will by default return a copy,
1380
1401
even if the dtype was unchanged (pass ``copy=False `` to change this behavior). In addition, they will raise an
1381
1402
exception if the astype operation is invalid.
1382
1403
@@ -1411,7 +1432,7 @@ they will be set to ``np.nan``.
1411
1432
df3.dtypes
1412
1433
1413
1434
To force conversion to ``datetime64[ns] ``, pass ``convert_dates='coerce' ``.
1414
- This will convert any datetimelike object to dates, forcing other values to ``NaT ``.
1435
+ This will convert any datetime-like object to dates, forcing other values to ``NaT ``.
1415
1436
This might be useful if you are reading in data which is mostly dates,
1416
1437
but occasionally has non-dates intermixed and you want to represent as missing.
1417
1438
@@ -1598,7 +1619,7 @@ For instance:
1598
1619
1599
1620
1600
1621
The ``set_printoptions `` function has a number of options for controlling how
1601
- floating point numbers are formatted (using hte ``precision `` argument) in the
1622
+ floating point numbers are formatted (using the ``precision `` argument) in the
1602
1623
console and . The ``max_rows `` and ``max_columns `` control how many rows and
1603
1624
columns of DataFrame objects are shown by default. If ``max_columns `` is set to
1604
1625
0 (the default, in fact), the library will attempt to fit the DataFrame's
0 commit comments