Skip to content

Commit 174ecf8

Browse files
committed
DOC: Explain the use of NDFrame.equals
1 parent 984aa8e commit 174ecf8

File tree

2 files changed

+40
-19
lines changed

2 files changed

+40
-19
lines changed

doc/source/basics.rst

Lines changed: 39 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -215,14 +215,6 @@ These operations produce a pandas object the same type as the left-hand-side inp
215215
that if of dtype ``bool``. These ``boolean`` objects can be used in indexing operations,
216216
see :ref:`here<indexing.boolean>`
217217

218-
As of v0.13.1, Series, DataFrames and Panels have an equals method to compare if
219-
two such objects are equal.
220-
221-
.. ipython:: python
222-
223-
df.equals(df)
224-
df.equals(df2)
225-
226218
.. _basics.reductions:
227219

228220
Boolean Reductions
@@ -281,6 +273,35 @@ To evaluate single-element pandas objects in a boolean context, use the method `
281273
282274
See :ref:`gotchas<gotchas.truth>` for a more detailed discussion.
283275

276+
.. _basics.equals:
277+
278+
Often you may find there is more than one way to compute the same
279+
result. As a simple example, consider ``df+df`` and ``df*2``. To test
280+
that these two computations produce the same result, given the tools
281+
shown above, you might imagine using ``(df+df == df*2).all()``. But in
282+
fact, this expression is False:
283+
284+
.. ipython:: python
285+
286+
df+df == df*2
287+
(df+df == df*2).all()
288+
289+
Notice that the boolean DataFrame ``df+df == df*2`` contains some False values!
290+
That is because NaNs do not compare as equals:
291+
292+
.. ipython:: python
293+
294+
np.nan == np.nan
295+
296+
So, as of v0.13.1, NDFrames (such as Series, DataFrames, and Panels)
297+
have an ``equals`` method for testing equality, with NaNs in corresponding
298+
locations treated as equal.
299+
300+
.. ipython:: python
301+
302+
(df+df).equals(df*2)
303+
304+
284305
285306
Combining overlapping data sets
286307
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -497,7 +518,7 @@ of a 1D array of values. It can also be used as a function on regular arrays:
497518
s.value_counts()
498519
value_counts(data)
499520
500-
Similarly, you can get the most frequently occuring value(s) (the mode) of the values in a Series or DataFrame:
521+
Similarly, you can get the most frequently occurring value(s) (the mode) of the values in a Series or DataFrame:
501522

502523
.. ipython:: python
503524
@@ -783,7 +804,7 @@ DataFrame's index.
783804
pre-aligned data**. Adding two unaligned DataFrames internally triggers a
784805
reindexing step. For exploratory analysis you will hardly notice the
785806
difference (because ``reindex`` has been heavily optimized), but when CPU
786-
cycles matter sprinking a few explicit ``reindex`` calls here and there can
807+
cycles matter sprinkling a few explicit ``reindex`` calls here and there can
787808
have an impact.
788809

789810
.. _basics.reindex_like:
@@ -1013,7 +1034,7 @@ containing the data in each row:
10131034
...: print('%s\n%s' % (row_index, row))
10141035
...:
10151036

1016-
For instance, a contrived way to transpose the dataframe would be:
1037+
For instance, a contrived way to transpose the DataFrame would be:
10171038

10181039
.. ipython:: python
10191040
@@ -1160,12 +1181,12 @@ relies on strict ``re.match``, while ``contains`` relies on ``re.search``.
11601181

11611182
This old, deprecated behavior of ``match`` is still the default. As
11621183
demonstrated above, use the new behavior by setting ``as_indexer=True``.
1163-
In this mode, ``match`` is analagous to ``contains``, returning a boolean
1184+
In this mode, ``match`` is analogous to ``contains``, returning a boolean
11641185
Series. The new behavior will become the default behavior in a future
11651186
release.
11661187

11671188
Methods like ``match``, ``contains``, ``startswith``, and ``endswith`` take
1168-
an extra ``na`` arguement so missing values can be considered True or False:
1189+
an extra ``na`` argument so missing values can be considered True or False:
11691190

11701191
.. ipython:: python
11711192
@@ -1189,7 +1210,7 @@ Methods like ``match``, ``contains``, ``startswith``, and ``endswith`` take
11891210
``slice_replace``,Replace slice in each string with passed value
11901211
``count``,Count occurrences of pattern
11911212
``startswith``,Equivalent to ``str.startswith(pat)`` for each element
1192-
``endswidth``,Equivalent to ``str.endswith(pat)`` for each element
1213+
``endswith``,Equivalent to ``str.endswith(pat)`` for each element
11931214
``findall``,Compute list of all occurrences of pattern/regex for each string
11941215
``match``,"Call ``re.match`` on each element, returning matched groups as list"
11951216
``extract``,"Call ``re.match`` on each element, as ``match`` does, but return matched groups as strings for convenience."
@@ -1364,7 +1385,7 @@ from the current type (say ``int`` to ``float``)
13641385
df3.dtypes
13651386
13661387
The ``values`` attribute on a DataFrame return the *lower-common-denominator* of the dtypes, meaning
1367-
the dtype that can accomodate **ALL** of the types in the resulting homogenous dtyped numpy array. This can
1388+
the dtype that can accommodate **ALL** of the types in the resulting homogenous dtyped numpy array. This can
13681389
force some *upcasting*.
13691390

13701391
.. ipython:: python
@@ -1376,7 +1397,7 @@ astype
13761397

13771398
.. _basics.cast:
13781399

1379-
You can use the ``astype`` method to explicity convert dtypes from one to another. These will by default return a copy,
1400+
You can use the ``astype`` method to explicitly convert dtypes from one to another. These will by default return a copy,
13801401
even if the dtype was unchanged (pass ``copy=False`` to change this behavior). In addition, they will raise an
13811402
exception if the astype operation is invalid.
13821403

@@ -1411,7 +1432,7 @@ they will be set to ``np.nan``.
14111432
df3.dtypes
14121433
14131434
To force conversion to ``datetime64[ns]``, pass ``convert_dates='coerce'``.
1414-
This will convert any datetimelike object to dates, forcing other values to ``NaT``.
1435+
This will convert any datetime-like object to dates, forcing other values to ``NaT``.
14151436
This might be useful if you are reading in data which is mostly dates,
14161437
but occasionally has non-dates intermixed and you want to represent as missing.
14171438

@@ -1598,7 +1619,7 @@ For instance:
15981619
15991620
16001621
The ``set_printoptions`` function has a number of options for controlling how
1601-
floating point numbers are formatted (using hte ``precision`` argument) in the
1622+
floating point numbers are formatted (using the ``precision`` argument) in the
16021623
console and . The ``max_rows`` and ``max_columns`` control how many rows and
16031624
columns of DataFrame objects are shown by default. If ``max_columns`` is set to
16041625
0 (the default, in fact), the library will attempt to fit the DataFrame's

doc/source/v0.13.1.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ API changes
4646
equal have equal axes, dtypes, and values. Added the
4747
``array_equivalent`` function to compare if two ndarrays are
4848
equal. NaNs in identical locations are treated as
49-
equal. (:issue:`5283`)
49+
equal. (:issue:`5283`) See also :ref:`the docs<basics.equals>` for a motivating example.
5050

5151
.. ipython:: python
5252

0 commit comments

Comments
 (0)