Skip to content

Commit 013c2ce

Browse files
committed
DOC: follow on merge_asof
closes #13358
1 parent b06bc7a commit 013c2ce

File tree

3 files changed

+73
-83
lines changed

3 files changed

+73
-83
lines changed

doc/source/merging.rst

Lines changed: 58 additions & 74 deletions
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ some configurable handling of "what to do with the other axes":
7878
::
7979

8080
pd.concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False,
81-
keys=None, levels=None, names=None, verify_integrity=False)
81+
keys=None, levels=None, names=None, verify_integrity=False)
8282

8383
- ``objs``: a sequence or mapping of Series, DataFrame, or Panel objects. If a
8484
dict is passed, the sorted keys will be used as the `keys` argument, unless
@@ -510,48 +510,45 @@ standard database join operations between DataFrame objects:
510510

511511
::
512512

513-
merge(left, right, how='inner', on=None, left_on=None, right_on=None,
514-
left_index=False, right_index=False, sort=True,
515-
suffixes=('_x', '_y'), copy=True, indicator=False)
516-
517-
Here's a description of what each argument is for:
518-
519-
- ``left``: A DataFrame object
520-
- ``right``: Another DataFrame object
521-
- ``on``: Columns (names) to join on. Must be found in both the left and
522-
right DataFrame objects. If not passed and ``left_index`` and
523-
``right_index`` are ``False``, the intersection of the columns in the
524-
DataFrames will be inferred to be the join keys
525-
- ``left_on``: Columns from the left DataFrame to use as keys. Can either be
526-
column names or arrays with length equal to the length of the DataFrame
527-
- ``right_on``: Columns from the right DataFrame to use as keys. Can either be
528-
column names or arrays with length equal to the length of the DataFrame
529-
- ``left_index``: If ``True``, use the index (row labels) from the left
530-
DataFrame as its join key(s). In the case of a DataFrame with a MultiIndex
531-
(hierarchical), the number of levels must match the number of join keys
532-
from the right DataFrame
533-
- ``right_index``: Same usage as ``left_index`` for the right DataFrame
534-
- ``how``: One of ``'left'``, ``'right'``, ``'outer'``, ``'inner'``. Defaults
535-
to ``inner``. See below for more detailed description of each method
536-
- ``sort``: Sort the result DataFrame by the join keys in lexicographical
537-
order. Defaults to ``True``, setting to ``False`` will improve performance
538-
substantially in many cases
539-
- ``suffixes``: A tuple of string suffixes to apply to overlapping
540-
columns. Defaults to ``('_x', '_y')``.
541-
- ``copy``: Always copy data (default ``True``) from the passed DataFrame
542-
objects, even when reindexing is not necessary. Cannot be avoided in many
543-
cases but may improve performance / memory usage. The cases where copying
544-
can be avoided are somewhat pathological but this option is provided
545-
nonetheless.
546-
- ``indicator``: Add a column to the output DataFrame called ``_merge``
547-
with information on the source of each row. ``_merge`` is Categorical-type
548-
and takes on a value of ``left_only`` for observations whose merge key
549-
only appears in ``'left'`` DataFrame, ``right_only`` for observations whose
550-
merge key only appears in ``'right'`` DataFrame, and ``both`` if the
551-
observation's merge key is found in both.
552-
553-
.. versionadded:: 0.17.0
554-
513+
pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None,
514+
left_index=False, right_index=False, sort=True,
515+
suffixes=('_x', '_y'), copy=True, indicator=False)
516+
517+
- ``left``: A DataFrame object
518+
- ``right``: Another DataFrame object
519+
- ``on``: Columns (names) to join on. Must be found in both the left and
520+
right DataFrame objects. If not passed and ``left_index`` and
521+
``right_index`` are ``False``, the intersection of the columns in the
522+
DataFrames will be inferred to be the join keys
523+
- ``left_on``: Columns from the left DataFrame to use as keys. Can either be
524+
column names or arrays with length equal to the length of the DataFrame
525+
- ``right_on``: Columns from the right DataFrame to use as keys. Can either be
526+
column names or arrays with length equal to the length of the DataFrame
527+
- ``left_index``: If ``True``, use the index (row labels) from the left
528+
DataFrame as its join key(s). In the case of a DataFrame with a MultiIndex
529+
(hierarchical), the number of levels must match the number of join keys
530+
from the right DataFrame
531+
- ``right_index``: Same usage as ``left_index`` for the right DataFrame
532+
- ``how``: One of ``'left'``, ``'right'``, ``'outer'``, ``'inner'``. Defaults
533+
to ``inner``. See below for more detailed description of each method
534+
- ``sort``: Sort the result DataFrame by the join keys in lexicographical
535+
order. Defaults to ``True``, setting to ``False`` will improve performance
536+
substantially in many cases
537+
- ``suffixes``: A tuple of string suffixes to apply to overlapping
538+
columns. Defaults to ``('_x', '_y')``.
539+
- ``copy``: Always copy data (default ``True``) from the passed DataFrame
540+
objects, even when reindexing is not necessary. Cannot be avoided in many
541+
cases but may improve performance / memory usage. The cases where copying
542+
can be avoided are somewhat pathological but this option is provided
543+
nonetheless.
544+
- ``indicator``: Add a column to the output DataFrame called ``_merge``
545+
with information on the source of each row. ``_merge`` is Categorical-type
546+
and takes on a value of ``left_only`` for observations whose merge key
547+
only appears in ``'left'`` DataFrame, ``right_only`` for observations whose
548+
merge key only appears in ``'right'`` DataFrame, and ``both`` if the
549+
observation's merge key is found in both.
550+
551+
.. versionadded:: 0.17.0
555552

556553
The return type will be the same as ``left``. If ``left`` is a ``DataFrame``
557554
and ``right`` is a subclass of DataFrame, the return type will still be
@@ -573,11 +570,11 @@ terminology used to describe join operations between two SQL-table like
573570
structures (DataFrame objects). There are several cases to consider which are
574571
very important to understand:
575572

576-
- **one-to-one** joins: for example when joining two DataFrame objects on
577-
their indexes (which must contain unique values)
578-
- **many-to-one** joins: for example when joining an index (unique) to one or
579-
more columns in a DataFrame
580-
- **many-to-many** joins: joining columns on columns.
573+
- **one-to-one** joins: for example when joining two DataFrame objects on
574+
their indexes (which must contain unique values)
575+
- **many-to-one** joins: for example when joining an index (unique) to one or
576+
more columns in a DataFrame
577+
- **many-to-many** joins: joining columns on columns.
581578

582579
.. note::
583580

@@ -714,15 +711,15 @@ The merge indicator
714711

715712
.. ipython:: python
716713
717-
df1 = DataFrame({'col1':[0,1], 'col_left':['a','b']})
718-
df2 = DataFrame({'col1':[1,2,2],'col_right':[2,2,2]})
719-
merge(df1, df2, on='col1', how='outer', indicator=True)
714+
df1 = pd.DataFrame({'col1': [0, 1], 'col_left':['a', 'b']})
715+
df2 = pd.DataFrame({'col1': [1, 2, 2],'col_right':[2, 2, 2]})
716+
pd.merge(df1, df2, on='col1', how='outer', indicator=True)
720717
721718
The ``indicator`` argument will also accept string arguments, in which case the indicator function will use the value of the passed string as the name for the indicator column.
722719

723720
.. ipython:: python
724721
725-
merge(df1, df2, on='col1', how='outer', indicator='indicator_column')
722+
pd.merge(df1, df2, on='col1', how='outer', indicator='indicator_column')
726723
727724
728725
.. _merging.join.index:
@@ -924,7 +921,7 @@ a level name of the multi-indexed frame.
924921
925922
left = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
926923
'B': ['B0', 'B1', 'B2']},
927-
index=Index(['K0', 'K1', 'K2'], name='key'))
924+
index=pd.Index(['K0', 'K1', 'K2'], name='key'))
928925
929926
index = pd.MultiIndex.from_tuples([('K0', 'Y0'), ('K1', 'Y1'),
930927
('K2', 'Y2'), ('K2', 'Y3')],
@@ -1116,28 +1113,20 @@ Timeseries friendly merging
11161113
Merging Ordered Data
11171114
~~~~~~~~~~~~~~~~~~~~
11181115

1119-
The ``pd.merge_ordered()`` function allows combining time series and other
1116+
A :func:`pd.merge_ordered` function allows combining time series and other
11201117
ordered data. In particular it has an optional ``fill_method`` keyword to
11211118
fill/interpolate missing data:
11221119

11231120
.. ipython:: python
11241121
1125-
left = DataFrame({'k': ['K0', 'K1', 'K1', 'K2'],
1126-
'lv': [1, 2, 3, 4],
1127-
's': ['a', 'b', 'c', 'd']})
1128-
1129-
right = DataFrame({'k': ['K1', 'K2', 'K4'],
1130-
'rv': [1, 2, 3]})
1122+
left = pd.DataFrame({'k': ['K0', 'K1', 'K1', 'K2'],
1123+
'lv': [1, 2, 3, 4],
1124+
's': ['a', 'b', 'c', 'd']})
11311125
1132-
result = pd.merge_ordered(left, right, fill_method='ffill', left_by='s')
1133-
1134-
.. ipython:: python
1135-
:suppress:
1126+
right = pd.DataFrame({'k': ['K1', 'K2', 'K4'],
1127+
'rv': [1, 2, 3]})
11361128
1137-
@savefig merging_ordered_merge.png
1138-
p.plot([left, right], result,
1139-
labels=['left', 'right'], vertical=True);
1140-
plt.close('all');
1129+
pd.merge_ordered(left, right, fill_method='ffill', left_by='s')
11411130
11421131
.. _merging.merge_asof:
11431132

@@ -1146,12 +1135,7 @@ Merging AsOf
11461135

11471136
.. versionadded:: 0.18.2
11481137

1149-
An ``pd.merge_asof()`` this is similar to an ordered left-join except that we
1150-
match on nearest key rather than equal keys.
1151-
1152-
For each row in the ``left`` DataFrame, we select the last row in the ``right``
1153-
DataFrame whose ``on`` key is less than the left's key. Both DataFrames must
1154-
be sorted by the key.
1138+
A :func:`pd.merge_asof` is similar to an ordered left-join except that we match on nearest key rather than equal keys. For each row in the ``left`` DataFrame, we select the last row in the ``right`` DataFrame whose ``on`` key is less than the left's key. Both DataFrames must be sorted by the key.
11551139

11561140
Optionally an asof merge can perform a group-wise merge. This matches the ``by`` key equally,
11571141
in addition to the nearest match on the ``on`` key.

doc/source/whatsnew/v0.18.2.txt

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ We recommend that all users upgrade to this version.
99

1010
Highlights include:
1111

12+
- ``pd.merge_asof()`` for asof-style time-series joining, see :ref:`here <whatsnew_0182.enhancements.asof_merge>`
1213

1314
.. contents:: What's new in v0.18.2
1415
:local:
@@ -28,7 +29,7 @@ A long-time requested feature has been added through the :func:`merge_asof` func
2829
support asof style joining of time-series. (:issue:`1870`). Full documentation is
2930
:ref:`here <merging.merge_asof>`
3031

31-
The :func:`merge_asof`` performs an asof merge, which is similar to a left-join
32+
The :func:`merge_asof` performs an asof merge, which is similar to a left-join
3233
except that we match on nearest key rather than equal keys.
3334

3435
.. ipython:: python
@@ -108,7 +109,7 @@ that forward filling happens automatically taking the most recent non-NaN value.
108109
by='ticker')
109110

110111
This returns a merged DataFrame with the entries in the same order as the original left
111-
passed DataFrame (``trades`` in this case). With the fields of the ``quotes`` merged.
112+
passed DataFrame (``trades`` in this case), with the fields of the ``quotes`` merged.
112113

113114
.. _whatsnew_0182.enhancements.read_csv_dupe_col_names_support:
114115

pandas/tools/merge.py

Lines changed: 12 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -243,6 +243,8 @@ def _merger(x, y):
243243
result = _merger(left, right)
244244
return result
245245

246+
ordered_merge.__doc__ = merge_ordered.__doc__
247+
246248

247249
def merge_asof(left, right, on=None,
248250
left_on=None, right_on=None,
@@ -335,7 +337,7 @@ def merge_asof(left, right, on=None,
335337
1 5 b 3.0
336338
2 10 c 7.0
337339
338-
For this example, we can achieve a similar result thru pd.merge_ordered,
340+
For this example, we can achieve a similar result thru ``pd.merge_ordered()``,
339341
though its not nearly as performant.
340342
341343
@@ -348,7 +350,7 @@ def merge_asof(left, right, on=None,
348350
3 5 b 3.0
349351
6 10 c 7.0
350352
351-
Here is a real-worth times-series example
353+
Here is a real-world times-series example
352354
353355
>>> quotes
354356
time ticker bid ask
@@ -369,7 +371,8 @@ def merge_asof(left, right, on=None,
369371
3 2016-05-25 13:30:00.048 GOOG 720.92 100
370372
4 2016-05-25 13:30:00.048 AAPL 98.00 100
371373
372-
# by default we are taking the asof of the quotes
374+
By default we are taking the asof of the quotes
375+
373376
>>> pd.asof_merge(trades, quotes,
374377
... on='time',
375378
... by='ticker')
@@ -380,7 +383,8 @@ def merge_asof(left, right, on=None,
380383
3 2016-05-25 13:30:00.048 GOOG 720.92 100 720.50 720.93
381384
4 2016-05-25 13:30:00.048 AAPL 98.00 100 NaN NaN
382385
383-
# we only asof within 2ms betwen the quote time and the trade time
386+
We only asof within 2ms betwen the quote time and the trade time
387+
384388
>>> pd.asof_merge(trades, quotes,
385389
... on='time',
386390
... by='ticker',
@@ -392,9 +396,10 @@ def merge_asof(left, right, on=None,
392396
3 2016-05-25 13:30:00.048 GOOG 720.92 100 720.50 720.93
393397
4 2016-05-25 13:30:00.048 AAPL 98.00 100 NaN NaN
394398
395-
# we only asof within 10ms betwen the quote time and the trade time
396-
# and we exclude exact matches on time. However *prior* data will
397-
# propogate forward
399+
We only asof within 10ms betwen the quote time and the trade time
400+
and we exclude exact matches on time. However *prior* data will
401+
propogate forward
402+
398403
>>> pd.asof_merge(trades, quotes,
399404
... on='time',
400405
... by='ticker',

0 commit comments

Comments
 (0)