Skip to content

API: Implement interval-point joins #21901

Open
@jschendel

Description

@jschendel

For clarification, by "interval-point" joins I mean joining an IntervalIndex/IntervalArray against the point values contained in the intervals, e.g. joining a numeric IntervalIndex against a Float64Index. I want to keep this discussion separate from interval-interval merges for the time being.

For example, the following join does not currently work (and likewise merge with column data):

In [2]: df1 = pd.DataFrame({'A': [10, 20, 30]}, index=pd.interval_range(0, 3))

In [3]: df2 = pd.DataFrame({'B': ['foo', 'bar', 'baz', 'qux']},
   ...:                    index=[0.5, 1, 2.71828, 3.14159])

In [4]: df1
Out[4]: 
         A
(0, 1]  10
(1, 2]  20
(2, 3]  30

In [5]: df2
Out[5]: 
           B
0.50000  foo
1.00000  bar
2.71828  baz
3.14159  qux

In [6]: df1.join(df2)
Out[6]: 
         A    B
(0, 1]  10  NaN
(1, 2]  20  NaN
(2, 3]  30  NaN

I think the behavior of such a join/merge is straight forward for left/right joins, but is a little bit less clear for inner/outer joins. For inner (outer) joins one takes the intersection (union) of both indexes as the resulting index values. This makes sense when both indexes contain the same type of objects, but this is not the case for interval-point joins. I can't think of a consistent way to handle inner/outer joins, and not entirely if they even make sense. A few options:

  • Do not support inner/outer interval-point joins
  • For inner joins:
    • always keep the intervals and filter any non-matches?
    • default to the left index and filter any non-matches?
  • For outer joins:
    • union any non-matching point values for an object dtype?
      • obviously non-performant and a bit weird
    • coerce non-matching points to degenerate intervals (left == right) and union for an interval dtype?
  • Use a new API for non-exact interval joins?

I'm leaning towards just using the existing API not supporting inner/outer for the time being, but would appreciate any thoughts.

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementIntervalInterval data typeReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions