strange dtype behaviour as function of series length

Found when tracking down what was going on with [this question](http://stackoverflow.com/questions/24028281/pandas-series-operations-very-slow-after-upgrade#comment37041169_24028281) about performance.

First the case that makes sense:

```
>>> s = pd.Series(range(10**3), dtype=np.int32)
>>> s.dtype
dtype('int32')
>>> s.dtype.type
<type 'numpy.int32'>
>>> s.dtype.type in pd.lib._TYPE_MAP
True
>>> 
>>> orig_sum_type = (s+s).dtype.type
>>> orig_sum_type
<type 'numpy.int32'>
>>> orig_sum_type in pd.lib._TYPE_MAP
True
```

Now let's increase the length of the series.

```
>>> s = pd.Series(range(10**5), dtype=np.int32)
>>> s.dtype
dtype('int32')
>>> s.dtype.type
<type 'numpy.int32'>
>>> s.dtype.type in pd.lib._TYPE_MAP
True
>>> 
>>> new_sum_type = (s+s).dtype.type
>>> new_sum_type
<type 'numpy.int32'>
>>> new_sum_type in pd.lib._TYPE_MAP
False
```

.. wait, what?

```
>>> orig_sum_type, new_sum_type
(<type 'numpy.int32'>, <type 'numpy.int32'>)
>>> orig_sum_type == new_sum_type
False
>>> orig_sum_type is new_sum_type
False
>>> np.int32 is orig_sum_type
True
>>> np.int32 is new_sum_type
False
```

We've now got a new `numpy.int32` type floating around, not equal to the one in `numpy`.  The crossover seems to be at 10k:

```
>>> def find_first():
...         for i in range(1, 10**5):
...                 s = pd.Series(range(i), dtype=np.int32)
...                 if (s+s).dtype.type not in pd.lib._TYPE_MAP:
...                         return i
...         
>>> find_first()
10001
```

ISTM that this lack of recognition of the dtype as in `_TYPE_MAP` prevents the early exit from being taken in `infer_dtype` upon recognition that it's an integer dtype, and that slows things down considerably.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

strange dtype behaviour as function of series length #7332

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

strange dtype behaviour as function of series length #7332

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions