duplicated() performance and bug on long rows regression from 0.15.2->0.16.0

the following works quickly in 0.15.2 and has a performance issue on the last operation df.T.duplicated() in 0.16.0 and 0.16.1
also on a private data set that works on 0.15.2 i get an error on 0.16.0 and 0.16.1 on the same operation.

code:

``` python
import pandas,numpy

df = pandas.DataFrame({'A': [1 for x in range(1000)],
                   'B': [1 for x in range(1000)]})

print (numpy.count_nonzero(df.duplicated()))
print (numpy.count_nonzero(df.T.duplicated()))

df = pandas.DataFrame({'A': [1 for x in range(1000000)],
                   'B': [1 for x in range(1000000)]})

print (numpy.count_nonzero(df.duplicated()))
print (numpy.count_nonzero(df.T.duplicated()))


this is the error i get on the private data set (code not reproduce yet with synthetic data):
  File "C:\Anaconda3\lib\site-packages\pandas\util\decorators.py", line 88, in wrapper
    return func(*args, **kwargs)
  File "C:\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2867, in duplicated
    labels, shape = map(list, zip( * map(f, vals)))
  File "C:\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2856, in f
    labels, shape = factorize(vals, size_hint=min(len(self), _SIZE_HINT_LIMIT))
  File "C:\Anaconda3\lib\site-packages\pandas\core\algorithms.py", line 135, in factorize
    labels = table.get_labels(vals, uniques, 0, na_sentinel)
  File "pandas\hashtable.pyx", line 813, in pandas.hashtable.PyObjectHashTable.get_labels (pandas\hashtable.c:14025)
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

duplicated() performance and bug on long rows regression from 0.15.2->0.16.0 #10161

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

duplicated() performance and bug on long rows regression from 0.15.2->0.16.0 #10161

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions