Skip to content

duplicated() performance and bug on long rows regression from 0.15.2->0.16.0 #10161

Closed
@eyaler

Description

@eyaler

the following works quickly in 0.15.2 and has a performance issue on the last operation df.T.duplicated() in 0.16.0 and 0.16.1
also on a private data set that works on 0.15.2 i get an error on 0.16.0 and 0.16.1 on the same operation.

code:

import pandas,numpy

df = pandas.DataFrame({'A': [1 for x in range(1000)],
                   'B': [1 for x in range(1000)]})

print (numpy.count_nonzero(df.duplicated()))
print (numpy.count_nonzero(df.T.duplicated()))

df = pandas.DataFrame({'A': [1 for x in range(1000000)],
                   'B': [1 for x in range(1000000)]})

print (numpy.count_nonzero(df.duplicated()))
print (numpy.count_nonzero(df.T.duplicated()))


this is the error i get on the private data set (code not reproduce yet with synthetic data):
  File "C:\Anaconda3\lib\site-packages\pandas\util\decorators.py", line 88, in wrapper
    return func(*args, **kwargs)
  File "C:\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2867, in duplicated
    labels, shape = map(list, zip( * map(f, vals)))
  File "C:\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2856, in f
    labels, shape = factorize(vals, size_hint=min(len(self), _SIZE_HINT_LIMIT))
  File "C:\Anaconda3\lib\site-packages\pandas\core\algorithms.py", line 135, in factorize
    labels = table.get_labels(vals, uniques, 0, na_sentinel)
  File "pandas\hashtable.pyx", line 813, in pandas.hashtable.PyObjectHashTable.get_labels (pandas\hashtable.c:14025)
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

Metadata

Metadata

Assignees

No one assigned

    Labels

    PerformanceMemory or execution speed performanceRegressionFunctionality that used to work in a prior pandas versionReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions