Description
Code Sample
import pandas as pd
x1 = pd.DataFrame({'a': [1, 2, 3], 'b': [1, 1, 1]})
x2 = pd.DataFrame({'a': [2, 2, 2], 'b': [1, 1, 1]})
aggregation = {'a': 'unique', 'b': 'unique'}
agg1 = x1.agg(aggregation)
agg2 = x2.agg(aggregation)
print("First aggregation:", type(agg1))
print(agg1)
print("Second aggregation:", type(agg2))
print(agg2)
Output
First aggregation: <class 'pandas.core.series.Series'>
a [1, 2, 3]
b [1]
dtype: object
Second aggregation: <class 'pandas.core.frame.DataFrame'>
a b
0 2 1
Problem description
When performing 'unique' aggregations on a dataframe, the results can be returned as different types in an unexpected manner.
Generally, when performing a 'unique' aggregation on several columns of a dataframe as done above, a pandas.Series
of numpy arrays is returned, with one element per aggregation column. This, I think, is the expected behavior, and is demonstrated in the first aggregation above.
However, there is a special case. When all aggregation columns have exactly 1 unique element, a pandas.DataFrame
with one row is returned instead. I'm pretty sure this is unintended behavior, and it requires special case handling when doing such aggregations.
Expected Output
First aggregation: <class 'pandas.core.series.Series'>
a [1, 2, 3]
b [1]
dtype: object
Second aggregation: <class 'pandas.core.series.Series'>
a [2]
b [1]
dtype: object
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Darwin
OS-release: 17.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.23.1
pytest: None
pip: 9.0.3
setuptools: 39.0.1
Cython: None
numpy: 1.14.5
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.7.5
patsy: 0.5.0
dateutil: 2.7.0
pytz: 2018.3
blosc: None
bottleneck: None
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.2.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: 1.2.5
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None