Skip to content

Effect of assigning list to DataFrame cell depends on unrelated column's type #25806

Open
@RauliRuohonen

Description

@RauliRuohonen

Code Sample, a copy-pastable example if possible

import pandas as pd

df = pd.DataFrame([{'foo': None, 'bar': None}], index=['a'])
df.loc['a', 'foo'] = ['123']
print(df)
print()

df = pd.DataFrame([{'foo': None, 'bar': None}], index=['a'])
df['bar'] = df['bar'].astype(float)
df.loc['a', 'foo'] = ['123']
print(df)
print()

df = pd.DataFrame([{'foo': None, 'bar': None}], index=['a'])
df.at['a', 'foo'] = ['123']
print(df)
print()

df = pd.DataFrame([{'foo': None, 'bar': None}], index=['a'])
df['bar'] = df['bar'].astype(float)
df.index = df.index.astype('category')
df.at['a', 'foo'] = ['123']
print(df)

Problem description

Output is:

    bar    foo
a  None  [123]

   bar  foo
a  NaN  123

    bar    foo
a  None  [123]

   bar  foo
a  NaN  123

Behavior of the operation df.loc['a', 'foo'] = ['123'] depends on the type of the bar column, even though the bar column has nothing to do with the operation. If one writes a function using such operations, and a new column containing some unrelated data is added to the data frame, the function will break. The dependency on unrelated columns is also surprising.

If bar is of type object, then the list ['123'] is assigned to the cell. If bar is of type float, then the string '123' is assigned to the cell. If the list is longer than one element, then in the latter case one gets the exception ValueError: Must have equal len keys and value when setting with an iterable.

Instead of loc, using at works consistently for the first two examples. However, if one changes the index's type to category instead of object as in the last two examples, then one observes the same inconsistency as with loc. This is even more surprising, since one gets the same ValueError: Must have equal len keys and value when setting with an iterable exception for longer lists, and at is supposed to do single element assignment, but that exception suggests that it is trying to do multiple element assignment instead.

Expected Output

    bar    foo
a  None  [123]

   bar  foo
a  NaN  [123]

    bar    foo
a  None  [123]

   bar  foo
a  NaN  [123]

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Darwin
OS-release: 17.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.2
pytest: None
pip: 18.1
setuptools: 39.0.1
Cython: 0.28.5
numpy: 1.15.2
scipy: 1.1.0
pyarrow: None
xarray: 0.10.9
IPython: 6.5.0
sphinx: 1.8.1
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.2.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: 1.0.1
sqlalchemy: 1.2.11
pymysql: None
psycopg2: 2.7.5 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: 0.1.6
pandas_gbq: None
pandas_datareader: None
gcsfs: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIndexingRelated to indexing on series/frames, not to indexes themselvesNested DataData where the values are collections (lists, sets, dicts, objects, etc.).

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions