Skip to content

BUG: Series.update() raises ValueError if dtype="string" #33980

Closed
@RagBlufThim

Description

@RagBlufThim
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

import pandas as pd
a = pd.Series(["a", None, "c"], dtype="string")
b = pd.Series([None, "b", None], dtype="string")
a.update(b)

results in:

Traceback (most recent call last):

  File "<ipython-input-15-b9da8f25067a>", line 1, in <module>
    a.update(b)

  File "C:\tools\anaconda3\envs\Simple\lib\site-packages\pandas\core\series.py", line 2810, in update
    self._data = self._data.putmask(mask=mask, new=other, inplace=True)

  File "C:\tools\anaconda3\envs\Simple\lib\site-packages\pandas\core\internals\managers.py", line 564, in putmask
    return self.apply("putmask", **kwargs)

  File "C:\tools\anaconda3\envs\Simple\lib\site-packages\pandas\core\internals\managers.py", line 442, in apply
    applied = getattr(b, f)(**kwargs)

  File "C:\tools\anaconda3\envs\Simple\lib\site-packages\pandas\core\internals\blocks.py", line 1676, in putmask
    new_values[mask] = new

  File "C:\tools\anaconda3\envs\Simple\lib\site-packages\pandas\core\arrays\string_.py", line 248, in __setitem__
    super().__setitem__(key, value)

  File "C:\tools\anaconda3\envs\Simple\lib\site-packages\pandas\core\arrays\numpy_.py", line 252, in __setitem__
    self._ndarray[key] = value

ValueError: NumPy boolean array indexing assignment cannot assign 3 input values to the 1 output values where the mask is true

Problem description

The example works if I leave off the dtype="string" (resulting in the implicit dtype object).
IMO update should work for all dtypes, not only the "old" ones.

a = pd.Series([1, None, 3], dtype="Int16") etc. also raises ValueError, while the same with dtype="float64"works.

It looks as if update doesn't work with the new nullable dtypes (the ones with pd.NA).

Expected Output

The expected result is that a.update(b) updates a without raising an exception, not only for object and float64, but also for string and Int16 etc..

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.7.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 ..., GenuineIntel
...

pandas : 1.0.3
numpy : 1.18.1
...

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugNA - MaskedArraysRelated to pd.NA and nullable extension arraysStringsString extension data type and string data

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions