Skip to content

BUG: Setting a numpy array as a column in Pandas uses only the first column of the array.  #57765

Open
@kgourgou

Description

@kgourgou

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas 
import numpy as np 
p = np.array([
              [1,2,3], 
              [1,2,3]
             ])

df = pandas.DataFrame(columns=['n', 'p'])
df['n'] = np.array([0,0])
df['p'] = p 

print(df)
   n  p
0  0  1
1  0  1

Issue Description

Passing the columns as separate arrays, but one of the arrays has the wrong dimensions Nx2 instead of Nx1. In that case, the dataframe column 'p' can be assigned that Nx2 array, but only the first column of the array is actually assigned to 'p'.

While this is hard to happen by accident, it's not impossible.

Expected Behavior

Either store each row of the array to the corresponding row of the dataframe or raise a warning/error for trying to store a NxM array as a column.

See for example:

import pandas 
import numpy as np 

p =          [
              [1,2], 
              [1,2]
             ]

df = pandas.DataFrame(columns=['n', 'p'])
df['n'] = [0,0]
df['p'] = p 

print(df)
   n       p
0  0  [1, 2]
1  0  [1, 2]

Installed Versions

INSTALLED VERSIONS

commit : bdc79c1
python : 3.10.12.final.0
python-bits : 64
OS : Linux
OS-release : 6.1.58+
Version : #1 SMP PREEMPT_DYNAMIC Sat Nov 18 15:31:17 UTC 2023
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.2.1
numpy : 1.25.2
pytz : 2023.4
dateutil : 2.8.2
setuptools : 67.7.2
pip : 23.1.2
Cython : 3.0.8
pytest : 7.4.4
hypothesis : None
sphinx : 5.0.2
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.4
html5lib : 1.1
pymysql : None
psycopg2 : 2.9.9
jinja2 : 3.1.3
IPython : 7.34.0
pandas_datareader : 0.10.0
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : 2023.6.0
gcsfs : 2023.6.0
matplotlib : 3.7.1
numba : 0.58.1
numexpr : 2.9.0
odfpy : None
openpyxl : 3.1.2
pandas_gbq : 0.19.2
pyarrow : 14.0.2
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.11.4
sqlalchemy : 2.0.28
tables : 3.8.0
tabulate : 0.9.0
xarray : 2023.7.0
xlrd : 2.0.1
zstandard : None
tzdata : 2024.1
qtpy : None
pyqt5 : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIndexingRelated to indexing on series/frames, not to indexes themselvesNeeds DiscussionRequires discussion from core team before further action

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions