Skip to content

BUG: uint8 silently converted to int8 during dataframe creation #43733

Closed
@ntjess

Description

@ntjess

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

# !pip install -U pandas
import pandas as pd
import numpy as np

# Create arbitrary 3-channel 8-bit image
data = np.random.randint(0, 256, size=(50,50, 3), dtype='uint8')

# Replicate this image and treat each image as a row of pixel features
rows = [data.reshape(-1)]*2
assert all(r.dtype == np.uint8 for r in rows)

print(pd.DataFrame(rows).dtypes)
# int8???

print(pd.DataFrame(np.vstack(rows)).dtypes)
# uint8 (expected)

Issue Description

Apologies if this issue already exists -- type-related issues are hard for me to navigate in github since there are too many to easily parse.

uint8 subarray dtypes are silently converted to int8 when constructing a dataframe from lists.

I tried testing on master but Windows has a build error for me (stack trace below),

building 'pandas._libs.algos' extension
  error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/
  ----------------------------------------
  ERROR: Failed building wheel for pandas
Failed to build pandas
ERROR: Could not build wheels for pandas which use PEP 517 and cannot be installed directly

Expected Behavior

The dtype of each unit in each column is a uint8, so I would expect uint8 resulting columns

Installed Versions

INSTALLED VERSIONS

commit : 73c6825
python : 3.9.4.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19041
machine : AMD64
processor : Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.1252

pandas : 1.3.3
numpy : 1.19.5
pytz : 2021.1
dateutil : 2.8.1
pip : 21.1.3
setuptools : 52.0.0.post20210125
Cython : None
pytest : 6.2.4
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 1.4.3
lxml.etree : 4.6.3
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.0.1
IPython : 7.23.1
pandas_datareader: None
bs4 : 4.9.3
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.4.2
numexpr : None
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.6.3
sqlalchemy : 1.4.15
tables : None
tabulate : None
xarray : None
xlrd : 2.0.1
xlwt : None
numba : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugConstructorsSeries/DataFrame/Index/pd.array ConstructorsDtype ConversionsUnexpected or buggy dtype conversions

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions