Skip to content

Pandas wrongly read Scipy sparse matrix #29814

Closed
@m7142yosuke

Description

@m7142yosuke

Code Sample, a copy-pastable example if possible

>>> from scipy.sparse import coo_matrix
>>> sparse_data = coo_matrix((
...     [0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
...     ([0, 1, 1, 2, 2, 3, 3], [0, 1, 2, 0, 2, 0, 1])
... ))
>>> sparse_data.todense()
matrix([[0., 0., 0.],
        [0., 1., 1.],
        [1., 0., 1.],
        [1., 1., 0.]])
>>> pd.DataFrame.sparse.from_spmatrix(sparse_data)
     0    1    2
0  0.0  0.0  0.0
1  0.0  1.0  1.0
2  0.0  0.0  1.0
3  1.0  1.0  0.0
>>> pd.DataFrame.sparse.from_spmatrix(sparse_data.tocsr())
     0    1    2
0  0.0  0.0  0.0
1  0.0  1.0  1.0
2  0.0  0.0  1.0
3  1.0  1.0  0.0

Problem description

sparse_data.todense() and the other matrixes should be same, but not.
(i.e. the [2, 0] value in bottom 2 matrixes should be 1.)

Expected Output

When converting sparse matrix to Pandas DataFrame values of the sparse array should stay the same.

Output of pd.show_versions()

python : 3.7.5.final.0
pandas : 0.25.3
numpy : 1.17.4
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 41.6.0
Cython : None
pytest : 5.3.0
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.2
sqlalchemy : None
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions