Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
import pandas as pd
df = pd.DataFrame([[1,1,1,1,0,1]]).T
sdf1 = df.astype(pd.SparseDtype(int, fill_value=1))
sdf2 = sdf1.astype(pd.SparseDtype(int, fill_value=0))
sdf3 = sdf1.astype('Sparse[int]')
print(sdf1)
print(sdf2)
print(sdf3)
Output:
>>> print(sdf1)
0
0 1
1 1
2 1
3 1
4 0
5 1
>>> print(sdf2)
0
0 0
1 0
2 0
3 0
4 0
5 0
>>> print(sdf3)
0
0 0
1 0
2 0
3 0
4 0
5 0
Expected Output
>>> print(sdf1)
0
0 1
1 1
2 1
3 1
4 0
5 1
>>> print(sdf2)
0
0 1
1 1
2 1
3 1
4 0
5 1
>>> print(sdf3)
0
0 1
1 1
2 1
3 1
4 0
5 1
Problem description
Performing astype
operation should not change the data in the dataframe. Currently doing astype
on a dataframe containing sparse columns to a SparseDtype
with a different fill_value
results in data loss, e.g. everything becomes 0 for sparse ints. I believe similar phenomenon might exist for other sparse dtypes but have not tested that.
Output of pd.show_versions()
INSTALLED VERSIONS
commit : d9fff27
python : 3.7.4.final.0
python-bits : 64
OS : Linux
OS-release : 4.4.0-179-generic
Version : #209-Ubuntu SMP Fri Apr 24 17:48:44 UTC 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.1.0
numpy : 1.18.2
pytz : 2019.3
dateutil : 2.8.1
pip : 20.2.2
setuptools : 46.1.3
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.8.4 (dt dec pq3 ext lo64)
jinja2 : 2.11.1
IPython : 7.13.0
pandas_datareader: None
bs4 : 4.8.2
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.2.1
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.15
tables : 3.6.1
tabulate : 0.8.7
xarray : None
xlrd : 1.2.0
xlwt : None
numba : 0.49.1