Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
# Your code here
from scipy.sparse import coo_matrix, eye
import pandas as pd
df = pd.DataFrame.sparse.from_spmatrix(eye(10))
df.sparse.density
# Should output 0.1
df.loc[range(5)]
# 0 1 2 3 4 5 6 7 8 9
# 0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
# 1 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
# 2 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
# 3 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0
# 4 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0
df.loc[range(5)].sparse.density
# outputs 0.1
df.loc[range(5)].loc[range(3)]
# 0 1 2 3 4 5 6 7 8 9
# 0 1.0 0.0 0.0 0.0 0.0 NaN NaN NaN NaN NaN
# 1 0.0 1.0 0.0 0.0 0.0 NaN NaN NaN NaN NaN
# 2 0.0 0.0 1.0 0.0 0.0 NaN NaN NaN NaN NaN
df.loc[range(5)].loc[range(3)].sparse.density
# outputs 0.6
Problem description
It seems that sparse DataFrame extracted using loc does not behave as the original ones. This creates inconsistencies in our processing pipelines, depending on filtering and selection that has been applied, sometimes producing "Nans", severely impacting memory consumption and computational time.
Expected Output
The output of loc should not depend on multiple slicing, i.e.
df.loc[range(5)].loc[range(3)] = df.loc[range(3)]
Output of pd.show_versions()
INSTALLED VERSIONS
commit : None
pandas : 1.0.4
numpy : 1.18.5
pytz : 2020.1
dateutil : 2.8.1
pip : 9.0.3
setuptools : 39.0.1
Cython : 0.29.19
pytest : 5.4.3
hypothesis : 5.16.1
sphinx : 3.1.0
blosc : 1.9.1
feather : None
xlsxwriter : 1.2.9
lxml.etree : 4.5.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.15.0
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fastparquet : 0.4.0
gcsfs : None
lxml.etree : 4.5.1
matplotlib : 3.2.1
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : 0.17.1
pytables : None
pytest : 5.4.3
pyxlsb : None
s3fs : 0.4.2
scipy : 1.4.1
sqlalchemy : 1.3.17
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.15.1
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.9
numba : 0.49.1