Skip to content

pandas slow replace with int64 columns in dataframe #28084

Closed
@apiszcz

Description

@apiszcz

Reproduced issue in part without n2 and n3 and with n2 n3 see prior issue:
#12257

The replace runs 10x slower when encountering int64 columns

WITHOUT n2,n3, 1.73 seconds

a1=np.zeros((40000000,1))
a1[:,:]=np.inf
df=pd.DataFrame(a1)
df['a1']=''
df['n1']=0.0
# df['n2']=0.0
# df['n2']=df['n2'].astype(np.int64)
# df['n3']=0.0
# df['n3']=df['n3'].astype(np.int64)
df['n1'].astype('datetime64[ns]')
df['a1']=df['a1'].astype('category')
%time df.replace([np.inf, -np.inf], np.nan)

WITH n2, n3 21.4 seconds

a1=np.zeros((40000000,1))
a1[:,:]=np.inf
df=pd.DataFrame(a1)
df['a1']=''
df['n1']=0.0
df['n2']=0.0
df['n2']=df['n2'].astype(np.int64)
df['n3']=0.0
df['n3']=df['n3'].astype(np.int64)
df['n1'].astype('datetime64[ns]')
df['a1']=df['a1'].astype('category')
%time df.replace([np.inf, -np.inf], np.nan)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions