Skip to content

BUG: Memory leak on v1.5.0 with Future Warning #49166

Closed
@klaucos

Description

@klaucos

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import tracemalloc

import numpy as np
import pandas as pd

# Un-comment this line for a first run to generate random df
# df = pd.DataFrame(np.random.randn(10000, 4), columns=list('ABCD'))
# df.to_csv('out.csv')


def read_csv():
    df = pd.read_csv('out.csv')
    double_df = df.append(df)
    print(len(double_df))


tracemalloc.start()
for i in range(100):
    read_csv()
    snapshot = tracemalloc.take_snapshot()
    top_stats = snapshot.statistics('lineno')
    print(f'Iteration {i} ##############')
    for stat in top_stats[:10]:
        if 'pandas' in str(stat):
            print(stat)

Issue Description

Operation df.append raises a future warning and this probably causes that the memory is not properly released afterwards/ memory leak. (This did not occur in 1.4.3)
Here is the comparison of iteration stats from tracemalloc:

Iteration 1 ##############
python3.8/site-packages/pandas/core/internals/managers.py:2279: size=782 KiB, count=10, average=78.2 KiB
python3.8/site-packages/pandas/core/indexes/range.py:203: size=158 KiB, count=7, average=22.5 KiB
python3.8/site-packages/pandas/core/construction.py:775: size=3424 B, count=4, average=856 B

Iteration 99 ##############
python3.8/site-packages/pandas/core/internals/managers.py:2279: size=38.2 MiB, count=500, average=78.2 KiB
python3.8/site-packages/pandas/core/indexes/range.py:203: size=7823 KiB, count=204, average=38.3 KiB
python3.8/site-packages/pandas/core/internals/managers.py:182: size=65.6 KiB, count=1399, average=48 B
python3.8/site-packages/pandas/core/internals/managers.py:1837: size=33.7 KiB, count=603, average=57 B
python3.8/site-packages/pandas/core/internals/managers.py:2243: size=30.3 KiB, count=402, average=77 B
python3.8/site-packages/pandas/core/indexes/base.py:703: size=30.2 KiB, count=595, average=52 B

The issue was not present in pandas 1.4.3:

# Pandas 1.4.3
Iteration 99 ##############
python3.8/site-packages/pandas/io/parsers/c_parser_wrapper.py:75: size=4332 B, count=108, average=40 B
python3.8/site-packages/pandas/io/parsers/c_parser_wrapper.py:225: size=3530 B, count=18, average=196 B
python3.8/site-packages/pandas/core/construction.py:728: size=3424 B, count=4, average=856 B
python3.8/site-packages/pandas/util/_decorators.py:311: size=1976 B, count=3, average=659 B

It's probable that this BUG affects all pandas warnings and errors in pandas 1.5.0.

Expected Behavior

It's expected that pandas won't leak memory even when future warning occurs.

Installed Versions

commit : 87cfe4e
python : 3.8.10.final.0
python-bits : 64
OS : Linux
OS-release : 5.15.0-50-generic
Version : 20.04.1-Ubuntu
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.5.0
numpy : 1.22.3
pytz : 2022.1
dateutil : 2.8.2
setuptools : 45.2.0
pip : 20.0.2
Cython : None
pytest : 7.1.2
hypothesis : None
sphinx : 5.0.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.3
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.8.4
jinja2 : 3.0.1
IPython : None
pandas_datareader: 0.10.0
bs4 : 4.8.2
bottleneck : None
brotli : None
fastparquet : None
fsspec : 2021.07.0
gcsfs : None
matplotlib : 3.4.2
numba : None
numexpr : None
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : 0.4.2
scipy : 1.8.0
snappy : None
sqlalchemy : None
tables : None
tabulate : 0.8.6
xarray : None
xlrd : None
xlwt : None
zstandard : None
tzdata : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugNeeds TriageIssue that has not been reviewed by a pandas team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions