Skip to content

memory leak, consumes entire system after just days of usage #1181

Closed
@pseudotensor

Description

@pseudotensor

Confirm this is an issue with the Python library and not an underlying OpenAI API

  • This is an issue with the Python library

Describe the bug

Progressive memory leak despite use of close(). Leads to eventually OOM on even large memory systems after just a few days.

Related: #820

To Reproduce

I'm using 1.12.0 and still hit this issue.

Using text completion. I'm also using close() in a try-finally, so close() does not help. A global client connection doesn't make sense to enforce. The old client pre 1.x had global attributes that made the API poor. I presume there are some legacy parts still in place.

In h2oGPT, we use OpenAI client for OpenAI or vLLM connections, and I see a 5GB memory leak for every 6000 connections. This happens whether I yield the generator for streaming or just exit after creating the completion.

Normally connections are not as intense, but this was easily reproducible by bisecting the OpenAI creation/generation parts of the code. For typical workloads this leads to OOM on a 256GB system after just few days of usage.

Here is repro. Please choose the to be some endpoint that you have setup like vLLM or TGI or gpt3.5 turbo so not expensive. Choose api_key and model accordingly.

import os
import psutil
from openai import OpenAI

for i in range(6000):

    client_args = dict(base_url='<choose>', api_key="EMPTY")
    client = OpenAI(**client_args)

    responses = client.completions.create(
        model='h2oai/h2ogpt-4096-llama2-13b-chat',
        prompt="Say exactly one word.",
        stream=True,
    )
    client.close()
    p = psutil.Process(os.getpid())
    print(p.memory_full_info())

The memory consumed is not increasing every step in loop, but it does monotonically increase from pss=48523264 to pss=107862016 within a few minutes (i.e. doubled) and continues this indefinitely.

The problem seems to be even stronger when doing concurrent requests in multi-threaded setup, as if the clean-up is not thread safe. I'm trying to put together a repro that would showcase the 5GB after 6000 connections that only takes half hour to run. But perhaps the above is sufficient.

Code snippets

No response

OS

ubuntu 22

Python version

Python v3.10

Library version

openai v1.12.0

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions