memory leak, consumes entire system after just days of usage

### Confirm this is an issue with the Python library and not an underlying OpenAI API

- [X] This is an issue with the Python library

### Describe the bug

Progressive memory leak despite use of close().  Leads to eventually OOM on even large memory systems after just a few days.

Related: https://github.com/openai/openai-python/issues/820

### To Reproduce

I'm using 1.12.0 and still hit this issue.

Using text completion.  I'm also using close() in a try-finally, so close() does not help.  A global client connection doesn't make sense to enforce.  The old client pre 1.x had global attributes that made the API poor.  I presume there are some legacy parts still in place.

In h2oGPT, we use OpenAI client for OpenAI or vLLM connections, and I see a 5GB memory leak for every 6000 connections.  This happens whether I yield the generator for streaming or just exit after creating the completion.

Normally connections are not as intense, but this was easily reproducible by bisecting the OpenAI creation/generation parts of the code.  For typical workloads this leads to OOM on a 256GB system after just few days of usage.

Here is repro.  Please choose the <choose> to be some endpoint that you have setup like vLLM or TGI or gpt3.5 turbo so not expensive.  Choose api_key and model accordingly.

```
import os
import psutil
from openai import OpenAI

for i in range(6000):

    client_args = dict(base_url='<choose>', api_key="EMPTY")
    client = OpenAI(**client_args)

    responses = client.completions.create(
        model='h2oai/h2ogpt-4096-llama2-13b-chat',
        prompt="Say exactly one word.",
        stream=True,
    )
    client.close()
    p = psutil.Process(os.getpid())
    print(p.memory_full_info())
```

The memory consumed is not increasing every step in loop, but it does monotonically increase from pss=48523264 to pss=107862016 within a few minutes (i.e. doubled) and continues this indefinitely.

The problem seems to be even stronger when doing concurrent requests in multi-threaded setup, as if the clean-up is not thread safe.  I'm trying to put together a repro that would showcase the 5GB after 6000 connections that only takes half hour to run.  But perhaps the above is sufficient.


### Code snippets

_No response_

### OS

ubuntu 22

### Python version

Python v3.10

### Library version

openai v1.12.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

memory leak, consumes entire system after just days of usage #1181

Confirm this is an issue with the Python library and not an underlying OpenAI API

Describe the bug

To Reproduce

Code snippets

OS

Python version

Library version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

memory leak, consumes entire system after just days of usage #1181

Description

Confirm this is an issue with the Python library and not an underlying OpenAI API

Describe the bug

To Reproduce

Code snippets

OS

Python version

Library version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions