Description
Describe the bug
As we've used the openai.ChatCompletion.create (with gpt-3.5-turbo), we've had intermittent
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
without a clear reproduction. At first I thought it was #91 and due to too many open connections to the OpenAI servers. Now I think it looks more like #368 instead, but I have some hypotheses about it. I'm opening a new issue separate from #368 in case they're different. If this is a duplicate, we can feel free to tack on my details there.
My hypothesis is that if you have a long running process (like a web server), and it calls out to OpenAI, that periods of inactivity cause the server side to terminate the connection and it takes a long time for the client to reestablish the connection. I dug into related issues on the requests side (like this one, psf/requests#4937) that hinted at the root cause. Essentially, what I think is happening is that,
- First connection is made to OpenAI, returns a result, requests maintains a connection under the hood with default keep-alive
- some time passes, in my experience, around 10 minutes should do
- New connection is made to OpenAI, but the client throws a ConnectionResetError
- A new call after this succeeds
I believe that the OpenAI servers are terminating the connection after a brief time (perhaps minutes) but the client still tries to keep it alive.
The reason why I think this is a bug worth reporting is that I think you could modify the client code so it responds more gracefully to these server-side settings. Changing some of the keep-alive settings from the default ones would help out several folks using this.
To Reproduce
- Write a long-running program. In our case, we have a Python web server running FastAPI
- As part of a route for the server, call OpenAI to do some work. In our case, we're calling openai.ChatCompletion.create with gpt-3.5-turbo to manipulate some input language and respond back with it
- Run the server and call the endpoint once
- Wait 10 minutes
- Call the endpoint again
- You'll likely get a Connection reset by peer issue on the second call
Code snippets
No response
OS
Linux
Python version
Python v3.8
Library version
openai-python 0.27.2