Skip to content

Messages with non-latin languages texts returns server errors (500 status code) #1217

Closed as not planned
@yovelcohen

Description

@yovelcohen

Confirm this is an issue with the Python library and not an underlying OpenAI API

  • This is an issue with the Python library

Describe the bug

I'm passing a message to openai, that might include some JSON in the message. those JSONs might include non-latin languages such as Hebrew, Arabic, and Japanese.
I think there's something wrong when the library builds the "Request" object, it encodes the string (or httpx does it) in a way that causes openai servers to return 500 status codes.

I debugged the request-making process. it seems that httpx is to blame here. If I look at the library's code in the AsyncAPIClient code.

When I print the "options" parameter async def _reuqest I see that a Hebrew string is fine, when I access request.content I can see that it's the hebrew is now encoded.

Now, I'm not sure how much the openai API or GPT4 care about working with those encoded texts, but I do know, that request with nested jsons in msg causes lots of 500 responses from GPT4-1106 on Azure.

To Reproduce

  1. build a message that contains a JSON string with some Hebrew/Arabic/Japanese text in it.
  2. try to send it to gpt-4 with openai Library.
  3. put a breakpoint in the row: raise self._make_status_error_from_response(err.response) from None
  4. see that it returns a 500, and accessing request.content, you see the message with the non-Latin encoded.

Code snippets

No response

OS

macOS

Python version

Python v3.11.7

Library version

openai v1.13.3

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions