Skip to content

How to integrate audio input in agent sdk? (not realtime or voice pipeline, e.g., gpt-4o audio) #738

Open
@101scholar

Description

@101scholar

Currently, input types include text, file and image. Is it possible to add audio input support, so that we can directly process audio input with potential function calling? Especially for models such as gpt-4o audio, and gemini-2.5-flash-preview.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionQuestion about using the SDK

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions