Feature request for `transformers` use-cases

### 🚀 The feature

Hi :wave:

First of all, huge thanks to you and the team, the latest `torchcodec` release with audio support is fantastic! It's a long-awaited feature

I'm the maintainer of multimodal models in `transformers` and I'm thinking to use `torchcodec` to load multimodal data for MLLMs. Looking forward for a stable version to be released. For now, I’ve been testing the latest release and noticed a few points that might be useful to consider for future support.

1. Mono channel audio support: Some audio models (like Whisper from Hugging Face) only support mono-channel input. It would be helpful if audio loading allowed channel selection or converted stereo to mono optionally.

2. Fallback for video files with no audio: When loading audio from a video file that has no audio stream, an error is raised currently. A more flexible behavior would be to return `None`, similar to how `moviepy` handles it and can be checked as `if clip.audio is not None`.

3. Loading from URL: Loading audio/video from URLs seems to work for some urls I have tested with, though I couldn’t find in the docs whether URL input is officially supported. Hope it will be officially supported for the stable release

4. Video decoder issues with `avi` format: When trying to load `avi` files, the decoder fails to infer duration and related metadata, which prevents sampling frames by seconds. Loading the same video saved as `mp4` resolves the issue. You can try this [video](https://huggingface.co/datasets/raushan-testing-hf/videos-test/blob/main/sample_video_2.avi) as an example.

Let me know if you'd like me to file any of these separately or provide reproducible examples. Thanks again for the awesome work!

### Motivation, pitch

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature request for `transformers` use-cases #673

🚀 The feature

Motivation, pitch

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature request for transformers use-cases #673

Description

🚀 The feature

Motivation, pitch

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Feature request for `transformers` use-cases #673