Skip to content

Return pts of first frame in audio API #552

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 12, 2025

Conversation

NicolasHug
Copy link
Member

I thought we didn't need this, but we actually need it in order to implement the public sample-based API (which is out of scope for this PR).

@NicolasHug NicolasHug requested a review from scotts March 12, 2025 18:07
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 12, 2025
@@ -838,7 +838,7 @@ VideoDecoder::FrameBatchOutput VideoDecoder::getFramesPlayedInRange(
return frameBatchOutput;
}

torch::Tensor VideoDecoder::getFramesPlayedInRangeAudio(
VideoDecoder::AudioFramesOutput VideoDecoder::getFramesPlayedInRangeAudio(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to create a new AudioFramesOutput struct instead of relying on the existing FrameOutput, which contains unnecessary fields like streamIndex and durationSeconds. For now, those aren't needed for audio. No super strong opinion though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's the right call, since the tensors are going to be very different.

tensors.push_back(frameOutput.data);
firstFramePtsSeconds =
std::min(firstFramePtsSeconds, frameOutput.ptsSeconds);
frames.push_back(frameOutput.data);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's now a bit ugly that we are manipulating both a FrameOutput and an AudioFramesOutput here in this function.

This is mostly because convertAVFrameToFrameOutput returns a FrameOutput, but maybe we could bypass it and directly call convertAudioAVFrameToFrameOutputOnCPU. I'll write a TODO to investigate this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yeah, that's cumbersome. If we're in an audio-only call, we shouldn't have to deal with video structures.

@NicolasHug NicolasHug merged commit d75fc58 into pytorch:main Mar 12, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants