-
Notifications
You must be signed in to change notification settings - Fork 39
Return pts of first frame in audio API #552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -838,7 +838,7 @@ VideoDecoder::FrameBatchOutput VideoDecoder::getFramesPlayedInRange( | |
return frameBatchOutput; | ||
} | ||
|
||
torch::Tensor VideoDecoder::getFramesPlayedInRangeAudio( | ||
VideoDecoder::AudioFramesOutput VideoDecoder::getFramesPlayedInRangeAudio( | ||
double startSeconds, | ||
std::optional<double> stopSecondsOptional) { | ||
validateActiveStream(AVMEDIA_TYPE_AUDIO); | ||
|
@@ -854,7 +854,7 @@ torch::Tensor VideoDecoder::getFramesPlayedInRangeAudio( | |
|
||
if (startSeconds == stopSeconds) { | ||
// For consistency with video | ||
return torch::empty({0}); | ||
return AudioFramesOutput{torch::empty({0}), 0.0}; | ||
} | ||
|
||
StreamInfo& streamInfo = streamInfos_[activeStreamIndex_]; | ||
|
@@ -871,17 +871,24 @@ torch::Tensor VideoDecoder::getFramesPlayedInRangeAudio( | |
// TODO-AUDIO Pre-allocate a long-enough tensor instead of creating a vec + | ||
// cat(). This would save a copy. We know the duration of the output and the | ||
// sample rate, so in theory we know the number of output samples. | ||
std::vector<torch::Tensor> tensors; | ||
std::vector<torch::Tensor> frames; | ||
|
||
double firstFramePtsSeconds = std::numeric_limits<double>::max(); | ||
auto stopPts = secondsToClosestPts(stopSeconds, streamInfo.timeBase); | ||
auto finished = false; | ||
while (!finished) { | ||
try { | ||
AVFrameStream avFrameStream = decodeAVFrame([startPts](AVFrame* avFrame) { | ||
return startPts < avFrame->pts + getDuration(avFrame); | ||
}); | ||
// TODO: it's not great that we are getting a FrameOutput, which is | ||
// intended for videos. We should consider bypassing | ||
// convertAVFrameToFrameOutput and directly call | ||
// convertAudioAVFrameToFrameOutputOnCPU. | ||
auto frameOutput = convertAVFrameToFrameOutput(avFrameStream); | ||
tensors.push_back(frameOutput.data); | ||
firstFramePtsSeconds = | ||
std::min(firstFramePtsSeconds, frameOutput.ptsSeconds); | ||
frames.push_back(frameOutput.data); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's now a bit ugly that we are manipulating both a FrameOutput and an This is mostly because There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, yeah, that's cumbersome. If we're in an audio-only call, we shouldn't have to deal with video structures. |
||
} catch (const EndOfFileException& e) { | ||
finished = true; | ||
} | ||
|
@@ -895,7 +902,8 @@ torch::Tensor VideoDecoder::getFramesPlayedInRangeAudio( | |
finished |= (streamInfo.lastDecodedAvFramePts) <= stopPts && | ||
(stopPts <= lastDecodedAvFrameEnd); | ||
} | ||
return torch::cat(tensors, 1); | ||
|
||
return AudioFramesOutput{torch::cat(frames, 1), firstFramePtsSeconds}; | ||
} | ||
|
||
// -------------------------------------------------------------------------- | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to create a new
AudioFramesOutput
struct instead of relying on the existingFrameOutput
, which contains unnecessary fields likestreamIndex
anddurationSeconds
. For now, those aren't needed for audio. No super strong opinion though.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's the right call, since the tensors are going to be very different.