Description
In nasa_13013.mp4.audio.mp3.stream0.all_frames_info.json
, which we created using ffprobe
, the first frame of nasa_13013.mp4.audio.mp3
has pts 0.138125 and duration 0.005875:
However, the corresponding AVFrame'
s pts
and duration fields do not match these value. Instrumenting our decoder to print the first few frames infos of the decoded frames (the ones we return):
diff --git a/src/torchcodec/decoders/_core/VideoDecoder.cpp b/src/torchcodec/decoders/_core/VideoDecoder.cpp
index 0e287a5..5e9cc8a 100644
--- a/src/torchcodec/decoders/_core/VideoDecoder.cpp
+++ b/src/torchcodec/decoders/_core/VideoDecoder.cpp
@@ -1152,6 +1152,8 @@ VideoDecoder::FrameOutput VideoDecoder::convertAVFrameToFrameOutput(
avFrame->pts, formatContext_->streams[streamIndex]->time_base);
frameOutput.durationSeconds = ptsToSeconds(
getDuration(avFrame), formatContext_->streams[streamIndex]->time_base);
+
+ printf("AVFrame pts = %f, duration = %f, num_samples = %d\n", frameOutput.ptsSeconds, frameOutput.durationSeconds, avFrame->nb_samples);
if (streamInfo.avMediaType == AVMEDIA_TYPE_AUDIO) {
convertAudioAVFrameToFrameOutputOnCPU(
avFrameStream, frameOutput, preAllocatedOutputTensor);
AVFrame pts = 0.072000, duration = 0.072000, num_samples = 47
AVFrame pts = 0.144000, duration = 0.072000, num_samples = 576
AVFrame pts = 0.216000, duration = 0.072000, num_samples = 576
We can see that there's a disagreement with the first frame. It's likely that ffprobe
is correct here, and that the correct pts and duration are 0.138125 and 0.005875: the file has a sample rate of 8000, and 47 samples at this rate yields exactly 0.005875 seconds, while 0.144000 - 0.005875 == 0.138125. In contrast, it's impossible for the frame duration to be equal to 0.072000 with only 47 samples at this sample rate.
I don't really know how to fix this for now. The pts we return are from FFmpeg itself (set by the AVFrame!), and we're just trusting it, but clearly it's wrong for this first frame. ffprobe
seems to do something smarter, and I don't know what it is. It's possible that there's a field in AVFrame
that I'm missing? Or ffprobe is just realizing that with 47 samples at this rate, the start of the first frame must be 0.138125 - but that means it's looking at the second frame, and that it trusts its values are correct??
I don't know.
Note: This bug only affects the first frame of nasa_13013.mp4.audio.mp3
. All other frames are fine, as can be seen in #554.