Closed
Description
Alternative title: The C++ and core ops work fine as long as we add only one stream. They break if we add more than one stream.
Example 1:
from torchcodec.decoders import _core as core
# This video has stream 0 with dimensions torch.Size([3, 180, 320]) and stream 3 with dimensions torch.Size([3, 270, 480])
decoder = core.create_from_file("test/resources/nasa_13013.mp4")
core.add_video_stream(decoder, stream_index=0)
core.add_video_stream(decoder, stream_index=3)
for frame_index in range(100):
frame, _, _ = core.get_frame_at_index(decoder, stream_index=0, frame_index=frame_index)
print(frame.shape) # torch.Size([3, 270, 480]). This is stream 3, not stream 0.
Example 2:
from torchcodec.decoders import _core as core
decoder = core.create_from_file("test/resources/nasa_13013.mp4")
core.add_video_stream(decoder, stream_index=0)
frame, _, _ = core.get_frame_at_index(decoder, stream_index=3, frame_index=5) # This should error but doesn't
print(frame.shape) # torch.Size([3, 180, 320]). This is Stream 0, not stream 3.
None of the core APIs or C++ APIs actually do demuxing. I.e. the stream_index
parameter is never used to filter and select frames. The only way it is used is to seek.
This may be more clear by looking at the call-stack of our decoding entry-points.
All but one rely on getFrameAtIndexInternal
, which will use the streamIndex
to set the cursor:
torchcodec/src/torchcodec/decoders/_core/VideoDecoder.cpp
Lines 1254 to 1255 in 288bb83
but then immediately return the frame that is returned by getNextFrameNoDemuxInternal()
, which doesn't demux anything.
Metadata
Metadata
Assignees
Labels
No labels