add opencv benchmark #711

Dan-Flores · 2025-06-04T14:37:27Z

This PR updates the changes in #674, which added a benchmark for decoding with the OpenCV library.

Changes in this PR:

Removed iteration through available backends
Removed API/ABI compatibility checks
Added FFMPEG as default backend for OpenCV
Added option to select stream_index in TorchCodecPublic, TorchAudioDecoder

Benchmark 1: nasa_13013.mp4 (320x180)

The file "nasa_13013.mp4" has multiple video streams. This can be seen by calling ffprobe test/resources/nasa_13013.mp4.
For accurate benchmarking, we should ensure the same video streams are being decoded. This was not true for the original benchmark in add opencv benchmark #674, hence the unexpected results.
- OpenCV decoder with FFMPEG backend does not support setting the video stream, height, or width parameters.
- The updated benchmark below specifies a stream_index to TorchCodecPublic and TorchAudioDecoder.

python benchmarks/decoders/benchmark_decoders.py --decoders torchcodec_public:seek_mode=approximate+stream_index=0,opencv,torchaudio:stream_index=0 --min-run-seconds 40

[--------------- video=/home/danielflores3/github/Dan-Flores/benchmarks/decoders/../../test/resources/nasa_13013.mp4 h264 480x270, 13.013s 29.97002997002997fps ---------------]
                                                             |  decode 10 uniform frames  |  decode 10 random frames  |  first 1 frames  |  first 10 frames  |  first 100 frames
1 threads: ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
      OpenCV[backend=FFMPEG]                                 |            32.9            |            32.0           |        9.9       |        10.9       |        26.6      
      TorchAudio:stream_index=0                              |            92.9            |            89.5           |       10.6       |        11.9       |        31.2      
      TorchCodecPublic:seek_mode=approximate+stream_index=0  |            35.8            |            31.6           |       10.3       |        11.1       |        24.9      

Times are in milliseconds (ms).

Benchmark 2: mandelbrot_1920x1080_120s.mp4

To generate a video for this benchmark, the command below was used.
- The generated video file has one video stream to ensure the decoders select the same video stream.

ffmpeg -y -f lavfi -i mandelbrot=s=1920x1080 -t 120 -c:v h264 -r 60 -g 600 -pix_fmt yuv420p mandelbrot_1920x1080_120s.mp4

python benchmarks/decoders/benchmark_decoders.py --decoders torchcodec_public:seek_mode=approximate,opencv,torchaudio --min-run-seconds 40  --video-paths mandelbrot_1920x1080_120s.mp4

[---------------------------------------------- video=mandelbrot_1920x1080_120s.mp4 h264 1920x1080, 120.0s 60.0fps ---------------------------------------------]
                                              |  decode 10 uniform frames  |  decode 10 random frames  |  first 1 frames  |  first 10 frames  |  first 100 frames
1 threads: ------------------------------------------------------------------------------------------------------------------------------------------------------
      OpenCV[backend=FFMPEG]                  |          15314.1           |          15102.9          |       50.9       |        76.6       |       336.6      
      TorchAudio                              |           6798.7           |           8445.0          |       55.0       |        95.9       |       554.7      
      TorchCodecPublic:seek_mode=approximate  |           6824.3           |           5454.0          |       50.7       |        79.2       |       446.6      

Times are in milliseconds (ms).

Interpreting the Results:

The updated results for benchmarking nasa_13013.mp4 on the lower resolution stream shows similar performance by TorchCodec and OpenCV.
The results of benchmarking the high resolution video show varied results.
- OpenCV's decoder sequentially decodes frames. This approach results in lower performance when decoding random and uniform frames, but may be more optimized for decoding frames in order, due to the grab() and retrieve() functions.
- TorchCodec decoders perform significantly better than OpenCV at decoding random and uniform frames, but are slower than OpenCV when decoding a high number of sequential frames.

NicolasHug

Thanks for the PR @Dan-Flores ! It looks good, I shared a few comments below. As we just discussed offline, it might be worth checking the output frames for validity, to make sure that the all the decoders we're benchmarking are returning similar frames.

benchmarks/decoders/benchmark_decoders_library.py

NicolasHug

Thanks for the PR @Dan-Flores ! It looks good, I shared a few comments below. As we just discussed offline, it might be worth checking the output frames for validity, to make sure that the all the decoders we're benchmarking are returning similar frames.

NicolasHug

Thanks for the PR @Dan-Flores ! It looks good, I shared a few comments below. As we just discussed offline, it might be worth checking the output frames for validity, to make sure that the all the decoders we're benchmarking are returning similar frames.

Dan-Flores · 2025-06-06T20:02:09Z

Thanks for the PR @Dan-Flores ! It looks good, I shared a few comments below. As we just discussed offline, it might be worth checking the output frames for validity, to make sure that the all the decoders we're benchmarking are returning similar frames.

cc @NicolasHug - It seems OpenCV only supports certain frame resolutions, so the benchmark on nasa13013.mp4 was using a smaller resolution, and was not a fair comparison (see image below). I was unfortunately unable to modify the VideoCapture's resolution - OpenCV's documentation on these properties suggests that modifying them can fail unexpectedly.

I ran the benchmark using a generated mandelbrot video at 1920x1080, which OpenCV is able to support.

NicolasHug · 2025-06-09T09:57:25Z

Thanks for checking @Dan-Flores !

I think what's happening with nasa_13013.mp4 is simply that opencv is decoding a different stream! We can see the streams with ffprobe:

~/dev/torchcodec  » ffprobe test/resources/nasa_13013.mp4                                        
...
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'test/resources/nasa_13013.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf58.76.100
  Duration: 00:00:13.06, start: 0.000000, bitrate: 412 kb/s
  Stream #0:0[0x1](eng): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt709, progressive), 320x180 [SAR 1:1 DAR 16:9], 71 kb/s, 25 fps, 25 tbr, 12800 tbn (default)
    Metadata:
      handler_name    : ?Mainconcept Video Media Handler
      vendor_id       : [0][0][0][0]
  Stream #0:1[0x2](eng): Audio: aac (LC) (mp4a / 0x6134706D), 8000 Hz, stereo, fltp, 72 kb/s (default)
    Metadata:
      handler_name    : #Mainconcept MP4 Sound Media Handler
      vendor_id       : [0][0][0][0]
  Stream #0:2[0x3](eng): Subtitle: mov_text (tx3g / 0x67337874), 0 kb/s (default)
    Metadata:
      handler_name    : SubtitleHandler
  Stream #0:3[0x4](eng): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt709, progressive), 480x270 [SAR 1:1 DAR 16:9], 128 kb/s, 29.97 fps, 29.97 tbr, 30k tbn (default)
    Metadata:
      handler_name    : ?Mainconcept Video Media Handler
      vendor_id       : [0][0][0][0]
  Stream #0:4[0x5](eng): Audio: aac (LC) (mp4a / 0x6134706D), 16000 Hz, stereo, fltp, 128 kb/s (default)
    Metadata:
      handler_name    : #Mainconcept MP4 Sound Media Handler
      vendor_id       : [0][0][0][0]
  Stream #0:5[0x6](eng): Subtitle: mov_text (tx3g / 0x67337874), 0 kb/s (default)
    Metadata:
      handler_name    : SubtitleHandler

the mp4 containers has 2 video streams: one with 320x180 resolution and one with 480x270 resolution. OpenCV decodes the first one while we decode the second one, because it's what ffmpeg considers to be the "best" stream.
So to get a fair comparison we just need to tell opencv to decode the same stream.

And BTW, we probably want to double check we were decoding the same streams with our other benchmark backends (torchaudio etc.), but that can be done separately.

Dan-Flores · 2025-06-10T20:31:34Z

benchmarks/decoders/benchmark_decoders_library.py

@@ -828,7 +829,7 @@ def run_benchmarks(
        # are using different random pts values across videos.
        random_pts_list = (torch.rand(num_samples) * duration).tolist()

-        for decoder_name, decoder in decoder_dict.items():
+        for decoder_name, decoder in sorted(decoder_dict.items(), key=lambda x: x[0]):


This change was added to make it easier to compare benchmarks, I'm open to alternative approaches to sorting or removing it entirely.

Let's keep this, as it ensures that we always get the results printed in the same order. But a comment stating that is good, since it may not be obvious why. (We're actually doing the experiments in the same order every time, which means we populate the results in the same way, which makes them displayed in the same way.)

NicolasHug · 2025-06-11T09:22:49Z

benchmarks/decoders/benchmark_decoders_library.py

+                    # OpenCV uses BGR, change to RGB
+                    frame = self.cv2.cvtColor(frame, self.cv2.COLOR_BGR2RGB)
+                    # Update to C, H, W
+                    frame = np.transpose(frame, (2, 0, 1))
+                    frame = torch.from_numpy(frame)


Nit: move this "BGR array --> RGB tensor" logic in a separate helper that we can re-use across methods

NicolasHug · 2025-06-11T09:28:20Z

benchmarks/decoders/benchmark_decoders_library.py

        self._num_ffmpeg_threads = num_ffmpeg_threads
        self._device = device
        self._seek_mode = seek_mode
+        self._stream_index = int(stream_index) if stream_index else None


I was a bit surprised to see this written like this, because if stream_index is 0 then it becomes None, which isn't what we want. Then I realized stream_index is a str (!), not an int, so the logic is sound.

I rarely advocate for type annotations but in this case, I think it could help to annotate the stream_index parameter of __init__ as Optional[str]. We might as well do the same for num_ffmpeg_threads. (same below)

NicolasHug

Thank you for the PR @Dan-Flores !

scotts · 2025-06-11T15:01:46Z

benchmarks/decoders/benchmark_decoders_library.py

+        frames = [
+            self.cv2.resize(frame, (width, height))
+            for frame in self.decode_frames(video_file, pts_list)
+        ]


@Dan-Flores, @NicolasHug, I actually agree that we should not expose this until we apply antialias. It's fine that we won't be able to generate a README graph before we do that. I'd rather that we fail when trying to do a bogus comparison than succeed and not realize it's a bogus comparison. :)

SGTM, instead of not implementing the method, I suggest to define it and just raise, so that we remember why we didn't implement it in the first place:

def decode_and_resize(self, *args, **kwags*): raise ValueError("OpenCV doesn't apply antialias while pytorch does by default, this is potentially an unfair comparison")

… to function

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jun 4, 2025

Dan-Flores marked this pull request as ready for review June 4, 2025 19:04

NicolasHug reviewed Jun 5, 2025

View reviewed changes

benchmarks/decoders/benchmark_decoders_library.py Outdated Show resolved Hide resolved

benchmarks/decoders/benchmark_decoders_library.py Show resolved Hide resolved

benchmarks/decoders/benchmark_decoders_library.py Outdated Show resolved Hide resolved

NicolasHug reviewed Jun 5, 2025

View reviewed changes

Dan-Flores commented Jun 10, 2025

View reviewed changes

NicolasHug reviewed Jun 11, 2025

View reviewed changes

NicolasHug approved these changes Jun 11, 2025

View reviewed changes

scotts reviewed Jun 11, 2025

View reviewed changes

jinhohwang-meta and others added 14 commits June 12, 2025 11:33

add opencv benchmark

c17b9d8

reflect comments

478136f

Remove backend checking from OpenCV benchmark

70f5727

Remove backend checking from OpenCV benchmark

d85192c

Set ffmpeg as default backend for OpenCV

d5e74d7

Remove benchmarks.png

24b67ec

Remove duplicate imports, return tensor

04019e7

Add stream_index option to TorchCodecPublic decoder

baa79f4

sort before plotting

3c3668e

Add stream_index to TorchAudioDecoder

2e23cfa

Annotate types for ints passed as str/None, extract opencv conversion…

b97e7bf

… to function

Remove OpenCV decode_and_resize function

9717fe7

Raise error in opencv decode_and_resize function

0674655

Add comment explaining sorting

96b8e18

Dan-Flores force-pushed the benchmark_opencv branch from 4a78f83 to 96b8e18 Compare June 12, 2025 18:46

Dan-Flores merged commit dd44f57 into pytorch:main Jun 12, 2025
37 of 40 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add opencv benchmark #711

add opencv benchmark #711

Dan-Flores commented Jun 4, 2025 •

edited

Loading

Uh oh!

NicolasHug left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NicolasHug left a comment

Uh oh!

NicolasHug left a comment

Uh oh!

Dan-Flores commented Jun 6, 2025

Uh oh!

NicolasHug commented Jun 9, 2025 •

edited

Loading

Uh oh!

Dan-Flores Jun 10, 2025

Uh oh!

scotts Jun 11, 2025

Uh oh!

NicolasHug Jun 11, 2025

Uh oh!

NicolasHug Jun 11, 2025

Uh oh!

NicolasHug left a comment

Uh oh!

scotts Jun 11, 2025

Uh oh!

NicolasHug Jun 11, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

add opencv benchmark #711

add opencv benchmark #711

Conversation

Dan-Flores commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes in this PR:

Benchmark 1: nasa_13013.mp4 (320x180)

Benchmark 2: mandelbrot_1920x1080_120s.mp4

Interpreting the Results:

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

Dan-Flores commented Jun 6, 2025

Uh oh!

NicolasHug commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dan-Flores Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

scotts Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

NicolasHug Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

NicolasHug Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

scotts Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

NicolasHug Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Dan-Flores commented Jun 4, 2025 •

edited

Loading

NicolasHug commented Jun 9, 2025 •

edited

Loading

NicolasHug Jun 11, 2025 •

edited

Loading