Skip to content

add opencv benchmark #711

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

Conversation

Dan-Flores
Copy link
Contributor

@Dan-Flores Dan-Flores commented Jun 4, 2025

This PR updates the changes in #674, which added a benchmark for decoding with the OpenCV library.

Changes in this PR:

  • Removed iteration through available backends
  • Removed API/ABI compatibility checks
  • Added FFMPEG as default backend for OpenCV
  • Added option to select stream_index in TorchCodecPublic, TorchAudioDecoder

Benchmark 1: nasa_13013.mp4 (320x180)

  • The file "nasa_13013.mp4" has multiple video streams. This can be seen by calling ffprobe test/resources/nasa_13013.mp4.
  • For accurate benchmarking, we should ensure the same video streams are being decoded. This was not true for the original benchmark in add opencv benchmark #674, hence the unexpected results.
python benchmarks/decoders/benchmark_decoders.py --decoders torchcodec_public:seek_mode=approximate+stream_index=0,torchcodec_public:seek_mode=exact+stream_index=0,opencv,torchaudio:stream_index=0 --min-run-seconds 40
[--------------- video=/home/danielflores3/github/Dan-Flores/benchmarks/decoders/../../test/resources/nasa_13013.mp4 h264 480x270, 13.013s 29.97002997002997fps ---------------]
                                                             |  decode 10 uniform frames  |  decode 10 random frames  |  first 1 frames  |  first 10 frames  |  first 100 frames
1 threads: ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
      OpenCV[backend=FFMPEG]                                 |            32.6            |            32.5           |        9.9       |        10.9       |        26.6      
      TorchAudio:stream_index=0                              |            93.3            |            88.6           |       10.5       |        11.9       |        31.4      
      TorchCodecPublic:seek_mode=approximate+stream_index=0  |            35.3            |            31.7           |       10.3       |        11.1       |        25.3      
      TorchCodecPublic:seek_mode=exact+stream_index=0        |            37.1            |            33.5           |       12.3       |        13.1       |        27.9      

Times are in milliseconds (ms).

image

Benchmark 2: mandelbrot_1920x1080_120s.mp4

  • To generate a video for this benchmark, the command below was used.
    • The generated video file has one video stream to ensure the decoders select the same video stream.
ffmpeg -y -f lavfi -i mandelbrot=s=1920x1080 -t 120 -c:v h264 -r 60 -g 600 -pix_fmt yuv420p mandelbrot_1920x1080_120s.mp4
python benchmarks/decoders/benchmark_decoders.py --decoders torchcodec_public:seek_mode=approximate,torchcodec_public:seek_mode=exact,opencv,torchaudio --min-run-seconds 40  --video-paths mandelbrot_1920x1080_120s.mp4
[----------------------------------------------------- video=mandelbrot_1920x1080_120s.mp4 h264 1920x1080, 120.0s 60.0fps -----------------------------------------------------]
                                                             |  decode 10 uniform frames  |  decode 10 random frames  |  first 1 frames  |  first 10 frames  |  first 100 frames
1 threads: ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
      OpenCV[backend=FFMPEG]                                 |          15146.3           |          15064.6          |       50.8       |        80.1       |       363.6      
      TorchAudio:stream_index=0                              |           6944.1           |           8415.5          |       57.5       |       100.3       |       555.6      
      TorchCodecPublic:seek_mode=approximate+stream_index=0  |           6804.0           |           5428.5          |       50.3       |        83.4       |       449.2      
      TorchCodecPublic:seek_mode=exact+stream_index=0        |           6936.3           |           5591.3          |      188.8       |       223.1       |       592.1      

Times are in milliseconds (ms).

image

Interpreting the Results:

  • The updated results for benchmarking nasa_13013.mp4 on the lower resolution stream shows similar performance by TorchCodec and OpenCV.
  • The results of benchmarking the high resolution video show varied results.
    • OpenCV's decoder sequentially decodes frames. This approach results in lower performance when decoding random and uniform frames, but may be more optimized for decoding frames in order, due to the grab() and retrieve() functions.
    • TorchCodec decoders perform significantly better than OpenCV at decoding random and uniform frames, but are slower than OpenCV when decoding a high number of sequential frames.
      • TorchCodec in exact mode sees a large performance decrease when decoding few sequential frames due to performing a full scan of the file.
      • TorchCodec in approximate mode maintains higher performance by avoiding the full scan.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jun 4, 2025
@Dan-Flores Dan-Flores marked this pull request as ready for review June 4, 2025 19:04
Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @Dan-Flores ! It looks good, I shared a few comments below. As we just discussed offline, it might be worth checking the output frames for validity, to make sure that the all the decoders we're benchmarking are returning similar frames.

self._print_each_iteration_time = False

def decode_frames(self, video_file, pts_list):
import cv2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's best to only import in __init__ and then store the module as a self.cv2 attibute. Otherwise, we'd be benchmarking the import statement when calling the decoding method, and it may add noise to the results.

import cv2

frames = [
cv2.resize(frame, (width, height))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should note that openCV doesn't apply antialias, while the rest of the decode_and_resize() methods apply antialias by default. That makes these methods less comparable.

I would suggest not to expose decode_and_resize for openCV if we can, so as to prevent any confusion, but if we need to expose it for technical reasons then let's at least add a comment pointing out this discrepancy about antialiasing.

CC @scotts as it's relevant to the whole "resize inconsistency chaos"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decode_and_reize() is needed to generate data for the README, so I've left a comment here.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @Dan-Flores ! It looks good, I shared a few comments below. As we just discussed offline, it might be worth checking the output frames for validity, to make sure that the all the decoders we're benchmarking are returning similar frames.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @Dan-Flores ! It looks good, I shared a few comments below. As we just discussed offline, it might be worth checking the output frames for validity, to make sure that the all the decoders we're benchmarking are returning similar frames.

@Dan-Flores
Copy link
Contributor Author

Thanks for the PR @Dan-Flores ! It looks good, I shared a few comments below. As we just discussed offline, it might be worth checking the output frames for validity, to make sure that the all the decoders we're benchmarking are returning similar frames.

cc @NicolasHug - It seems OpenCV only supports certain frame resolutions, so the benchmark on nasa13013.mp4 was using a smaller resolution, and was not a fair comparison (see image below). I was unfortunately unable to modify the VideoCapture's resolution - OpenCV's documentation on these properties suggests that modifying them can fail unexpectedly.

Screenshot 2025-06-06 at 1 35 19 PM

I ran the benchmark using a generated mandelbrot video at 1920x1080, which OpenCV is able to support.

@NicolasHug
Copy link
Member

NicolasHug commented Jun 9, 2025

Thanks for checking @Dan-Flores !

I think what's happening with nasa_13013.mp4 is simply that opencv is decoding a different stream! We can see the streams with ffprobe:

~/dev/torchcodec  » ffprobe test/resources/nasa_13013.mp4                                        
...
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'test/resources/nasa_13013.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf58.76.100
  Duration: 00:00:13.06, start: 0.000000, bitrate: 412 kb/s
  Stream #0:0[0x1](eng): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt709, progressive), 320x180 [SAR 1:1 DAR 16:9], 71 kb/s, 25 fps, 25 tbr, 12800 tbn (default)
    Metadata:
      handler_name    : ?Mainconcept Video Media Handler
      vendor_id       : [0][0][0][0]
  Stream #0:1[0x2](eng): Audio: aac (LC) (mp4a / 0x6134706D), 8000 Hz, stereo, fltp, 72 kb/s (default)
    Metadata:
      handler_name    : #Mainconcept MP4 Sound Media Handler
      vendor_id       : [0][0][0][0]
  Stream #0:2[0x3](eng): Subtitle: mov_text (tx3g / 0x67337874), 0 kb/s (default)
    Metadata:
      handler_name    : SubtitleHandler
  Stream #0:3[0x4](eng): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt709, progressive), 480x270 [SAR 1:1 DAR 16:9], 128 kb/s, 29.97 fps, 29.97 tbr, 30k tbn (default)
    Metadata:
      handler_name    : ?Mainconcept Video Media Handler
      vendor_id       : [0][0][0][0]
  Stream #0:4[0x5](eng): Audio: aac (LC) (mp4a / 0x6134706D), 16000 Hz, stereo, fltp, 128 kb/s (default)
    Metadata:
      handler_name    : #Mainconcept MP4 Sound Media Handler
      vendor_id       : [0][0][0][0]
  Stream #0:5[0x6](eng): Subtitle: mov_text (tx3g / 0x67337874), 0 kb/s (default)
    Metadata:
      handler_name    : SubtitleHandler

the mp4 containers has 2 video streams: one with 320x180 resolution and one with 480x270 resolution. OpenCV decodes the first one while we decode the second one, because it's what ffmpeg considers to be the "best" stream.
So to get a fair comparison we just need to tell opencv to decode the same stream.

And BTW, we probably want to double check we were decoding the same streams with our other benchmark backends (torchaudio etc.), but that can be done separately.

@@ -828,7 +829,7 @@ def run_benchmarks(
# are using different random pts values across videos.
random_pts_list = (torch.rand(num_samples) * duration).tolist()

for decoder_name, decoder in decoder_dict.items():
for decoder_name, decoder in sorted(decoder_dict.items(), key=lambda x: x[0]):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change was added to make it easier to compare benchmarks, I'm open to alternative approaches to sorting or removing it entirely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants