Skip to content

add opencv benchmark #711

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jun 12, 2025
Merged

add opencv benchmark #711

merged 14 commits into from
Jun 12, 2025

Conversation

Dan-Flores
Copy link
Contributor

@Dan-Flores Dan-Flores commented Jun 4, 2025

This PR updates the changes in #674, which added a benchmark for decoding with the OpenCV library.

Changes in this PR:

  • Removed iteration through available backends
  • Removed API/ABI compatibility checks
  • Added FFMPEG as default backend for OpenCV
  • Added option to select stream_index in TorchCodecPublic, TorchAudioDecoder

Benchmark 1: nasa_13013.mp4 (320x180)

  • The file "nasa_13013.mp4" has multiple video streams. This can be seen by calling ffprobe test/resources/nasa_13013.mp4.
  • For accurate benchmarking, we should ensure the same video streams are being decoded. This was not true for the original benchmark in add opencv benchmark #674, hence the unexpected results.
python benchmarks/decoders/benchmark_decoders.py --decoders torchcodec_public:seek_mode=approximate+stream_index=0,opencv,torchaudio:stream_index=0 --min-run-seconds 40
[--------------- video=/home/danielflores3/github/Dan-Flores/benchmarks/decoders/../../test/resources/nasa_13013.mp4 h264 480x270, 13.013s 29.97002997002997fps ---------------]
                                                             |  decode 10 uniform frames  |  decode 10 random frames  |  first 1 frames  |  first 10 frames  |  first 100 frames
1 threads: ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
      OpenCV[backend=FFMPEG]                                 |            32.9            |            32.0           |        9.9       |        10.9       |        26.6      
      TorchAudio:stream_index=0                              |            92.9            |            89.5           |       10.6       |        11.9       |        31.2      
      TorchCodecPublic:seek_mode=approximate+stream_index=0  |            35.8            |            31.6           |       10.3       |        11.1       |        24.9      

Times are in milliseconds (ms).

image

Benchmark 2: mandelbrot_1920x1080_120s.mp4

  • To generate a video for this benchmark, the command below was used.
    • The generated video file has one video stream to ensure the decoders select the same video stream.
ffmpeg -y -f lavfi -i mandelbrot=s=1920x1080 -t 120 -c:v h264 -r 60 -g 600 -pix_fmt yuv420p mandelbrot_1920x1080_120s.mp4
python benchmarks/decoders/benchmark_decoders.py --decoders torchcodec_public:seek_mode=approximate,opencv,torchaudio --min-run-seconds 40  --video-paths mandelbrot_1920x1080_120s.mp4
[---------------------------------------------- video=mandelbrot_1920x1080_120s.mp4 h264 1920x1080, 120.0s 60.0fps ---------------------------------------------]
                                              |  decode 10 uniform frames  |  decode 10 random frames  |  first 1 frames  |  first 10 frames  |  first 100 frames
1 threads: ------------------------------------------------------------------------------------------------------------------------------------------------------
      OpenCV[backend=FFMPEG]                  |          15314.1           |          15102.9          |       50.9       |        76.6       |       336.6      
      TorchAudio                              |           6798.7           |           8445.0          |       55.0       |        95.9       |       554.7      
      TorchCodecPublic:seek_mode=approximate  |           6824.3           |           5454.0          |       50.7       |        79.2       |       446.6      

Times are in milliseconds (ms).
Screenshot 2025-06-11 at 1 57 23 PM

Interpreting the Results:

  • The updated results for benchmarking nasa_13013.mp4 on the lower resolution stream shows similar performance by TorchCodec and OpenCV.
  • The results of benchmarking the high resolution video show varied results.
    • OpenCV's decoder sequentially decodes frames. This approach results in lower performance when decoding random and uniform frames, but may be more optimized for decoding frames in order, due to the grab() and retrieve() functions.
    • TorchCodec decoders perform significantly better than OpenCV at decoding random and uniform frames, but are slower than OpenCV when decoding a high number of sequential frames.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jun 4, 2025
@Dan-Flores Dan-Flores marked this pull request as ready for review June 4, 2025 19:04
Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @Dan-Flores ! It looks good, I shared a few comments below. As we just discussed offline, it might be worth checking the output frames for validity, to make sure that the all the decoders we're benchmarking are returning similar frames.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @Dan-Flores ! It looks good, I shared a few comments below. As we just discussed offline, it might be worth checking the output frames for validity, to make sure that the all the decoders we're benchmarking are returning similar frames.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @Dan-Flores ! It looks good, I shared a few comments below. As we just discussed offline, it might be worth checking the output frames for validity, to make sure that the all the decoders we're benchmarking are returning similar frames.

@Dan-Flores
Copy link
Contributor Author

Thanks for the PR @Dan-Flores ! It looks good, I shared a few comments below. As we just discussed offline, it might be worth checking the output frames for validity, to make sure that the all the decoders we're benchmarking are returning similar frames.

cc @NicolasHug - It seems OpenCV only supports certain frame resolutions, so the benchmark on nasa13013.mp4 was using a smaller resolution, and was not a fair comparison (see image below). I was unfortunately unable to modify the VideoCapture's resolution - OpenCV's documentation on these properties suggests that modifying them can fail unexpectedly.

Screenshot 2025-06-06 at 1 35 19 PM

I ran the benchmark using a generated mandelbrot video at 1920x1080, which OpenCV is able to support.

@NicolasHug
Copy link
Member

NicolasHug commented Jun 9, 2025

Thanks for checking @Dan-Flores !

I think what's happening with nasa_13013.mp4 is simply that opencv is decoding a different stream! We can see the streams with ffprobe:

~/dev/torchcodec  » ffprobe test/resources/nasa_13013.mp4                                        
...
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'test/resources/nasa_13013.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf58.76.100
  Duration: 00:00:13.06, start: 0.000000, bitrate: 412 kb/s
  Stream #0:0[0x1](eng): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt709, progressive), 320x180 [SAR 1:1 DAR 16:9], 71 kb/s, 25 fps, 25 tbr, 12800 tbn (default)
    Metadata:
      handler_name    : ?Mainconcept Video Media Handler
      vendor_id       : [0][0][0][0]
  Stream #0:1[0x2](eng): Audio: aac (LC) (mp4a / 0x6134706D), 8000 Hz, stereo, fltp, 72 kb/s (default)
    Metadata:
      handler_name    : #Mainconcept MP4 Sound Media Handler
      vendor_id       : [0][0][0][0]
  Stream #0:2[0x3](eng): Subtitle: mov_text (tx3g / 0x67337874), 0 kb/s (default)
    Metadata:
      handler_name    : SubtitleHandler
  Stream #0:3[0x4](eng): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt709, progressive), 480x270 [SAR 1:1 DAR 16:9], 128 kb/s, 29.97 fps, 29.97 tbr, 30k tbn (default)
    Metadata:
      handler_name    : ?Mainconcept Video Media Handler
      vendor_id       : [0][0][0][0]
  Stream #0:4[0x5](eng): Audio: aac (LC) (mp4a / 0x6134706D), 16000 Hz, stereo, fltp, 128 kb/s (default)
    Metadata:
      handler_name    : #Mainconcept MP4 Sound Media Handler
      vendor_id       : [0][0][0][0]
  Stream #0:5[0x6](eng): Subtitle: mov_text (tx3g / 0x67337874), 0 kb/s (default)
    Metadata:
      handler_name    : SubtitleHandler

the mp4 containers has 2 video streams: one with 320x180 resolution and one with 480x270 resolution. OpenCV decodes the first one while we decode the second one, because it's what ffmpeg considers to be the "best" stream.
So to get a fair comparison we just need to tell opencv to decode the same stream.

And BTW, we probably want to double check we were decoding the same streams with our other benchmark backends (torchaudio etc.), but that can be done separately.

@@ -828,7 +829,7 @@ def run_benchmarks(
# are using different random pts values across videos.
random_pts_list = (torch.rand(num_samples) * duration).tolist()

for decoder_name, decoder in decoder_dict.items():
for decoder_name, decoder in sorted(decoder_dict.items(), key=lambda x: x[0]):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change was added to make it easier to compare benchmarks, I'm open to alternative approaches to sorting or removing it entirely.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep this, as it ensures that we always get the results printed in the same order. But a comment stating that is good, since it may not be obvious why. (We're actually doing the experiments in the same order every time, which means we populate the results in the same way, which makes them displayed in the same way.)

Comment on lines 177 to 181
# OpenCV uses BGR, change to RGB
frame = self.cv2.cvtColor(frame, self.cv2.COLOR_BGR2RGB)
# Update to C, H, W
frame = np.transpose(frame, (2, 0, 1))
frame = torch.from_numpy(frame)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: move this "BGR array --> RGB tensor" logic in a separate helper that we can re-use across methods

self._num_ffmpeg_threads = num_ffmpeg_threads
self._device = device
self._seek_mode = seek_mode
self._stream_index = int(stream_index) if stream_index else None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a bit surprised to see this written like this, because if stream_index is 0 then it becomes None, which isn't what we want. Then I realized stream_index is a str (!), not an int, so the logic is sound.

I rarely advocate for type annotations but in this case, I think it could help to annotate the stream_index parameter of __init__ as Optional[str]. We might as well do the same for num_ffmpeg_threads. (same below)

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @Dan-Flores !

frames = [
self.cv2.resize(frame, (width, height))
for frame in self.decode_frames(video_file, pts_list)
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dan-Flores, @NicolasHug, I actually agree that we should not expose this until we apply antialias. It's fine that we won't be able to generate a README graph before we do that. I'd rather that we fail when trying to do a bogus comparison than succeed and not realize it's a bogus comparison. :)

Copy link
Member

@NicolasHug NicolasHug Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM, instead of not implementing the method, I suggest to define it and just raise, so that we remember why we didn't implement it in the first place:

def decode_and_resize(self, *args, **kwags*):
    raise ValueError("OpenCV doesn't apply antialias while pytorch does by default, this is potentially an unfair comparison")

@Dan-Flores Dan-Flores merged commit dd44f57 into pytorch:main Jun 12, 2025
37 of 40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants