-
Notifications
You must be signed in to change notification settings - Fork 38
add opencv benchmark #711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add opencv benchmark #711
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @Dan-Flores ! It looks good, I shared a few comments below. As we just discussed offline, it might be worth checking the output frames for validity, to make sure that the all the decoders we're benchmarking are returning similar frames.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @Dan-Flores ! It looks good, I shared a few comments below. As we just discussed offline, it might be worth checking the output frames for validity, to make sure that the all the decoders we're benchmarking are returning similar frames.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @Dan-Flores ! It looks good, I shared a few comments below. As we just discussed offline, it might be worth checking the output frames for validity, to make sure that the all the decoders we're benchmarking are returning similar frames.
cc @NicolasHug - It seems OpenCV only supports certain frame resolutions, so the benchmark on ![]() I ran the benchmark using a generated mandelbrot video at 1920x1080, which OpenCV is able to support. |
Thanks for checking @Dan-Flores ! I think what's happening with
the mp4 containers has 2 video streams: one with 320x180 resolution and one with 480x270 resolution. OpenCV decodes the first one while we decode the second one, because it's what ffmpeg considers to be the "best" stream. And BTW, we probably want to double check we were decoding the same streams with our other benchmark backends (torchaudio etc.), but that can be done separately. |
@@ -828,7 +829,7 @@ def run_benchmarks( | |||
# are using different random pts values across videos. | |||
random_pts_list = (torch.rand(num_samples) * duration).tolist() | |||
|
|||
for decoder_name, decoder in decoder_dict.items(): | |||
for decoder_name, decoder in sorted(decoder_dict.items(), key=lambda x: x[0]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change was added to make it easier to compare benchmarks, I'm open to alternative approaches to sorting or removing it entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep this, as it ensures that we always get the results printed in the same order. But a comment stating that is good, since it may not be obvious why. (We're actually doing the experiments in the same order every time, which means we populate the results in the same way, which makes them displayed in the same way.)
# OpenCV uses BGR, change to RGB | ||
frame = self.cv2.cvtColor(frame, self.cv2.COLOR_BGR2RGB) | ||
# Update to C, H, W | ||
frame = np.transpose(frame, (2, 0, 1)) | ||
frame = torch.from_numpy(frame) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: move this "BGR array --> RGB tensor" logic in a separate helper that we can re-use across methods
self._num_ffmpeg_threads = num_ffmpeg_threads | ||
self._device = device | ||
self._seek_mode = seek_mode | ||
self._stream_index = int(stream_index) if stream_index else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was a bit surprised to see this written like this, because if stream_index
is 0
then it becomes None
, which isn't what we want. Then I realized stream_index
is a str
(!), not an int
, so the logic is sound.
I rarely advocate for type annotations but in this case, I think it could help to annotate the stream_index
parameter of __init__
as Optional[str]
. We might as well do the same for num_ffmpeg_threads
. (same below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR @Dan-Flores !
frames = [ | ||
self.cv2.resize(frame, (width, height)) | ||
for frame in self.decode_frames(video_file, pts_list) | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Dan-Flores, @NicolasHug, I actually agree that we should not expose this until we apply antialias. It's fine that we won't be able to generate a README graph before we do that. I'd rather that we fail when trying to do a bogus comparison than succeed and not realize it's a bogus comparison. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM, instead of not implementing the method, I suggest to define it and just raise, so that we remember why we didn't implement it in the first place:
def decode_and_resize(self, *args, **kwags*):
raise ValueError("OpenCV doesn't apply antialias while pytorch does by default, this is potentially an unfair comparison")
4a78f83
to
96b8e18
Compare
This PR updates the changes in #674, which added a benchmark for decoding with the OpenCV library.
Changes in this PR:
stream_index
in TorchCodecPublic, TorchAudioDecoderBenchmark 1: nasa_13013.mp4 (320x180)
ffprobe test/resources/nasa_13013.mp4
.stream_index
to TorchCodecPublic and TorchAudioDecoder.Benchmark 2: mandelbrot_1920x1080_120s.mp4
Interpreting the Results:
nasa_13013.mp4
on the lower resolution stream shows similar performance by TorchCodec and OpenCV.