Add the ability to benchmark throughput using multiple threads #359

ahmadsharif1 · 2024-11-11T21:56:45Z

The batch mode is a new mode that decodes a batch of 40 copies of the decoder using 8 threads.

Tested:

video=/home/ahmads/personal/torchcodec/benchmarks/decoders/../../test/resources/nasa_13013.mp4, decoder=TorchCodecPublic
[---------------------------------------------------------------- video=/home/ahmads/personal/torchcodec/benchmarks/decoders/../../test/resources/nasa_13013.mp4 h264 480x270, 13.013s 29.97002997002997fps -----------------------------------------------------------------]
                        |  uniform 10 seek()+next()  |  batch uniform 10 seek()+next()  |  random 10 seek()+next()  |  batch random 10 seek()+next()  |  1 next()  |  batch 1 next()  |  10 next()  |  batch 10 next()  |  100 next()  |  batch 100 next()  |  create()+next()
1 threads: -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      TorchCodecPublic  |            67.0            |              841.9               |            60.5           |              743.9              |    21.4    |      219.4       |     24.1    |       276.5       |     69.9     |       812.5        |                 
      TorchCodecCore    |                            |                                  |                           |                                 |            |                  |             |                   |              |                    |        18.5     

Times are in milliseconds (ms).

benchmarks/decoders/benchmark_decoders_library.py

scotts · 2024-11-12T14:48:08Z

I assume the removal of the other decoders is temporary while getting everything working?

On the chart generated by generate_readme_*.py, I think we want to be selective on what we add to it. I think we want no more than four experiments per row. This is in contrast to the output from benchmark_decoders.py, where we can have many experiments. I see benchmark_decoders.py as a perf development tool, and generate_readme_*.py as our external showcase.

ahmadsharif1

I actually don't want to update the chart or data in this PR.

I just want the cli tool to have the option to benchmark throughput for now.

CUDA and decord may not work in the same process (I haven't tested those). I just wanted the ability to benchmark throughput. I already got an interesting finding that the cuda decoder is slower on some videos.

benchmarks/decoders/benchmark_decoders_library.py

scotts · 2024-11-12T18:36:40Z

benchmarks/decoders/benchmark_decoders_library.py

@@ -479,13 +480,50 @@ def get_metadata(video_file_path: str) -> VideoStreamMetadata:
    return VideoDecoder(video_file_path).metadata


+class BatchParameters(NamedTuple):


Is there a specific reason we're using NamedTuple and not a dataclass?

I thought it would be lighterweight than dataclass, but I am not too sure. Do you have a preference?

I prefer dataclasses. There are some instances where you need to use a namedtuple, but in general, I consider dataclasses to have supplanted namedtuples.

Add batch benchmarks and cuda decoder

cf91336

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 11, 2024

ahmadsharif1 added 3 commits November 11, 2024 15:51

Updated chart

22c65a5

Fixed bug

ce42849

.

86384cf

scotts reviewed Nov 12, 2024

View reviewed changes

benchmarks/decoders/benchmark_decoders_library.py Show resolved Hide resolved

ahmadsharif1 added 2 commits November 12, 2024 07:34

Addressed comments

99ff500

Addressed comments

9ed1992

ahmadsharif1 commented Nov 12, 2024

View reviewed changes

benchmarks/decoders/benchmark_decoders_library.py Show resolved Hide resolved

ahmadsharif1 changed the title ~~Add batch benchmarks and cuda decoder~~ Add the ability to benchmark throughput using multiple threads Nov 12, 2024

ahmadsharif1 marked this pull request as ready for review November 12, 2024 15:50

Added batch parameters

85ec39e

scotts reviewed Nov 12, 2024

View reviewed changes

scotts approved these changes Nov 12, 2024

View reviewed changes

.

82f2712

ahmadsharif1 merged commit bd9d5cb into pytorch:main Nov 12, 2024
5 checks passed

ahmadsharif1 mentioned this pull request Nov 13, 2024

Unbreak benchmark_decoders.py #371

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add the ability to benchmark throughput using multiple threads #359

Add the ability to benchmark throughput using multiple threads #359

Uh oh!

ahmadsharif1 commented Nov 11, 2024 •

edited

Loading

Uh oh!

Uh oh!

scotts commented Nov 12, 2024

Uh oh!

ahmadsharif1 left a comment

Uh oh!

Uh oh!

scotts Nov 12, 2024

Uh oh!

ahmadsharif1 Nov 12, 2024

Uh oh!

scotts Nov 12, 2024

Uh oh!

Uh oh!

Uh oh!

		@@ -479,13 +480,50 @@ def get_metadata(video_file_path: str) -> VideoStreamMetadata:
		return VideoDecoder(video_file_path).metadata


		class BatchParameters(NamedTuple):

Add the ability to benchmark throughput using multiple threads #359

Add the ability to benchmark throughput using multiple threads #359

Uh oh!

Conversation

ahmadsharif1 commented Nov 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

scotts commented Nov 12, 2024

Uh oh!

ahmadsharif1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

scotts Nov 12, 2024

Choose a reason for hiding this comment

Uh oh!

ahmadsharif1 Nov 12, 2024

Choose a reason for hiding this comment

Uh oh!

scotts Nov 12, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ahmadsharif1 commented Nov 11, 2024 •

edited

Loading