Skip to content

[RFC] Enable Intel GPU support in torchcodec (pytorch xpu backend device) #559

Open
@dvrogozh

Description

@dvrogozh

That's an RFC issue for the following PR:

The PR enables Intel GPU support in torchcodec by adding ffmpeg-vaapi decoding and connecting it with pytorch XPU device backend.

There are few items I would like to bring to discussion in this RFC issue.

Selection of ffmpeg backend. There are few which are available for Intel GPU:

  • ffmpeg-vaapi
  • ffmpeg-dx11/dx12
  • ffmpeg-qsv (libvpl based one)
  • ffmpeg-vulkan

None of the above media backends are preferable in the sense of easier context/memory sharing. Intel media APIs, drivers and libraries are not directly intersect with Intel compute stack. In particular, Intel compute's Unified Shared Memory pointers are note recognized by media APIs and can not be accepted directly. This means that memory sharing between media and compute must be done (at the moment) via lower level APIs such as DMA fds on Linux and NT handles on Windows. This gives OS specific dependency.

I suggest to consider ffmpeg-vaapi for Linux and ffmpeg-dx12 for Windows to enable Intel GPUs. These are backends based on Intel media driver APIs. ffmpeg-qsv is based on a higher level library (libvpl) and does not allow to avoid vaapi/dx dependency since we in any case will need to use these APIs to get to the underlying surface memory. I also think that ffmpeg-vaapi is used by AMD GPUs, thus adding support of VAAPI can help here as well. (See also #444)

ffmpeg-vulkan might be interesting due to eventual cross-vendor capabilities. However 1) media support in vulkan is recently new and I am not sure that all required features are available (for example - color space conversion to RGBA), 2) media support in vulkan driver is a community effort for Intel GPUs rather than Intel effort and has different implementation comparing to Intel media driver. Overall, I think ffmpeg-vulkan might be an interesting next stepping after enabling torchcodec with ffmpeg-vaapi.

Selection of color conversion algorithm. Following current torchcodec architecture color conversion of decoded output (typically NV12) to RGB24 is needed. In the current implementation I chose to just implement color conversion directly on VAAPI since that's fairly quick and trivial. I believe that's good enough for the first implementation. I do suggest to consider using ffmpeg-vaapi for color space conversion going forward, but this will require additional effort on top of the PR I currently provide. Couple notes on conversion:

  • Current Intel media APIs do not support RGB24 since it is considered suboptimal format (due to odd alignment), so Intel media APIs support only RGB32 formats. For that reason I added slicing of the output RGB32 surface and copying it to the final output surface.
  • Conversion algorithms might give different results. Current tests rely on per-pixel absolute/relative difference. To handle bigger difference in converted output vs. what CUDA has, I used PSNR metric thru torcheval for checks in tests (changed only for Intel GPU).

Planned changes.

CC: @scotts, @NicolasHug, @EikanWang

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions