Design for sharing GPU across multiple APIs

#### Description

An API can only be given entire GPU units. Add support for fractional values for the GPU resource. Here's an example:

```yaml
# cortex.yaml
- name: <string>  # API name (required)
  # ...
  compute:
    gpu: 300m
```

#### Motivation

Better GPU resource utilization across multiple APIs. This can reduce the overall costs for the user. If an API is supposed to run very rarely and doesn't need much inference capacity, then giving `100m` of the GPU resource might be desirable - the current alternative is to give an entire GPU to the API, and this can be expensive/wasteful.

#### Additional context

There are 2 ways to address this:

1. At the driver level (i.e. device plugin for the GPU). This is the preferred method.
1. At the pod level, by having a single pod per instance which is responsible for handling the prediction requests of all API replicas of all APIs residing on that instance. This may incur significant performance penalties. The single pod also represents the single-point-of-failure, so this may be undesirable.

Open questions
* If a pod uses more GPU than requested is there a way to evict it?

Useful links for the first approach (where the device plugin handles all of this):
1. https://github.com/kubernetes/kubernetes/issues/52757
1. https://github.com/AliyunContainerService/gpushare-scheduler-extender
1. https://github.com/sakjain92/Fractional-GPUs
1. https://github.com/Deepomatic/shared-gpu-nvidia-k8s-device-plugin
1. https://github.com/tkestack/gpu-manager


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Design for sharing GPU across multiple APIs #1390

Description

Motivation

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Design for sharing GPU across multiple APIs #1390

Description

Description

Motivation

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions