Skip to content

Commit 6ed6674

Browse files
tkatilaeero-t
andcommitted
operator: add support for npu plugin
Co-authored-by: Eero Tamminen <eero.t.tamminen@intel.com> Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
1 parent 3df7c56 commit 6ed6674

16 files changed

+969
-2
lines changed

README.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ Table of Contents
2222
* [DSA device plugin](#dsa-device-plugin)
2323
* [DLB device plugin](#dlb-device-plugin)
2424
* [IAA device plugin](#iaa-device-plugin)
25+
* [NPU device plugin](#npu-device-plugin)
2526
* [Device Plugins Operator](#device-plugins-operator)
2627
* [XeLink XPU Manager sidecar](#xelink-xpu-manager-sidecar)
2728
* [Intel GPU Level-Zero sidecar](#intel-gpu-levelzero)
@@ -182,12 +183,17 @@ Balancer accelerator(DLB).
182183
The [IAA device plugin](cmd/iaa_plugin/README.md) supports acceleration using
183184
the Intel Analytics accelerator(IAA).
184185

186+
### NPU Device Plugin
187+
188+
The [NPU device plugin](cmd/npu_plugin/README.md) supports acceleration using
189+
the Intel Neural Processing Unit(NPU).
190+
185191
## Device Plugins Operator
186192

187193
To simplify the deployment of the device plugins, a unified device plugins
188194
operator is implemented.
189195

190-
Currently the operator has support for the DSA, DLB, FPGA, GPU, IAA, QAT, and
196+
Currently the operator has support for the DSA, DLB, FPGA, GPU, IAA, QAT, NPU, and
191197
Intel SGX device plugins. Each device plugin has its own custom resource
192198
definition (CRD) and the corresponding controller that watches CRUD operations
193199
to those custom resources.
@@ -247,6 +253,8 @@ The summary of resources available via plugins in this repository is given in th
247253
* [crypto-perf-dpdk-pod-requesting-qat-cy.yaml](deployments/qat_dpdk_app/crypto-perf/crypto-perf-dpdk-pod-requesting-qat-cy.yaml)
248254
* `sgx.intel.com` : `epc`
249255
* [intelsgx-job.yaml](deployments/sgx_enclave_apps/base/intelsgx-job.yaml)
256+
* `npu.intel.com` : `npu`
257+
* TODO
250258

251259
## Developers
252260

cmd/operator/main.go

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ import (
3939
"github.com/intel/intel-device-plugins-for-kubernetes/pkg/controllers/fpga"
4040
"github.com/intel/intel-device-plugins-for-kubernetes/pkg/controllers/gpu"
4141
"github.com/intel/intel-device-plugins-for-kubernetes/pkg/controllers/iaa"
42+
"github.com/intel/intel-device-plugins-for-kubernetes/pkg/controllers/npu"
4243
"github.com/intel/intel-device-plugins-for-kubernetes/pkg/controllers/qat"
4344
"github.com/intel/intel-device-plugins-for-kubernetes/pkg/controllers/sgx"
4445
"github.com/intel/intel-device-plugins-for-kubernetes/pkg/fpgacontroller"
@@ -65,7 +66,7 @@ type devicePluginControllerAndWebhook map[string](func(ctrl.Manager, string, boo
6566

6667
type flagList []string
6768

68-
var supportedDevices = flagList{"dsa", "dlb", "fpga", "gpu", "iaa", "qat", "sgx"}
69+
var supportedDevices = flagList{"dsa", "dlb", "fpga", "gpu", "iaa", "qat", "sgx", "npu"}
6970
var devices flagList
7071

7172
func (flag *flagList) String() string {
@@ -170,6 +171,7 @@ func main() {
170171
"iaa": iaa.SetupReconciler,
171172
"qat": qat.SetupReconciler,
172173
"sgx": sgx.SetupReconciler,
174+
"npu": npu.SetupReconciler,
173175
}
174176

175177
tlsCfgFuncs := createTLSCfgs(enableHTTP2)

deployments/daemonsets.go

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,13 @@ func SGXPluginDaemonSet() *apps.DaemonSet {
7171
return getDaemonset(contentSGX).DeepCopy()
7272
}
7373

74+
//go:embed npu_plugin/base/*plugin*.yaml
75+
var contentNPU []byte
76+
77+
func NPUPluginDaemonSet() *apps.DaemonSet {
78+
return getDaemonset(contentNPU).DeepCopy()
79+
}
80+
7481
// getDaemonset unmarshalls yaml content into a DaemonSet object.
7582
func getDaemonset(content []byte) *apps.DaemonSet {
7683
var result apps.DaemonSet
Lines changed: 191 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,191 @@
1+
---
2+
apiVersion: apiextensions.k8s.io/v1
3+
kind: CustomResourceDefinition
4+
metadata:
5+
annotations:
6+
controller-gen.kubebuilder.io/version: v0.17.0
7+
name: npudeviceplugins.deviceplugin.intel.com
8+
spec:
9+
group: deviceplugin.intel.com
10+
names:
11+
kind: NpuDevicePlugin
12+
listKind: NpuDevicePluginList
13+
plural: npudeviceplugins
14+
singular: npudeviceplugin
15+
scope: Cluster
16+
versions:
17+
- additionalPrinterColumns:
18+
- jsonPath: .status.desiredNumberScheduled
19+
name: Desired
20+
type: integer
21+
- jsonPath: .status.numberReady
22+
name: Ready
23+
type: integer
24+
- jsonPath: .spec.nodeSelector
25+
name: Node Selector
26+
type: string
27+
- jsonPath: .metadata.creationTimestamp
28+
name: Age
29+
type: date
30+
name: v1
31+
schema:
32+
openAPIV3Schema:
33+
description: |-
34+
NpuDevicePlugin is the Schema for the npudeviceplugins API. It represents
35+
the NPU device plugin responsible for advertising Intel NPU hardware resources to
36+
the kubelet.
37+
properties:
38+
apiVersion:
39+
description: |-
40+
APIVersion defines the versioned schema of this representation of an object.
41+
Servers should convert recognized schemas to the latest internal value, and
42+
may reject unrecognized values.
43+
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
44+
type: string
45+
kind:
46+
description: |-
47+
Kind is a string value representing the REST resource this object represents.
48+
Servers may infer this from the endpoint the client submits requests to.
49+
Cannot be updated.
50+
In CamelCase.
51+
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
52+
type: string
53+
metadata:
54+
type: object
55+
spec:
56+
description: NpuDevicePluginSpec defines the desired state of NpuDevicePlugin.
57+
properties:
58+
image:
59+
description: Image is a container image with NPU device plugin executable.
60+
type: string
61+
logLevel:
62+
description: LogLevel sets the plugin's log level.
63+
minimum: 0
64+
type: integer
65+
nodeSelector:
66+
additionalProperties:
67+
type: string
68+
description: NodeSelector provides a simple way to constrain device
69+
plugin pods to nodes with particular labels.
70+
type: object
71+
sharedDevNum:
72+
description: SharedDevNum is a number of containers that can share
73+
the same NPU device.
74+
minimum: 1
75+
type: integer
76+
tolerations:
77+
description: Specialized nodes (e.g., with accelerators) can be Tainted
78+
to make sure unwanted pods are not scheduled on them. Tolerations
79+
can be set for the plugin pod to neutralize the Taint.
80+
items:
81+
description: |-
82+
The pod this Toleration is attached to tolerates any taint that matches
83+
the triple <key,value,effect> using the matching operator <operator>.
84+
properties:
85+
effect:
86+
description: |-
87+
Effect indicates the taint effect to match. Empty means match all taint effects.
88+
When specified, allowed values are NoSchedule, PreferNoSchedule and NoExecute.
89+
type: string
90+
key:
91+
description: |-
92+
Key is the taint key that the toleration applies to. Empty means match all taint keys.
93+
If the key is empty, operator must be Exists; this combination means to match all values and all keys.
94+
type: string
95+
operator:
96+
description: |-
97+
Operator represents a key's relationship to the value.
98+
Valid operators are Exists and Equal. Defaults to Equal.
99+
Exists is equivalent to wildcard for value, so that a pod can
100+
tolerate all taints of a particular category.
101+
type: string
102+
tolerationSeconds:
103+
description: |-
104+
TolerationSeconds represents the period of time the toleration (which must be
105+
of effect NoExecute, otherwise this field is ignored) tolerates the taint. By default,
106+
it is not set, which means tolerate the taint forever (do not evict). Zero and
107+
negative values will be treated as 0 (evict immediately) by the system.
108+
format: int64
109+
type: integer
110+
value:
111+
description: |-
112+
Value is the taint value the toleration matches to.
113+
If the operator is Exists, the value should be empty, otherwise just a regular string.
114+
type: string
115+
type: object
116+
type: array
117+
type: object
118+
status:
119+
description: NpuDevicePluginStatus defines the observed state of NpuDevicePlugin.
120+
properties:
121+
controlledDaemonSet:
122+
description: ControlledDaemoSet references the DaemonSet controlled
123+
by the operator.
124+
properties:
125+
apiVersion:
126+
description: API version of the referent.
127+
type: string
128+
fieldPath:
129+
description: |-
130+
If referring to a piece of an object instead of an entire object, this string
131+
should contain a valid JSON/Go field access statement, such as desiredState.manifest.containers[2].
132+
For example, if the object reference is to a container within a pod, this would take on a value like:
133+
"spec.containers{name}" (where "name" refers to the name of the container that triggered
134+
the event) or if no container name is specified "spec.containers[2]" (container with
135+
index 2 in this pod). This syntax is chosen only to have some well-defined way of
136+
referencing a part of an object.
137+
type: string
138+
kind:
139+
description: |-
140+
Kind of the referent.
141+
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
142+
type: string
143+
name:
144+
description: |-
145+
Name of the referent.
146+
More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
147+
type: string
148+
namespace:
149+
description: |-
150+
Namespace of the referent.
151+
More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/
152+
type: string
153+
resourceVersion:
154+
description: |-
155+
Specific resourceVersion to which this reference is made, if any.
156+
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#concurrency-control-and-consistency
157+
type: string
158+
uid:
159+
description: |-
160+
UID of the referent.
161+
More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#uids
162+
type: string
163+
type: object
164+
x-kubernetes-map-type: atomic
165+
desiredNumberScheduled:
166+
description: |-
167+
The total number of nodes that should be running the device plugin
168+
pod (including nodes correctly running the device plugin pod).
169+
format: int32
170+
type: integer
171+
nodeNames:
172+
description: The list of Node names where the device plugin pods are
173+
running.
174+
items:
175+
type: string
176+
type: array
177+
numberReady:
178+
description: |-
179+
The number of nodes that should be running the device plugin pod and have one
180+
or more of the device plugin pod running and ready.
181+
format: int32
182+
type: integer
183+
required:
184+
- desiredNumberScheduled
185+
- numberReady
186+
type: object
187+
type: object
188+
served: true
189+
storage: true
190+
subresources:
191+
status: {}

deployments/operator/crd/kustomization.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ resources:
99
- bases/deviceplugin.intel.com_dsadeviceplugins.yaml
1010
- bases/deviceplugin.intel.com_iaadeviceplugins.yaml
1111
- bases/deviceplugin.intel.com_dlbdeviceplugins.yaml
12+
- bases/deviceplugin.intel.com_npudeviceplugins.yaml
1213
- bases/fpga.intel.com_acceleratorfunctions.yaml
1314
- bases/fpga.intel.com_fpgaregions.yaml
1415
# +kubebuilder:scaffold:crdkustomizeresource
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
resources:
2+
- ../../default
3+
4+
patches:
5+
- path: npu.yaml
6+
target:
7+
kind: Deployment
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
apiVersion: apps/v1
2+
kind: Deployment
3+
metadata:
4+
name: intel-deviceplugins-controller-manager
5+
namespace: inteldeviceplugins-system
6+
spec:
7+
template:
8+
spec:
9+
containers:
10+
- args:
11+
- --metrics-bind-address=127.0.0.1:8080
12+
- --leader-elect
13+
- --devices=npu
14+
name: manager

deployments/operator/rbac/role.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,7 @@ rules:
6464
- fpgadeviceplugins
6565
- gpudeviceplugins
6666
- iaadeviceplugins
67+
- npudeviceplugins
6768
- qatdeviceplugins
6869
- sgxdeviceplugins
6970
verbs:
@@ -82,6 +83,7 @@ rules:
8283
- fpgadeviceplugins/finalizers
8384
- gpudeviceplugins/finalizers
8485
- iaadeviceplugins/finalizers
86+
- npudeviceplugins/finalizers
8587
- qatdeviceplugins/finalizers
8688
- sgxdeviceplugins/finalizers
8789
verbs:
@@ -94,6 +96,7 @@ rules:
9496
- fpgadeviceplugins/status
9597
- gpudeviceplugins/status
9698
- iaadeviceplugins/status
99+
- npudeviceplugins/status
97100
- qatdeviceplugins/status
98101
- sgxdeviceplugins/status
99102
verbs:
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
apiVersion: deviceplugin.intel.com/v1
2+
kind: NpuDevicePlugin
3+
metadata:
4+
name: npudeviceplugin-sample
5+
spec:
6+
image: intel/intel-npu-plugin:0.32.0
7+
sharedDevNum: 1
8+
logLevel: 4
9+
nodeSelector:
10+
intel.feature.node.kubernetes.io/npu: "true"

deployments/operator/webhook/manifests.yaml

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -104,6 +104,26 @@ webhooks:
104104
resources:
105105
- iaadeviceplugins
106106
sideEffects: None
107+
- admissionReviewVersions:
108+
- v1
109+
clientConfig:
110+
service:
111+
name: webhook-service
112+
namespace: system
113+
path: /mutate-deviceplugin-intel-com-v1-npudeviceplugin
114+
failurePolicy: Fail
115+
name: mnpudeviceplugin.kb.io
116+
rules:
117+
- apiGroups:
118+
- deviceplugin.intel.com
119+
apiVersions:
120+
- v1
121+
operations:
122+
- CREATE
123+
- UPDATE
124+
resources:
125+
- npudeviceplugins
126+
sideEffects: None
107127
- admissionReviewVersions:
108128
- v1
109129
clientConfig:
@@ -290,6 +310,26 @@ webhooks:
290310
resources:
291311
- iaadeviceplugins
292312
sideEffects: None
313+
- admissionReviewVersions:
314+
- v1
315+
clientConfig:
316+
service:
317+
name: webhook-service
318+
namespace: system
319+
path: /validate-deviceplugin-intel-com-v1-npudeviceplugin
320+
failurePolicy: Fail
321+
name: vnpudeviceplugin.kb.io
322+
rules:
323+
- apiGroups:
324+
- deviceplugin.intel.com
325+
apiVersions:
326+
- v1
327+
operations:
328+
- CREATE
329+
- UPDATE
330+
resources:
331+
- npudeviceplugins
332+
sideEffects: None
293333
- admissionReviewVersions:
294334
- v1
295335
clientConfig:

0 commit comments

Comments
 (0)