Skip to content

Scaling Operators  #877

Open
Open
@csviri

Description

@csviri

DISCLAIMER: Note that need for scaling operators disputable problem, putting it here, since this is a quite simple solution that is feasible to implement. We do not plant to implement this, it's rather part of discussions and explore the topic, also as part for the options we have on table regarding performance improvements and scaling.

The Problem

Currently we can run only 1 instance of operator. The fundamental problem is that, operator instance receives all the events, so multiple instances would process same events in parallel. In the following will build up and see how to overcome this problem with a simple approach with drawbacks and then taking it a step further that might handle the event more efficiently (but added complexity).

Solution - First Step

The solution in short is to deploy instances of operator as a StatefulSet. So every pod / instance (for sake of simplicity let's talk about 3 instances) will receive and ordinal index: 0,1,2. So we will have three pods, with operators running inside will be aware of their ordinal index and the number of replicas (in our case 3).

The trick is that the operator will filter the events. All the events received (either related to the custom resource or dependent resources - mostly from informers) eventually are further propagated as an internal event, with a ResourceID (a name + namespace of the target custom resource ) in it. This ResourceID identifies the target custom resource we want to reconcile. What we can do is hash this resource id, and mod it by the number of replicas: hash(ResourceID) mod [number of replicas] in our case 3. and only process the events when this number is equal the ordinal index of the pod.

In this way we will receive all the events from K8S API but will just drop the events which are not valid for us based on the approach above, so not process them at all, not even cache them.

Untitled Diagram drawio

This works nicely with a set of predefined number of instances. To have it auto scaled, we would need to reconfigure the instances (mainly the number of replicas input, this is doable quite nicely using config maps, or just watching the StatefulSet inside the operator).

Improvement: Pre-Hashing with Mutating Webhook

In the previous approach we would receive still all the events regarding all the resources. What we could do is use label selectors and already ask for the events related to the particular operator instance. For this we need to provide a mutation webhook that will add the labels to the custom resources. Again we can use the same formula just adding a label:

metadata:
  labels:
    operatorid: [hash(name+namespace?) mod replica-count ]

(Or just in a round-robin fashion?)

This can be done on creation of the resource. Note that on the secondary resource the operator can add the labels when it's creating them, we need the admission controller just for the main custom resource. We have to make sure also that the label
is not removed (but can be updated), that might be done via validation admission controller.

Although for auto-scaling (mainly downscaling) this might mean reprocessing the resources, which is not desriable.

Untitled Diagram drawio (1)

Remarks

  • If a pod crashes and restarts and will produce the events it just missed, since will receive the same ordinal index.
  • Note that this is not necessarily about auto scaling, with little tweaks that works too.

Implementation

To implement this we just need filters or label selectors added to the Event Sources, which are already possible.

And have a simple webhook implemented, for the improved case.

(Maybe we can think about a generic filtering mechanism (think of Servlet filter) that is cross cutting through all event sources, will create a separate issue for that)

Notes

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions