OWLS-88571 - Potential fixes for domain startup issues in large k8s cluster when watch events are not delivered. #2305
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Potential fixes for the domain startup issue in GBU CNE large k8s cluster where watch events are not reliably delivered by k8s. Currently, the introspection pod is not started when the domain is deleted and recreated using the
kubectl apply
command if there are previously running fibers for the domain. In such cases, domain status is null and this change starts a new fiber if the domain status is null. The second change is to discard the API client and associated HTTP client if the client gets into a bad state due to ProtocolException (and also due to a potential bug in okhttp3/okio library) and create a new client instance. Creating this draft PR to review these changes so that they can be provided to the GBU CNE team for further testing while we discuss other approaches. We're also discussing the unit testing approach to diagnose this type of issues in case of missed watch events and verify fixes.