Skip to content

OWLS-91212 - Fix for the introspector retry behavior after the job times out. #2580

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Oct 27, 2021

Conversation

ankedia
Copy link
Member

@ankedia ankedia commented Oct 20, 2021

This change is to fix the introspector retry behavior following a introspection job time out. We changed the logic for incrementing the introspector job failure count. Additionally, we also changed the logic for when to create a new introspection job if a job already exists as part of OWLS-90180 as job might have been created before the operator started.

I noticed that we currently rely on the failure count in domain status when introspector fails with SEVERE error in logs. However, when the introspector job times out, we were relying on the retryCount (failure count) stored in DPI. I have changed to use failure count in domain status for all types of failures.

Fixed couple of NPEs - (1) when the domain resource had a missing secret and (2) when callResponse.getRequestParams() was null, the call to print the ASYNC_NO_RETRY log message was generating NPE.

Finally, this PR also has changes to capture the WDT log files to the LOG_HOME directory and a potential fix for intermittent failure in domain status integration test. Integration tests reesults are at -
https://build.weblogick8s.org:8443/job/weblogic-kubernetes-operator-kind-new/6899/
https://build.weblogick8s.org:8443/job/weblogic-kubernetes-operator-kind-new/6884/

@ankedia
Copy link
Member Author

ankedia commented Oct 21, 2021

The integration test run has 5 failures -> https://build.weblogick8s.org:8443/job/weblogic-kubernetes-operator-kind-new/6817/
The ItMiiDynamicUpdate failures are intermittent and passes when run individually -> https://build.weblogick8s.org:8443/job/weblogic-kubernetes-operator-kind-new/6818/ .
The ItKubernetesEvents.testDomainK8SEventsFailed is failing consistently and I think that test needs to be changed to trigger an introspector failure instead of removing the weblogicCredentialsSecret for DOMAIN_PROCESSING_ABORTED event to be generated.

@ankedia ankedia marked this pull request as ready for review October 21, 2021 14:43
@ankedia
Copy link
Member Author

ankedia commented Oct 21, 2021

Created OWLS-93380 to track the change to the ItKubernetesEvents.testDomainK8SEventsFailed integration test.

@ankedia ankedia changed the base branch from owls_93071 to release/3.3 October 25, 2021 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants