Skip to content

Anil: remove stranded dpi's that are no longer present in recheck domain list #2314

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 15, 2021

Conversation

rjeberhard
Copy link
Member

Anil, I'm giving you a look at this first...

While working on this, I was reminded that the original "stranded" domain use case was where a customer had stopped the operator and then deleted a domain resource. When the operator would later restart, there would be pods, services and perhaps other resources left over, but no domain resource. This is why the dpi would have some resources but no domain.

We are now dealing with a different type of stranded domain -- where the operator has been running the whole time but where the watch events are not firing. In this case, there will still be a cached dpi that has the domain even as a recheck listing of domains will not include that domain. Therefore, we need to treat as deleted any cached domain that isn't found in the recheck list.

@rjeberhard rjeberhard requested a review from ankedia April 14, 2021 22:58
Copy link
Member

@ankedia ankedia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me (just one minor comment). I'm not sure if we need any checks to guard against the null dpi (after stranded dpi has been removed) in the suspended (waiting) fiber after it wakes up. It's possible that this change takes care of that and I missed it.

if ((liveInfo.getDomain() != null) && (!isNewDomain(cachedInfo))
&& (liveInfo.getDomain().getMetadata().getCreationTimestamp()
.isAfter(cachedInfo.getDomain().getMetadata().getCreationTimestamp()))) {
if (liveInfo.getDomain() != null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is probably not needed now since the getCreationTimestamp() method uses Optional.ofNullable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to merge in to your original branch and then I'll see if we can remove that check. (I've not looked at what isNewDomain does)

@rjeberhard rjeberhard merged commit db7395b into owls_88571 Apr 15, 2021
rjeberhard added a commit that referenced this pull request Apr 15, 2021
…luster when watch events are not delivered. (#2305)

* OWLS-88571 - Potential fix for GBU CNE issue to start a new fiber for a recreated domain when previous fiber is stuck.

* Change for the ProtocolException and NPE issue to discard the API client instance and create a new one.

* Change to check if the creationTimestamp of the live domain is newer compared to the cached domain creation timestamp.

* remove stranded dpi's that are no longer present in recheck domain list (#2314)

* Clean-up stranded domain presence info when watches aren't occurring

Co-authored-by: Ryan Eberhard <ryan.eberhard@oracle.com>
@rjeberhard rjeberhard deleted the owls_88571_rje branch January 31, 2022 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants