Skip to content

Reflector list should handle expired resource version from watch handler as well #1628

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

aaruna
Copy link
Contributor

@aaruna aaruna commented Apr 2, 2021

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: aaruna
To complete the pull request process, please assign yue9944882 after the PR has been reviewed.
You can assign the PR to them by writing /assign @yue9944882 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 2, 2021
@aaruna aaruna force-pushed the setIsLastSyncResourceVersionUnavailable branch from 7eed8bb to 36edb18 Compare April 2, 2021 15:53
@brendandburns
Copy link
Contributor

This looks good to me, but I want @yue9944882 to take a look at this also, since he was the primary author of this code.

@aaruna
Copy link
Contributor Author

aaruna commented Apr 8, 2021

@yue9944882 Any thoughts on this?

"ResourceVersion {} and Watch connection expired: {} , will retry w/o resourceVersion next time",
getRelistResourceVersion(),
item.status.getMessage());
isLastSyncResourceVersionUnavailable = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaruna upon 410 status code, the watch handler is supposed to return and we deal w/ the RV expiration at the list call. this is working as designed. https://github.com/kubernetes/kubernetes/blob/a96000311f5beca111debc8727018e42cbb5dc79/staging/src/k8s.io/client-go/tools/cache/reflector.go#L129-L131

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you referring to the next list call? Because the current watch handler won't throw an ApiException and only that is handled in the catch block above. (The unit test added in this PR shows an example of this scenario)

CC @jiangzhou in case I have missed anything in the scenario above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you referring to the next list call?

yeah. IIRC watch handler doesnt directly throw an ApiException but it will deal w/ error status nested in the decoded watch event.

if (eventType.get() == EventType.ERROR) {
if (item.status != null && item.status.getCode() == HttpURLConnection.HTTP_GONE) {
log.info("Watch connection expired: {}", item.status.getMessage());
return;
} else {
String errorMessage =
String.format("got ERROR event and its status: %s", item.status.toString());
log.error(errorMessage);
throw new RuntimeException(errorMessage);
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that's not resetting isLastSyncResourceVersionUnavailable=true. The unit test below has an example

@yue9944882
Copy link
Member

/hold

for discussion

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 12, 2021
@brendandburns
Copy link
Contributor

@yue9944882 what is the status of this PR/Discussion?

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 24, 2021
@tony-clarke-amdocs
Copy link
Contributor

I have upgrade to client 13.0.0 and now we are floored with these messages. It looks like a very clear regression. I am getting deja vu. It looks like the same pattern, we are not resetting isLastSyncResourceVersionUnavailable back to true when HTTP_GONE occurs. @aaruna change looks good to me. 13.0.0 is unusable as it is. Waiting for update on this. @aaruna can you rebase?

@aaruna aaruna force-pushed the setIsLastSyncResourceVersionUnavailable branch from 36edb18 to 1f9413d Compare September 3, 2021 04:36
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 3, 2021
@aaruna
Copy link
Contributor Author

aaruna commented Sep 3, 2021

@tony-clarke-amdocs I have rebased the PR.

@tony-clarke-amdocs
Copy link
Contributor

Thanks @aaruna. @brendandburns @yue9944882 Can you guys review when you get a chance. It's a blocker for us.

@tony-clarke-amdocs
Copy link
Contributor

@brendandburns @yue9944882 Any ETA for this PR? If there is anything we can do to move it forward please let us know.

rdesgroppes added a commit to rdesgroppes/micronaut-kubernetes that referenced this pull request Oct 11, 2021
Because io.kubernetes client 13.0.0 keeps reinstalling the watch on
expired resources.

Fixes micronaut-projects#374, until kubernetes-client/java#1628 gets released.
@yue9944882
Copy link
Member

/hold cancel

@yue9944882
Copy link
Member

/close

picked and merged #1927

@k8s-ci-robot
Copy link
Contributor

@yue9944882: Closed this PR.

In response to this:

/close

picked and merged #1927

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants