Description
Describe the bug
As of 0.17.0
, there is an increased number of error
events emitted from the ListWatch
, which results in, if you follow the example code, in unnecessary restarts of the informer.
Client Version
0.17.1
Server Version
1.22.x
To Reproduce
- Keep the informer open for long enough and observe the behavior:
- In
0.15.0
, you will notice a lot of reconnections with calls to/list
to repopulate the cache - In
0.16.0
, you will notice "freezes" (likely Informer example requires fix: infomer restarted after onerror event doesn't dispatch other events #795) - In
0.17.0
, you will notice a lot oferror
events emitted withECONNRESET
, happening afterreq.abort()
is called, i.e. they're no longer relevant.
- In
Expected behavior
- The informer should quietly reconnect to the API, restart watches when necessary, repopulate cache by calling
listFn()
when neccessary
Example Code
Assuming it's reproducible with cache-example.js
.
Environment (please complete the following information):
Likely not relevant.
Additional context
Here's my attempt to reconstruct what happened:
- Attempt to start list watch from last resourceVersion #711 was merged and released as part of
0.16.0
. It included caching improvements (🎉) which avoid callinglistFn()
when unnecessary. - The above PR has an issue, because it [likely?] resulted in new
ERROR
events getting emitted via the watch feed - they were not being handled. fix(cache): watch errors must call done handler #781 addressed this, by callingdoneHandler()
with an error.- I believe that Informer example requires fix: infomer restarted after onerror event doesn't dispatch other events #795 describes the issue, as well as this comment on #596, as it reports using
0.16.3
and we've experienced similar problems when we've tried to upgrade to 0.16. The fix was included in0.17.0
- I believe that Informer example requires fix: infomer restarted after onerror event doesn't dispatch other events #795 describes the issue, as well as this comment on #596, as it reports using
- The fix in fix(cache): watch errors must call done handler #781 introduced additional calls to
doneHandler()
, and by virtue of that - to the_stop()
method which callsthis.request.abort();
. Callingrequest.abort()
occasionally results in anECONNRESET
(depending on who closes the socket first - the client or the server) emitted asrequest.on('error')
. In the case of handling theERROR
event by aborting the existing request, this started happening a lot more often in0.17.0
.- In
<=0.15.x
, this was likely being hidden by the fact thatwatch.ts
ensures thatdone()
is only called once - i.e. there's likely the streamclose
or some such event arriving before therequest.on('error')
is emitted with theECONNRESET
.
- In
Proposed fix options
I haven't yet made up my mind on what is the best approach for the fix :) Internally, we'll probably monkeypatch the _stop()
method with the first option until a proper fix is rolled out.
- Option 1. Disable the
error
handler in theListWatch._stop()
, something along the lines of:
+ this.request.removeAllListeners('error');
+ this.request.on('error', () => { /* void */ })
this.request.abort();
this.request = undefined;
I don't like this, as it is using removeAllListeners
, as it does not have access or knowledge of the doneHandler
to only remove the specific handler that Watch.watch()
creates. It's fine for my internal purposes, where we're sure we don't have any other error
handlers, but might be less so for generic library code?
That said, it's definitely the easiest one and might just be the most practical one, depending on when the 1.x
/ switch to fetch()
is coming?
- Option 2. Wrap the
req.abort()
method inDefaultRequest
to remove the error handlers. I don't particularly like this, as it still has the same issue withremoveAllListeners
not knowing about which specific handler to remove, let alone monkeypatching theabort()
. - Option 3. Wrap the
req.abort()
method inWatch.watch()
to calloff('error', doneCallOnce)
for both - the request and the stream. Same as above, but avoidsremoveAllListeners
in favor of anoff
for a specific callback.
Another downside of all of the three of the above is that they are effectively request
specific - not sure if it will be relevant for the fetch()
based approach - it is also technically a breaking change, since the RequestResult
will require a new method to be exported (removeAllListeners
) for people who provide their own requestImpl
.
- Option 4. Wrap the
req.abort()
and set anaborted
variable inside the closure, and then ignore any request errors that come after it is set (we probably still need to handle stream closure asdoneCallOnce(null)
?). Same as above - I don't like monkeypatching theabort()
, but at least there is no breaking change in theRequestResult
. - Option 5. Refactor
Watch.watch()
in such a way that we can do aWatch.abort()
or something like that, so that at least we avoid monkeypatchingrequest.abort()
. This is probably the least disruptive option, but it feels like it's a bit more work than the others...
Are there any better ideas for implementation? Happy to take a stab at a PR, if there's an agreement on the approach.
For longer term, I would think that maybe the Watch
interface needs a re-haul anyways - it takes two callbacks and returns a promise - that alone feels like it should be an event emitter of its own?