Skip to content

Unnecessary error events in ListWatch (cache/informer) #881

Closed
@dominykas

Description

@dominykas

Describe the bug
As of 0.17.0, there is an increased number of error events emitted from the ListWatch, which results in, if you follow the example code, in unnecessary restarts of the informer.

Client Version
0.17.1

Server Version
1.22.x

To Reproduce

Expected behavior

  • The informer should quietly reconnect to the API, restart watches when necessary, repopulate cache by calling listFn() when neccessary

Example Code

Assuming it's reproducible with cache-example.js.

Environment (please complete the following information):

Likely not relevant.

Additional context

Here's my attempt to reconstruct what happened:

  1. Attempt to start list watch from last resourceVersion #711 was merged and released as part of 0.16.0. It included caching improvements (🎉) which avoid calling listFn() when unnecessary.
  2. The above PR has an issue, because it [likely?] resulted in new ERROR events getting emitted via the watch feed - they were not being handled. fix(cache): watch errors must call done handler #781 addressed this, by calling doneHandler() with an error.
  3. The fix in fix(cache): watch errors must call done handler #781 introduced additional calls to doneHandler(), and by virtue of that - to the _stop() method which calls this.request.abort();. Calling request.abort() occasionally results in an ECONNRESET (depending on who closes the socket first - the client or the server) emitted as request.on('error'). In the case of handling the ERROR event by aborting the existing request, this started happening a lot more often in 0.17.0.
    • In <=0.15.x, this was likely being hidden by the fact that watch.ts ensures that done() is only called once - i.e. there's likely the stream close or some such event arriving before the request.on('error') is emitted with the ECONNRESET.

Proposed fix options

I haven't yet made up my mind on what is the best approach for the fix :) Internally, we'll probably monkeypatch the _stop() method with the first option until a proper fix is rolled out.

  • Option 1. Disable the error handler in the ListWatch._stop(), something along the lines of:
+           this.request.removeAllListeners('error');
+           this.request.on('error', () => { /* void */ })
            this.request.abort();
            this.request = undefined;

I don't like this, as it is using removeAllListeners, as it does not have access or knowledge of the doneHandler to only remove the specific handler that Watch.watch() creates. It's fine for my internal purposes, where we're sure we don't have any other error handlers, but might be less so for generic library code?

That said, it's definitely the easiest one and might just be the most practical one, depending on when the 1.x / switch to fetch() is coming?

  • Option 2. Wrap the req.abort() method in DefaultRequest to remove the error handlers. I don't particularly like this, as it still has the same issue with removeAllListeners not knowing about which specific handler to remove, let alone monkeypatching the abort().
  • Option 3. Wrap the req.abort() method in Watch.watch() to call off('error', doneCallOnce) for both - the request and the stream. Same as above, but avoids removeAllListeners in favor of an off for a specific callback.

Another downside of all of the three of the above is that they are effectively request specific - not sure if it will be relevant for the fetch() based approach - it is also technically a breaking change, since the RequestResult will require a new method to be exported (removeAllListeners) for people who provide their own requestImpl.

  • Option 4. Wrap the req.abort() and set an aborted variable inside the closure, and then ignore any request errors that come after it is set (we probably still need to handle stream closure as doneCallOnce(null)?). Same as above - I don't like monkeypatching the abort(), but at least there is no breaking change in the RequestResult.
  • Option 5. Refactor Watch.watch() in such a way that we can do a Watch.abort() or something like that, so that at least we avoid monkeypatching request.abort(). This is probably the least disruptive option, but it feels like it's a bit more work than the others...

Are there any better ideas for implementation? Happy to take a stab at a PR, if there's an agreement on the approach.

For longer term, I would think that maybe the Watch interface needs a re-haul anyways - it takes two callbacks and returns a promise - that alone feels like it should be an event emitter of its own?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions