Unnecessary `error` events in `ListWatch` (cache/informer)

**Describe the bug**
As of `0.17.0`, there is an increased number of `error` events emitted from the `ListWatch`, which results in, if you follow the example code, in unnecessary restarts of the informer.

**Client Version**
`0.17.1`

**Server Version**
`1.22.x`

**To Reproduce**

- Keep the informer open for long enough and observe the behavior:
    - In `0.15.0`, you will notice a lot of reconnections with calls to `/list` to repopulate the cache
    - In `0.16.0`, you will notice "freezes" (likely #795)
    - In `0.17.0`, you will notice a lot of `error` events emitted with `ECONNRESET`, happening _after_ `req.abort()` is called, i.e. they're no longer relevant.

**Expected behavior**

- The informer should quietly reconnect to the API, restart watches when necessary, repopulate cache by calling `listFn()` when neccessary

**Example Code**

Assuming it's reproducible with `cache-example.js`.

**Environment (please complete the following information):**

Likely not relevant.

**Additional context**

Here's my attempt to reconstruct what happened:

1. #711 was merged and released as part of `0.16.0`. It included caching improvements (🎉) which avoid calling `listFn()` when unnecessary.
2. The above PR has an issue, because it [likely?] resulted in new `ERROR` events getting emitted via the watch feed - they were not being handled. #781 addressed this, by calling `doneHandler()` with an error.
    - I believe that #795 describes the issue, as well as [this comment on #596](https://github.com/kubernetes-client/javascript/issues/596#issuecomment-1155491107), as it reports using `0.16.3` and we've experienced similar problems when we've tried to upgrade to 0.16. The fix was included in `0.17.0`
3. The fix in #781 introduced additional calls to `doneHandler()`, and by virtue of that - to the `_stop()` method which calls `this.request.abort();`. Calling `request.abort()` occasionally results in an `ECONNRESET` (depending on who closes the socket first - the client or the server) emitted as `request.on('error')`. In the case of handling the `ERROR` event by aborting the existing request, this started happening a lot more often in `0.17.0`.
	- In `<=0.15.x`, this was likely being hidden by the fact that `watch.ts` ensures that `done()` is only called once - i.e. there's likely the stream `close` or some such event arriving before the `request.on('error')` is emitted with the `ECONNRESET`.

**Proposed fix options**

I haven't yet made up my mind on what is the best approach for the fix :) Internally, we'll probably monkeypatch the `_stop()` method with the first option until a proper fix is rolled out.

- **Option 1**. Disable the `error` handler in the `ListWatch._stop()`, something along the lines of:
```diff
+           this.request.removeAllListeners('error');
+           this.request.on('error', () => { /* void */ })
            this.request.abort();
            this.request = undefined;
```

I don't like this, as it is using `removeAllListeners`, as it does not have access or knowledge of the `doneHandler` to only remove the specific handler that `Watch.watch()` creates. It's fine for my internal purposes, where we're sure we don't have any other `error` handlers, but might be less so for generic library code?

That said, it's definitely the easiest one and might just be the most practical one, depending on when the `1.x` / switch to `fetch()` is coming?

- **Option 2**. Wrap the `req.abort()` method in `DefaultRequest` to remove the error handlers. I don't particularly like this, as it still has the same issue with `removeAllListeners` not knowing about which specific handler to remove, let alone monkeypatching the `abort()`.
- **Option 3**. Wrap the `req.abort()` method in `Watch.watch()` to call `off('error', doneCallOnce)` for both - the request and the stream. Same as above, but avoids `removeAllListeners` in favor of an `off` for a specific callback.

Another downside of all of the three of the above is that they are effectively `request` specific - not sure if it will be relevant for the `fetch()` based approach - it is also technically a breaking change, since the `RequestResult` will require a new method to be exported (`removeAllListeners`) for people who provide their own `requestImpl`.

- **Option 4**. Wrap the `req.abort()` and set an `aborted` variable inside the closure, and then ignore any request errors that come after it is set (we probably still need to handle stream closure as `doneCallOnce(null)`?). Same as above - I don't like monkeypatching the `abort()`, but at least there is no breaking change in the `RequestResult`.
- **Option 5**. Refactor `Watch.watch()` in such a way that we can do a `Watch.abort()` or something like that, so that at least we avoid monkeypatching `request.abort()`. This is probably the least disruptive option, but it feels like it's a bit more work than the others...

Are there any better ideas for implementation? Happy to take a stab at a PR, if there's an agreement on the approach.

For longer term, I would think that maybe the `Watch` interface needs a re-haul anyways - it takes two callbacks and returns a promise - that alone feels like it should be an event emitter of its own?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unnecessary `error` events in `ListWatch` (cache/informer) #881

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unnecessary error events in ListWatch (cache/informer) #881

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Unnecessary `error` events in `ListWatch` (cache/informer) #881