Skip to content

GEP-1742: HTTPRoute Timeouts API #1997

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 28 commits into from
Jun 26, 2023
Merged
Show file tree
Hide file tree
Changes from 26 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
182 changes: 174 additions & 8 deletions geps/gep-1742.md
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I first read this I wondered why it only focused on HTTPRoute, but later I realized the original GEP specified that focus. It would be a good idea to change the title to "HTTPRoute Timeouts API", or else mention in Non-Goals that other route types are not covered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Title changed

Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# GEP-1742: Timeouts
# GEP-1742: HTTPRoute Timeouts

* Issue: [#1742](https://github.com/kubernetes-sigs/gateway-api/issues/1742)
* Status: Provisional
* Status: Implementable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did we end up back at Implementable here, did I miss something after https://github.com/kubernetes-sigs/gateway-api/pull/1997/files#r1204513701?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shaneutt, it's been a couple of weeks now, and we've changed it to be Extended support, so I'm suggesting that we can make it Implementable now, so that we can also merge the related interfaces PR (#2013).

See #1997 (comment)

Let me know if you think it's still too soon, and I'll switch it back to Provisional.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I think I just missed the comment. I can be convinced that we should move forward with Implementable at this point, I just wanna make sure nobody else has any confusion about that.

/cc @robscott @youngnick

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think going direct to implementable makes sense here, but you'll also need to make a corresponding change to update the status in the navigation as well:

- geps/gep-1742.md


(See status definitions [here](overview.md#status).)

Expand All @@ -12,12 +12,12 @@ timeouts for different types of connection.

## Goals

- Create some method to configure some timeouts
- Create some method to configure some timeouts.
- Timeout config must be applicable to most if not all Gateway API implementations.

## Non-Goals

- TBD
- A standard API for every possible timeout that implementations may support.

## Introduction

Expand Down Expand Up @@ -298,16 +298,182 @@ Could not find any HTTP specific timeouts. PRs welcomed. 😊

Could not find any HTTP specific timeouts. PRs welcomed. 😊


## API

TBD.
The above diagrams show that there are many different kinds of configurable timeouts
supported by Gateway implementations: connect, idle, request, upstream, downstream.
Although there may be opportunity for the specification of a common API for more of
them in the future, this GEP will focus on the L7 timeouts in HTTPRoutes that are
most valuable to clients.
Comment on lines +306 to +307
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one of the most important questions here is where the timeout config belongs. This proposes HTTPRouteRule, but I think that both HTTPBackendRef and/or a new policy to extend Service would be compelling alternatives. It would be helpful if this GEP described why HTTPRouteRule is proposed and why the others alternatives were not chosen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some text in the Alternatives section.


From the above analysis, it appears that most implementations are capable of
supporting the configuration of simple client downstream request timeouts on HTTPRoute
rules. This is a relatively small addition that would benefit many users.

Some implementations support configuring a timeout for individual backend requests,
separate from the overall client request timeout. This is particularly useful if a
client HTTP request to a gateway can result in more than one call from the gateway
to the destination backend service, for example, if automatic retries are supported.
Adding support for this would also benefit many users.

### Timeout values

There are 2 kinds of timeouts that can be configured in an `HTTPRouteRule`:

1. `timeouts.request` is the timeout for the Gateway API implementation to send a
response to a client HTTP request. Whether the gateway starts the timeout before
or after the entire client request stream has been received, is implementation dependent.
This field is optional `Extended` support.

1. `timeouts.backendRequest` is a timeout for a single request from the gateway to a backend.
This field is optional `Extended` support. Typically used in conjuction with retry configuration,
if supported by an implementation.
Note that retry configuration will be the subject of a separate GEP (GEP-1731).

```mermaid
sequenceDiagram
participant C as Client
participant P as Proxy
participant U as Upstream
C->>P: Connection Started
note left of P: timeouts.request start time (min)
C->>P: Starts sending Request
C->>P: Finishes Headers
C->>P: Finishes request
note left of P: timeouts.request start time (max)
P->>U: Connection Started
note right of P: timeouts.backendRequest start time
P->>U: Starts sending Request
P->>U: Finishes request
P->>U: Finishes Headers
U->>P: Starts Response
U->>P: Finishes Headers
note right of P: timeouts.backendRequest end time
note left of P: timeouts.request end time
U->>P: Finishes Response
note right of P: Repeat if retry
P->>C: Starts Response
P->>C: Finishes Headers
P->>C: Finishes Response
Note right of P: Repeat if connection sharing
U->>C: Connection ended
```

Both timeout fields are string duration values as specified by
[Golang time.ParseDuration](https://pkg.go.dev/time#ParseDuration) and MUST be >= 1ms
or 0 to disable (no timeout).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify if the client request covers the entire connection from client to backend, or just the downstream request from client to gateway? There is also at least one tunnel timeout (HAProxy) and perhaps some other timeouts that cover the entire connection, not just the downstream and upstream halves of the connection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a diagram in the "Timeout values" section that shows the relationships, but you need to look at the rendered markdown to see it. https://github.com/kubernetes-sigs/gateway-api/blob/7e20c3d25368b2c91d0952dfbd842f46e3324b36/geps/gep-1742.md#timeout-values

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did see the diagram, yet still wanted to get the explanation in words and in the godoc of the implementation section for Request and BackendRequest. I asked about something similar in that section.

I don't see any timeout that would cover the entire transaction, unless Request does.

### GO

```go
type HTTPRouteRule struct {
// Timeouts defines the timeouts that can be configured for an HTTP request.
//
// Support: Extended
//
// +optional
// <gateway:experimental>
Timeouts *HTTPRouteTimeouts `json:"timeouts,omitempty"`

// ...
}

// HTTPRouteTimeouts defines timeouts that can be configured for an HTTPRoute.
// Timeout values are formatted like 1h/1m/1s/1ms as parsed by Golang time.ParseDuration
// and MUST BE >= 1ms or 0 to disable (no timeout).
type HTTPRouteTimeouts struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also add a name and/or type to the HTTPRouteTimeouts in case there are multiple needed for any given route. For example, maybe an idle timeout would be a different duration than a connect timeout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand what you're suggesting here. If we wanted (in the future) to add another timeout (e.g., idle timeout), wouldn't we just add another field to HTTPRouteTimeouts?

rules:
- backendRef:
    ...
  timeouts:
    request: 2s
    idle: 10s

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok, so maybe you need to clarify in Goal and Non-Goals which timeout values you're addressing. If you call it "request", that doesn't mean much. What transactions are covered in "request"?

// Request specifies the duration for processing an HTTP client request after which the
// gateway will time out if unable to send a response.
//
// For example, setting the `rules.timeouts.request` field to the value `10s` in an
// `HTTPRoute` will cause a timeout if a client request is taking longer than 10 seconds
// to complete.
//
// This timeout is intended to cover as close to the whole request-response transaction
// as possible although an implementation MAY choose to start the timeout after the entire
// request stream has been received instead of immediately after the transaction is
// initiated by the client.
//
// When this field is unspecified, request timeout behavior is implementation-dependent.
//
// Support: Extended
//
// +optional
// +kubebuilder:validation:Format=duration
Request *metav1.Duration `json:"request,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Request *metav1.Duration `json:"request,omitempty"`
ClientRequest *metav1.Duration `json:"clientRequest,omitempty"`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to prefer keeping it short-and-sweet (Request) but will switch if others also think ClientRequest would be better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally prefer greater specificity too, but this timeout as designed is broadly similar to what's called a "request timeout" by a few of the proxy implementations at least, so I think in this case, the more general term is okay.


// BackendRequest specifies a timeout for an individual request from the gateway
// to a backend service. This covers the time from when the request first starts being
// sent from the gateway to when the full response has been received from the backend.
//
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see the definition in the diagram below, but this struct needs to have a full definition of what BackendRequest means.

That is, there should be something like this here:

The BackendRequest timeout covers the time from when the first request starts being sent from the Gateway to when the full response has been received from the backend.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// An entire client HTTP transaction with a gateway, covered by the Request timeout,
// may result in more than one call from the gateway to the destination backend service,
// for example, if automatic retries are supported.
//
// Because the Request timeout encompasses the BackendRequest timeout,
// the value of BackendRequest defaults to and must be <= the value of Request timeout.
//
// Support: Extended
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if you have 2 routes with same backend, and define 2 different backend timeouts ?

Will it mean that the backend request will have different timeouts depending on path ? Are all implementations capable of doing this ?

Copy link
Contributor Author

@frankbu frankbu May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it mean that the backend request will have different timeouts depending on path ? Are all implementations capable of doing this ?

Yes, the backend request timeout is client request specific. Envoy supports this (perTryTimeout) but all implementations probably not, so that's why it's Extended (instead of Core) support.

//
// +optional
// +kubebuilder:validation:Format=duration
BackendRequest *metav1.Duration `json:"backendRequest,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should document it should be LET Request timeout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hzxuzhonghu I don't quite understand what you are suggesting. Can you clarify please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be LET

Oh, did you mean to say LESS? If so, good idea. Done.

}
```

### YAML

```yaml
apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
name: timeout-example
spec:
...
rules:
- backendRefs:
- name: some-service
port: 8080
timeouts:
request: 10s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not ISO 8601 as specified

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to say format is as time.ParseDuration

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thats probably my fault, I added a link to the POC PR mentioning ISO 8601 but didnt read much further into the actual format

apimachinery metav1.Duration is pretty light on documentation too, can maybe just rip the field documentation from go if we need to be specific: https://pkg.go.dev/maze.io/x/duration#ParseDuration

backendRequest: 2s
```

## Alternatives

(List other design alternatives and why we did not go in that
direction)
Timeouts could be configured using policy attachments or in objects other than `HTTPRouteRule`.

### Policy Attachment

Instead of configuring timeouts directly on an API object, they could be configured using policy
attachments. The advantage to this approach would be that timeout policies can be not only
configured for an `HTTPRouteRule`, but can also be added/overriden at a more fine
(e.g., `HTTPBackendRef`) or course (e.g. `HTTPRoute`) level of granularity.

The downside, however, is complexity introduced for the most common use case, adding a simple
timeout for an HTTP request. Setting a single field in the route rule, instead of needing to
create a policy resource, for this simple case seems much better.

In the future, we could consider using policy attachments to configure less common kinds of
timeouts that may be needed, but it would probably be better to instead extend the proposed API
to support those timeouts as well.

The default values of the proposed timeout fields could also be overriden
using policy attachments in the future. For example, a policy attachment could be used to set the
default value of `rules.timeouts.request` for all routes under an `HTTPRoute` or `Gateway`.

### Other API Objects

The new timeouts field could be added to a different API struct, instead of `HTTPRouteRule`.

Putting it on an `HTTPBackendRef`, for example, would allow users to set different timeouts for different
backends. This is a feature that we believe has not been requested by existing proxy or service mesh
clients and is also not implementable using available timeouts of most proxies.

Another alternative is to move the timeouts configuration up a level in the API to `HTTPRoute`. This
would be convenient when a user wants the same timeout on all rules, but would be overly restrictive.
Using policy attachments to override the default timeout value for all rules, as described in the
previous section, is likely a better way to handle timeout configuration above the route rule level.

## References

Expand Down
5 changes: 3 additions & 2 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -79,12 +79,13 @@ nav:
- Declined:
- geps/gep-735.md
- Provisional:
- geps/gep-1742.md
- geps/gep-1426.md
- geps/gep-1324.md
- geps/gep-1282.md
- Prototyping:
- geps/gep-1709.md
- Implementable:
- geps/gep-1742.md
- geps/gep-1426.md
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch that we forgot to move this one as well previously, thanks.

- Experimental:
- geps/gep-1748.md
- geps/gep-1323.md
Expand Down