-
Notifications
You must be signed in to change notification settings - Fork 542
GEP-1742: HTTPRoute Timeouts API #1997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
accce11
0525880
7696daa
09ca210
18b6894
2799811
90606a8
f07a259
ad0e9c7
db6688b
224d342
7a01f3e
29cc9f0
26e451a
7b6315b
bef97ed
7e20c3d
650874d
659d478
3c0e8e6
6c01fa8
4e7c538
2a1c178
f9181ec
12c731a
ab8283d
bad35be
f0ac5ac
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -301,8 +301,87 @@ Could not find any HTTP specific timeouts. PRs welcomed. 😊 | |
|
||
## API | ||
|
||
TBD. | ||
From the above analysis, it appears that most implementations are capable of supporting | ||
the configuration of simple downstream request timeouts on HTTPRoute rules. This is a | ||
relatively small addition (one new field) that would benefit many users. | ||
|
||
Because request timeouts are very much related to retry attempts, we also propose | ||
addressing issue: [#1731](https://github.com/kubernetes-sigs/gateway-api/issues/1731) | ||
by adding another field for retry conifguration in an HTTPRoute rule. This would also | ||
benefit many users and seems to be implementable by many (most?) Gateway implementations. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you clarify if the client request covers the entire connection from client to backend, or just the downstream request from client to gateway? There is also at least one tunnel timeout (HAProxy) and perhaps some other timeouts that cover the entire connection, not just the downstream and upstream halves of the connection. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is a diagram in the "Timeout values" section that shows the relationships, but you need to look at the rendered markdown to see it. https://github.com/kubernetes-sigs/gateway-api/blob/7e20c3d25368b2c91d0952dfbd842f46e3324b36/geps/gep-1742.md#timeout-values There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I did see the diagram, yet still wanted to get the explanation in words and in the godoc of the implementation section for I don't see any timeout that would cover the entire transaction, unless |
||
### GO | ||
|
||
```go | ||
type HTTPRouteRule struct { | ||
// Timeout for HTTP requests, disabled by default. | ||
Timeout *time.Duration `json:"timeout,omitempty"` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we expand the specification for this a bit? Given the GEP already documents probably 20 types of "timeout" in https://gateway-api.sigs.k8s.io/geps/gep-1742/?h=1742#flow-diagrams-with-available-timeouts, having just a single field as "timeout" with no further explanation seems a bit ambiguous. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added a paragraph above. PTAL There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. instead of a flat field, should this be tiered, so you can start off with a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @arkodg Good question. I think it depends on whether implementations support configuring other kinds of timeouts at the HTTPRoute level. Based on the diagrams in the previous sections, I think Envoy does not, but unclear about the others. @youngnick do you have more insight on this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note that this GEP is actually defining two separate timeouts here... 🙂
Both of these timeouts are useful! though, per the diagrams above, a given gateway may not be able to honor both. Overall, I would prefer HTTPRouteRule:
timeouts:
request: duration # end-to-end as seen by the client
workloadRequest: duration # single request as seen by the gateway with no There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @kflynn This suggestion sounds quite good to me. The reason that I dragged retries into this GEP in the first place is because perTryTimeout was so tightly coupled to retries, but moving it out and generalizing it like you suggest allows for better separation of concerns and to only focus on Timeouts in this GEP. I'm happy to make this change but will give others some time to comment first. |
||
// Retry policy for HTTP requests. | ||
Retries *HTTPRetry `json:"retries,omitempty"` | ||
// ... | ||
} | ||
|
||
type HTTPRetry struct { | ||
// Number of retries to be allowed for a given request. | ||
Attempts int32 `json:"attempts,omitempty"` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have quite a few comments on retries, but I really think that the retry portion here needs to be pulled out into a different GEP. 🙂 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 to this |
||
// Timeout per attempt for a given request, including the initial call and any retries. | ||
// Default is the value of the HTTPRouteRule request Timeout. | ||
PerTryTimeout *time.Duration `json:"per_try_timeout,omitempty"` | ||
// Specifies the conditions under which retry takes place. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What conditions are valid? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I explain below that HTTP error codes are valid and must be supported by implementations. Other implementation-specific identifier are also valid, depending on the implementation.
|
||
// One or more conditions can be specified using a ‘,’ delimited list. | ||
RetryOn string `json:"retry_on,omitempty"` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If its a list it should be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated |
||
// Flag to specify whether the retries should retry to other localities. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems under-specified given there is no mention of "localities" anywhere else in the repo. Can we expand on this a bit. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added another section below. |
||
RetryRemoteLocalities *bool `json:"retry_remote_localities,omitempty"` | ||
} | ||
``` | ||
|
||
### YAML | ||
|
||
```yaml | ||
apiVersion: gateway.networking.k8s.io/v1beta1 | ||
kind: HTTPRoute | ||
metadata: | ||
name: timeout-example | ||
spec: | ||
... | ||
rules: | ||
- backendRefs: | ||
- name: some-service | ||
port: 8080 | ||
timeout: 10s | ||
retries: | ||
attempts: 3 | ||
perTryTimeout: 2s | ||
retryOn: 503 | ||
``` | ||
|
||
### Timeout values | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this section should go before the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
|
||
Timeout and PerTryTimeout values are formatted as 1h/1m/1s/1ms and MUST BE >= 1ms. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this used anywhere else in gateway-api or core k8s? Is there a more formal spec? is 1h5m valid? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I couldn't find any other use of duration in gateway-api. I couldn't find it in core k8s either, but maybe not looking in the right place? Otherwise, I think this is just the Go time.Duration string serialization that we'd use? |
||
|
||
### Retry attempts | ||
|
||
By default, the number of retry attempts is implementation dependent. An implementation MAY | ||
provide external mechanisims for controlling the default retry behavior. | ||
|
||
When a request Timeout in the HTTPRouteRule or PerTryTimeout value is configured, the actual | ||
number of retries attempted may be reduced depending on the specified request Timeout and | ||
PerTryTimeout values. | ||
|
||
An implementation MUST ensure that the number of retries never exceeds the Attempts value, | ||
if specified. | ||
|
||
The interval between retries is implementation dependent but SHOULD be a minimum of 25ms. | ||
|
||
### RetryOn Conditions | ||
|
||
The value of an individual RetryOn condition can be either an HTTP status codes (e.g., 503) | ||
or some other identifier known to the implementation. | ||
|
||
An implementation MUST support any HTTP status code value for a RetryOn condition. | ||
|
||
The default conditions that will trigger retries, if no RetryOn conditions are specified, | ||
is implementation dependent. | ||
|
||
## Alternatives | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I first read this I wondered why it only focused on HTTPRoute, but later I realized the original GEP specified that focus. It would be a good idea to change the title to "HTTPRoute Timeouts API", or else mention in Non-Goals that other route types are not covered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Title changed