Open
Description
Today, if a call encounters Http timeout, the metric does not include how much time was spent in that particular attempt. Given that SDK metrics are really to debug such issues, it will be very helpful if the ApiCallAttempt
metric collection included the ServiceCallDuration
metric.
I have reproduced this issue the following way:
A simple program to make an SDK call with a very small timeout guaranteed to throw before the call can finish:
KinesisClient kc = ...;
ks.listStreams(
ListStreamsRequest.builder()
.overrideConfiguration(c -> c..apiCallAttemptTimeout(Duration.ofMillis(5))) // almost guaranteed to timeout
.build()
);
Running this code logs the following metrics:
MetricCollection(
name=ApiCall,
metrics=[
MetricRecord(metric=MarshallingDuration, value=PT0.04227928S),
MetricRecord(metric=RetryCount, value=3),
MetricRecord(metric=ApiCallSuccessful, value=false),
MetricRecord(metric=OperationName, value=ListStreams),
MetricRecord(metric=ApiCallDuration, value=PT0.754417876S),
MetricRecord(metric=CredentialsFetchDuration, value=PT1.542674972S),
MetricRecord(metric=ServiceId, value=Kinesis)
],
children=[
MetricCollection(
name=ApiCallAttempt,
metrics=[
MetricRecord(metric=BackoffDelayDuration, value=PT0S),
MetricRecord(metric=SigningDuration, value=PT0.019085065S)
],
children=[
MetricCollection(
name=HttpClient,
metrics=[
MetricRecord(metric=HttpClientName, value=Apache)
],
children=[]
)
]
),
MetricCollection(
name=ApiCallAttempt,
metrics=[
MetricRecord(metric=BackoffDelayDuration, value=PT0.091S),
MetricRecord(metric=SigningDuration, value=PT0.001483931S)
],
children=[
MetricCollection(
name=HttpClient,
metrics=[
MetricRecord(metric=HttpClientName, value=Apache)
],
children=[]
)
]
),
MetricCollection(
name=ApiCallAttempt,
metrics=[
MetricRecord(metric=BackoffDelayDuration, value=PT0.172S),
MetricRecord(metric=SigningDuration, value=PT0.001785582S)
],
children=[
MetricCollection(
name=HttpClient,
metrics=[
MetricRecord(metric=HttpClientName, value=Apache)
],
children=[]
)
]
),
MetricCollection(
name=ApiCallAttempt,
metrics=[
MetricRecord(metric=BackoffDelayDuration, value=PT0.205S),
MetricRecord(metric=SigningDuration, value=PT0.001690817S)
],
children=[
MetricCollection(
name=HttpClient,
metrics=[
MetricRecord(metric=HttpClientName, value=Apache)
],
children=[]
)
]
)
]
)
We can see that the ApiCallAttempt metrics does not include the ServiceCallDuration
.