Skip to content

Improve "active" metrics handling in WebClient observations #31702

Closed
@jtorkkel

Description

@jtorkkel

Current behavior:

WebClient long task timer task active metrics does not show "destination host" in "client_name" label, instead always shows "none".

http_client_requests_active_seconds_active_count{client_name="none",exception="none",method="none",outcome="UNKNOWN",status="CLIENT_ERROR",uri="none",} 7.0

In service which has more than 1 webClient instance to different remote host all long task timer active metrics are aggregated and metrics cannot detect the number of pending requests per destination host. This decrease the value of the metrics as breakdown per destination host calls cannot be done.

When MVC is used instead of webflux then "client_name" label contains correct destination host".

http_client_requests_active_seconds_active_count{client_name="serviceB.local",exception="none",method="none",outcome="UNKNOWN",status="CLIENT_ERROR",uri="none",} 5.0
http_client_requests_active_seconds_active_count{client_name="serviceC.local",exception="none",method="none",outcome="UNKNOWN",status="CLIENT_ERROR",uri="none",} 2.0

Expected behavior:
webClient should report "client_name" label correctly matching remote destination address shown in normal metrics.

Potential root cause:

DefaultWebClient exchange() store the requestsBuilder() to carrier() before observation start, but "request" is stored to context only after observation is started just before applying the filter.

DefaultClientRequestObservationConvention/clientName() reads the "client_name" from context when long task timer is started and get "null" because context is not set. Thus "client_name" is not available for long task timer active start but only for normal metrics stop.

MVC works because "client_name" is read from carrier.

There were some other changes in DefaultWebClient blame but I am not sure if those change or broke the the "client_name" because context is still set after start.

public class DefaultClientRequestObservationConvention implements ClientRequestObservationConvention {
	protected KeyValue clientName(ClientRequestObservationContext context) {
		if (context.getRequest() != null && context.getRequest().url().getHost() != null) {
			return KeyValue.of(LowCardinalityKeyNames.CLIENT_NAME, context.getRequest().url().getHost());
		}
		return CLIENT_NAME_NONE;
	}
}
final class DefaultWebClient implements WebClient {
		@SuppressWarnings("deprecation")
		@Override
		public Mono<ClientResponse> exchange() {
			ClientRequestObservationContext observationContext = new ClientRequestObservationContext();
			ClientRequest.Builder requestBuilder = initRequestBuilder();
			return Mono.deferContextual(contextView -> {
				Observation observation = ClientHttpObservationDocumentation.HTTP_REACTIVE_CLIENT_EXCHANGES.observation(observationConvention,
						DEFAULT_OBSERVATION_CONVENTION, () -> observationContext, observationRegistry);
				observationContext.setCarrier(requestBuilder); // carrier available in start
				observation
						.parentObservation(contextView.getOrDefault(ObservationThreadLocalAccessor.KEY, null))
						.start();
				ExchangeFilterFunction filterFunction = new ObservationFilterFunction(observationContext);
				if (filterFunctions != null) {
					filterFunction = filterFunctions.andThen(filterFunction);
				}
				ClientRequest request = requestBuilder.build();
				observationContext.setUriTemplate((String) request.attribute(URI_TEMPLATE_ATTRIBUTE).orElse(null));
				observationContext.setRequest(request); // context saved after start
				Mono<ClientResponse> responseMono = filterFunction.apply(exchangeFunction)
						.exchange(request)
						.checkpoint("Request to " + this.httpMethod.name() + " " + this.uri + " [DefaultWebClient]")
						.switchIfEmpty(NO_HTTP_CLIENT_RESPONSE_ERROR);
				if (this.contextModifier != null) {
					responseMono = responseMono.contextWrite(this.contextModifier);
				}
				final AtomicBoolean responseReceived = new AtomicBoolean();
				return responseMono
						.doOnNext(response -> responseReceived.set(true))
						.doOnError(observationContext::setError)
						.doFinally(signalType -> {
							if (signalType == SignalType.CANCEL && !responseReceived.get()) {
								observationContext.setAborted(true);
							}
							observation.stop();
						})
						.contextWrite(context -> context.put(ObservationThreadLocalAccessor.KEY, observation));
			});
		}
}

Possible customization workaround is reading the clientName from context.getCarrier().build() instead from context. But not clear if this is safe way in case webClient implementation change again.

@Configuration
class NdfClientRequestObservationConvention extends DefaultClientRequestObservationConvention {
    private static final KeyValue CLIENT_NAME_NONE = KeyValue.of(ClientHttpObservationDocumentation.LowCardinalityKeyNames.CLIENT_NAME, KeyValue.NONE_VALUE);

    @Override
    protected KeyValue clientName(ClientRequestObservationContext context) {
        if (context.getRequest() != null && context.getRequest().url().getHost() != null) {
            return KeyValue.of(ClientHttpObservationDocumentation.LowCardinalityKeyNames.CLIENT_NAME, context.getRequest().url().getHost());
        }
        if (context.getCarrier() != null && context.getCarrier().build().url().getHost() != null) {
            return KeyValue.of(ClientHttpObservationDocumentation.LowCardinalityKeyNames.CLIENT_NAME, context.getCarrier().build().url().getHost());
        }
        return CLIENT_NAME_NONE;
    }
}

Environment:
Java: 17/21
SpringBoot 3.1.5, same issue in 3.1.6 and 3.2.0

Metadata

Metadata

Assignees

Labels

in: webIssues in web modules (web, webmvc, webflux, websocket)theme: observabilityAn issue related to observability and tracingtype: enhancementA general enhancement

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions