Skip to content

Differentiate back-off exceptions from 'real' application errors in Listener Micrometer timer metrics for retry topics #2237

Open
@theopigott

Description

@theopigott

Expected Behavior

As per the docs on 'Monitoring Listener Performance', there are Micrometer timers called spring.kafka.listener which are tagged with a result (success or failure) and exception. I would expect the metrics generated with the failure tag to capture true failures (e.g. an IOException from some resource that is used to process records). Any back-off exceptions, which are expected to occur for topics with a delay configured, should be treated separately, e.g. with a different tag value for result or exception.

Current Behavior

A failure timer is recorded whenever a RuntimeException occurs while processing a record. When dealing with retry topics, this includes a KafkaBackoffException which may be thrown inside invokeOnMessage (or the batch equivalent) when the listener determines that the timestamp of the latest record is not ready to be processed yet. The exception is always recorded as ListenerExecutionFailedException so there is no way to differentiate back-off exceptions from other exceptions.

Context

I would like to analyze the listener metrics to gain insight into failures (how often they happen, the performance impact, etc.), but I'm interested in application logic failures (e.g. database is unavailable) rather than expected framework level failures (back-off exceptions). I was surprised to see my metrics indicating many failures despite the application logs showing that all records were successfully processed until I realized that the failures must actually be due to these KafkaBackoffExceptions.

I could implement my own timers/metrics inside my KafkaListener, but I would prefer to be able to use the existing timers that are provided by the framework.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions