Description
Expected Behavior
As per the docs on 'Monitoring Listener Performance', there are Micrometer
timers called spring.kafka.listener
which are tagged with a result
(success
or failure
) and exception
. I would expect the metrics generated with the failure
tag to capture true failures (e.g. an IOException
from some resource that is used to process records). Any back-off exceptions, which are expected to occur for topics with a delay configured, should be treated separately, e.g. with a different tag value for result
or exception
.
Current Behavior
A failure
timer is recorded whenever a RuntimeException
occurs while processing a record. When dealing with retry topics, this includes a KafkaBackoffException
which may be thrown inside invokeOnMessage
(or the batch equivalent) when the listener determines that the timestamp of the latest record is not ready to be processed yet. The exception
is always recorded as ListenerExecutionFailedException
so there is no way to differentiate back-off exceptions from other exceptions.
Context
I would like to analyze the listener metrics to gain insight into failures (how often they happen, the performance impact, etc.), but I'm interested in application logic failures (e.g. database is unavailable) rather than expected framework level failures (back-off exceptions). I was surprised to see my metrics indicating many failures despite the application logs showing that all records were successfully processed until I realized that the failures must actually be due to these KafkaBackoffException
s.
I could implement my own timers/metrics inside my KafkaListener
, but I would prefer to be able to use the existing timers that are provided by the framework.