Skip to content

Add cache to optimize header match performance. #3879

Open
@chickenchickenlove

Description

@chickenchickenlove

Context

Currently, I'm working on #3067 issue.
To implement this issue, I used the PatternMatchUtils.
If #3874 is merged, it would be better to consider introducing cache to optimize pattern match.

A consumer will subscribe to only a few topics, and it’s very likely that all ConsumerRecords for those topics will carry the same headers.
Therefore, the following assumptions are reasonable:

  • The number of distinct header types won’t grow without bound (i.e., low cardinality).
  • Once a header has been resolved via a pattern match, the result can be reused thereafter without any issues.
  • the number of records that need to be processed per second may be in the thousands, tens of thousands, or even hundreds of thousands.

So, If we put result of pattern matching to cache and use it next iteration,
The performance of multi-value headers will be improved drastically.

As artembilan mentioned before, it cause the memory leak problem.
I think It would be better to introduce Caffein for LFU cache.

If this issue makes sense to you guys, I will take a crack. 🙇‍♂

Simple Performance Result

https://gist.github.com/chickenchickenlove/8c0935883bab446a1ff2efe728b6f99b

Case Without Cache (ms) With Cache (ms)
First Match – 10,000,000 91 0
First Match – 50,000,000 479 0
First Match – 100,000,000 480 0
Last Match – 10,000,000 2500 0
Last Match – 50,000,000 627 0
Last Match – 100,000,000 1112 0
Last Match and more patterns – 10,000,000 202 0
Last Match and more patterns – 50,000,000 1023 0
Last Match and more patterns – 100,000,000 2148 0

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions