Description
Context
Currently, I'm working on #3067 issue.
To implement this issue, I used the PatternMatchUtils
.
If #3874 is merged, it would be better to consider introducing cache to optimize pattern match.
A consumer will subscribe to only a few topics, and it’s very likely that all ConsumerRecords
for those topics will carry the same headers.
Therefore, the following assumptions are reasonable:
- The number of distinct header types won’t grow without bound (i.e., low cardinality).
- Once a header has been resolved via a pattern match, the result can be reused thereafter without any issues.
- the number of records that need to be processed per second may be in the thousands, tens of thousands, or even hundreds of thousands.
So, If we put result of pattern matching to cache and use it next iteration,
The performance of multi-value headers will be improved drastically.
As artembilan mentioned before, it cause the memory leak problem.
I think It would be better to introduce Caffein for LFU cache.
If this issue makes sense to you guys, I will take a crack. 🙇♂
Simple Performance Result
https://gist.github.com/chickenchickenlove/8c0935883bab446a1ff2efe728b6f99b
Case | Without Cache (ms) | With Cache (ms) |
---|---|---|
First Match – 10,000,000 | 91 | 0 |
First Match – 50,000,000 | 479 | 0 |
First Match – 100,000,000 | 480 | 0 |
Last Match – 10,000,000 | 2500 | 0 |
Last Match – 50,000,000 | 627 | 0 |
Last Match – 100,000,000 | 1112 | 0 |
Last Match and more patterns – 10,000,000 | 202 | 0 |
Last Match and more patterns – 50,000,000 | 1023 | 0 |
Last Match and more patterns – 100,000,000 | 2148 | 0 |