-
Notifications
You must be signed in to change notification settings - Fork 39
Split log entries so each entry contains a maximum of 100 datapoints #64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
aws_embedded_metrics/constants.py
Outdated
@@ -14,3 +14,4 @@ | |||
DEFAULT_NAMESPACE = "aws-embedded-metrics" | |||
MAX_DIMENSIONS = 9 | |||
MAX_METRICS_PER_EVENT = 100 | |||
MAX_DATAPOINTS_PER_EVENT = 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider renaming to MAX_DATAPOINTS_PER_METRIC
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated!
@@ -53,24 +55,53 @@ def create_body() -> Dict[str, Any]: | |||
current_body: Dict[str, Any] = create_body() | |||
event_batches: List[str] = [] | |||
num_metrics_in_current_body = 0 | |||
num_datapoints_in_current_body = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I'm following why we need to track the number of datapoints in the body. There are 2 requirements:
- You cannot have more than 100 metric names in an EMF event
- A given metric name cannot have more than 100 samples
These two requirements bound the number of metric datapoints in an event to 10_000.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't understand the spec properly, I thought the limit of 100 samples was for the body. I will correct to limit to 100 samples per metric name.
I've updated the logic to limit to 100 datapoints per metric per batch. |
event_batches: List[str] = [] | ||
num_metrics_in_current_body = 0 | ||
missing_data = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some comments about what this is and consider thinking about a more intuitive name? It's not really "missing" data, something more like data left to process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added comments and renamed to "remaining_data".
|
||
# assert | ||
datapoints_count = Counter() | ||
for batch in results: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we have an assertion on the number of expected events?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added an extra assert on the batch count.
I've also added logic and assert to validate that we don't miss any datapoint with the slicing logic.
Compute the full expected results so we ensure the slicing logic doesn't cause any datapoint to be lost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR and for your time!
PR Code Suggestions ✨
|
*Issue #, if available: #10
Description of changes:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.