Split log entries so each entry contains a maximum of 100 datapoints #64

paulez · 2021-01-15T19:21:19Z

*Issue #, if available: #10

Description of changes:

Submit no more than 100 datapoints per log entry.
Split log entries so each entry contains a maximum of 100 datapoints.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

jaredcnance · 2021-01-17T19:04:31Z

aws_embedded_metrics/constants.py

@@ -14,3 +14,4 @@
 DEFAULT_NAMESPACE = "aws-embedded-metrics"
 MAX_DIMENSIONS = 9
 MAX_METRICS_PER_EVENT = 100
+MAX_DATAPOINTS_PER_EVENT = 100


Consider renaming to MAX_DATAPOINTS_PER_METRIC.

jaredcnance · 2021-01-17T19:10:27Z

aws_embedded_metrics/serializers/log_serializer.py

@@ -53,24 +55,53 @@ def create_body() -> Dict[str, Any]:
        current_body: Dict[str, Any] = create_body()
        event_batches: List[str] = []
        num_metrics_in_current_body = 0
+        num_datapoints_in_current_body = 0


I'm not sure I'm following why we need to track the number of datapoints in the body. There are 2 requirements:

You cannot have more than 100 metric names in an EMF event

A given metric name cannot have more than 100 samples

These two requirements bound the number of metric datapoints in an event to 10_000.

I didn't understand the spec properly, I thought the limit of 100 samples was for the body. I will correct to limit to 100 samples per metric name.

datapoints per metric.

paulez · 2021-01-18T19:42:40Z

I've updated the logic to limit to 100 datapoints per metric per batch.
I've also added a test to combine the 100 datapoints and 100 metrics per batch limit logic.

jaredcnance · 2021-01-27T21:30:38Z

aws_embedded_metrics/serializers/log_serializer.py

        event_batches: List[str] = []
        num_metrics_in_current_body = 0
+        missing_data = True


Can you add some comments about what this is and consider thinking about a more intuitive name? It's not really "missing" data, something more like data left to process.

I've added comments and renamed to "remaining_data".

jaredcnance · 2021-01-27T21:36:21Z

tests/serializer/test_log_serializer.py

+
+    # assert
+    datapoints_count = Counter()
+    for batch in results:


should we have an assertion on the number of expected events?

I've added an extra assert on the batch count.

I've also added logic and assert to validate that we don't miss any datapoint with the slicing logic.

Compute the full expected results so we ensure the slicing logic doesn't cause any datapoint to be lost.

jaredcnance

Thanks for the PR and for your time!

hussam789 · 2025-04-02T12:57:32Z

PR Code Suggestions ✨

Category	Suggestion	Impact
Possible issue	Fix test batch range The test is checking the wrong batches. It's iterating through `range(expected_batches)` but should be checking batches from the extra metric, which would be in the range of `expected_batches` to `expected_batches +` `expected_extra_batches - 1`. tests/serializer/test_log_serializer.py [165-171] # extra metric with more datapoints -for batch_index in range(expected_batches): +for batch_index in range(expected_batches, expected_batches + expected_extra_batches): result_json = results[batch_index] result_obj = json.loads(result_json) metric_name = f"Metric-{metrics}" expected_datapoint_count = extra_datapoints % 100 if (batch_index == expected_batches + expected_extra_batches - 1) else 100 assert len(result_obj[metric_name]) == expected_datapoint_count Apply this suggestion Suggestion importance[1-10]: 7 __ Why: Adjusting the iteration range to cover the extra batches for the metric with more datapoints addresses a likely bug in the test, ensuring all batches are properly verified.	Medium
More

paulez added 2 commits January 15, 2021 19:15

Add test on not emitting more than 100 datapoints per log entry.

fe1a29c

Limit the count of datapoints per log batch to 100.

091592b

jaredcnance reviewed Jan 17, 2021

View reviewed changes

paulez added 4 commits January 18, 2021 19:26

Allow passing arguments to pytest from tox.

819cde5

Limit datapoints to 100 per metric instead of 100 per batch.

b476772

Add test with a single metric with more datapoints than others.

deaa268

Add test to serialize data with more than 100 and metrics and more 100

26340ce

datapoints per metric.

jaredcnance reviewed Jan 27, 2021

View reviewed changes

paulez added 2 commits January 28, 2021 15:47

Add comments and rename confusing variable name.

f78c5eb

Add extra asserts.

5d62f52

Compute the full expected results so we ensure the slicing logic doesn't cause any datapoint to be lost.

jaredcnance approved these changes Jan 28, 2021

View reviewed changes

jaredcnance merged commit 3e5ccce into awslabs:master Jan 28, 2021

jaredcnance mentioned this pull request Feb 8, 2021

Bump version to 1.0.7 #67

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Split log entries so each entry contains a maximum of 100 datapoints #64

Split log entries so each entry contains a maximum of 100 datapoints #64

Uh oh!

paulez commented Jan 15, 2021

Uh oh!

jaredcnance Jan 17, 2021

Uh oh!

paulez Jan 18, 2021

Uh oh!

jaredcnance Jan 17, 2021

Uh oh!

paulez Jan 18, 2021

Uh oh!

paulez commented Jan 18, 2021

Uh oh!

jaredcnance Jan 27, 2021

Uh oh!

paulez Jan 28, 2021

Uh oh!

jaredcnance Jan 27, 2021

Uh oh!

paulez Jan 28, 2021

Uh oh!

jaredcnance left a comment

Uh oh!

hussam789 commented Apr 2, 2025

Uh oh!

Uh oh!

Split log entries so each entry contains a maximum of 100 datapoints #64

Split log entries so each entry contains a maximum of 100 datapoints #64

Uh oh!

Conversation

paulez commented Jan 15, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

paulez commented Jan 18, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jaredcnance left a comment

Choose a reason for hiding this comment

Uh oh!

hussam789 commented Apr 2, 2025

PR Code Suggestions ✨

Uh oh!

Uh oh!