Skip to content

Oversampling Mitigation #366

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Nov 10, 2022

Conversation

carolabadeer
Copy link
Contributor

@carolabadeer carolabadeer commented Nov 8, 2022

Issue #, if available: N/A

Description of changes: Implemented oversampling mitigation to support creating subsegments without sampling (control sampling decisions at the subsegment level) and an SQS message helper class.

Note: Since it is possible to modify the sampled flag of a subsegment without the need to create a new API, this is the design implemented in the changes below. For instance, to create a new unsampled subsegment and add it to a Lambda facade segment:

    segment = FacadeSegment('facade_segment', 'id', 'id', True)
    subsegment = Subsegment('sampled_subsegment', 'local', segment)
    subsegment.sampled = False
    segment.add_subsegment(subsegment)

Similarly, the begin_subsegment API has been modified to include a default sampling parameter which defaults to True, and can be set to False when creating a new unsampled subsegment.

    segment = xray_recorder.begin_segment('segment')
    subsegment = xray_recorder.begin_subsegment('unsampled_subsegment', sampling=False)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@carolabadeer carolabadeer requested a review from a team as a code owner November 8, 2022 00:50
@carolabadeer
Copy link
Contributor Author

This change has been tested in a Lambda environment using the following Lambda function:

from aws_xray_sdk.core import xray_recorder
from aws_xray_sdk.core.models.subsegment import Subsegment
from aws_xray_sdk.core.utils.sqs_message_helper import SqsMessageHelper
import logging 

def lambda_handler(event, context):
    
    for message in event['Records']:
        if SqsMessageHelper.isSampled(message):
            print('sampled - doing batch work')
            subsegment = xray_recorder.begin_subsegment('sampled_subsegment', sampling=True)

        else:
            print('unsampled - doing batch work')
            subsegment = xray_recorder.begin_subsegment('unsampled_subsegment', sampling=False)
    
    xray_recorder.end_subsegment()    

X-Ray service map when Sampled=1 in AWSTraceHeader:
image (7)

X-Ray service map when Sampled=0 in AWSTraceHeader:
image (8)

@@ -81,6 +81,7 @@ def add_subsegment(self, subsegment):
"""
self._check_ended()
subsegment.parent_id = self.id
subsegment.sampled = self.sampled if subsegment.sampled == True else False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we doing parent based sampling if the subsegment itself is sampled? Do we need this line at all?

Copy link
Contributor Author

@carolabadeer carolabadeer Nov 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is specifically for the case when a subsegment is added to an unsampled subsegment. We have to ensure that once an unsampled subsegment is created, all of its child subsegments are also unsampled.

So for example, it is possible to do this:

    segment = Segment('parent_segment', 'id', 'id', True)
    subsegment = Subsegment('subsegment1', 'local', segment)
    subsegment.sampled = False
    segment.add_subsegment(subsegment)
    subsegment2 = Subsegment('subsegment2', 'local', segment)
    subsegment.add_subsegment(subsegment2)

In this case, it would be illegal for subsegment2 to be sampled, so it has to take the decision of its direct parent subsegment - subsegment1- which is unsampled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make more sense to move this assignment to the "begin_subsegment" function in recorder.py and just base it off of the parent there? Then we can just base it off of the entity here "current_entity = self.get_trace_entity()". That entity would be the direct parent no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having to check two levels like that is a code smell to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two different ways to create subsegments, which is why this had to be done on two different levels - @srprash please correct me if I'm missing something

Comment on lines 287 to 288
:param bool sampling: sampling decision for the subsegment being created,
defaults to True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sampling parameter is not the same in meaning with the sampled flag on an entity.
sampling=True here means that the subsegment will inherit the sampling decision of its parent entity, and could either be sampled or not.
sampling=False would mean that the subsegment will not inherit the sampling decision and will be unsampled.

I hope I was able to explain the difference. Please reword the description. :)

@@ -335,7 +340,7 @@ def end_subsegment(self, end_time=None):

# if segment is already close, we check if we can send entire segment
# otherwise we check if we need to stream some subsegments
if self.current_segment().ready_to_send():
if self.current_segment().ready_to_send():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unnecessary white space.

def isSampled(sqs_message):
attributes = sqs_message['attributes']

if 'AWSTraceHeader' not in attributes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest making AWSTraceHeader into a constant and reuse it in both the places in this method.

@@ -35,7 +35,7 @@ def inject_trace_header(headers, entity):
else:
header = entity.get_origin_trace_header()
data = header.data if header else None

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unnecessary white spaces

@atshaw43
Copy link
Contributor

atshaw43 commented Nov 8, 2022

Can we also get a unit test for the emitter?

@srprash
Copy link
Contributor

srprash commented Nov 8, 2022

@carolabadeer I know we decided on this approach in our conversation. But now when I think of it again, I agree with @atshaw43 that we should keep the SDKs consistent. There could be confusion among customers why the a begin_subsegment_without_sampling API exists in other SDKs but not in this.

if not segment.sampled:
current_entity = self.get_trace_entity()

if not current_entity.sampled or not sampling:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would looking at just current_entity.sampled be enough here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. If the current entity (segment or subsegment) is sampled and a new unsampled subsegment is being added, it is necessary to check the sampling flag being set on the newly created subsegment.

@atshaw43 atshaw43 closed this Nov 8, 2022
@carolabadeer carolabadeer reopened this Nov 8, 2022
@carolabadeer
Copy link
Contributor Author

Updated lambda handler function with the new begin_subsegment_without_sampling API:

def lambda_handler(event, context):

    for message in event['Records']:
        if SqsMessageHelper.isSampled(message):
            print('sampled - doing batch work')
            subsegment = xray_recorder.begin_subsegment('sampled_subsegment')

        else:
            print('unsampled - doing batch work')
            subsegment = xray_recorder.begin_subsegment_without_sampling('unsampled_subsegment')
    
    xray_recorder.end_subsegment()    

subsegment = DummySubsegment(segment, name)
subsegment.sampled = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else:
subsegment = Subsegment(name, namespace, segment)

self.context.put_subsegment(subsegment)

return subsegment

def begin_subsegment_without_sampling(self, name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's lot of duplicate code here with begin_subsegment. Can you move the common code to another function and reuse?

return None

subsegment = DummySubsegment(segment, name)
subsegment.sampled = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need not set sampled = False on a DummySubsegment.

'''
Helper method to begin_subsegment and begin_subsegment_without_sampling
'''
result = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this is still problematic. I'm not sure I get the point of using result and returning it.
Also, result seem to be of variable types. Here it is a boolean but later it can be a Subsegment. This can easily confusion and bugs.

I believe there is a better way to leverage this helper method than it's callers having to check its return types and values.
_begin_subsegment_helper can accept a parameter as flag for sampling or not. The callers can simply call this method with the flag and get back a Subsegment depending on the flag's value. How does that sound?
The method definition could look like:

def _begin_subsegment_helper(name, namespace='local', beginWithoutSampling=False)

Copy link
Contributor

@srprash srprash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!!

@carolabadeer carolabadeer merged commit 6b64982 into aws:master Nov 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants