Skip to content

Add ExecutionInfo to RequestTracker methods #1640

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: 4.x
Choose a base branch
from
Open

Conversation

ajweave
Copy link

@ajweave ajweave commented Jun 5, 2023

RequestTracker implementations may want to use the ExecutionInfo object e.g logging request/response size, etc.

@absurdfarce
Copy link
Contributor

Thanks for the PR @ajweave!

Have you signed the Contributor License Agreement for contributions to DataStax open source projects? If not you can find it at https://cla.datastax.com/. Thanks!

@absurdfarce
Copy link
Contributor

@ajweave Just so I have this straight in my own head, here's how I understand the relationship between this PR and the ask on JAVA-3046. This PR clearly extends the callbacks for request success and failure to include the extra ExecutionInfo object while JAVA-3046 asks for new callbacks all together for the start of a request. Do I have the contours basically right there?

@ajweave
Copy link
Author

ajweave commented Jun 6, 2023

@absurdfarce the current behavior of RequestTracker providing success/failure after the fact is sufficient for my needs. The only additional thing I require is the ExecutionInfo. Making RequestTracker support finer-grained lifecycle events is ok with me as long as I can get my hands on ExecutionInfo in onSuccess.

@ajweave
Copy link
Author

ajweave commented Jun 7, 2023

@absurdfarce I've signed the CLA.

Copy link
Contributor

@absurdfarce absurdfarce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good! Want to make sure we don't miss onNodeError() here but otherwise this appears to be ready to go.

@NonNull String requestLogPrefix,
@NonNull ExecutionInfo executionInfo) {
// delegate call to the old method
onNodeSuccess(request, latencyNanos, executionProfile, node, requestLogPrefix);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like onSuccess(), onError() and onNodeSuccess() are updated here. We should probably update onNodeError() as well in order to keep the API internally consistent. I don't know that it's actually used in existing driver code beyond MultiplexingRequestTracker but it's still a good idea to keep the API internally consistent.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, from what I could tell, in most error cases no ExecutionInfo is present (but I'm still new to the driver code). I've started going through it again and I think CqlRequestHandler can be refactored to pass it in for some cases. I'll add the new arg without the NonNull annotation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CqlRequestHandler is a bit tangled - there's a NodeResponseCallback inner class with a trackNodeError method and on the containing class, there's a setFinalError method where the ExecutionInfo is created.
Ideally, setFinalError could call RequestTracker.setNodeError with the ExecutionInfo, but that's already being done in NodeResponseCallback.trackNodeError so I'm thinking it's appropriate to move the trackNodeError method to the containing class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that CqlRequestHandler does get a little tangled; async code certainly will do that if you don't keep an eye on it :).

My comment wasn't connected to the current implementation of CqlRequestHandler, though. Individual request handlers are created by the various request processors. These processors vary widely in implementation, and that's just for the ones we've written; they're defined by an interface so users are always free to write their own! With that in mind I'd say it's important to think about RequestTracker as a general interface which may (or may not) be used by any given handler for any given processor.

With that in mind, I'd argue it's important for RequestTracker to present a coherent interface that's consistent across all the on*() handler methods unless there's a very good reason to diverge. There doesn't appear to be any such reason in this case, thus my recommendation to update onNodeError() to keep things consistent.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally agree on keeping the interface consistent.

I've been going through the handlers and for the most part, calls to onNodeError with a null ExecutionInfo are feasible, so that's a good first step. Should a larger refactoring effort to route the ExecutionInfo to those methods be postponed? I suppose a good argument against the refactoring work is that the node-level events in the interface can be considered "in-progress" requests and the final ExecutionInfo instance will be delivered in onSuccess.

The only place I see where we can't honor the NonNull contract for onSuccess without significant changes is ContinuousRequestHandlerBase.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured out a solution in ContinuousRequestHandlerBase.
https://github.com/datastax/java-driver/pull/1640/files#r1247216557

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this in CqlRequestHandler.NodeResponseCallback.trackNodeError?

    private void trackNodeError(Node node, Throwable error, long nodeResponseTimeNanos) {
      if (requestTracker instanceof NoopRequestTracker) {
        return;
      }
      if (nodeResponseTimeNanos == NANOTIME_NOT_MEASURED_YET) {
        nodeResponseTimeNanos = System.nanoTime();
      }
      long latencyNanos = nodeResponseTimeNanos - this.nodeStartTimeNanos;
      ExecutionInfo executionInfo =
              new DefaultExecutionInfo(
                      statement,
                      node,
                      startedSpeculativeExecutionsCount.get(),
                      execution,
                      errors,
                      null,
                      null,
                      true,
                      session,
                      context,
                      executionProfile);
      requestTracker.onNodeError(
          statement, error, latencyNanos, executionProfile, node, logPrefix, executionInfo);
    }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also explored how to pass in ExecutionInfo from ContinuousRequestHandlerBase, see here. How do you think of this approach?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to ContinuousRequestHandlerBase merged. Thank you.

Copy link
Contributor

@hhughes hhughes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few places in the code and tests which are still calling the now deprecated methods which could result in callbacks being missed for those who override the new methods (happy to discuss if there was a good reason for that, and if so consider adding a comment above those invocations and interface methods) and a few quibbles about annotation inconsistency.

@@ -94,9 +98,12 @@ public void onError(
long latencyNanos,
@NonNull DriverExecutionProfile executionProfile,
@Nullable Node node,
@NonNull String logPrefix) {
@NonNull String logPrefix,
@NonNull ExecutionInfo executionInfo) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here executionInfo is marked @NonNull but RequestTracker#onError does not have that annotation (arguably it should be @Nullable since it looks like CqlRequestHandler may pass it as null)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, should be @nullable, fixed.

@@ -51,7 +53,8 @@ public void onError(
long latencyNanos,
@NonNull DriverExecutionProfile executionProfile,
Node node,
@NonNull String requestPrefix) {
@NonNull String requestPrefix,
@NonNull ExecutionInfo executionInfo) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to with MultiplexingRequestTracker, this is not marked @NonNull in the interface

session,
context,
executionProfile));
executionInfo =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason we only build and pass along the executionInfo for DriverException errors?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reasons unknown to me, the existing code was only creating a new ExecutionInfo for DriverErrors. I suppose this is because DriverException has a setter for it. Now I suppose we have another use for the ExecutionInfo so I can change it to create one for all Throwables.

session,
context,
executionProfile);
this.executionInfo =
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an instance variable here for the ExecutionInfo. It gets populated and later passed to the RequestTracker in completeResultSetFuture.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this will always work correctly - the executionInfo field on the NodeResponseCallback is populated when the response is processed in processResultResponse, but before being used when firing the request tracker methods in completeResultSetFuture, responses may be queued. If another request comes in before the earlier response is removed from the queue the executionInfo field will be overwritten. When the earlier response is later removed from the queue, the execution info provided to the request tracker could be the one from the newer request.

Instead of a field attached to the NodeResponseCallback, perhaps you could use the execution info stored on the resultSet object in processResultResponse? - it's a shame there isn't a common interface (perhaps a useful improvement to file for another time) but it looks like both ContinuousAsyncResultSet and AsyncGraphResultSet have getters for ExecutionInfo objects.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On further thought, since you only have an Object at this stage, I don't know how you would get an execution info for the request tracker call if you end up with a something that isn't a ContinuousAsyncResultSet or AsyncGraphResultSet so perhaps the solution I suggested above isn't going to work.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize for calling out details you no doubt already know off the top of your head, it's just for my own benefit being new to the code.

This is a bit of a pickle. ResultSetT is just an Object and it looks like just sharpening up the definition of that type parameter is impossible because there's not a unified type structure for the different kinds of results in the internal API.

I could add a type check and cast for both AsyncPagingIterable and AsyncGraphResultSet which both provide a getExecutionInfo method.

If that works, then we might want to go a bit further and refactor a bit to add a marking interface or something to prevent other types from being passed in, but that seems like an extensive change.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the ExecutionInfo instance variable and added type checks and casts for both AsyncPagingIterable and AsyncGraphResultSet to get the ExecutionInfo from there.

I attempted to run the ContinuousPagingIT test with DSE to validate, but I'm having trouble with CCM. The CCM node fails to start with this error:

Cannot start node if snitch's data center (dc1) differs from previous data center (Cassandra)

I'm setting these Java options:

-Dccm.dse=true -Dccm.version=6.8.38

Any tips for running the integration test locally are appreciated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit of work to set up. I ran all the tests against this branch, and it passed all the tests.

Copy link
Contributor

@hhughes hhughes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the earlier feedback! I have some concerns about the way you're saving the executionInfo in the continous request handler, and one more potential clash between @NonNull and @Nullable but otherwise looks good to me!

[edit]

[ERROR] Failed to execute goal com.coveo:fmt-maven-plugin:2.9:check (default) on project java-driver-integration-tests: Found 1 non-complying files, failing build -> [Help 1]

Oh and can you run the coveo format plugin before your next push? Travis CI is complaining about some formatting issues with this changeset

session,
context,
executionProfile);
this.executionInfo =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this will always work correctly - the executionInfo field on the NodeResponseCallback is populated when the response is processed in processResultResponse, but before being used when firing the request tracker methods in completeResultSetFuture, responses may be queued. If another request comes in before the earlier response is removed from the queue the executionInfo field will be overwritten. When the earlier response is later removed from the queue, the execution info provided to the request tracker could be the one from the newer request.

Instead of a field attached to the NodeResponseCallback, perhaps you could use the execution info stored on the resultSet object in processResultResponse? - it's a shame there isn't a common interface (perhaps a useful improvement to file for another time) but it looks like both ContinuousAsyncResultSet and AsyncGraphResultSet have getters for ExecutionInfo objects.

@ajweave
Copy link
Author

ajweave commented Nov 8, 2023

Now that the donation to Apache is complete, I'd like to move forward with this change. @hhughes , @absurdfarce , please take a look.

@ajweave
Copy link
Author

ajweave commented Dec 4, 2023

@hhughes , @absurdfarce , gentle reminder to review these changes.

@vanditsramblings
Copy link

+1 , this would be a great addition , allowing more granular access to execution metrics.

@absurdfarce absurdfarce removed the request for review from hhughes April 30, 2024 20:56
@absurdfarce
Copy link
Contributor

Ping @SiyaoIsHiding for (hopefully) another review

@SiyaoIsHiding
Copy link
Contributor

May you please merge apache/4.x to your branch?

@ajweave
Copy link
Author

ajweave commented May 8, 2024

Sorry for the delay. I will get this done tomorrow. I appreciate the attention here.

@NonNull String requestLogPrefix) {
// If client doesn't override onSuccess with requestLogPrefix delegate call to the old method
onSuccess(request, latencyNanos, executionProfile, node);
@NonNull String requestLogPrefix,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small request: would it be possible to keep the log prefix as the last parameter? This has been kind of a tacit convention so far.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored to make requestLogPrefix the last arg in all the methods.

@lukasz-antoniak
Copy link
Member

lukasz-antoniak commented May 28, 2024

Good work @ajweave, sorry to keep you waiting. Two things to consider from my end. Feel free to disagree since both are more of coding style category.

In request handlers (e.g. CqlRequestHandler), we very frequently do two things with same arguments:

trackNodeError(node, error, NANOTIME_NOT_MEASURED_YET);
setFinalError(statement, error, node, execution);

Both methods create their own instance of DefaultExecutionInfo internally. Shall we create the ExecutionInfo outside and pass it as parameter? Some of the other parameters like statement or node, could be taken from ExecutionInfo and not passed individually. When we use NoopRequestTracker, we do not need to create DefaultExecutionInfo inside trackNodeError, but still it would be needed in setFinalError, so we shall not degrade performance in this case.

All of this can be done, because we decide to construct ExecutionInfo for all exceptional executions, not only once resulting in DriverException. Anyway, most of the time we would see DriverException being raised.

DefaultExecutionInfo has few optional parameters, but from what I see, we use the constructor in different contexts. What do you think about making this constructor private and expose create(...) methods to indicate what type of DefaultExecutionInfo we are going to pass? Also, would move pagingState calculation inside (currently code copied across different request handlers).

  /** Use when driver failed to send the request to the server, or we did not receive any response. */
  public static DefaultExecutionInfo create(Request request, Node coordinator, int speculativeExecutionCount,
                                            int successfulExecutionIndex, List<Map.Entry<Node, Throwable>> errors,
                                            DefaultSession session, InternalDriverContext context,
                                            DriverExecutionProfile executionProfile) {
    return create(request, coordinator, speculativeExecutionCount, successfulExecutionIndex, errors,
            null, null, true, session, context, executionProfile);
  }

  /** Use when driver received the response, but operation result remains unknown. */
  public static DefaultExecutionInfo create(Request request, Node coordinator, int speculativeExecutionCount,
                                            int successfulExecutionIndex, List<Map.Entry<Node, Throwable>> errors,
                                            Frame frame, DefaultSession session, InternalDriverContext context,
                                            DriverExecutionProfile executionProfile) {
    return create(request, coordinator, speculativeExecutionCount, successfulExecutionIndex, errors,
            null, frame, true, session, context, executionProfile);
  }

  public static DefaultExecutionInfo create(Request request, Node coordinator, int speculativeExecutionCount,
                                            int successfulExecutionIndex, List<Map.Entry<Node, Throwable>> errors,
                                            Result resultMessage, Frame frame, boolean schemaInAgreement,
                                            DefaultSession session, InternalDriverContext context,
                                            DriverExecutionProfile executionProfile) {
    final ByteBuffer pagingState =
            (resultMessage instanceof Rows) ? ((Rows) resultMessage).getMetadata().pagingState : null;
    return new DefaultExecutionInfo(request, coordinator, speculativeExecutionCount, successfulExecutionIndex, errors,
            pagingState, frame, schemaInAgreement, session, context, executionProfile);
  }

I was even thinking of changing the three create(...) methods into different names like, of, ofClientError, ofServerError. Thoughts?

Edit:
Yet another option would be to introduce builder pattern for DefaultExecutionInfo class and might be a nicest solution.

public static Builder builder(
    Request request,
    Node coordinator,
    int speculativeExecutionCount,
    int successfulExecutionIndex,
    List<Map.Entry<Node, Throwable>> errors,
    DefaultSession session,
    InternalDriverContext context,
    DriverExecutionProfile executionProfile) {
  return new Builder(
      request,
      coordinator,
      speculativeExecutionCount,
      successfulExecutionIndex,
      errors,
      session,
      context,
      executionProfile);
}

public static class Builder {
  private final Request request;
  private final Node coordinator;
  private final int speculativeExecutionCount;
  private final int successfulExecutionIndex;
  private final List<Map.Entry<Node, Throwable>> errors;
  private final DefaultSession session;
  private final InternalDriverContext context;
  private final DriverExecutionProfile executionProfile;

  private Result response;
  private Frame frame;
  private boolean schemaInAgreement = true;

  public Builder(
      Request request,
      Node coordinator,
      int speculativeExecutionCount,
      int successfulExecutionIndex,
      List<Map.Entry<Node, Throwable>> errors,
      DefaultSession session,
      InternalDriverContext context,
      DriverExecutionProfile executionProfile) {
    this.request = request;
    this.coordinator = coordinator;
    this.speculativeExecutionCount = speculativeExecutionCount;
    this.successfulExecutionIndex = successfulExecutionIndex;
    this.errors = errors;
    this.session = session;
    this.context = context;
    this.executionProfile = executionProfile;
  }

  public Builder withServerResponse(Result response, Frame frame) {
    this.response = response;
    this.frame = frame;
    return this;
  }

  public Builder withServerResponse(Frame frame) {
    return withServerResponse(null, frame);
  }

  public Builder withSchemaInAgreement(boolean schemaInAgreement) {
    this.schemaInAgreement = schemaInAgreement;
    return this;
  }

  public DefaultExecutionInfo build() {
    final ByteBuffer pagingState =
        (response instanceof Rows) ? ((Rows) response).getMetadata().pagingState : null;
    return new DefaultExecutionInfo(
        request,
        coordinator,
        speculativeExecutionCount,
        successfulExecutionIndex,
        errors,
        pagingState,
        frame,
        schemaInAgreement,
        session,
        context,
        executionProfile);
  }
}

@absurdfarce
Copy link
Contributor

A quick status update here: there's some overlap between the changes in this PR and some changes being proposed in @chibenwa's PR to minimize the number of resolutions of ExecutionProfile in CqlRequestHandler. Since that PR is quite a bit smaller and simpler I'd like to get it in first and then rebase this PR against those changes; doing it the other way around seems like a lot more work :). I've mentioned this relationship to the other committers on ASF Slack so hopefully we can get the other PR in quickly... we'll see how that goes.

@NonNull String requestLogPrefix) {
// If client doesn't override onSuccess with requestLogPrefix delegate call to the old method
onSuccess(request, latencyNanos, executionProfile, node);
// If client doesn't override onSuccess with executionInfo delegate call to the old method
Copy link
Contributor

@SiyaoIsHiding SiyaoIsHiding Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lukasz-antoniak and I thought we don't need the parameter request, executionProfile, and node here, because they are all contained in executionInfo object. So we can change it to:

  default void onSuccess(
      long latencyNanos,
      @NonNull ExecutionInfo executionInfo,
      @NonNull String requestLogPrefix) {
    onSuccess(executionInfo.getRequest(), latencyNanos, executionInfo.getExecutionProfile(), executionInfo.getCoordinator(), requestLogPrefix);
  }

If we leave it like it is, the user may be confused if the ones in the parameters and the ones in exeuctionInfo Object are not consistent.
@absurdfarce @adutra How do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. As long as we retain and deprecate the old method signatures for backwards-compatability, I'm good.

Copy link
Contributor

@tolbertam tolbertam Jun 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, I like the concise method signature as well!

Seeing onSuccess(long latencyNanos, @NonNull ExecutionInfo executionInfo, @NonNull String requestLogPrefix) makes me wonder why we couldn't just put latency on ExecutionInfo? That seems like it could be useful for those consuming ExecutionInfo on their result set.

I think it's fine as proposed though 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may consider putting requestLogPrefix into ExecutionInfo, too.

Copy link
Contributor

@tolbertam tolbertam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just adding my support for the change, being as there are a few outstanding things (including getting #1623 merged in and this method signature discussion), I'll wait before formally +1ing but eager to give my +1 when it is all set 👍 .

@NonNull String requestLogPrefix) {
// If client doesn't override onSuccess with requestLogPrefix delegate call to the old method
onSuccess(request, latencyNanos, executionProfile, node);
// If client doesn't override onSuccess with executionInfo delegate call to the old method
Copy link
Contributor

@tolbertam tolbertam Jun 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, I like the concise method signature as well!

Seeing onSuccess(long latencyNanos, @NonNull ExecutionInfo executionInfo, @NonNull String requestLogPrefix) makes me wonder why we couldn't just put latency on ExecutionInfo? That seems like it could be useful for those consuming ExecutionInfo on their result set.

I think it's fine as proposed though 👍

@SiyaoIsHiding
Copy link
Contributor

We realize that RequestTracker.getErrors was used for "errors from previous coordinators". We want to discuss whether we should still stick to that. I started the conversation in the ASF slack in the "#cassandra-drivers" channel. @ajweave you are welcome to join the conversation.

@lukasz-antoniak
Copy link
Member

@ajweave, @tolbertam, I have tried to solve my comments and submitted PR to Andrew's fork: ajweave#1. Could you please review the changes and comment? I am not enforcing any of those, just trying cleanup already existing code a little.

@ajweave
Copy link
Author

ajweave commented Jun 26, 2024

@ajweave, @tolbertam, I have tried to solve my comments and submitted PR to Andrew's fork: ajweave#1. Could you please review the changes and comment? I am not enforcing any of those, just trying cleanup already existing code a little.

Thank you for pitching in. I will take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants