Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion(instrumentation-aws-sdk): SQS receive according to semantic conventions #707

Open
blumamir opened this issue Oct 19, 2021 · 5 comments · May be fixed by #2345
Open

Discussion(instrumentation-aws-sdk): SQS receive according to semantic conventions #707

blumamir opened this issue Oct 19, 2021 · 5 comments · May be fixed by #2345

Comments

@blumamir
Copy link
Member

This issue is to document the implementation of aws SQS receive operation according to the current semantic conventions for messaging systems (Oct 2021). there is an active SIG working on messaging systems specification which will probably change the specification and how to handle these situations when it's not possible to accurately extract perfect context.

receiveMessage

  • Messaging Attributes are added by this instrumentation according to the spec.
  • Additional "processing spans" are created for each message received by the application.
    If an application invoked receiveMessage, and received a 10 messages batch, a single messaging.operation = receive span will be created for the receiveMessage operation, and 10 messaging.operation = process spans will be created, one for each message.
    Those processing spans are created by the library. This behavior is partially implemented, See discussion below.
  • Sets the inter process context correctly, so that additional spans created through the process will be linked to parent spans correctly.
    This behavior is partially implemented, See discussion below.
  • Extract trace context from SQS MessageAttributes, and set span's parent and links correctly according to the spec.

Processing Spans

According to OpenTelemetry specification (and to reasonable expectation for trace structure), user of this instrumentation library would expect to see one span for the operation of receiving messages batch from SQS, and then, for each message, a span with it's own sub-tree for the processing of this specific message.

For example, if a receiveMessages returned 2 messages:

  • msg1 resulting in storing something to a DB.
  • msg2 resulting in calling an external HTTP endpoint.

This will result in a creating a DB span that would be the child of msg1 process span, and an HTTP span that would be the child of msg2 process span (in opposed to mixing all those operations under the single receive span, or start a new trace for each of them).

Unfortunately, this is not so easy to implement in JS:

  1. The SDK is calling a single callback for the messages batch, and it's not straightforward to understand when each individual message processing starts and ends (and set the context correctly for cascading spans).
  2. If async/await is used, context can be lost when returning data from async functions, for example:
async function asyncRecv() {
  const data = await sqs.receiveMessage(recvParams).promise();
  // context of receiveMessage is set here
  return data;
}

async function poll() {
  const result = await asyncRecv();
  // context is lost when asyncRecv returns. following spans are created with root context.
  await Promise.all(
    result.Messages.map((message) => this.processMessage(message))
  );
}

Current implementation partially solves this issue by patching the map \ forEach \ Filter functions on the Messages array of receiveMessage result. This handles issues like the one above, but will not handle situations where the processing is done in other patterns (multiple map\forEach calls, index access to the array, other array operations, etc). This is currently an open issue in the instrumentation.

@github-actions
Copy link
Contributor

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days.

@github-actions github-actions bot added the stale label Dec 20, 2021
@github-actions
Copy link
Contributor

github-actions bot commented Jan 3, 2022

This issue was closed because it has been stale for 14 days with no activity.

@github-actions github-actions bot closed this as completed Jan 3, 2022
@dyladan dyladan added never-stale and removed stale labels Jan 3, 2022
@dyladan dyladan reopened this Jan 3, 2022
@aadharsh-rengarajan
Copy link
Contributor

aadharsh-rengarajan commented Aug 5, 2022

Hi @dyladan @blumamir does this issue still occur if we repeatedly poll using setTimeout?

I have an issue with the instrumentation where there are lot of nested spans. I am using forEach and inside it, using a setTimeout. Its causing the spans to be nested with one root span having 10000+ spans for receiving messages.

Quoting the above code, I wrote a similar code.

async function asyncRecv() {
  const data = await sqs.receiveMessage(recvParams).promise();
  return data;
}

async function poll() {
  const result = await asyncRecv();
  result.Messages.forEach((message, i) => {
    this.processMessage(message);
    setTimeout(() => {
      this._poll();
    }, this.pollInterval * i);
  });
}

trentm added a commit that referenced this issue Dec 11, 2023
…sqs ReceiveMessage context handling (#1847)

In `@aws-sdk/client-sqs` v3.316 the SQS client methods became async. This breaks
the `utils.bindPromise` usage that attempts to propagate the SQS ReceiveMessage
span context to the user's handler for the method's returned promise.
Fixing that propagation is for #707 or another issue.

This change is a workaround that skips the testing of that span context
propagation feature.

Fixes: #1477
Refs: #707
Co-authored-by: Marc Pichler <[email protected]>
@trentm
Copy link
Contributor

trentm commented Feb 7, 2024

This comment #1477 (comment)
has a section "Discussion: do we want to support this?" which argues for dropping some of the special handling for "SQS ReceiveMessage" requests -- specifically dropping the attempts to automatically create "processing" spans when iterating over received messages. IIUC, the semantic conventions have since changed to not longer mention "processing" spans.

@seemk
Copy link
Contributor

seemk commented Feb 28, 2024

I'm for dropping the process spans. In this case should the instrumentation just start a new receive span for every unique producer span context?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants