Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Add OTEL tracing information as attribute on log messages emitted using Workflow Logger #1425

Open
lucaskatayama opened this issue May 21, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@lucaskatayama
Copy link

lucaskatayama commented May 21, 2024

What are you really trying to do?

Get OpenTelemetry trace_id and span_id fields on Workflow logs.

Describe the bug

The tracing fields from OpenTelemetry (trace_id and span_id) is not appearing on Workflow logs specifically.

It works on Client and Activities.

Calling worflow.log.error:

{"label":"workflow","level":"error","message":"hello","namespace":"default","runId":"add5d842-ca36-4fee-8e51-624d7b2bb58c","sdkComponent":"workflow","taskQueue":"interceptors-opentelemetry-example","timestamp":1716328238734,"workflowId":"otel-example-0","workflowType":"example"}

Calling activity.log.error

{"activityId":"1","activityType":"greet","attempt":1,"isLocal":false,"label":"activity","level":"error","message":"alskjdbnkajsdbnkjandsjk","namespace":"default","sdkComponent":"activity","span_id":"10036d093b04a315","taskQueue":"interceptors-opentelemetry-example","taskToken":"CiRjYmI4ODI1My0wOWMyLTRiMzMtOWI3YS05MTQ5ODFhMzdjM2USDm90ZWwtZXhhbXBsZS0wGiRhZGQ1ZDg0Mi1jYTM2LTRmZWUtOGU1MS02MjRkN2IyYmI1OGMgBSgBMgExQgVncmVldEoICAEQxo5AGAE=","timestamp":1716328238737,"trace_flags":"01","trace_id":"f2731bb716c062624923e55b29f8fcec","workflowId":"otel-example-0","workflowRunId":"add5d842-ca36-4fee-8e51-624d7b2bb58c","workflowType":"example"}

Which gives "span_id":"10036d093b04a315" and "trace_id":"f2731bb716c062624923e55b29f8fcec".

Minimal Reproduction

  1. From interceptors-opentelemetry sample: https://github.com/temporalio/samples-typescript/tree/main/interceptors-opentelemetry
  2. Enable OpenTelemetry Auto Instrumentation for JS for winston (https://opentelemetry.io/docs/languages/js/automatic/)
export OTEL_NODE_RESOURCE_DETECTORS="env,host,os,serviceinstance"
export OTEL_SERVICE_NAME="client"
export NODE_OPTIONS="--require @opentelemetry/auto-instrumentations-node/register"
export OTEL_NODE_ENABLED_INSTRUMENTATIONS="winston,http"
  1. Enable custom-logger using https://github.com/temporalio/samples-typescript/tree/main/custom-logger
  2. Log on Workflow
import * as workflow from "@temporalio/workflow"

// A workflow that simply calls an activity
export async function example(name: string): Promise<string> {
  workflow.log.error("hello");
  return await greet(name);
}
  1. Log on Activity
import * as activity from "@temporalio/activity"

export async function greet(name: string): Promise<string> {
  activity.log.error("alskjdbnkajsdbnkjandsjk")
  return `Hello, ${name}!`;
}
  1. Call client to execute workflow (npm run workflow)

Environment/Versions

  • OS and processor: M2 Mac
  • Temporal Version:
❯ temporal --version
temporal version 0.12.0 (server 1.23.0) (ui 2.26.2)
  • Are you using Docker or Kubernetes or building Temporal from source? No

Additional context

  • The trace_id and span_id are generated but they doesn't appear on logs and are being reported to newrelic correctly, because they appear on distributed tracing.

From ConsoleSpanExporter

{
  resource: { attributes: { 'service.name': 'interceptors-sample-worker' } },
  traceId: 'd498bbf00762a58fc78c022804644789', <<<<<<<<<<
  parentId: '38b7d260146c73e6', <<<<<<<<
  traceState: undefined,
  name: 'RunWorkflow:example',
  id: 'aaa81665fd7d1198',
  kind: 0,
  timestamp: 1716328235467000,
  duration: 46000,
  attributes: {
...
{
  resource: { attributes: { 'service.name': 'interceptors-sample-worker' } },
  traceId: 'd498bbf00762a58fc78c022804644789',
  parentId: 'aaa81665fd7d1198',
  traceState: undefined,
  name: 'StartActivity:greet',
  id: 'aec76af986522d29',
  kind: 0,

@lucaskatayama lucaskatayama added the bug Something isn't working label May 21, 2024
@mjameswh
Copy link
Contributor

mjameswh commented May 27, 2024

This is not a bug. It just can't work that way.

In the TS SDK, Workflows execute in a sandboxed environment. Among other things, that implies that Workflow Code can't do any I/O, and therefore, can't interact with a regular Node.js logger.

Instead, log messages emitted using the Workflow context logger are funneled through a Workflow Sink to the Runtime logger. Now, Sinks are an advanced feature of the TS SDK, but essentially, that means that messages are queued internally in the Workflow context for the duration of a Workflow Activation; at the end of an Activation, queued Sink Calls are passed from the Workflow Worker Thread, through a V8 MessagePort, back to the main SDK thread, where they get processed (in this case, passing them to the Runtime logger, which in your case would be an instance of Winston's Logger class).

Note that, at the time that the call is actually passed to the Winston's Logger, execution is no longer contained by the OpenTelemetry call scope, and therefore, the OpenTelemetry Auto Instrumentation can't attach these calls to the parent context.

Note also that the @opentelemetry/auto-instrumentations-node plugin can't reach into the Workflow sandbox, or get loaded there, as that plugin would bring in dependencies on various non-deterministic-safe modules.

The solution

What you can do instead is to register a WorkflowOutboundInterceptor on your Workflow Worker that intercepts the getLogAttributes function and injects the trace_id and span_id attributes from there. I think the following should work (untested):

export class OpenTelemetryTraceInLogsWorkflowOutboundInterceptor implements WorkflowOutboundCallsInterceptor {
  public getLogAttributes(
    input: GetLogAttributesInput,
    next: Next<WorkflowOutboundCallsInterceptor, 'getLogAttributes'>
  ): Record<string, unknown> {
    const span = otel.trace.getSpan(otel.context.active());
    const spanContext = span?.spanContext();
    if (spanContext && otel.isSpanContextValid(spanContext)) {
      return next({
        input,
        trace_id: spanContext.traceId,
        span_id: spanContext.spanId,
        trace_flags: `0${spanContext.traceFlags.toString(16)}`,
      });
    } else {
      return next(input);
    }
  }
}

I certainly agree that it would make sense to be done out of the box by our @temporalio/interceptors-opentelemetry Workflow Interceptor. I'll therefore keep this ticket open as a feature request.

@mjameswh mjameswh changed the title [Bug] Workflow tracing information is not being populated on logs [Feature Request] Add OTEL tracing information as attribute on log messages emitted using Workflow Logger May 27, 2024
@mjameswh mjameswh added enhancement New feature or request and removed bug Something isn't working labels May 27, 2024
@lucaskatayama
Copy link
Author

This is not a bug. It just can't work that way.

In the TS SDK, Workflows execute in a sandboxed environment. Among other things, that implies that Workflow Code can't do any I/O, and therefore, can't interact with a regular Node.js logger.

Instead, log messages emitted using the Workflow context logger are funneled through a Workflow Sink to the Runtime logger. Now, Sinks are an advanced feature of the TS SDK, but essentially, that means that messages are queued internally in the Workflow context for the duration of a Workflow Activation; at the end of an Activation, queued Sink Calls are passed from the Workflow Worker Thread, through a V8 MessagePort, back to the main SDK thread, where they get processed (in this case, passing them to the Runtime logger, which in your case would be an instance of Winston's Logger class).

Note that, at the time that the call is actually passed to the Winston's Logger, execution is no longer contained by the OpenTelemetry call scope, and therefore, the OpenTelemetry Auto Instrumentation can't attach these calls to the parent context.

Note also that the @opentelemetry/auto-instrumentations-node plugin can't reach into the Workflow sandbox, or get loaded there, as that plugin would bring in dependencies on various non-deterministic-safe modules.

The solution

What you can do instead is to register a WorkflowOutboundInterceptor on your Workflow Worker that intercepts the getLogAttributes function and injects the trace_id and span_id attributes from there. I think the following should work (untested):

export class OpenTelemetryTraceInLogsWorkflowOutboundInterceptor implements WorkflowOutboundCallsInterceptor {
  public getLogAttributes(
    input: GetLogAttributesInput,
    next: Next<WorkflowOutboundCallsInterceptor, 'getLogAttributes'>
  ): Record<string, unknown> {
    const span = otel.trace.getSpan(otel.context.active());
    const spanContext = span?.spanContext();
    if (spanContext && otel.isSpanContextValid(spanContext)) {
      return next({
        input,
        trace_id: spanContext.traceId,
        span_id: spanContext.spanId,
        trace_flags: `0${spanContext.traceFlags.toString(16)}`,
      });
    } else {
      return next(input);
    }
  }
}

I certainly agree that it would make sense to be done out of the box by our @temporalio/interceptors-opentelemetry Workflow Interceptor. I'll therefore keep this ticket open as a feature request.

Yep... That WORKED @mjameswh !

I think that piece of code could be in @temporalio/interceptors-opentelemetry ... Or at least documented ...

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants