Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Bug Report: incompatibilities with LLM semantics #1455

Open
1 task done
codefromthecrypt opened this issue Jul 5, 2024 · 5 comments
Open
1 task done

🐛 Bug Report: incompatibilities with LLM semantics #1455

codefromthecrypt opened this issue Jul 5, 2024 · 5 comments

Comments

@codefromthecrypt
Copy link

codefromthecrypt commented Jul 5, 2024

Which component is this bug for?

LLM Semantic Conventions

📜 Description

As a first timer, I tried the ollama instrumentation, and sent a trace to a local collector. Then I compared the output with llm semantics defined by otel. I noticed as many compatibilities as incompatibilities, and it made me concerned that other instrumentation may have other large glitches.

👟 Reproduction steps

use olllama-python with the instrumentation here. It doesn't matter if you use the traceloop-sdk or normal otel to initialize the instrumentation ( I checked both just in case)

👍 Expected behavior

otel specs should be a subset of openllmetry semantics, so no incompatible attributes.

👎 Actual Behavior with Screenshots

compatible:

  • kind=client
  • name=ollama.chat
  • attributes['gen_ai.system']='Ollama'
  • attributes['gen_ai.response.model']='codegemma:2b-code'
  • attributes['gen_ai.usage.completion_tokens']=11

Incompatible:

  • attributes['gen_ai.prompt.0.content']='prompt_text' otel semantics declare this as a non-indexed attribute 'gen_ai.prompt'
  • attributes['gen_ai.completion.0.role']='assistant' otel semantics declare this as a non-indexed attribute 'gen_ai.request.model.role'

not yet defined in the standard:

  • attributes['llm.request.type']='chat'
  • attributes['llm.is_streaming']=false
  • attributes['llm.usage.total_tokens']=11

🤖 Python Version

3.12

📃 Provide any additional context for the Bug.

partially addressed by @gyliu513 in #884

👀 Have you spent some time to check if this bug has been raised before?

  • I checked and didn't find similar issue

Are you willing to submit PR?

None

@codefromthecrypt
Copy link
Author

what seems similar to the spec is if you splat out json decoded events into attributes, the gen_ai.prompt is correlated with the attributes set here, just if that's the case, basically I would expect some explanation in the spec or in the README here about special casing. The main goal is to be able to analyze the data coherently, so these points are important to understand even if the spec is just missing some info.

It is also possible that there are some implicit understandings of how to interpret the spec that I'm lacking, so feel free to correct me.

@nirga
Copy link
Member

nirga commented Jul 5, 2024

Thanks for this @codefromthecrypt. OpenLLMetry was released on Oct 2023 and pre-dates the semantic conventions defined by otel. The semantic convention work is actually the result of this project, OpenLLMetry, and is very much still work in progress.

When we started the OSS, there were no semantic conventions for LLMs and so we decided to add attributes to things that we think would be important to users. These became the basis for the discussions we've done in the otel working group where some attributes were officially adopted, some were changed slightly (for example, we've decided to change the prefix llm to gen_ai) and some are still in discussions (for example how to log prompts). OpenLLMetry keeps up to date with all the agreements we're making in the otel working group while keeping the older conventions in place (like prompts) so that our users can still get the visibility they need.

The incompatibilities you mentioned are just things we haven't gotten the chance to formalize in the otel working group but will be adopted soon.

@nirga nirga closed this as not planned Won't fix, can't repro, duplicate, stale Jul 5, 2024
@codefromthecrypt
Copy link
Author

can you please link to issues upstream about "The incompatibilities you mentioned are just things we haven't gotten the chance to formalize in the otel working group but will be adopted soon."? because that's easier to track

@nirga
Copy link
Member

nirga commented Jul 5, 2024

Sure:
open-telemetry/semantic-conventions#834
open-telemetry/semantic-conventions#930
open-telemetry/semantic-conventions#1170

@codefromthecrypt
Copy link
Author

This is an update based on latest OpenLLMetry which now includes metrics based on the following sample code and what the semantics pending release 1.27.0 will define.

Sample Code

import os
from openai import OpenAI
from traceloop.sdk import Traceloop

# Set the service name such that it is different from other experiments
app_name = "openllmetry-python-ollama-traceloop"
# Default the SDK endpoint ENV variable to localhost
api_endpoint = os.getenv("TRACELOOP_BASE_URL", "http://localhost:4318")
# Don't batch spans, as this is a demo
Traceloop.init(app_name=app_name, api_endpoint=api_endpoint, disable_batch=True)

def main():
    ollama_host = os.getenv('OLLAMA_HOST', 'localhost')
    # Use the OpenAI endpoint, not the Ollama API.
    base_url = 'http://' + ollama_host + ':11434/v1'
    client = OpenAI(base_url=base_url, api_key='unused')
    messages = [
      {
        'role': 'user',
        'content': '<|fim_prefix|>def hello_world():<|fim_suffix|><|fim_middle|>',
      },
    ]
    chat_completion = client.chat.completions.create(model='codegemma:2b-code', messages=messages)
    print(chat_completion.choices[0].message.content)

if __name__ == "__main__":
    main()

Spans

Semantic evaluation on spans.

compatible:

  • kind=client
  • attributes['gen_ai.request.model']='codegemma:2b-code'
  • attributes['gen_ai.response.model']='codegemma:2b-code'

missing required fields:

  • attributes['gen_ai.operation.name']

deprecated fields:

  • attributes['gen_ai.usage.completion_tokens']=8 (deprecated for gen_ai.usage.output_tokens)
  • attributes['gen_ai.usage.prompt_tokens']=24 (deprecated for gen_ai.usage.input_tokens)

incompatible:

  • name=openai.chat (should be 'chat codegemma:2b-code')
  • attributes['gen_ai.system']='OpenAI' (should be lowercase 'openai')
  • attributes['gen_ai.prompt.0.content']='<|fim_prefix|>def hello_world():<|fim_suffix|><|fim_middle|>' (should be 'content' in the event attribute gen_ai.prompt)
  • attributes['gen_ai.prompt.0.role']='user' (should be 'role' in the event attribute gen_ai.prompt)
  • attributes['gen_ai.completion.0.role']='assistant' (should be 'role' in the event attribute gen_ai.completion)
  • attributes['gen_ai.completion.0.finish_reason']='stop' (should be 'gen_ai.response.finish_reasons' and an array)
  • attributes['gen_ai.completion.0.content']='\ndef' (should be 'content' in the event attribute gen_ai.completion)

not yet defined in the standard:

  • attributes['gen_ai.openai.api_base']='http://ollama:11434/v1/'
  • attributes['gen_ai.openai.system_fingerprint']='fp_ollama'
  • attributes['llm.request.type']='chat'
  • attributes['llm.headers]=''
  • attributes['llm.is_streaming']=false
  • attributes['llm.usage.total_tokens']=11

Metrics

Semantic evaluation on
metrics:

'gen_ai.client.token.usage'

input

compatible:

  • gen_ai.token.type='input'
  • gen_ai.operation.name='chat'
  • gen_ai.system='openai'
  • gen_ai.response.model='codegemma:2b-code'
  • server.address='http://ollama:11434/v1/'

missing:

  • gen_ai.request.model
  • server.port (required with 'server.address')

not yet defined in the standard:

  • stream=False

output

compatible:

  • gen_ai.token.type='output'
  • gen_ai.operation.name='chat'
  • gen_ai.system='openai'
  • gen_ai.response.model='codegemma:2b-code'
  • server.address='http://ollama:11434/v1/'

missing:

  • gen_ai.request.model
  • server.port (required with 'server.address')

not yet defined in the standard:

  • stream=False

'gen_ai.client.operation.duration'

compatible:

  • gen_ai.operation.name='chat'
  • gen_ai.system='openai'
  • gen_ai.response.model='codegemma:2b-code'
  • server.address='http://ollama:11434/v1/'

missing:

  • gen_ai.request.model
  • server.port (required with 'server.address')

not yet defined in the standard:

  • stream=False

'gen_ai.client.generation.choices' (custom)

This is not defined in the spec

Example collector log

otel-collector      | 2024-07-24T11:56:28.143Z  info    TracesExporter  {"kind": "exporter", "data_type": "traces", "name": "debug", "resource spans": 1, "spans": 1}
otel-collector      | 2024-07-24T11:56:28.143Z  info    ResourceSpans #0
otel-collector      | Resource SchemaURL: 
otel-collector      | Resource attributes:
otel-collector      |      -> service.name: Str(openllmetry-python-ollama-traceloop)
otel-collector      | ScopeSpans #0
otel-collector      | ScopeSpans SchemaURL: 
otel-collector      | InstrumentationScope opentelemetry.instrumentation.openai.v1 0.25.6
otel-collector      | Span #0
otel-collector      |     Trace ID       : 49f431d2e410223004d9ad58ceb92f5c
otel-collector      |     Parent ID      : 
otel-collector      |     ID             : e0e141e9455028a9
otel-collector      |     Name           : openai.chat
otel-collector      |     Kind           : Client
otel-collector      |     Start time     : 2024-07-24 11:56:25.481104949 +0000 UTC
otel-collector      |     End time       : 2024-07-24 11:56:28.142227082 +0000 UTC
otel-collector      |     Status code    : Unset
otel-collector      |     Status message : 
otel-collector      | Attributes:
otel-collector      |      -> llm.request.type: Str(chat)
otel-collector      |      -> gen_ai.system: Str(OpenAI)
otel-collector      |      -> gen_ai.request.model: Str(codegemma:2b-code)
otel-collector      |      -> llm.headers: Str(None)
otel-collector      |      -> llm.is_streaming: Bool(false)
otel-collector      |      -> gen_ai.openai.api_base: Str(http://ollama:11434/v1/)
otel-collector      |      -> gen_ai.prompt.0.role: Str(user)
otel-collector      |      -> gen_ai.prompt.0.content: Str(<|fim_prefix|>def hello_world():<|fim_suffix|><|fim_middle|>)
otel-collector      |      -> gen_ai.response.model: Str(codegemma:2b-code)
otel-collector      |      -> gen_ai.openai.system_fingerprint: Str(fp_ollama)
otel-collector      |      -> llm.usage.total_tokens: Int(32)
otel-collector      |      -> gen_ai.usage.completion_tokens: Int(8)
otel-collector      |      -> gen_ai.usage.prompt_tokens: Int(24)
otel-collector      |      -> gen_ai.completion.0.finish_reason: Str(stop)
otel-collector      |      -> gen_ai.completion.0.role: Str(assistant)
otel-collector      |      -> gen_ai.completion.0.content: Str(
otel-collector      | def)
otel-collector      |   {"kind": "exporter", "data_type": "traces", "name": "debug"}
otel-collector      | 2024-07-24T11:56:28.146Z  info    MetricsExporter {"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 1, "metrics": 3, "data points": 5}
otel-collector      | 2024-07-24T11:56:28.146Z  info    ResourceMetrics #0
otel-collector      | Resource SchemaURL: 
otel-collector      | Resource attributes:
otel-collector      |      -> telemetry.sdk.language: Str(python)
otel-collector      |      -> telemetry.sdk.name: Str(opentelemetry)
otel-collector      |      -> telemetry.sdk.version: Str(1.25.0)
otel-collector      |      -> service.name: Str(openllmetry-python-ollama-traceloop)
otel-collector      | ScopeMetrics #0
otel-collector      | ScopeMetrics SchemaURL: 
otel-collector      | InstrumentationScope opentelemetry.instrumentation.openai.v1 0.25.6
otel-collector      | Metric #0
otel-collector      | Descriptor:
otel-collector      |      -> Name: gen_ai.client.token.usage
otel-collector      |      -> Description: Measures number of input and output tokens used
otel-collector      |      -> Unit: token
otel-collector      |      -> DataType: Histogram
otel-collector      |      -> AggregationTemporality: Cumulative
otel-collector      | HistogramDataPoints #0
otel-collector      | Data point attributes:
otel-collector      |      -> gen_ai.system: Str(openai)
otel-collector      |      -> gen_ai.response.model: Str(codegemma:2b-code)
otel-collector      |      -> gen_ai.operation.name: Str(chat)
otel-collector      |      -> server.address: Str(http://ollama:11434/v1/)
otel-collector      |      -> stream: Bool(false)
otel-collector      |      -> gen_ai.token.type: Str(output)
otel-collector      | StartTimestamp: 2024-07-24 11:56:28.141678077 +0000 UTC
otel-collector      | Timestamp: 2024-07-24 11:56:28.144710521 +0000 UTC
otel-collector      | Count: 1
otel-collector      | Sum: 8.000000
otel-collector      | Min: 8.000000
otel-collector      | Max: 8.000000
otel-collector      | ExplicitBounds #0: 0.010000
otel-collector      | ExplicitBounds #1: 0.020000
otel-collector      | ExplicitBounds #2: 0.040000
otel-collector      | ExplicitBounds #3: 0.080000
otel-collector      | ExplicitBounds #4: 0.160000
otel-collector      | ExplicitBounds #5: 0.320000
otel-collector      | ExplicitBounds #6: 0.640000
otel-collector      | ExplicitBounds #7: 1.280000
otel-collector      | ExplicitBounds #8: 2.560000
otel-collector      | ExplicitBounds #9: 5.120000
otel-collector      | ExplicitBounds #10: 10.240000
otel-collector      | ExplicitBounds #11: 20.480000
otel-collector      | ExplicitBounds #12: 40.960000
otel-collector      | ExplicitBounds #13: 81.920000
otel-collector      | Buckets #0, Count: 0
otel-collector      | Buckets #1, Count: 0
otel-collector      | Buckets #2, Count: 0
otel-collector      | Buckets #3, Count: 0
otel-collector      | Buckets #4, Count: 0
otel-collector      | Buckets #5, Count: 0
otel-collector      | Buckets #6, Count: 0
otel-collector      | Buckets #7, Count: 0
otel-collector      | Buckets #8, Count: 0
otel-collector      | Buckets #9, Count: 0
otel-collector      | Buckets #10, Count: 1
otel-collector      | Buckets #11, Count: 0
otel-collector      | Buckets #12, Count: 0
otel-collector      | Buckets #13, Count: 0
otel-collector      | Buckets #14, Count: 0
otel-collector      | HistogramDataPoints #1
otel-collector      | Data point attributes:
otel-collector      |      -> gen_ai.system: Str(openai)
otel-collector      |      -> gen_ai.response.model: Str(codegemma:2b-code)
otel-collector      |      -> gen_ai.operation.name: Str(chat)
otel-collector      |      -> server.address: Str(http://ollama:11434/v1/)
otel-collector      |      -> stream: Bool(false)
otel-collector      |      -> gen_ai.token.type: Str(input)
otel-collector      | StartTimestamp: 2024-07-24 11:56:28.141678077 +0000 UTC
otel-collector      | Timestamp: 2024-07-24 11:56:28.144710521 +0000 UTC
otel-collector      | Count: 1
otel-collector      | Sum: 24.000000
otel-collector      | Min: 24.000000
otel-collector      | Max: 24.000000
otel-collector      | ExplicitBounds #0: 0.010000
otel-collector      | ExplicitBounds #1: 0.020000
otel-collector      | ExplicitBounds #2: 0.040000
otel-collector      | ExplicitBounds #3: 0.080000
otel-collector      | ExplicitBounds #4: 0.160000
otel-collector      | ExplicitBounds #5: 0.320000
otel-collector      | ExplicitBounds #6: 0.640000
otel-collector      | ExplicitBounds #7: 1.280000
otel-collector      | ExplicitBounds #8: 2.560000
otel-collector      | ExplicitBounds #9: 5.120000
otel-collector      | ExplicitBounds #10: 10.240000
otel-collector      | ExplicitBounds #11: 20.480000
otel-collector      | ExplicitBounds #12: 40.960000
otel-collector      | ExplicitBounds #13: 81.920000
otel-collector      | Buckets #0, Count: 0
otel-collector      | Buckets #1, Count: 0
otel-collector      | Buckets #2, Count: 0
otel-collector      | Buckets #3, Count: 0
otel-collector      | Buckets #4, Count: 0
otel-collector      | Buckets #5, Count: 0
otel-collector      | Buckets #6, Count: 0
otel-collector      | Buckets #7, Count: 0
otel-collector      | Buckets #8, Count: 0
otel-collector      | Buckets #9, Count: 0
otel-collector      | Buckets #10, Count: 0
otel-collector      | Buckets #11, Count: 0
otel-collector      | Buckets #12, Count: 1
otel-collector      | Buckets #13, Count: 0
otel-collector      | Buckets #14, Count: 0
otel-collector      | Metric #1
otel-collector      | Descriptor:
otel-collector      |      -> Name: gen_ai.client.generation.choices
otel-collector      |      -> Description: Number of choices returned by chat completions call
otel-collector      |      -> Unit: choice
otel-collector      |      -> DataType: Sum
otel-collector      |      -> IsMonotonic: true
otel-collector      |      -> AggregationTemporality: Cumulative
otel-collector      | NumberDataPoints #0
otel-collector      | Data point attributes:
otel-collector      |      -> gen_ai.system: Str(openai)
otel-collector      |      -> gen_ai.response.model: Str(codegemma:2b-code)
otel-collector      |      -> gen_ai.operation.name: Str(chat)
otel-collector      |      -> server.address: Str(http://ollama:11434/v1/)
otel-collector      |      -> stream: Bool(false)
otel-collector      | StartTimestamp: 2024-07-24 11:56:28.141839995 +0000 UTC
otel-collector      | Timestamp: 2024-07-24 11:56:28.144710521 +0000 UTC
otel-collector      | Value: 1
otel-collector      | NumberDataPoints #1
otel-collector      | Data point attributes:
otel-collector      |      -> gen_ai.system: Str(openai)
otel-collector      |      -> gen_ai.response.model: Str(codegemma:2b-code)
otel-collector      |      -> gen_ai.operation.name: Str(chat)
otel-collector      |      -> server.address: Str(http://ollama:11434/v1/)
otel-collector      |      -> stream: Bool(false)
otel-collector      |      -> llm.response.finish_reason: Str(stop)
otel-collector      | StartTimestamp: 2024-07-24 11:56:28.141839995 +0000 UTC
otel-collector      | Timestamp: 2024-07-24 11:56:28.144710521 +0000 UTC
otel-collector      | Value: 1
otel-collector      | Metric #2
otel-collector      | Descriptor:
otel-collector      |      -> Name: gen_ai.client.operation.duration
otel-collector      |      -> Description: GenAI operation duration
otel-collector      |      -> Unit: s
otel-collector      |      -> DataType: Histogram
otel-collector      |      -> AggregationTemporality: Cumulative
otel-collector      | HistogramDataPoints #0
otel-collector      | Data point attributes:
otel-collector      |      -> gen_ai.system: Str(openai)
otel-collector      |      -> gen_ai.response.model: Str(codegemma:2b-code)
otel-collector      |      -> gen_ai.operation.name: Str(chat)
otel-collector      |      -> server.address: Str(http://ollama:11434/v1/)
otel-collector      |      -> stream: Bool(false)
otel-collector      | StartTimestamp: 2024-07-24 11:56:28.141857453 +0000 UTC
otel-collector      | Timestamp: 2024-07-24 11:56:28.144710521 +0000 UTC
otel-collector      | Count: 1
otel-collector      | Sum: 2.655250
otel-collector      | Min: 2.655250
otel-collector      | Max: 2.655250
otel-collector      | ExplicitBounds #0: 1.000000
otel-collector      | ExplicitBounds #1: 4.000000
otel-collector      | ExplicitBounds #2: 16.000000
otel-collector      | ExplicitBounds #3: 64.000000
otel-collector      | ExplicitBounds #4: 256.000000
otel-collector      | ExplicitBounds #5: 1024.000000
otel-collector      | ExplicitBounds #6: 4096.000000
otel-collector      | ExplicitBounds #7: 16384.000000
otel-collector      | ExplicitBounds #8: 65536.000000
otel-collector      | ExplicitBounds #9: 262144.000000
otel-collector      | ExplicitBounds #10: 1048576.000000
otel-collector      | ExplicitBounds #11: 4194304.000000
otel-collector      | ExplicitBounds #12: 16777216.000000
otel-collector      | ExplicitBounds #13: 67108864.000000
otel-collector      | Buckets #0, Count: 0
otel-collector      | Buckets #1, Count: 1
otel-collector      | Buckets #2, Count: 0
otel-collector      | Buckets #3, Count: 0
otel-collector      | Buckets #4, Count: 0
otel-collector      | Buckets #5, Count: 0
otel-collector      | Buckets #6, Count: 0
otel-collector      | Buckets #7, Count: 0
otel-collector      | Buckets #8, Count: 0
otel-collector      | Buckets #9, Count: 0
otel-collector      | Buckets #10, Count: 0
otel-collector      | Buckets #11, Count: 0
otel-collector      | Buckets #12, Count: 0
otel-collector      | Buckets #13, Count: 0
otel-collector      | Buckets #14, Count: 0
otel-collector      |   {"kind": "exporter", "data_type": "metrics", "name": "debug"}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants