-
Notifications
You must be signed in to change notification settings - Fork 173
Description
Summary
OpenTelemetry officially merged MCP semantic conventions on January 12, 2026 (PR #2083). ToolHive should align its telemetry implementation with these standards for better observability tool compatibility and ecosystem alignment.
Standard References
- Main Documentation: docs/gen-ai/mcp.md
- Attribute Registry: model/mcp/registry.yaml
- Metrics Definitions: model/mcp/metrics.yaml
- W3C Trace Context: https://www.w3.org/TR/trace-context/
Current State
ToolHive has solid telemetry foundation but predates the official conventions:
- Middleware:
pkg/telemetry/middleware.go- spans, attributes, metrics for MCP proxy - vMCP:
pkg/vmcp/server/telemetry.go- backend and workflow telemetry - Parser:
pkg/mcp/parser.go- already extracts_metafield (lines 228-233)
Core Implementation Tasks
1. Update Attributes and Span Naming
File: pkg/telemetry/middleware.go
Attribute Renames (for standard compliance):
mcp.method→mcp.method.name(line 222)mcp.request.id→jsonrpc.request.id(line 229)mcp.tool.name→gen_ai.tool.name(line 263)mcp.tool.arguments→gen_ai.tool.call.arguments(line 267, opt-in)mcp.prompt.name→gen_ai.prompt.name(line 279)mcp.transport→network.transportwith value mapping:stdio→pipesse,streamable-http→tcp
Add Missing Required Attributes:
mcp.protocol.version- MCP spec version (e.g., "2025-11-25")mcp.session.id- Session identifierjsonrpc.protocol.version- When not "2.0"error.type- On failures (JSON-RPC error code or "tool_error")rpc.response.status_code- When response contains errorgen_ai.operation.name- "execute_tool" for tool callsnetwork.protocol.name- "http" for SSE/streamable-http
Span Naming (lines 161-170):
- Current:
mcp.tools/call - Standard:
tools/call get_weather(include target when available) - Format:
{mcp.method.name} {target}where target is tool/prompt name
2. Add Standard Metrics
File: pkg/telemetry/middleware.go
New Standard Metrics (alongside existing):
mcp.client.operation.duration(histogram, seconds)mcp.server.operation.duration(histogram, seconds)- Use recommended buckets:
[0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1, 2, 5, 10, 30, 60, 120, 300]
Keep existing toolhive_mcp_* metrics for backward compatibility.
3. Implement W3C Trace Context Propagation
Critical Feature: Enable distributed tracing across MCP boundaries.
3a. Context Injection (vMCP → Backends)
Files:
- New:
pkg/telemetry/propagation.go- W3C Trace Context helpers pkg/vmcp/client/client.go- Inject before backend calls
Implementation: Inject traceparent and tracestate into params._meta:
{
"jsonrpc": "2.0",
"method": "tools/call",
"params": {
"name": "get-weather",
"_meta": {
"traceparent": "00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01",
"tracestate": "rojo=00f067aa0ba902b7"
}
}
}Code Structure:
// propagation.go
func InjectTraceContext(ctx context.Context, params map[string]interface{}) {
meta := getOrCreateMeta(params)
carrier := &MetaCarrier{meta: meta}
otel.GetTextMapPropagator().Inject(ctx, carrier)
}
type MetaCarrier struct {
meta map[string]interface{}
}
// Implement TextMapCarrier interface3b. Context Extraction (Clients → ToolHive)
File: pkg/telemetry/middleware.go (around line 114)
Implementation: Extract trace context from incoming params._meta and use as parent for server span:
if parsedMCP := mcpparser.GetParsedMCPRequest(ctx); parsedMCP != nil && parsedMCP.Meta != nil {
carrier := &MetaCarrier{meta: parsedMCP.Meta}
ctx = otel.GetTextMapPropagator().Extract(ctx, carrier)
}4. Add Client-Side Spans for vMCP
File: pkg/vmcp/client/client.go
Current: Only SERVER spans when serving requests
Needed: CLIENT spans when vMCP calls backend MCP servers
Operations to Instrument:
initialize- Protocol handshaketools/list,tools/callresources/list,resources/readprompts/list,prompts/get
Span Kind: Use trace.SpanKindClient for these operations.
5. Add Session Duration Metrics
Files:
pkg/vmcp/server/session_adapter.go- Track session lifecycle- Proxy components - Track session termination
Metrics:
mcp.client.session.duration(histogram, seconds)mcp.server.session.duration(histogram, seconds)
Attributes:
mcp.protocol.versionnetwork.protocol.namenetwork.transporterror.type(if session terminated with error)
Backward Compatibility
Approach: Emit both legacy and standard names during transition period.
Configuration: Add optional flag:
telemetry:
useLegacyAttributes: false # default: standard onlyCLI Flag: --otel-use-legacy-attributes (enables dual emission)
Timeline:
- Ship standard-compliant attributes/metrics immediately
- Announce deprecation after 6 months
- Remove legacy support in v2.0
Components Affected
pkg/telemetry/middleware.go- MCP proxy telemetry (spans, metrics, attributes)pkg/telemetry/propagation.go- New file for trace context helperspkg/vmcp/client/client.go- CLIENT spans and context injectionpkg/vmcp/server/session_adapter.go- Session duration trackingpkg/telemetry/config.go- Backward compatibility configurationcmd/thv-operator/api/v1alpha1/*_types.go- CRD telemetry specsdocs/observability.md- Update documentation- Test files: Update assertions for new attribute names
Testing Requirements
- Update test expectations in
pkg/telemetry/middleware_test.go - Update E2E tests in
test/e2e/telemetry_middleware_e2e_test.go - Add trace propagation E2E test (vMCP → backend → vMCP chain)
- Validate span hierarchy (CLIENT/SERVER relationship)
- Test session duration tracking
- Verify histogram buckets
Success Criteria
- All required attributes emitted per standard
- Span names follow
{method} {target}format - Standard metrics recorded with correct units/attributes
- W3C Trace Context propagates through
params._meta - CLIENT spans created for vMCP backend calls
- Session duration metrics tracked
- Network transport values mapped correctly (stdio→pipe, http→tcp)
- Documentation updated
- Backward compatibility maintained (with flag)
- No performance regression
References
- OTel MCP PR: MCP semantic conventions open-telemetry/semantic-conventions#2083
- Implementation Examples:
- Related Issue: Mentioned in team discussion about enhancing ToolHive telemetry