Skip to content

RA-8279: Add OpenTelemetry distributed tracing and ctutils logging to CTFE#3

Open
himaschal wants to merge 58 commits intomasterfrom
RA-8279_improve_application_visibility_timing_logs
Open

RA-8279: Add OpenTelemetry distributed tracing and ctutils logging to CTFE#3
himaschal wants to merge 58 commits intomasterfrom
RA-8279_improve_application_visibility_timing_logs

Conversation

@himaschal
Copy link
Collaborator

@himaschal himaschal commented Dec 8, 2025

ℹ️ Release Coordination (Downstream of ctutils)

This feature depends on digicert/ctutils (RA-8279).
Status: Ready for review (ensure ctutils v1.0.0 tag is available).
Plan:

  1. Wait for digicert/ctutils PR 1 merge & v1.0.0 tag.
  2. Update go.mod in this PR to use digicert/ctutils v1.0.0.
  3. Merge this PR.

Summary

Integrates OpenTelemetry distributed tracing and standardized logging into the Certificate Transparency Frontend (CTFE), leveraging the shared digicert/ctutils library. This enables end-to-end observability from HTTP requests down to the Trillian backend.

Key Features

  • Trace Propagation: CTFE extracts incoming trace context (if any) and propagates it to Trillian Log Server/Signer via gRPC.
  • Request Logging: Semantic logging for HTTP requests (method, status, duration).
  • Configuration: Standard OTEL_* env vars + LOG_LEVEL.

Configuration

See trillian/README.md for details.

Variable Description Default
OTEL_ENABLED Enable tracing false
OTEL_EXPORTER otlp, stdout stdout
OTEL_COLLECTOR_ENDPOINT OTLP endpoint (grpc) localhost:4317
LOG_LEVEL DEBUG, INFO, WARN, ERROR INFO

How It Works

  1. HTTP Handlers: Wrapped with otelhttp.NewHandler for automatic span creation on incoming requests.
  2. gRPC Clients: Use chained interceptors (ChainedGRPCClientInterceptor) to propagate trace context to backends.
  3. Flow: HTTP Request → CTFE → gRPC → Trillian (Log Server/Signer).

Related PRs

Testing

  • Integration: Verified in Clean Run 9.0 environment. Traces successfully appear in Jaeger for /ct/v1/get-sth and other endpoints.

See full e2e testing here

himaschal and others added 19 commits December 8, 2025 15:59
… to CTFE

This change integrates the digicert/ctutils shared logging library to enable
OpenTelemetry-compliant distributed tracing in the Certificate Transparency
Frontend (CTFE).

Key changes:
- Add trillian/ctfe/config/config.go with InitLogging() for OTEL configuration
- Update ct_server/main.go to initialize logging and wrap HTTP handlers with otelhttp
- Add chained gRPC client interceptors for trace context propagation to Trillian
- Add Dockerfile.unified with SSH access for private ctutils dependency
- Update go.mod/go.sum for ctutils v0.1.6 and OTEL dependencies

The logging configuration is driven by environment variables:
- OTEL_ENABLED: Enable/disable OpenTelemetry (default: false)
- OTEL_EXPORTER: Exporter type ('otlp' or 'stdout')
- OTEL_COLLECTOR_ENDPOINT: OTLP collector URL
- OTEL_SERVICE_NAME: Service name for traces
- OTEL_SAMPLE_RATIO: Sampling ratio (0.0-1.0)

HTTP handlers are wrapped with otelhttp.NewHandler for automatic span creation,
and gRPC clients use chained interceptors to propagate trace context to Trillian
backends. This enables end-to-end request tracing across the CT infrastructure.

Refs: RA-8279
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR integrates OpenTelemetry distributed tracing into the Certificate Transparency Frontend (CTFE) using the digicert/ctutils shared logging library. It enables end-to-end request tracing from HTTP requests through gRPC calls to Trillian backends, with configurable trace exporters and sampling.

Changes:

  • Added OpenTelemetry support with environment-based configuration for tracing
  • Integrated ctutils library for shared logging functionality
  • Updated CI/CD workflows to handle private ctutils repository authentication

Reviewed changes

Copilot reviewed 13 out of 15 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
trillian/examples/deployment/docker/ctfe/Dockerfile.unified New unified Dockerfile with support for private ctutils dependency
trillian/examples/deployment/docker/ctfe/Dockerfile Added GitHub token authentication for private module access
trillian/docs/ManualDeployment.md Added OpenTelemetry distributed tracing documentation section
trillian/README.md Added Observability section with OTEL configuration reference
trillian/ctfe/instance.go Wrapped scheduled tasks with span tracing
trillian/ctfe/handlers.go Minor comment adjustments (commented import)
trillian/ctfe/ct_server/main.go Initialized OTEL logging, wrapped HTTP handlers, added gRPC interceptors
trillian/ctfe/config/config.go New centralized logging configuration with OpenTelemetry support
go.mod Added ctutils v0.1.13-test and updated OTEL dependencies
go.sum Updated checksums for new and upgraded dependencies
.gitignore Added entry for ctfe_server binary
.github/workflows/update-ctutils.yaml New workflow for automated ctutils dependency updates
.github/workflows/govulncheck.yml Added ctutils authentication
.github/workflows/golangci-lint.yml Added ctutils authentication
.github/workflows/codeql.yml Added ctutils authentication

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

"github.com/google/certificate-transparency-go/asn1"
"github.com/google/certificate-transparency-go/tls"

//"github.com/google/certificate-transparency-go/trillian/ctfe/logging"
Copy link

Copilot AI Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commented-out import statement should be removed. Leaving commented code in the codebase creates confusion and reduces maintainability. If this import is not needed, it should be deleted entirely.

Suggested change
//"github.com/google/certificate-transparency-go/trillian/ctfe/logging"

Copilot uses AI. Check for mistakes.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 14 changed files in this pull request and generated 11 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@himaschal himaschal changed the title feat(otel): Add OpenTelemetry distributed tracing and ctutils logging to CTFE RA-8279: Add OpenTelemetry distributed tracing and ctutils logging to CTFE Jan 23, 2026
…s` module across workflows and enhance the `update-ctutils` workflow to explicitly resolve and report the latest version.
…ne is not double-wrapping but serve different purposes.
@sadhana-angara
Copy link

QA Evidence for CT Log :

Test Case 1 : Deploy Jaeger and Configure OTLP Export :
Status : PASS
Evidence :

Jaeger pod deployed successfully -
Screenshot 2026-02-05 at 10 49 43 AM

Services configured with OTLP export -
Screenshot 2026-02-05 at 10 49 43 AM (1)

Port forwarding established -
Screenshot 2026-02-09 at 12 21 40 PM

Jaeger UI accessible at http://localhost:16686
Screenshot 2026-02-09 at 12 23 16 PM

CTFE API accessible at http://localhost:6962
Screenshot 2026-02-09 at 11 50 50 AM

========================================================================================

Test 2: Verify Distributed Tracing in Jaeger UI
Status : PASS
Evidence :

  • Multiple traces visible in search results (6+ traces) : Yes
  • Traces show both ctfe and trillian-logserver services : Yes
  • Span hierarchy shows parent-child relationships : Yes
  • Timing data present for each span (typically 3-25ms) : Yes
  • Operation names visible (e.g., ctfe_internal_sth_force) : Yes
Screenshot 2026-02-09 at 12 32 48 PM Screenshot 2026-02-09 at 12 31 19 PM

========================================================================================

Test 3: W3C Trace Context Propagation
Status : PASS
Evidence :

  • Request succeeds (200 OK) : Yes
  • Custom trace_id appears in Jaeger UI : Yes
  • Trace shows in search results within 10 seconds : Yes
Screenshot 2026-02-05 at 3 03 32 PM Screenshot 2026-02-05 at 3 13 29 PM Screenshot 2026-02-10 at 10 08 23 AM

========================================================================================

Test 4: Cross-Service Trace Propagation
Status : PASS
Evidence :

  • Same trace_id found in CTFE logs : Yes
  • Same trace_id found in Trillian logserver logs : Yes
  • Logs show parent-child relationship via span_id : Yes
Screenshot 2026-02-05 at 3 53 46 PM Screenshot 2026-02-05 at 4 36 30 PM

========================================================================================

Test 5: Structured Logging with Trace Context
Status : PASS
Evidence :

  • Logs in JSON format : Yes
  • trace_id field present (32-char hex) : Yes
  • span_id field present (16-char hex) : Yes
  • parent_source field present (values: client_header, grpc_metadata, or system_generated) : Yes
  • elapsed_ms field present (timing data) : Yes
Screenshot 2026-02-09 at 12 47 57 PM

========================================================================================

Test 6: Multiple Requests with Shared Trace
Status : PASS
Evidence :

**- All 3 requests succeed : **

Screenshot 2026-02-06 at 11 24 20 AM

- Logs show 3+ entries with same trace_id :

Screenshot 2026-02-06 at 11 27 56 AM

- Each entry has different span_id :

Screenshot 2026-02-06 at 11 28 53 AM Screenshot 2026-02-06 at 11 29 10 AM

========================================================================================

Test 7: Performance and Timing Analysis
Status : PASS
Evidence : In Jaeger UI, examine 5-10 different traces

1. Trace ID : 4dd4c18687c1d0f1b54bba5f7223efd7 :
Typical get-sth operation:  5.47ms
gRPC call to Trillian: 3.39ms on server side
CTFE overhead : 5.1 - 3.39 = 1.71ms
No traces showing errors or timeouts.

Screenshot 2026-02-09 at 10 54 12 AM

2.Trace ID : 089ff8e936e3df81afbfb9fe416f3194 :
Typical get-sth operation:  3.26ms
gRPC call to Trillian: 2.2ms on server side
CTFE overhead : 3.04 - 2.2 = 0.84ms
No traces showing errors or timeouts.

Screenshot 2026-02-09 at 10 58 09 AM

3.Trace ID : b3aec802d23c219702cc36f7d7a3b1e6 :
Typical get-sth operation:  5.1ms
gRPC call to Trillian: 3.37ms on server side
CTFE overhead : 4.75 - 3.37 = 1.38ms
No traces showing errors or timeouts.

Screenshot 2026-02-09 at 11 01 43 AM

4.Trace ID : 9a4723af4335407b39318948fb92d9b6 :
Typical get-sth operation:  4.52ms
gRPC call to Trillian: 2.94ms on server side
CTFE overhead : 4.52 - 2.94 = 1.58ms
No traces showing errors or timeouts.

Screenshot 2026-02-09 at 11 05 40 AM

5.Trace ID : bea9a080ada3f6739735ab113f1ad7c2 :
Typical get-sth operation:  7.95ms
gRPC call to Trillian: 3.78ms on server side
CTFE overhead : 7.09 - 3.78 = 3.31ms
No traces showing errors or timeouts.

Screenshot 2026-02-09 at 11 18 05 AM

6.Trace ID : 4ae35982188663fee3d8eeb622866356 :
Typical get-sth operation:  8.76ms
gRPC call to Trillian: 4.69ms on server side
CTFE overhead : 7.83 - 4.69 = 3.14ms
No traces showing errors or timeouts.

Screenshot 2026-02-09 at 11 20 55 AM

7.Trace ID : 8018b3ff51e0b343206a6b984802c18b :
Typical get-sth operation:  6.14ms
gRPC call to Trillian: 4.21ms on server side
CTFE overhead : 6.14 - 4.21 = 1.93ms
No traces showing errors or timeouts.

Screenshot 2026-02-09 at 11 23 07 AM

8.Trace ID : a3699b3aef49a80fc73aae7bb5bd6e5f :
Typical get-sth operation:  5.2ms
gRPC call to Trillian: 3.61ms on server side
CTFE overhead : 4.97 - 3.61 = 1.36ms
No traces showing errors or timeouts.

Screenshot 2026-02-09 at 11 29 11 AM

9.Trace ID : 4a4f82d3b182303ed0b773c0460b9169 :
Typical get-sth operation:  7.65ms
gRPC call to Trillian: 4.19ms on server side
CTFE overhead : 7.24 - 4.19 = 3.05ms
No traces showing errors or timeouts.

Screenshot 2026-02-09 at 11 31 50 AM

10.Trace ID : 26a99abfc5ad08be508a00b301abb032 :
Typical get-sth operation:  5.08ms
gRPC call to Trillian: 2.26ms on server side
CTFE overhead : 4.59 - 2.26 = 2.33ms
No traces showing errors or timeouts.

Screenshot 2026-02-09 at 11 34 23 AM

========================================================================================

Test 8: Service Dependencies Visualization
Status : PASS
Evidence :

  • Clear call chain: ctfetrillian-logserver : Yes
  • gRPC communication visible in spans : Pending
  • No unexpected service dependencies : Pending
Screenshot 2026-02-06 at 11 48 09 AM Screenshot 2026-02-06 at 11 48 35 AM

@sadhana-angara sadhana-angara self-requested a review February 11, 2026 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants