dify/api/enterprise/telemetry
GareArc 83f5850d0a
refactor(telemetry): add resolved_parent_context property and fix edge cases
- Add resolved_parent_context property to BaseTraceInfo for reusable parent context extraction
- Refactor enterprise_trace.py to use property instead of duplicated dict plucking (~19 lines eliminated)
- Fix UUID validation in exporter.py with specific error logging for invalid trace correlation IDs
- Add error isolation in event_handlers.py to prevent telemetry failures from breaking user operations
- Replace pickle-based payload_fallback with JSON storage rehydration for security
- Update TelemetryEnvelope to use Pydantic v2 ConfigDict with extra='forbid'
- Update tests to reflect contract changes and new error handling behavior
2026-03-01 19:33:59 -08:00
..
entities feat(telemetry): unify token metric label structure with Pydantic enforcement 2026-02-06 03:10:20 -08:00
__init__.py feat(telemetry): add enterprise OTEL telemetry with gateway, traces, metrics, and logs 2026-02-05 23:10:30 -08:00
contracts.py refactor(telemetry): add resolved_parent_context property and fix edge cases 2026-03-01 19:33:59 -08:00
DATA_DICTIONARY.md feat(telemetry): add model provider and name tags to all trace metrics 2026-02-28 00:06:44 -08:00
draft_trace.py feat(telemetry): add enterprise OTEL telemetry with gateway, traces, metrics, and logs 2026-02-05 23:10:30 -08:00
enterprise_trace.py refactor(telemetry): add resolved_parent_context property and fix edge cases 2026-03-01 19:33:59 -08:00
event_handlers.py refactor(telemetry): add resolved_parent_context property and fix edge cases 2026-03-01 19:33:59 -08:00
exporter.py refactor(telemetry): add resolved_parent_context property and fix edge cases 2026-03-01 19:33:59 -08:00
id_generator.py feat(telemetry): add enterprise OTEL telemetry with gateway, traces, metrics, and logs 2026-02-05 23:10:30 -08:00
metric_handler.py refactor(telemetry): add resolved_parent_context property and fix edge cases 2026-03-01 19:33:59 -08:00
README.md docs(enterprise): split telemetry docs into README and data dictionary 2026-02-27 12:32:48 -08:00
telemetry_log.py feat: add dedicated app event counters and convert event names to StrEnum 2026-02-06 02:38:19 -08:00

Dify Enterprise Telemetry

This document provides an overview of the Dify Enterprise OpenTelemetry (OTEL) exporter and how to configure it for integration with observability stacks like Prometheus, Grafana, Jaeger, or Honeycomb.

Overview

Dify Enterprise uses a "slim span + rich companion log" architecture to provide high-fidelity observability without overwhelming trace storage.

  • Traces (Spans): Capture the structure, identity, and timing of high-level operations (Workflows and Nodes).
  • Structured Logs: Provide deep context (inputs, outputs, metadata) for every event, correlated to spans via trace_id and span_id.
  • Metrics: Provide 100% accurate counters and histograms for usage, performance, and error tracking.

Signal Architecture

graph TD
    A[Workflow Run] -->|Span| B(dify.workflow.run)
    A -->|Log| C(dify.workflow.run detail)
    B ---|trace_id| C
    
    D[Node Execution] -->|Span| E(dify.node.execution)
    D -->|Log| F(dify.node.execution detail)
    E ---|span_id| F
    
    G[Message/Tool/etc] -->|Log| H(dify.* event)
    G -->|Metric| I(dify.* counter/histogram)

Configuration

The Enterprise OTEL exporter is configured via environment variables.

Variable Description Default
ENTERPRISE_ENABLED Master switch for all enterprise features. false
ENTERPRISE_TELEMETRY_ENABLED Master switch for enterprise telemetry. false
ENTERPRISE_OTLP_ENDPOINT OTLP collector endpoint (e.g., http://otel-collector:4318). -
ENTERPRISE_OTLP_HEADERS Custom headers for OTLP requests (e.g., x-scope-orgid=tenant1). -
ENTERPRISE_OTLP_PROTOCOL OTLP transport protocol (http or grpc). http
ENTERPRISE_OTLP_API_KEY Bearer token for authentication. -
ENTERPRISE_INCLUDE_CONTENT Whether to include sensitive content (inputs/outputs) in logs. true
ENTERPRISE_SERVICE_NAME Service name reported to OTEL. dify
ENTERPRISE_OTEL_SAMPLING_RATE Sampling rate for traces (0.0 to 1.0). Metrics are always 100%. 1.0

Correlation Model

Dify uses deterministic ID generation to ensure signals are correlated across different services and asynchronous tasks.

ID Generation Rules

  • trace_id: Derived from the correlation ID (workflow_run_id or node_execution_id for drafts) using int(UUID(correlation_id))
  • span_id: Derived from the source ID using SHA256(source_id)[:8]

Scenario A: Simple Workflow

A single workflow run with multiple nodes. All spans and logs share the same trace_id (derived from workflow_run_id).

trace_id = UUID(workflow_run_id)
├── [root span] dify.workflow.run (span_id = hash(workflow_run_id))
│   ├── [child] dify.node.execution - "Start" (span_id = hash(node_exec_id_1))
│   ├── [child] dify.node.execution - "LLM" (span_id = hash(node_exec_id_2))
│   └── [child] dify.node.execution - "End" (span_id = hash(node_exec_id_3))

Scenario B: Nested Sub-Workflow

A workflow calling another workflow via a Tool or Sub-workflow node. The child workflow's spans are linked to the parent via parent_span_id. Both workflows share the same trace_id.

trace_id = UUID(outer_workflow_run_id)     ← shared across both workflows
├── [root] dify.workflow.run (outer) (span_id = hash(outer_workflow_run_id))
│   ├── dify.node.execution - "Start Node"
│   ├── dify.node.execution - "Tool Node" (triggers sub-workflow)
│   │   └── [child] dify.workflow.run (inner) (span_id = hash(inner_workflow_run_id))
│   │       ├── dify.node.execution - "Inner Start"
│   │       └── dify.node.execution - "Inner End"
│   └── dify.node.execution - "End Node"

Key attributes for nested workflows:

  • Inner workflow's dify.parent.trace_id = outer workflow_run_id
  • Inner workflow's dify.parent.node.execution_id = tool node's execution_id
  • Inner workflow's dify.parent.workflow.run_id = outer workflow_run_id
  • Inner workflow's dify.parent.app.id = outer app_id

Scenario C: Draft Node Execution

A single node run in isolation (debugger/preview mode). It creates its own trace where the node span is the root.

trace_id = UUID(node_execution_id)   ← own trace, NOT part of any workflow
└── dify.node.execution.draft (span_id = hash(node_execution_id))

Key difference: Draft executions use node_execution_id as the correlation_id, so they are NOT children of any workflow trace.

Content Gating

When ENTERPRISE_INCLUDE_CONTENT is set to false, sensitive content attributes (inputs, outputs, queries) are replaced with reference strings (e.g., ref:workflow_run_id=...) to prevent data leakage to the OTEL collector.

Reference String Format:

ref:{id_type}={uuid}

Examples:

ref:workflow_run_id=550e8400-e29b-41d4-a716-446655440000
ref:node_execution_id=660e8400-e29b-41d4-a716-446655440001
ref:message_id=770e8400-e29b-41d4-a716-446655440002

To retrieve actual content when gating is enabled, query the Dify database using the provided UUID.

Reference

For a complete list of telemetry signals, attributes, and data structures, see DATA_DICTIONARY.md.