opensource/dify - dify - Gitea: Git with a cup of tea

mirror of https://github.com/langgenius/dify.git synced 2026-03-14 22:02:15 +08:00

Author	SHA1	Message	Date
GareArc	05cf2336ac	docs(telemetry): add token consumption query patterns to data dictionary Add token hierarchy diagram, common PromQL queries (totals, drill-down, rates), and app name lookup via trace query.	2026-03-02 01:19:00 -08:00
GareArc	83f5850d0a	refactor(telemetry): add resolved_parent_context property and fix edge cases - Add resolved_parent_context property to BaseTraceInfo for reusable parent context extraction - Refactor enterprise_trace.py to use property instead of duplicated dict plucking (~19 lines eliminated) - Fix UUID validation in exporter.py with specific error logging for invalid trace correlation IDs - Add error isolation in event_handlers.py to prevent telemetry failures from breaking user operations - Replace pickle-based payload_fallback with JSON storage rehydration for security - Update TelemetryEnvelope to use Pydantic v2 ConfigDict with extra='forbid' - Update tests to reflect contract changes and new error handling behavior	2026-03-01 19:33:59 -08:00
yunlu.wen	7a92c1764f	fix token label	2026-03-02 10:10:01 +08:00
GareArc	9952a17fed	fix(telemetry): use URL scheme instead of API key for gRPC TLS detection - Change insecure parameter from API key-based to URL scheme-based detection - https:// endpoints now correctly use TLS (insecure=False) - All other endpoints (http://, no scheme) use insecure=True - Update tests to reflect URL scheme-based logic - Remove incorrect documentation claiming API key controls TLS	2026-03-01 02:24:25 -08:00
GareArc	ff877ee39c	fix(telemetry): add resolved_trace_id property to eliminate trace_id inconsistencies Add computed property to BaseTraceInfo that provides intelligent fallback: 1. External trace_id (from X-Trace-Id header) 2. workflow_run_id (for workflow-related traces) 3. message_id (as final fallback) This ensures attribute dify.trace_id always matches log-level trace_id, eliminating inconsistencies where attribute was null but log-level had value. Changes: - Add resolved_trace_id property to BaseTraceInfo (trace_entity.py) - Replace 4 direct trace_id attribute assignments with resolved_trace_id - Add trace_id_source parameter to 5 emit_metric_only_event calls Fixes trace_id inconsistency found in MESSAGE_RUN, TOOL_EXECUTION, MODERATION_CHECK, SUGGESTED_QUESTION_GENERATION, GENERATE_NAME_EXECUTION, DATASET_RETRIEVAL, and PROMPT_GENERATION_EXECUTION events. All 78 telemetry tests passing.	2026-02-28 20:32:15 -08:00
GareArc	abcf14a571	refactor(telemetry): move gateway to core as stateless module-level functions Move routing table, emit(), and is_enterprise_telemetry_enabled() from enterprise/telemetry/gateway.py into core/telemetry/gateway.py so both CE and EE share one code path. The ce_eligible flag in CASE_ROUTING controls which events flow in CE — flipping it is the only change needed to enable an event in community edition. - Delete enterprise/telemetry/gateway.py (class-based singleton) - Create core/telemetry/gateway.py (stateless functions, no shared state) - Simplify core/telemetry/__init__.py to thin facade over gateway - Remove TelemetryGateway class and get_gateway() from ext_enterprise_telemetry - Single-source is_enterprise_telemetry_enabled in core.telemetry.gateway - Fix pre-existing test bugs (missing dify.event.id in metric handler tests) - Update all imports and mock paths across 7 test files	2026-02-28 19:27:24 -08:00
GareArc	5e57f73598	feat(telemetry): add model provider and name tags to all trace metrics Add comprehensive model tracking across all OTEL metrics and logs: - Node execution metrics now include model_name for LLM operations - Suggested question metrics include model_provider and model_name - Dataset retrieval captures both embedding and rerank model info - Updated DATA_DICTIONARY.md with complete metric label documentation This enables granular cost tracking, performance analysis, and usage monitoring per model across all operation types.	2026-02-28 00:06:44 -08:00
GareArc	62592be60b	docs(enterprise): split telemetry docs into README and data dictionary Separate background/configuration instructions from the data dictionary: - README.md: Overview, configuration, correlation model, content gating - DATA_DICTIONARY.md: Pure reference format with signals and attributes The data dictionary is now concise (465 lines vs 911) and focuses on attribute types and relationships without verbose explanations.	2026-02-27 12:32:48 -08:00
GareArc	262b7d4d08	docs(enterprise): add telemetry data dictionary for OTEL signals - Comprehensive reference for all enterprise telemetry signals - Documents 3 span types, 10 counters, 6 histograms, 13 log events - Includes trace correlation model with ASCII diagrams - Configuration reference for all 8 ENTERPRISE_* variables - Per-emission-site label tables for metrics - Full JSON schemas for structured log events - Content gating behavior and token double-counting warnings	2026-02-10 19:51:14 -08:00
GareArc	b5dbabf5d0	feat(telemetry): add missing ID fields for name attributes - Add dify.credential.id to node execution events - Add dify.event.id to all telemetry events (APP_CREATED, APP_UPDATED, APP_DELETED, FEEDBACK_CREATED) This ensures all .name fields have corresponding .id fields for reliable aggregation and deduplication.	2026-02-10 00:09:41 -08:00
GareArc	ffa8aedc48	feat(enterprise-telemetry): wire bearer token auth and configurable insecure flag into OTEL exporter	2026-02-09 01:44:21 -08:00
GareArc	1b3a21e6f8	feat(telemetry): unify token metric label structure with Pydantic enforcement - Add TokenMetricLabels BaseModel to enforce consistent label structure - All dify.token.* metrics now use identical 6-label structure: * tenant_id, app_id, operation_type, model_provider, model_name, node_type - Pydantic validation ensures runtime enforcement (extra='forbid', frozen=True) - Enables filtering by operation_type to avoid double-counting: * workflow: aggregated workflow-level tokens * node_execution: individual node-level tokens * message: direct message tokens * rule_generate/code_generate: prompt generation tokens Previously, inconsistent label cardinality made aggregation impossible: - WORKFLOW: 3 labels - NODE_EXECUTION: 6 labels - MESSAGE: 5 labels - PROMPT_GENERATION: 5 labels Now all use the same 6-label structure for consistent querying.	2026-02-06 03:10:20 -08:00
GareArc	11c74d741a	feat: add dedicated app event counters and convert event names to StrEnum - Add APP_CREATED, APP_UPDATED, APP_DELETED counters to EnterpriseTelemetryCounter - Create EnterpriseTelemetryEvent StrEnum for type-safe event names - Update metric_handler to use new app-specific counters with labels (tenant_id, app_id, mode) - Convert all event_name strings to EnterpriseTelemetryEvent enum values - Update exporter to create OTEL meters for new app counters (dify.app.created.total, etc.) - Update tests to verify new counter behavior and enum usage	2026-02-06 02:38:19 -08:00
GareArc	ea9081f22d	feat(telemetry): add operation_type labels for token metrics Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-02-06 01:06:07 -08:00
GareArc	91a6fe25d1	feat(telemetry): add enterprise OTEL telemetry with gateway, traces, metrics, and logs	2026-02-05 23:10:30 -08:00

15 Commits