opensource/dify - dify - Gitea: Git with a cup of tea

mirror of https://github.com/langgenius/dify.git synced 2026-03-14 13:51:33 +08:00

Author	SHA1	Message	Date
GareArc	5d54c198c0	Merge branch '1.12.1-otel-ee' into deploy/enterprise	2026-03-02 20:01:15 -08:00
GareArc	6536489195	fix(telemetry): restore TRACE_TASK_TO_CASE lookup broken by CE safety refactor The CE safety commit (`8a3485454a`) converted module-level dicts to lazy functions but forgot to update __init__.py, which still imported the now-deleted TRACE_TASK_TO_CASE constant causing an ImportError at startup. Add get_trace_task_to_case() to gateway.py as a lazy public wrapper (inverse of _get_case_to_trace_task) and update __init__.py to call it.	2026-03-02 19:59:20 -08:00
GareArc	8f1d2455f4	Merge branch '1.12.1-otel-ee' into deploy/enterprise	2026-03-02 18:50:39 -08:00
GareArc	8a3485454a	fix(telemetry): ensure CE safety for enterprise-only imports and DB lookups - Move enqueue_draft_node_execution_trace import inside call site in workflow_service.py - Make gateway.py enterprise type imports lazy (routing dicts built on first call) - Restore typed ModelConfig in llm_generator method signatures (revert dict regression) - Fix generate_structured_output using wrong key model_parameters -> completion_params - Replace unsafe cast(str, msg.content) with get_text_content() across llm_generator - Remove duplicated payload classes from generator.py, import from core.llm_generator.entities - Gate _lookup_app_and_workspace_names and credential lookups in ops_trace_manager behind is_enterprise_telemetry_enabled()	2026-03-02 18:45:33 -08:00
GareArc	cf15f0d681	Merge branch '1.12.1-otel-ee' into deploy/enterprise	2026-03-02 15:56:52 -08:00
GareArc	d6de27a25a	feat(telemetry): promote gen_ai scalar fields from log-only to span attributes Move gen_ai.usage.*, gen_ai.request.model, gen_ai.provider.name, and gen_ai.user.id from companion-log-only to span attributes on workflow and node execution spans. These are small scalars with no size risk. Having them on spans enables filtering and grouping in trace UIs (Tempo, Jaeger, Datadog) without requiring a cross-signal join to companion logs. Data dictionary updated: span tables gain the new fields; companion log 'additional attributes' tables trimmed to only list fields not already covered by 'All span attributes'.	2026-03-02 15:55:10 -08:00
GareArc	11ab67c8cb	Merge branch '1.12.1-otel-ee' into deploy/enterprise	2026-03-02 04:20:06 -08:00
GareArc	fe741140d5	fix(telemetry): fix zero-value message and workflow duration histograms Workflow RT: replace float(info.workflow_run_elapsed_time) with (end_time - start_time).total_seconds() using workflow_run.created_at and workflow_run.finished_at. The elapsed_time DB field defaults to 0 and can be stale if the workflow_storage Celery task has not committed yet when the trace fires. Wall-clock timestamps are more reliable; elapsed_time is kept as fallback. Message RT: change end_time from created_at + provider_response_latency to message.updated_at when updated_at > created_at. The pipeline explicitly sets message.updated_at = naive_utc_now() at the moment the LLM response is complete, making it the canonical response-complete timestamp. Falls back to the latency-based calculation for error/aborted messages.	2026-03-02 04:14:57 -08:00
GareArc	9b5b355a4e	fix(telemetry): gate ObservabilityLayer content attrs behind ENTERPRISE_INCLUDE_CONTENT Add should_include_content() helper to extensions/otel/parser/base.py that returns True in CE (no behaviour change) and respects ENTERPRISE_INCLUDE_CONTENT in EE. Gate all content-bearing span attributes in LLM, retrieval, tool, and default node parsers so that gen_ai.completion, gen_ai.prompt, retrieval.document, tool call arguments/results, and node input/output values are suppressed when ENTERPRISE_ENABLED=True and ENTERPRISE_INCLUDE_CONTENT=False.	2026-03-02 04:04:26 -08:00
GareArc	ff35f1bfaa	Merge branch '1.12.1-otel-ee' into deploy/enterprise	2026-03-02 02:28:30 -08:00
GareArc	3364003f90	fix(telemetry): add credential_name lookup with async-safe fallback	2026-03-02 02:27:31 -08:00
GareArc	e387d0205b	Merge branch '1.12.1-otel-ee' into deploy/enterprise	2026-03-02 01:54:55 -08:00
GareArc	6df00c83ae	fix(telemetry): populate LLM credential info in node execution traces - Add _lookup_llm_credential_info() to query Provider/ProviderModel tables - Lookup LLM credentials when tool credential_id is null - Fall back to provider-level credential if no model-specific credential	2026-03-02 01:47:39 -08:00
GareArc	05cf2336ac	docs(telemetry): add token consumption query patterns to data dictionary Add token hierarchy diagram, common PromQL queries (totals, drill-down, rates), and app name lookup via trace query.	2026-03-02 01:19:00 -08:00
GareArc	b710c9ad59	fix(telemetry): populate missing fields in node execution trace - Extract model_provider/model_name from process_data (LLM nodes store model info there, not in execution_metadata) - Add invoke_from to node execution trace metadata dict - Add credential_id to node execution trace metadata dict - Add conversation_id to metadata after message_id lookup - Add tool_name to tool_info dict in tool node	2026-03-02 01:18:59 -08:00
GareArc	a2a5b02a53	docs(telemetry): add token consumption query patterns to data dictionary Add token hierarchy diagram, common PromQL queries (totals, drill-down, rates), and app name lookup via trace query.	2026-03-02 01:07:18 -08:00
GareArc	1fcb05432d	fix(telemetry): populate missing fields in node execution trace - Extract model_provider/model_name from process_data (LLM nodes store model info there, not in execution_metadata) - Add invoke_from to node execution trace metadata dict - Add credential_id to node execution trace metadata dict - Add conversation_id to metadata after message_id lookup - Add tool_name to tool_info dict in tool node	2026-03-02 01:07:10 -08:00
L1nSn0w	9c148218fc	Merge branch 'deploy/enterprise' of https://github.com/langgenius/dify into deploy/enterprise	2026-03-02 16:58:01 +08:00
L1nSn0w	02ab3a34b4	Merge branch 'release/e-1.12.1' into deploy/enterprise	2026-03-02 16:57:31 +08:00
L1nSn0w	58524fd7fd	feat(enterprise): auto-join newly registered accounts to the default workspace (#32308 ) Co-authored-by: Yunlu Wen <yunlu.wen@dify.ai>	2026-03-02 16:38:43 +08:00
GareArc	aa7f648712	Merge branch '1.12.1-otel-ee' into deploy/enterprise	2026-03-01 22:30:09 -08:00
GareArc	9d4b2715e8	fix(celery): register enterprise_telemetry_task in worker imports Fixes Celery worker error where process_enterprise_telemetry task was unregistered despite being dispatched from the app. Added conditional import when ENTERPRISE_TELEMETRY_ENABLED=true to ensure the task is available in the worker process. Resolves: KeyError 'tasks.enterprise_telemetry_task.process_enterprise_telemetry'	2026-03-01 22:27:44 -08:00
GareArc	e2fc3417be	Merge branch 'fix/otel-upgrade-e-1.12.1' into deploy/enterprise	2026-03-01 21:48:37 -08:00
GareArc	2d7bffcc11	fix: upgrade OpenTelemetry packages from 0.48b0 to 0.49b0 Fixes "Failed to detach context" error in production by upgrading to OTEL 0.49b0, which includes None token guards in Celery instrumentor (PR opentelemetry-python-contrib#2927). Package Updates: - OTEL instrumentation: 0.48b0 → 0.49b0 - OTEL SDK/API: 1.27.0 → 1.28.0 - protobuf: 4.25.8 → 5.29.6 (required by opentelemetry-proto 1.28.0) - Google Cloud packages upgraded for protobuf 5.x compatibility: - google-api-core: 2.18.0 → 2.19.1+ - google-auth: 2.29.0 → 2.47.0+ - google-cloud-aiplatform: 1.49.0 → 1.123.0+ - googleapis-common-protos: 1.63.0 → 1.65.0+ - google-cloud-storage: 2.16.0 → 3.0.0+ - httpx: 0.27.0 → 0.28.0 (required by google-genai 1.37+) Also removed duplicate opentelemetry-instrumentation-httpx entry in pyproject.toml.	2026-03-01 21:47:51 -08:00
GareArc	eb1b1eb09c	Merge 1.12.1-otel-ee into deploy/enterprise	2026-03-01 19:37:06 -08:00
GareArc	83f5850d0a	refactor(telemetry): add resolved_parent_context property and fix edge cases - Add resolved_parent_context property to BaseTraceInfo for reusable parent context extraction - Refactor enterprise_trace.py to use property instead of duplicated dict plucking (~19 lines eliminated) - Fix UUID validation in exporter.py with specific error logging for invalid trace correlation IDs - Add error isolation in event_handlers.py to prevent telemetry failures from breaking user operations - Replace pickle-based payload_fallback with JSON storage rehydration for security - Update TelemetryEnvelope to use Pydantic v2 ConfigDict with extra='forbid' - Update tests to reflect contract changes and new error handling behavior	2026-03-01 19:33:59 -08:00
yunlu.wen	3368d4cf02	Merge branch '1.12.1-otel-ee' into deploy/enterprise	2026-03-02 10:10:28 +08:00
yunlu.wen	7a92c1764f	fix token label	2026-03-02 10:10:01 +08:00
yunlu.wen	5617d69ca7	try to fix exception logging	2026-03-02 09:53:11 +08:00
GareArc	1a6aded8e0	Merge branch '1.12.1-otel-ee' into deploy/enterprise	2026-03-01 02:25:23 -08:00
GareArc	9952a17fed	fix(telemetry): use URL scheme instead of API key for gRPC TLS detection - Change insecure parameter from API key-based to URL scheme-based detection - https:// endpoints now correctly use TLS (insecure=False) - All other endpoints (http://, no scheme) use insecure=True - Update tests to reflect URL scheme-based logic - Remove incorrect documentation claiming API key controls TLS	2026-03-01 02:24:25 -08:00
GareArc	36ff9b447d	Merge origin/release/e-1.12.1 into 1.12.1-otel-ee Sync enterprise 1.12.1 changes: - feat: implement heartbeat mechanism for database migration lock - refactor: replace AutoRenewRedisLock with DbMigrationAutoRenewLock - fix: improve logging for database migration lock release - fix: make flask upgrade-db fail on error - fix: include sso_verified in access_mode validation - fix: inherit web app permission from original app - fix: make e-1.12.1 enterprise migrations database-agnostic - fix: get_message_event_type return wrong message type - refactor: document_indexing_sync_task split db session - fix: trigger output schema miss - test: remove unrelated enterprise service test Conflict resolution: - Combined OTEL telemetry imports with tool signature import in easy_ui_based_generate_task_pipeline.py	2026-03-01 00:18:46 -08:00
GareArc	1fa1960201	Merge branch '1.12.1-otel-ee' into deploy/enterprise	2026-02-28 20:34:15 -08:00
GareArc	ff877ee39c	fix(telemetry): add resolved_trace_id property to eliminate trace_id inconsistencies Add computed property to BaseTraceInfo that provides intelligent fallback: 1. External trace_id (from X-Trace-Id header) 2. workflow_run_id (for workflow-related traces) 3. message_id (as final fallback) This ensures attribute dify.trace_id always matches log-level trace_id, eliminating inconsistencies where attribute was null but log-level had value. Changes: - Add resolved_trace_id property to BaseTraceInfo (trace_entity.py) - Replace 4 direct trace_id attribute assignments with resolved_trace_id - Add trace_id_source parameter to 5 emit_metric_only_event calls Fixes trace_id inconsistency found in MESSAGE_RUN, TOOL_EXECUTION, MODERATION_CHECK, SUGGESTED_QUESTION_GENERATION, GENERATE_NAME_EXECUTION, DATASET_RETRIEVAL, and PROMPT_GENERATION_EXECUTION events. All 78 telemetry tests passing.	2026-02-28 20:32:15 -08:00
GareArc	370e1fa5e2	Merge branch '1.12.1-otel-ee' into deploy/enterprise	2026-02-28 19:30:49 -08:00
GareArc	abcf14a571	refactor(telemetry): move gateway to core as stateless module-level functions Move routing table, emit(), and is_enterprise_telemetry_enabled() from enterprise/telemetry/gateway.py into core/telemetry/gateway.py so both CE and EE share one code path. The ce_eligible flag in CASE_ROUTING controls which events flow in CE — flipping it is the only change needed to enable an event in community edition. - Delete enterprise/telemetry/gateway.py (class-based singleton) - Create core/telemetry/gateway.py (stateless functions, no shared state) - Simplify core/telemetry/__init__.py to thin facade over gateway - Remove TelemetryGateway class and get_gateway() from ext_enterprise_telemetry - Single-source is_enterprise_telemetry_enabled in core.telemetry.gateway - Fix pre-existing test bugs (missing dify.event.id in metric handler tests) - Update all imports and mock paths across 7 test files	2026-02-28 19:27:24 -08:00
GareArc	9bd938b4e1	Merge branch '1.12.1-otel-ee' into deploy/enterprise	2026-02-28 17:41:17 -08:00
GareArc	5e57f73598	feat(telemetry): add model provider and name tags to all trace metrics Add comprehensive model tracking across all OTEL metrics and logs: - Node execution metrics now include model_name for LLM operations - Suggested question metrics include model_provider and model_name - Dataset retrieval captures both embedding and rerank model info - Updated DATA_DICTIONARY.md with complete metric label documentation This enables granular cost tracking, performance analysis, and usage monitoring per model across all operation types.	2026-02-28 00:06:44 -08:00
GareArc	62592be60b	docs(enterprise): split telemetry docs into README and data dictionary Separate background/configuration instructions from the data dictionary: - README.md: Overview, configuration, correlation model, content gating - DATA_DICTIONARY.md: Pure reference format with signals and attributes The data dictionary is now concise (465 lines vs 911) and focuses on attribute types and relationships without verbose explanations.	2026-02-27 12:32:48 -08:00
L1nSn0w	7a8c96b4b7	Merge branch 'release/e-1.12.1' into deploy/enterprise	2026-02-14 17:00:06 +08:00
L1nSn0w	5025e29220	test: remove unrelated enterprise service test Co-authored-by: Cursor <cursoragent@cursor.com>	2026-02-14 16:34:49 +08:00
L1nSn0w	3cdc9c119e	refactor(api): enhance DbMigrationAutoRenewLock acquisition logic - Added a check to prevent double acquisition of the DB migration lock, raising an error if an attempt is made to acquire it while already held. - Implemented logic to reuse the lock object if it has already been created, improving efficiency and clarity in lock management. - Reset the lock object to None upon release to ensure proper state management. (cherry picked from commit d4b102d3c8a473c4fd6409dba7c198289bb5f921)	2026-02-14 16:28:38 +08:00
L1nSn0w	18ba367b11	refactor(api): improve DbMigrationAutoRenewLock configuration and logging - Introduced constants for minimum and maximum join timeout values, enhancing clarity and maintainability. - Updated the renewal interval calculation to use defined constants for better readability. - Improved logging messages to include context information, making it easier to trace issues during lock operations. (cherry picked from commit 1471b77bf5156a95417bde148753702d44221929)	2026-02-14 16:28:38 +08:00
autofix-ci[bot]	d0bd74fccb	[autofix.ci] apply automated fixes (cherry picked from commit 907e63cdc57f8006017837a74c2da2fbe274dcfb)	2026-02-14 16:28:38 +08:00
L1nSn0w	5ccbc00eb9	refactor(api): replace AutoRenewRedisLock with DbMigrationAutoRenewLock - Updated the database migration locking mechanism to use DbMigrationAutoRenewLock for improved clarity and functionality. - Removed the AutoRenewRedisLock implementation and its associated tests. - Adjusted integration and unit tests to reflect the new locking class and its usage in the upgrade_db command. (cherry picked from commit c812ad9ff26bed3eb59862bd7a5179b7ee83f11f)	2026-02-14 16:28:38 +08:00
L1nSn0w	94603b5408	refactor(api): replace heartbeat mechanism with AutoRenewRedisLock for database migration - Removed the manual heartbeat function for renewing the Redis lock during database migrations. - Integrated AutoRenewRedisLock to handle lock renewal automatically, simplifying the upgrade_db command. - Updated unit tests to reflect changes in lock handling and error management during migrations. (cherry picked from commit 8814256eb5fa20b29e554264f3b659b027bc4c9a)	2026-02-14 16:28:38 +08:00
L1nSn0w	8d4bd5636b	refactor(tests): replace hardcoded wait time with constant for clarity - Introduced HEARTBEAT_WAIT_TIMEOUT_SECONDS constant to improve readability and maintainability of test code. - Updated test assertions to use the new constant instead of a hardcoded value. (cherry picked from commit 0d53743d83b03ae0e68fad143711ffa5f6354093)	2026-02-14 16:28:38 +08:00
autofix-ci[bot]	ee0c4a8852	[autofix.ci] apply automated fixes (cherry picked from commit 326cffa553ffac1bcd39a051c899c35b0ebe997d)	2026-02-14 16:28:38 +08:00
L1nSn0w	6032c598b0	fix(api): improve logging for database migration lock release - Added a migration_succeeded flag to track the success of database migrations. - Enhanced logging messages to indicate the status of the migration when releasing the lock, providing clearer context for potential issues. (cherry picked from commit e74be0392995d16d288eed2175c51148c9e5b9c0)	2026-02-14 16:28:38 +08:00
L1nSn0w	afdd5b6c86	feat(api): implement heartbeat mechanism for database migration lock - Added a heartbeat function to renew the Redis lock during database migrations, preventing long blockages from crashed processes. - Updated the upgrade_db command to utilize the new locking mechanism with a configurable TTL. - Removed the deprecated MIGRATION_LOCK_TTL from DeploymentConfig and related files. - Enhanced unit tests to cover the new lock renewal behavior and error handling during migrations. (cherry picked from commit a3331c622435f9f215b95f6b0261f43ae56a9d9c)	2026-02-14 16:28:38 +08:00

1 2 3 4 5 ...

9016 Commits