Commit Graph

9040 Commits

Author SHA1 Message Date
yunlu.wen
e1adf73ad5 feat: add enterprise pre uninstall hook for plugin 2026-03-09 15:46:16 +08:00
GareArc
825765231b
fix: remove extra exempts 2026-03-05 01:10:59 -08:00
GareArc
4e35fbbff4
Merge branch 'fix/enterprise-api-error-handling' into deploy/enterprise 2026-03-05 00:27:55 -08:00
GareArc
d81684d8d1
fix: expose license status to unauthenticated /system-features callers
After force-logout due to license expiry, the login page calls
/system-features without auth. The license block was gated behind
is_authenticated, so the frontend always saw status='none' instead
of the actual expiry status. Split the guard so license.status and
expired_at are always returned while workspace usage details remain
auth-gated.
2026-03-05 00:27:47 -08:00
GareArc
11f657019a
Squash merge fix/enterprise-api-error-handling into deploy/enterprise 2026-03-04 22:31:21 -08:00
GareArc
5afb24f461
Merge branch 'release/e-1.12.1' into fix/enterprise-api-error-handling 2026-03-04 22:30:09 -08:00
GareArc
eaea4ad6dd
fix: use payload.id instead of undefined args in set_default_provider 2026-03-04 22:28:35 -08:00
GareArc
808002fbbd
fix: use payload.id instead of undefined args in set_default_provider 2026-03-04 22:28:30 -08:00
GareArc
757fabda1e
fix: exempt console bootstrap APIs from license check to prevent infinite reload loop 2026-03-04 22:13:25 -08:00
GareArc
858ccd8746
feat: add Redis caching for enterprise license status
Cache license status for 10 minutes to reduce HTTP calls to enterprise API.
Only caches license status, not full system features.

Changes:
- Add EnterpriseService.get_cached_license_status() method
- Cache key: enterprise:license:status
- TTL: 600 seconds (10 minutes)
- Graceful degradation: falls back to API call if Redis fails

Performance improvement:
- Before: HTTP call (~50-200ms) on every API request
- After: Redis lookup (~1ms) on cached requests
- Reduces load on enterprise service by ~99%
2026-03-04 21:28:11 -08:00
GareArc
ea35ee0a3e
feat: extend license enforcement to webapp API endpoints
Extend license middleware to also block webapp API (/api/*) when
enterprise license is expired/inactive/lost.

Changes:
- Check both /console/api and /api endpoints
- Add webapp-specific exempt paths:
  - /api/passport (webapp authentication)
  - /api/login, /api/logout, /api/oauth
  - /api/forgot-password
  - /api/system-features (webapp needs this to check license status)

This ensures both console users and webapp users are blocked when
license expires, maintaining consistent enforcement across all APIs.
2026-03-04 20:38:03 -08:00
GareArc
0e9dc86f3b
fix: use UnauthorizedAndForceLogout to trigger frontend logout on license expiry
Change license check to raise UnauthorizedAndForceLogout exception instead
of returning generic JSON response. This ensures proper frontend handling:

Frontend behavior (service/base.ts line 588):
- Checks if code === 'unauthorized_and_force_logout'
- Executes globalThis.location.reload()
- Forces user logout and redirect to login page
- Login page displays license expiration UI (already exists)

Response format:
- HTTP 401 (not 403)
- code: "unauthorized_and_force_logout"
- Triggers frontend reload which clears auth state

This completes the license enforcement flow:
1. Backend blocks all business APIs when license expires
2. Backend returns proper error code to trigger logout
3. Frontend reloads and redirects to login
4. Login page shows license expiration message
2026-03-04 20:30:53 -08:00
GareArc
0ed39d81e9
feat: add global license check middleware to block API access on expiry
Add before_request middleware that validates enterprise license status
for all /console/api endpoints when ENTERPRISE_ENABLED is true.

Behavior:
- Checks license status before each console API request
- Returns 403 with clear error message when license is expired/inactive/lost
- Exempts auth endpoints (login, oauth, forgot-password, etc.)
- Exempts /console/api/features so frontend can fetch license status
- Gracefully handles errors to avoid service disruption

This ensures all business APIs are blocked when license expires,
addressing the issue where APIs remained callable after expiry.
2026-03-04 20:10:42 -08:00
GareArc
7007aa3c61
Merge branch 'fix/enterprise-api-error-handling' into deploy/enterprise 2026-03-04 19:54:13 -08:00
GareArc
2b739b9544
fix: handle enterprise API errors properly to prevent KeyError crashes
When enterprise API returns 403/404, the response contains error JSON
instead of expected data structure. Code was accessing fields directly
causing KeyError → 500 Internal Server Error.

Changes:
- Add enterprise-specific error classes (EnterpriseAPIError, etc.)
- Implement centralized error validation in EnterpriseRequest.send_request()
- Extract error messages from API responses (message/error/detail fields)
- Raise domain-specific errors based on HTTP status codes
- Preserve backward compatibility with raise_for_status parameter

This prevents KeyError crashes and returns proper HTTP error codes
(403/404) instead of 500 errors.
2026-03-04 19:53:43 -08:00
GareArc
22e82297c5
fix(api): restore reg(ModelConfig) for Swagger schema generation 2026-03-04 17:34:08 -08:00
GareArc
8049c90a38
Merge remote-tracking branch 'origin/release/e-1.12.1' into deploy/enterprise 2026-03-04 17:32:33 -08:00
GareArc
3f771544b1
Merge branch '1.12.1-otel-ee' into deploy/enterprise 2026-03-04 17:31:51 -08:00
GareArc
ee13650e3d
fix(api): restore missing reg(ModelConfig) from 1.12.1 refactor 2026-03-04 17:31:19 -08:00
GareArc
7ef139cadd
Squash merge 1.12.1-otel-ee into release/e-1.12.1 2026-03-04 16:59:37 -08:00
GareArc
9fa8f6235e
Merge branch 'release/e-1.12.1' into 1.12.1-otel-ee 2026-03-04 16:59:21 -08:00
L1nSn0w
bf5a327156 fix(api): ensure enterprise workspace join occurs on account registration failure 2026-03-04 14:56:21 +08:00
L1nSn0w
d94af41f07 fix(api): ensure default workspace join occurs even if personal workspace creation fails 2026-03-04 14:56:21 +08:00
GareArc
5d54c198c0
Merge branch '1.12.1-otel-ee' into deploy/enterprise 2026-03-02 20:01:15 -08:00
GareArc
6536489195
fix(telemetry): restore TRACE_TASK_TO_CASE lookup broken by CE safety refactor
The CE safety commit (8a3485454a) converted module-level dicts to lazy
functions but forgot to update __init__.py, which still imported the
now-deleted TRACE_TASK_TO_CASE constant causing an ImportError at startup.

Add get_trace_task_to_case() to gateway.py as a lazy public wrapper
(inverse of _get_case_to_trace_task) and update __init__.py to call it.
2026-03-02 19:59:20 -08:00
GareArc
8f1d2455f4
Merge branch '1.12.1-otel-ee' into deploy/enterprise 2026-03-02 18:50:39 -08:00
GareArc
8a3485454a
fix(telemetry): ensure CE safety for enterprise-only imports and DB lookups
- Move enqueue_draft_node_execution_trace import inside call site in workflow_service.py
- Make gateway.py enterprise type imports lazy (routing dicts built on first call)
- Restore typed ModelConfig in llm_generator method signatures (revert dict regression)
- Fix generate_structured_output using wrong key model_parameters -> completion_params
- Replace unsafe cast(str, msg.content) with get_text_content() across llm_generator
- Remove duplicated payload classes from generator.py, import from core.llm_generator.entities
- Gate _lookup_app_and_workspace_names and credential lookups in ops_trace_manager behind is_enterprise_telemetry_enabled()
2026-03-02 18:45:33 -08:00
GareArc
8d8552cbb9
Merge branch 'fix/otel-upgrade-e-1.12.1' into release/e-1.12.1 2026-03-02 17:21:39 -08:00
GareArc
cf15f0d681
Merge branch '1.12.1-otel-ee' into deploy/enterprise 2026-03-02 15:56:52 -08:00
GareArc
d6de27a25a
feat(telemetry): promote gen_ai scalar fields from log-only to span attributes
Move gen_ai.usage.*, gen_ai.request.model, gen_ai.provider.name, and
gen_ai.user.id from companion-log-only to span attributes on workflow
and node execution spans.

These are small scalars with no size risk. Having them on spans enables
filtering and grouping in trace UIs (Tempo, Jaeger, Datadog) without
requiring a cross-signal join to companion logs.

Data dictionary updated: span tables gain the new fields; companion log
'additional attributes' tables trimmed to only list fields not already
covered by 'All span attributes'.
2026-03-02 15:55:10 -08:00
GareArc
11ab67c8cb
Merge branch '1.12.1-otel-ee' into deploy/enterprise 2026-03-02 04:20:06 -08:00
GareArc
fe741140d5
fix(telemetry): fix zero-value message and workflow duration histograms
Workflow RT: replace float(info.workflow_run_elapsed_time) with
(end_time - start_time).total_seconds() using workflow_run.created_at and
workflow_run.finished_at. The elapsed_time DB field defaults to 0 and can
be stale if the workflow_storage Celery task has not committed yet when the
trace fires. Wall-clock timestamps are more reliable; elapsed_time is kept
as fallback.

Message RT: change end_time from created_at + provider_response_latency to
message.updated_at when updated_at > created_at. The pipeline explicitly
sets message.updated_at = naive_utc_now() at the moment the LLM response
is complete, making it the canonical response-complete timestamp.
Falls back to the latency-based calculation for error/aborted messages.
2026-03-02 04:14:57 -08:00
GareArc
9b5b355a4e
fix(telemetry): gate ObservabilityLayer content attrs behind ENTERPRISE_INCLUDE_CONTENT
Add should_include_content() helper to extensions/otel/parser/base.py that
returns True in CE (no behaviour change) and respects ENTERPRISE_INCLUDE_CONTENT
in EE. Gate all content-bearing span attributes in LLM, retrieval, tool, and
default node parsers so that gen_ai.completion, gen_ai.prompt, retrieval.document,
tool call arguments/results, and node input/output values are suppressed when
ENTERPRISE_ENABLED=True and ENTERPRISE_INCLUDE_CONTENT=False.
2026-03-02 04:04:26 -08:00
GareArc
ff35f1bfaa
Merge branch '1.12.1-otel-ee' into deploy/enterprise 2026-03-02 02:28:30 -08:00
GareArc
3364003f90
fix(telemetry): add credential_name lookup with async-safe fallback 2026-03-02 02:27:31 -08:00
GareArc
e387d0205b
Merge branch '1.12.1-otel-ee' into deploy/enterprise 2026-03-02 01:54:55 -08:00
GareArc
6df00c83ae
fix(telemetry): populate LLM credential info in node execution traces
- Add _lookup_llm_credential_info() to query Provider/ProviderModel tables
- Lookup LLM credentials when tool credential_id is null
- Fall back to provider-level credential if no model-specific credential
2026-03-02 01:47:39 -08:00
GareArc
05cf2336ac
docs(telemetry): add token consumption query patterns to data dictionary
Add token hierarchy diagram, common PromQL queries (totals, drill-down,
rates), and app name lookup via trace query.
2026-03-02 01:19:00 -08:00
GareArc
b710c9ad59
fix(telemetry): populate missing fields in node execution trace
- Extract model_provider/model_name from process_data (LLM nodes store
  model info there, not in execution_metadata)
- Add invoke_from to node execution trace metadata dict
- Add credential_id to node execution trace metadata dict
- Add conversation_id to metadata after message_id lookup
- Add tool_name to tool_info dict in tool node
2026-03-02 01:18:59 -08:00
GareArc
a2a5b02a53
docs(telemetry): add token consumption query patterns to data dictionary
Add token hierarchy diagram, common PromQL queries (totals, drill-down,
rates), and app name lookup via trace query.
2026-03-02 01:07:18 -08:00
GareArc
1fcb05432d
fix(telemetry): populate missing fields in node execution trace
- Extract model_provider/model_name from process_data (LLM nodes store
  model info there, not in execution_metadata)
- Add invoke_from to node execution trace metadata dict
- Add credential_id to node execution trace metadata dict
- Add conversation_id to metadata after message_id lookup
- Add tool_name to tool_info dict in tool node
2026-03-02 01:07:10 -08:00
L1nSn0w
9c148218fc Merge branch 'deploy/enterprise' of https://github.com/langgenius/dify into deploy/enterprise 2026-03-02 16:58:01 +08:00
L1nSn0w
02ab3a34b4 Merge branch 'release/e-1.12.1' into deploy/enterprise 2026-03-02 16:57:31 +08:00
L1nSn0w
58524fd7fd feat(enterprise): auto-join newly registered accounts to the default workspace (#32308)
Co-authored-by: Yunlu Wen <yunlu.wen@dify.ai>
2026-03-02 16:38:43 +08:00
GareArc
aa7f648712
Merge branch '1.12.1-otel-ee' into deploy/enterprise 2026-03-01 22:30:09 -08:00
GareArc
9d4b2715e8
fix(celery): register enterprise_telemetry_task in worker imports
Fixes Celery worker error where process_enterprise_telemetry task
was unregistered despite being dispatched from the app.

Added conditional import when ENTERPRISE_TELEMETRY_ENABLED=true
to ensure the task is available in the worker process.

Resolves: KeyError 'tasks.enterprise_telemetry_task.process_enterprise_telemetry'
2026-03-01 22:27:44 -08:00
GareArc
e2fc3417be
Merge branch 'fix/otel-upgrade-e-1.12.1' into deploy/enterprise 2026-03-01 21:48:37 -08:00
GareArc
2d7bffcc11
fix: upgrade OpenTelemetry packages from 0.48b0 to 0.49b0
Fixes "Failed to detach context" error in production by upgrading to OTEL 0.49b0,
which includes None token guards in Celery instrumentor (PR opentelemetry-python-contrib#2927).

Package Updates:
- OTEL instrumentation: 0.48b0 → 0.49b0
- OTEL SDK/API: 1.27.0 → 1.28.0
- protobuf: 4.25.8 → 5.29.6 (required by opentelemetry-proto 1.28.0)
- Google Cloud packages upgraded for protobuf 5.x compatibility:
  - google-api-core: 2.18.0 → 2.19.1+
  - google-auth: 2.29.0 → 2.47.0+
  - google-cloud-aiplatform: 1.49.0 → 1.123.0+
  - googleapis-common-protos: 1.63.0 → 1.65.0+
  - google-cloud-storage: 2.16.0 → 3.0.0+
- httpx: 0.27.0 → 0.28.0 (required by google-genai 1.37+)

Also removed duplicate opentelemetry-instrumentation-httpx entry in pyproject.toml.
2026-03-01 21:47:51 -08:00
GareArc
eb1b1eb09c
Merge 1.12.1-otel-ee into deploy/enterprise 2026-03-01 19:37:06 -08:00
GareArc
83f5850d0a
refactor(telemetry): add resolved_parent_context property and fix edge cases
- Add resolved_parent_context property to BaseTraceInfo for reusable parent context extraction
- Refactor enterprise_trace.py to use property instead of duplicated dict plucking (~19 lines eliminated)
- Fix UUID validation in exporter.py with specific error logging for invalid trace correlation IDs
- Add error isolation in event_handlers.py to prevent telemetry failures from breaking user operations
- Replace pickle-based payload_fallback with JSON storage rehydration for security
- Update TelemetryEnvelope to use Pydantic v2 ConfigDict with extra='forbid'
- Update tests to reflect contract changes and new error handling behavior
2026-03-01 19:33:59 -08:00