From 32c9f9f88af30cd4a27c47e4d0c9b9a4f401140b Mon Sep 17 00:00:00 2001 From: -LAN- Date: Wed, 22 Apr 2026 13:15:24 +0800 Subject: [PATCH] docs(api): add llm quota identity design spec --- .../2026-04-22-llm-quota-identity-design.md | 370 ++++++++++++++++++ 1 file changed, 370 insertions(+) create mode 100644 docs/superpowers/specs/2026-04-22-llm-quota-identity-design.md diff --git a/docs/superpowers/specs/2026-04-22-llm-quota-identity-design.md b/docs/superpowers/specs/2026-04-22-llm-quota-identity-design.md new file mode 100644 index 0000000000..ac38407cd0 --- /dev/null +++ b/docs/superpowers/specs/2026-04-22-llm-quota-identity-design.md @@ -0,0 +1,370 @@ +# LLM Quota Identity API Design + +Date: 2026-04-22 + +## Summary + +Refactor workflow quota handling so `LLMQuotaLayer` no longer depends on +`ModelInstance`. + +The new design narrows quota APIs around the actual billing identity: + +- `tenant_id` +- `user_id` +- `provider` +- `model` +- `usage` for post-run deduction + +`LLMQuotaLayer` will be initialized with graph-scoped `tenant_id` and `user_id` +when the engine is built. It will read `provider` and `model` from public node +data before execution and from Graphon result events after execution. + +Existing `ModelInstance`-based quota helpers will remain temporarily as thin +deprecated wrappers so non-workflow callers do not need to move in the same +change. + +## Problem + +The current workflow quota path has the wrong dependency shape. + +`LLMQuotaLayer` naturally has model identity plus graph-scoped run context, but +the quota helpers currently require a full `ModelInstance`. That forces the +layer to depend on runtime assembly details or reconstruct a rich object only to +answer a billing question. + +This is unfriendly Python for two reasons: + +1. The caller has to provide a much larger object than the callee actually + needs. +2. The layer boundary becomes coupled to workflow internals instead of public + data. + +## Goals + +- Remove `ModelInstance` from `LLMQuotaLayer` entirely. +- Keep pre-run quota checks and post-run quota deduction behavior unchanged. +- Make the post-run billing API explicit and identity-based. +- Pass graph-scoped `tenant_id` and `user_id` into the quota layer at + construction time. +- Mark `ModelInstance`-based quota helpers as deprecated. + +## Non-Goals + +- Do not change provider quota semantics. +- Do not remove the deprecated helpers in this change. +- Do not migrate every existing non-workflow quota caller in this change. +- Do not redesign GraphEngine event ordering. + +## Approved Direction + +The workflow quota layer should not depend on `ModelInstance` at all. + +For this workflow path: + +- graph-scoped `tenant_id` and `user_id` are stable for the whole graph run +- Graphon success events provide `model_provider` and `model_name` +- pre-run checks must still happen in `on_node_run_start`, before any event is + emitted + +Because `on_node_run_start(node)` runs before `NodeRunStartedEvent` exists, the +layer will use public node configuration for pre-run model identity and public +event inputs for post-run model identity. + +## Target Call Site + +```python +layer = LLMQuotaLayer( + tenant_id=run_context.tenant_id, + user_id=run_context.user_id, +) + +ensure_llm_quota_available_for_model( + tenant_id=self.tenant_id, + user_id=self.user_id, + provider=provider, + model=model, +) + +deduct_llm_quota_for_model( + tenant_id=self.tenant_id, + user_id=self.user_id, + provider=result_event.node_run_result.inputs["model_provider"], + model=result_event.node_run_result.inputs["model_name"], + usage=result_event.node_run_result.llm_usage, +) +``` + +This is the desired public shape because it matches the real caller knowledge +without reconstructing a runtime object. + +## API Changes + +In `api/core/app/llm/quota.py`, add two narrow helpers: + +```python +def ensure_llm_quota_available_for_model( + *, + tenant_id: str, + user_id: str | None, + provider: str, + model: str, +) -> None: + ... + + +def deduct_llm_quota_for_model( + *, + tenant_id: str, + user_id: str | None, + provider: str, + model: str, + usage: LLMUsage, +) -> None: + ... +``` + +These functions become the real implementation points for model-based quota +logic. + +### Deprecated Wrappers + +Keep the existing wrappers temporarily: + +```python +def ensure_llm_quota_available(*, model_instance: ModelInstance) -> None: + ... + + +def deduct_llm_quota( + *, + tenant_id: str, + model_instance: ModelInstance, + usage: LLMUsage, +) -> None: + ... +``` + +Their behavior: + +- emit `DeprecationWarning` +- delegate immediately to the new identity-based helpers +- contain no quota logic of their own +- pass `user_id=None` because `ModelInstance` does not carry caller scope today + +Recommended warning shape: + +```python +warnings.warn( + "ensure_llm_quota_available(model_instance=...) is deprecated; " + "use ensure_llm_quota_available_for_model(...) instead.", + DeprecationWarning, + stacklevel=2, +) +``` + +The same pattern applies to `deduct_llm_quota(...)`. + +## LLMQuotaLayer Design + +### Constructor + +Change the layer constructor from: + +```python +LLMQuotaLayer() +``` + +to: + +```python +LLMQuotaLayer(tenant_id: str, user_id: str | None) +``` + +The layer stores graph-scoped run context directly and no longer fetches it +during execution. + +### Pre-Run Check + +`on_node_run_start(node)` will: + +1. check whether the node type is one of: + - `BuiltinNodeTypes.LLM` + - `BuiltinNodeTypes.PARAMETER_EXTRACTOR` + - `BuiltinNodeTypes.QUESTION_CLASSIFIER` +2. extract `(provider, model)` from public node configuration +3. call `ensure_llm_quota_available_for_model(...)` + +The layer should not read any `ModelInstance`, wrapped runtime object, or +private attribute in this path. + +The preferred source is the node's public data model, not `node.model_instance`. +The intended helper shape is: + +```python +def _extract_model_identity_from_node(node: Node) -> tuple[str, str] | None: + ... +``` + +The helper reads the node's public model config, such as `node.data.model`. + +### Post-Run Deduction + +`on_node_run_end(node, error, result_event)` will: + +1. ignore non-success events +2. extract `(provider, model)` from + `result_event.node_run_result.inputs["model_provider"]` and + `result_event.node_run_result.inputs["model_name"]` +3. call `deduct_llm_quota_for_model(...)` + +The intended helper shape is: + +```python +def _extract_model_identity_from_result_event( + result_event: NodeRunSucceededEvent, +) -> tuple[str, str] | None: + ... +``` + +This path depends only on public event payloads and graph-scoped run context. + +## Quota Resolution Logic + +The new narrow helpers will preserve the existing rules. + +For pre-check: + +- resolve provider configuration for the given tenant and model identity +- return early for non-system providers +- raise `QuotaExceededError` if the resolved system provider model is already in + `QUOTA_EXCEEDED` + +For deduction: + +- resolve provider configuration for the given tenant and model identity +- return early for non-system providers +- compute used quota exactly as the current implementation does +- apply the same trial, paid, and free quota branches + +The `user_id` parameter is included because it belongs to graph-scoped identity +and keeps the new API stable if provider resolution needs caller scope for +plugin-backed lookups. If the current implementation does not need `user_id`, +the helper should still accept it and ignore it for now rather than forcing +another signature change later. + +When the deprecated `ModelInstance` wrappers delegate, they will pass +`user_id=None`. That preserves current behavior for existing callers while +keeping the narrow API stable for the workflow path, which does have a +graph-scoped `user_id`. + +## Engine Assembly Changes + +Every workflow engine builder that constructs `LLMQuotaLayer` must pass +`tenant_id` and `user_id` explicitly. + +This includes normal workflow entry and child engine creation paths that inherit +the same run context. + +The layer should begin execution fully initialized: + +- no lazy tenant lookup +- no node-scoped context probing +- no hidden capture on first use + +## Error Handling + +Behavior should remain the same where quota is actually exceeded. + +### Pre-Check + +On `QuotaExceededError`: + +- set the stop event +- send an abort command +- log a warning + +### Post-Run + +On `QuotaExceededError`: + +- set the stop event +- send an abort command +- log a warning + +### Missing Public Identity + +If the layer cannot extract public model identity: + +- log once with the node id +- skip quota work for that node +- do not reconstruct identity from private state + +This preserves the new boundary. Missing identity should be treated as a public +contract problem, not as a reason to fall back to hidden runtime internals. + +## Testing Plan + +Update tests to match the new public boundary. + +### `api/core/app/llm/quota.py` + +Add unit tests for: + +- `ensure_llm_quota_available_for_model(...)` +- `deduct_llm_quota_for_model(...)` +- deprecated wrappers delegate correctly +- deprecated wrappers emit `DeprecationWarning` + +### `api/core/app/workflow/layers/llm_quota.py` + +Update layer tests to assert: + +- the layer is initialized with `tenant_id` and `user_id` +- pre-run checks use public node model identity +- post-run deduction uses event input model identity +- no `ModelInstance` reconstruction is involved +- abort behavior on quota-exceeded remains unchanged + +### Workflow Assembly + +Update workflow entry tests to assert: + +- `LLMQuotaLayer(tenant_id=..., user_id=...)` is constructed explicitly +- child engine paths pass the same graph-scoped run context into the layer + +## Migration Plan + +1. Add the new identity-based quota helpers. +2. Convert `LLMQuotaLayer` to the new constructor and helper API. +3. Keep the old `ModelInstance` wrappers as deprecated delegators. +4. Update tests to enforce the new public boundary. +5. Migrate remaining non-workflow callers in later follow-up changes. +6. Remove deprecated wrappers once all callers have moved. + +## Alternatives Considered + +### Keep `ModelInstance` in the Layer + +Rejected because it preserves the same over-wide dependency and forces runtime +object reconstruction for a simple billing operation. + +### Reconstruct `ModelInstance` Inside the Layer + +Rejected because it hides complexity instead of removing it. + +### Make the Layer Entirely Event-Driven + +Rejected for now because `on_node_run_start(node)` runs before Graphon emits the +node start event. That would require an engine-ordering change that is outside +the scope of this API refactor. + +## Open Removal Path + +This design intentionally leaves a clean removal path: + +- once non-workflow callers migrate, the deprecated wrappers can be deleted +- once Graphon event coverage is sufficient for any future pre-run design, the + pre-run identity source can evolve independently of the quota API + +The important boundary remains stable: quota logic consumes explicit model +identity, not `ModelInstance`.