docs(api): add llm quota identity design spec

This commit is contained in:
-LAN- 2026-04-22 13:15:24 +08:00
parent 7f1112ab53
commit cbb026efbe
No known key found for this signature in database
GPG Key ID: 6BA0D108DED011FF

View File

@ -0,0 +1,370 @@
# LLM Quota Identity API Design
Date: 2026-04-22
## Summary
Refactor workflow quota handling so `LLMQuotaLayer` no longer depends on
`ModelInstance`.
The new design narrows quota APIs around the actual billing identity:
- `tenant_id`
- `user_id`
- `provider`
- `model`
- `usage` for post-run deduction
`LLMQuotaLayer` will be initialized with graph-scoped `tenant_id` and `user_id`
when the engine is built. It will read `provider` and `model` from public node
data before execution and from Graphon result events after execution.
Existing `ModelInstance`-based quota helpers will remain temporarily as thin
deprecated wrappers so non-workflow callers do not need to move in the same
change.
## Problem
The current workflow quota path has the wrong dependency shape.
`LLMQuotaLayer` naturally has model identity plus graph-scoped run context, but
the quota helpers currently require a full `ModelInstance`. That forces the
layer to depend on runtime assembly details or reconstruct a rich object only to
answer a billing question.
This is unfriendly Python for two reasons:
1. The caller has to provide a much larger object than the callee actually
needs.
2. The layer boundary becomes coupled to workflow internals instead of public
data.
## Goals
- Remove `ModelInstance` from `LLMQuotaLayer` entirely.
- Keep pre-run quota checks and post-run quota deduction behavior unchanged.
- Make the post-run billing API explicit and identity-based.
- Pass graph-scoped `tenant_id` and `user_id` into the quota layer at
construction time.
- Mark `ModelInstance`-based quota helpers as deprecated.
## Non-Goals
- Do not change provider quota semantics.
- Do not remove the deprecated helpers in this change.
- Do not migrate every existing non-workflow quota caller in this change.
- Do not redesign GraphEngine event ordering.
## Approved Direction
The workflow quota layer should not depend on `ModelInstance` at all.
For this workflow path:
- graph-scoped `tenant_id` and `user_id` are stable for the whole graph run
- Graphon success events provide `model_provider` and `model_name`
- pre-run checks must still happen in `on_node_run_start`, before any event is
emitted
Because `on_node_run_start(node)` runs before `NodeRunStartedEvent` exists, the
layer will use public node configuration for pre-run model identity and public
event inputs for post-run model identity.
## Target Call Site
```python
layer = LLMQuotaLayer(
tenant_id=run_context.tenant_id,
user_id=run_context.user_id,
)
ensure_llm_quota_available_for_model(
tenant_id=self.tenant_id,
user_id=self.user_id,
provider=provider,
model=model,
)
deduct_llm_quota_for_model(
tenant_id=self.tenant_id,
user_id=self.user_id,
provider=result_event.node_run_result.inputs["model_provider"],
model=result_event.node_run_result.inputs["model_name"],
usage=result_event.node_run_result.llm_usage,
)
```
This is the desired public shape because it matches the real caller knowledge
without reconstructing a runtime object.
## API Changes
In `api/core/app/llm/quota.py`, add two narrow helpers:
```python
def ensure_llm_quota_available_for_model(
*,
tenant_id: str,
user_id: str | None,
provider: str,
model: str,
) -> None:
...
def deduct_llm_quota_for_model(
*,
tenant_id: str,
user_id: str | None,
provider: str,
model: str,
usage: LLMUsage,
) -> None:
...
```
These functions become the real implementation points for model-based quota
logic.
### Deprecated Wrappers
Keep the existing wrappers temporarily:
```python
def ensure_llm_quota_available(*, model_instance: ModelInstance) -> None:
...
def deduct_llm_quota(
*,
tenant_id: str,
model_instance: ModelInstance,
usage: LLMUsage,
) -> None:
...
```
Their behavior:
- emit `DeprecationWarning`
- delegate immediately to the new identity-based helpers
- contain no quota logic of their own
- pass `user_id=None` because `ModelInstance` does not carry caller scope today
Recommended warning shape:
```python
warnings.warn(
"ensure_llm_quota_available(model_instance=...) is deprecated; "
"use ensure_llm_quota_available_for_model(...) instead.",
DeprecationWarning,
stacklevel=2,
)
```
The same pattern applies to `deduct_llm_quota(...)`.
## LLMQuotaLayer Design
### Constructor
Change the layer constructor from:
```python
LLMQuotaLayer()
```
to:
```python
LLMQuotaLayer(tenant_id: str, user_id: str | None)
```
The layer stores graph-scoped run context directly and no longer fetches it
during execution.
### Pre-Run Check
`on_node_run_start(node)` will:
1. check whether the node type is one of:
- `BuiltinNodeTypes.LLM`
- `BuiltinNodeTypes.PARAMETER_EXTRACTOR`
- `BuiltinNodeTypes.QUESTION_CLASSIFIER`
2. extract `(provider, model)` from public node configuration
3. call `ensure_llm_quota_available_for_model(...)`
The layer should not read any `ModelInstance`, wrapped runtime object, or
private attribute in this path.
The preferred source is the node's public data model, not `node.model_instance`.
The intended helper shape is:
```python
def _extract_model_identity_from_node(node: Node) -> tuple[str, str] | None:
...
```
The helper reads the node's public model config, such as `node.data.model`.
### Post-Run Deduction
`on_node_run_end(node, error, result_event)` will:
1. ignore non-success events
2. extract `(provider, model)` from
`result_event.node_run_result.inputs["model_provider"]` and
`result_event.node_run_result.inputs["model_name"]`
3. call `deduct_llm_quota_for_model(...)`
The intended helper shape is:
```python
def _extract_model_identity_from_result_event(
result_event: NodeRunSucceededEvent,
) -> tuple[str, str] | None:
...
```
This path depends only on public event payloads and graph-scoped run context.
## Quota Resolution Logic
The new narrow helpers will preserve the existing rules.
For pre-check:
- resolve provider configuration for the given tenant and model identity
- return early for non-system providers
- raise `QuotaExceededError` if the resolved system provider model is already in
`QUOTA_EXCEEDED`
For deduction:
- resolve provider configuration for the given tenant and model identity
- return early for non-system providers
- compute used quota exactly as the current implementation does
- apply the same trial, paid, and free quota branches
The `user_id` parameter is included because it belongs to graph-scoped identity
and keeps the new API stable if provider resolution needs caller scope for
plugin-backed lookups. If the current implementation does not need `user_id`,
the helper should still accept it and ignore it for now rather than forcing
another signature change later.
When the deprecated `ModelInstance` wrappers delegate, they will pass
`user_id=None`. That preserves current behavior for existing callers while
keeping the narrow API stable for the workflow path, which does have a
graph-scoped `user_id`.
## Engine Assembly Changes
Every workflow engine builder that constructs `LLMQuotaLayer` must pass
`tenant_id` and `user_id` explicitly.
This includes normal workflow entry and child engine creation paths that inherit
the same run context.
The layer should begin execution fully initialized:
- no lazy tenant lookup
- no node-scoped context probing
- no hidden capture on first use
## Error Handling
Behavior should remain the same where quota is actually exceeded.
### Pre-Check
On `QuotaExceededError`:
- set the stop event
- send an abort command
- log a warning
### Post-Run
On `QuotaExceededError`:
- set the stop event
- send an abort command
- log a warning
### Missing Public Identity
If the layer cannot extract public model identity:
- log once with the node id
- skip quota work for that node
- do not reconstruct identity from private state
This preserves the new boundary. Missing identity should be treated as a public
contract problem, not as a reason to fall back to hidden runtime internals.
## Testing Plan
Update tests to match the new public boundary.
### `api/core/app/llm/quota.py`
Add unit tests for:
- `ensure_llm_quota_available_for_model(...)`
- `deduct_llm_quota_for_model(...)`
- deprecated wrappers delegate correctly
- deprecated wrappers emit `DeprecationWarning`
### `api/core/app/workflow/layers/llm_quota.py`
Update layer tests to assert:
- the layer is initialized with `tenant_id` and `user_id`
- pre-run checks use public node model identity
- post-run deduction uses event input model identity
- no `ModelInstance` reconstruction is involved
- abort behavior on quota-exceeded remains unchanged
### Workflow Assembly
Update workflow entry tests to assert:
- `LLMQuotaLayer(tenant_id=..., user_id=...)` is constructed explicitly
- child engine paths pass the same graph-scoped run context into the layer
## Migration Plan
1. Add the new identity-based quota helpers.
2. Convert `LLMQuotaLayer` to the new constructor and helper API.
3. Keep the old `ModelInstance` wrappers as deprecated delegators.
4. Update tests to enforce the new public boundary.
5. Migrate remaining non-workflow callers in later follow-up changes.
6. Remove deprecated wrappers once all callers have moved.
## Alternatives Considered
### Keep `ModelInstance` in the Layer
Rejected because it preserves the same over-wide dependency and forces runtime
object reconstruction for a simple billing operation.
### Reconstruct `ModelInstance` Inside the Layer
Rejected because it hides complexity instead of removing it.
### Make the Layer Entirely Event-Driven
Rejected for now because `on_node_run_start(node)` runs before Graphon emits the
node start event. That would require an engine-ordering change that is outside
the scope of this API refactor.
## Open Removal Path
This design intentionally leaves a clean removal path:
- once non-workflow callers migrate, the deprecated wrappers can be deleted
- once Graphon event coverage is sufficient for any future pre-run design, the
pre-run identity source can evolve independently of the quota API
The important boundary remains stable: quota logic consumes explicit model
identity, not `ModelInstance`.