From 4839fcc4f8a0c5508065c28d194527f7a024a665 Mon Sep 17 00:00:00 2001
From: -LAN- <laipz8200@outlook.com>
Date: Wed, 22 Apr 2026 13:19:47 +0800
Subject: [PATCH] docs(api): narrow llm quota spec to tenant-scoped identity

---
 .../2026-04-22-llm-quota-identity-design.md   | 57 ++++++++-----------
 1 file changed, 23 insertions(+), 34 deletions(-)

diff --git a/docs/superpowers/specs/2026-04-22-llm-quota-identity-design.md b/docs/superpowers/specs/2026-04-22-llm-quota-identity-design.md
index ac38407cd0..2158edc95e 100644
--- a/docs/superpowers/specs/2026-04-22-llm-quota-identity-design.md
+++ b/docs/superpowers/specs/2026-04-22-llm-quota-identity-design.md
@@ -10,14 +10,13 @@ Refactor workflow quota handling so `LLMQuotaLayer` no longer depends on
 The new design narrows quota APIs around the actual billing identity:
 
 - `tenant_id`
-- `user_id`
 - `provider`
 - `model`
 - `usage` for post-run deduction
 
-`LLMQuotaLayer` will be initialized with graph-scoped `tenant_id` and `user_id`
-when the engine is built. It will read `provider` and `model` from public node
-data before execution and from Graphon result events after execution.
+`LLMQuotaLayer` will be initialized with graph-scoped `tenant_id` when the
+engine is built. It will read `provider` and `model` from public node data
+before execution and from Graphon result events after execution.
 
 Existing `ModelInstance`-based quota helpers will remain temporarily as thin
 deprecated wrappers so non-workflow callers do not need to move in the same
@@ -27,10 +26,10 @@ change.
 
 The current workflow quota path has the wrong dependency shape.
 
-`LLMQuotaLayer` naturally has model identity plus graph-scoped run context, but
-the quota helpers currently require a full `ModelInstance`. That forces the
-layer to depend on runtime assembly details or reconstruct a rich object only to
-answer a billing question.
+`LLMQuotaLayer` naturally has model identity plus graph-scoped tenant context,
+but the quota helpers currently require a full `ModelInstance`. That forces the
+layer to depend on runtime assembly details or reconstruct a rich object only
+to answer a billing question.
 
 This is unfriendly Python for two reasons:
 
@@ -44,8 +43,7 @@ This is unfriendly Python for two reasons:
 - Remove `ModelInstance` from `LLMQuotaLayer` entirely.
 - Keep pre-run quota checks and post-run quota deduction behavior unchanged.
 - Make the post-run billing API explicit and identity-based.
-- Pass graph-scoped `tenant_id` and `user_id` into the quota layer at
-  construction time.
+- Pass graph-scoped `tenant_id` into the quota layer at construction time.
 - Mark `ModelInstance`-based quota helpers as deprecated.
 
 ## Non-Goals
@@ -61,7 +59,7 @@ The workflow quota layer should not depend on `ModelInstance` at all.
 
 For this workflow path:
 
-- graph-scoped `tenant_id` and `user_id` are stable for the whole graph run
+- graph-scoped `tenant_id` is stable for the whole graph run
 - Graphon success events provide `model_provider` and `model_name`
 - pre-run checks must still happen in `on_node_run_start`, before any event is
   emitted
@@ -75,19 +73,16 @@ event inputs for post-run model identity.
 ```python
 layer = LLMQuotaLayer(
     tenant_id=run_context.tenant_id,
-    user_id=run_context.user_id,
 )
 
 ensure_llm_quota_available_for_model(
     tenant_id=self.tenant_id,
-    user_id=self.user_id,
     provider=provider,
     model=model,
 )
 
 deduct_llm_quota_for_model(
     tenant_id=self.tenant_id,
-    user_id=self.user_id,
     provider=result_event.node_run_result.inputs["model_provider"],
     model=result_event.node_run_result.inputs["model_name"],
     usage=result_event.node_run_result.llm_usage,
@@ -105,7 +100,6 @@ In `api/core/app/llm/quota.py`, add two narrow helpers:
 def ensure_llm_quota_available_for_model(
     *,
     tenant_id: str,
-    user_id: str | None,
     provider: str,
     model: str,
 ) -> None:
@@ -115,7 +109,6 @@ def ensure_llm_quota_available_for_model(
 def deduct_llm_quota_for_model(
     *,
     tenant_id: str,
-    user_id: str | None,
     provider: str,
     model: str,
     usage: LLMUsage,
@@ -149,7 +142,6 @@ Their behavior:
 - emit `DeprecationWarning`
 - delegate immediately to the new identity-based helpers
 - contain no quota logic of their own
-- pass `user_id=None` because `ModelInstance` does not carry caller scope today
 
 Recommended warning shape:
 
@@ -177,10 +169,10 @@ LLMQuotaLayer()
 to:
 
 ```python
-LLMQuotaLayer(tenant_id: str, user_id: str | None)
+LLMQuotaLayer(tenant_id: str)
 ```
 
-The layer stores graph-scoped run context directly and no longer fetches it
+The layer stores graph-scoped tenant context directly and no longer fetches it
 during execution.
 
 ### Pre-Run Check
@@ -226,7 +218,7 @@ def _extract_model_identity_from_result_event(
     ...
 ```
 
-This path depends only on public event payloads and graph-scoped run context.
+This path depends only on public event payloads and graph-scoped tenant context.
 
 ## Quota Resolution Logic
 
@@ -246,24 +238,21 @@ For deduction:
 - compute used quota exactly as the current implementation does
 - apply the same trial, paid, and free quota branches
 
-The `user_id` parameter is included because it belongs to graph-scoped identity
-and keeps the new API stable if provider resolution needs caller scope for
-plugin-backed lookups. If the current implementation does not need `user_id`,
-the helper should still accept it and ignore it for now rather than forcing
-another signature change later.
+The narrow quota API intentionally excludes `user_id`.
 
-When the deprecated `ModelInstance` wrappers delegate, they will pass
-`user_id=None`. That preserves current behavior for existing callers while
-keeping the narrow API stable for the workflow path, which does have a
-graph-scoped `user_id`.
+Model and credential resolution for quota is tenant-scoped in the current code:
+provider configurations are cached and resolved by `tenant_id`, and current
+credentials are selected from tenant-bound provider configuration. Request
+`user_id` still matters in other request-scoped model runtime flows, but it is
+not needed for quota lookup or billing.
 
 ## Engine Assembly Changes
 
 Every workflow engine builder that constructs `LLMQuotaLayer` must pass
-`tenant_id` and `user_id` explicitly.
+`tenant_id` explicitly.
 
 This includes normal workflow entry and child engine creation paths that inherit
-the same run context.
+the same graph-scoped tenant context.
 
 The layer should begin execution fully initialized:
 
@@ -319,7 +308,7 @@ Add unit tests for:
 
 Update layer tests to assert:
 
-- the layer is initialized with `tenant_id` and `user_id`
+- the layer is initialized with `tenant_id`
 - pre-run checks use public node model identity
 - post-run deduction uses event input model identity
 - no `ModelInstance` reconstruction is involved
@@ -329,8 +318,8 @@ Update layer tests to assert:
 
 Update workflow entry tests to assert:
 
-- `LLMQuotaLayer(tenant_id=..., user_id=...)` is constructed explicitly
-- child engine paths pass the same graph-scoped run context into the layer
+- `LLMQuotaLayer(tenant_id=...)` is constructed explicitly
+- child engine paths pass the same graph-scoped tenant context into the layer
 
 ## Migration Plan