VirtualWorkflowSynthesizer._build_features() now extracts ALL legacy
app features from AppModelConfig into the synthesized workflow.features:
- opening_statement + suggested_questions
- sensitive_word_avoidance (keywords/API moderation)
- more_like_this
- speech_to_text / text_to_speech
- retriever_resource
Previously workflow.features was hardcoded to "{}", losing all these
features during transparent upgrade. Now AdvancedChatAppRunner's
moderation, opening text, and other feature layers work correctly
for transparently upgraded old apps.
Made-with: Cursor
Root cause: _WORKDIR was hardcoded to "/home/user" which doesn't exist
in AWS AgentCore Code Interpreter environment (actual pwd is
/opt/amazon/genesis1p-tools/var). Every command was prefixed with
"cd /home/user && ..." which failed silently, producing empty stdout.
Fix:
- Default _WORKDIR to "/tmp" (universally available)
- Auto-detect actual working directory via "pwd" during
_construct_environment and override _WORKDIR dynamically
Verified: echo, python3, uname all return correct stdout.
Made-with: Cursor
Full pipeline working: Agent V2 node → Docker container creation →
CLI binary upload (linux/arm64) → dify init (fetch tools from API) →
dify execute (tool callback via CLI API) → result returned.
Fixes:
- Use sandbox.id (not vm.metadata.id) for CLI paths
- Upload CLI binary to container during sandbox creation
- Resolve linux binary separately for Docker containers on macOS
- Save Docker provider config via SandboxProviderService (proper
encryption) instead of raw DB insert
- Add verbose logging for sandbox tool execution path
- Fix NameError: binary not defined
Made-with: Cursor
- Skip Local sandbox provider under gevent worker (subprocess pipes
cause cooperative threading deadlock with Celery's gevent pool).
- Add non-blocking sandbox readiness check before tool execution.
- Add gevent timeout wrapper for sandbox bash session.
- Fix CLI binary resolution: add SANDBOX_DIFY_CLI_ROOT config field.
- Fix ExecutionContext.node_id propagation.
- Fix SkillInitializer to gracefully handle missing skill bundles.
- Update _invoke_tool_in_sandbox to use correct `dify execute` CLI
subcommand format (not `invoke-tool`).
The full sandbox-in-agent pipeline works end-to-end for network-based
providers (Docker, E2B, SSH). Local provider is skipped under gevent
but works in non-gevent contexts.
Made-with: Cursor
Close 3 integration gaps between the ported Sandbox system and Agent V2:
1. Fix _invoke_tool_in_sandbox to use SandboxBashSession context manager
API correctly (keyword args, bash_tool, ToolReference), with graceful
fallback to direct invocation when DifyCli binary is unavailable.
2. Inject sandbox into run_context via _resolve_sandbox_context() in
WorkflowBasedAppRunner — automatically creates a sandbox when a
tenant has an active sandbox provider configured.
3. Register SandboxLayer in both advanced_chat and workflow app runners
for proper sandbox lifecycle cleanup on graph end.
Also: make SkillInitializer non-fatal when no skill bundle exists,
add node_id to ExecutionContext for sandbox session scoping.
Made-with: Cursor
- workflow_execute_task: add AppMode.CHAT/AGENT_CHAT/COMPLETION to the
AdvancedChatAppGenerator routing branch so transparently upgraded old
apps can execute through the workflow engine.
- app_generate_service: use app_model.mode (not hardcoded AppMode.AGENT)
for SSE event subscription channel, ensuring the subscriber and
Celery publisher use the same Redis channel key.
Made-with: Cursor
- Auto-resolve parent_message_id when not provided by client,
querying the latest message in the conversation to maintain
the thread chain that extract_thread_messages() relies on.
- Add AppMode.AGENT to TokenBufferMemory mode checks so file
attachments in memory are handled via the workflow branch.
- Add debug logging for memory injection in node_factory and node.
Made-with: Cursor
1. DSL Import fix: change self._session.commit() to self._session.flush()
in app_dsl_service.py _create_or_update_app() to avoid "closed transaction"
error. DSL import now works: export agent app -> import -> new app created.
2. Memory loading attempt: added _load_memory_messages() to AgentV2Node
that loads TokenBufferMemory from conversation history. However, chatflow
engine manages conversations differently from easy-UI (conversation may
not be in DB at query time, or uses ConversationVariablePersistenceLayer
instead of Message table). Memory needs further investigation.
Test results:
- Multi-turn memory: Turn 1 OK, Turn 2 LLM doesn't see history (needs deeper fix)
- Service API with API Key: PASSED (answer="Sixteen" for 8+8)
- DSL Import: PASSED (status=completed, new app created)
- Token aggregation: PASSED (node=49, workflow=49)
Known: memory in multi-turn chatflow needs to use graphon's built-in
memory mechanism (MemoryConfig on node + ConversationVariablePersistenceLayer)
rather than direct DB query.
Made-with: Cursor
1. Fix workflow-level total_tokens=0:
Call graph_runtime_state.add_tokens(usage.total_tokens) in both
_run_without_tools and _run_with_tools paths after node execution.
Previously only graphon's internal ModelInvokeCompletedEvent handler
called add_tokens, which agent-v2 doesn't emit.
2. Fix Turn 2 SSE empty response:
Set PUBSUB_REDIS_CHANNEL_TYPE=streams in .env. Redis Streams
provides durable event delivery (consumers can replay past events),
solving the pub/sub at-most-once timing issue.
3. Skill -> Agent runtime integration:
SandboxBuilder.build() now auto-includes SkillInitializer if not
already present. This ensures sandbox.attrs has the skill bundle
loaded for downstream consumers (tool execution in sandbox).
4. LegacyResponseAdapter:
New module at core/app/apps/common/legacy_response_adapter.py.
Filters workflow-specific SSE events (workflow_started, node_started,
node_finished, workflow_finished) from the stream, passing through
only message/message_end/agent_log/error/ping events that old
clients expect.
46 unit tests pass.
Made-with: Cursor
The EventAdapter was converting every LLMResultChunk from the agent
strategy into StreamChunkEvent. Combined with the answer node's
{{#agent.text#}} variable output, this caused the final answer to
appear twice (e.g., "It is 2026-04-09 04:27:45.It is 2026-04-09 04:27:45.").
Now LLMResultChunk from strategy output is silently consumed (text still
accumulates in AgentResult.text via the strategy). Only AgentLogEvent
(thought/tool_call/round) is forwarded to the pipeline.
Known remaining issues:
- workflow/message level total_tokens=0 (node level is correct at 33)
because pipeline aggregation doesn't include agent-v2 node tokens
- Turn 2 SSE delivery timing with Redis pubsub (celery executes OK)
Made-with: Cursor
FunctionCallStrategy and ReActStrategy were passing user=self.context.user_id
to ModelInstance.invoke_llm() which doesn't accept that parameter.
This caused tool-using agent runs to fail with:
"ModelInstance.invoke_llm() got an unexpected keyword argument 'user'"
Verified: Agent V2 with current_time tool now works end-to-end:
ROUND 1: LLM thought -> CALL current_time -> got time
ROUND 2: LLM generates answer with time info
Made-with: Cursor
1. Remove StreamChunkEvent from AgentV2Node._run_without_tools():
The agent-v2 node was yielding StreamChunkEvent during LLM streaming,
AND the downstream answer node was outputting the same text via
{{#agent.text#}} variable reference, causing "FourFour" duplication.
Now text only flows through outputs.text -> answer node (single path).
2. Map inputs to query for completion app transparent upgrade:
Completion apps send {inputs: {query: "..."}} not {query: "..."}.
VirtualWorkflowSynthesizer route now extracts query from inputs
when the top-level query is missing.
Verified:
- Old chat app: "What is 2+2?" -> "Four" (was "FourFour")
- Old completion app: {inputs: {query: "What is 3+3?"}} -> "3 + 3 = 6" (was failing)
- Old agent-chat app: still works
Made-with: Cursor
VirtualWorkflowSynthesizer.ensure_workflow() creates a real draft
workflow on first call for a legacy app, persisting it to the database.
On subsequent calls, returns the existing draft.
This is needed because AdvancedChatAppGenerator's worker thread looks
up workflows from the database by ID. Instead of hacking the generator
to skip DB lookups, we treat this as a lazy one-time upgrade: the old
app gets a real workflow that can also be edited in the workflow editor.
Verified: old chat app created on main branch ("What is 2+2?" -> "Four")
and old agent-chat app ("Say hello" -> "Hello!") both successfully
execute through the Agent V2 engine with AGENT_V2_TRANSPARENT_UPGRADE=true.
Made-with: Cursor
Add two feature-flag-controlled upgrade paths that allow existing apps
and LLM nodes to transparently run through the Agent V2 engine without
any database migration:
1. AGENT_V2_TRANSPARENT_UPGRADE (default: off):
When enabled, old apps (chat/completion/agent-chat) bypass legacy
Easy-UI runners. VirtualWorkflowSynthesizer converts AppModelConfig
to an in-memory Workflow (start -> agent-v2 -> answer) at runtime,
then executes via AdvancedChatAppGenerator. Falls back to legacy
path on any synthesis error.
VirtualWorkflowSynthesizer maps:
- model JSON -> ModelConfig
- pre_prompt/chat_prompt_config -> prompt_template
- agent_mode.tools -> ToolMetadata[]
- agent_mode.strategy -> agent_strategy
- dataset_configs -> context
- file_upload -> vision
2. AGENT_V2_REPLACES_LLM (default: off):
When enabled, DifyNodeFactory.create_node() transparently remaps
nodes with type="llm" to type="agent-v2" before class resolution.
Since AgentV2NodeData is a strict superset of LLMNodeData, the
mapping is lossless. With tools=[], Agent V2 behaves identically
to LLM Node.
Both flags default to False for safety. Turn off = instant rollback.
46 existing tests pass. Flask starts successfully.
Made-with: Cursor
Replace the hardcoded FunctionCallAgentRunner / CotChatAgentRunner /
CotCompletionAgentRunner selection in AgentChatAppRunner with the new
AgentAppRunner class that uses StrategyFactory from Phase 1.
Before: AgentChatAppRunner manually selects FC/CoT runner class based on
model features and LLM mode, then instantiates it directly.
After: AgentChatAppRunner instantiates AgentAppRunner (from sandbox branch),
which internally uses StrategyFactory.create_strategy() to auto-select
the right strategy, and uses ToolInvokeHook for proper agent_invoke
with file handling and thought persistence.
This unifies the agent execution engine: both the new Agent V2 workflow
node and the legacy agent-chat app now use the same StrategyFactory
and AgentPattern implementations.
Also fix: command and file_upload nodes use string node_type instead of
BuiltinNodeTypes.COMMAND/FILE_UPLOAD (not in current graphon version).
46 tests pass. Flask starts successfully.
Made-with: Cursor
Integrate the ported sandbox system with Agent V2 node:
- Add DIFY_SANDBOX_CONTEXT_KEY to app_invoke_entities for passing
sandbox through run_context without modifying graphon
- DifyNodeFactory._resolve_sandbox() extracts sandbox from run_context
and passes it to AgentV2Node constructor
- AgentV2Node accepts optional sandbox parameter
- AgentV2ToolManager supports dual execution paths:
- _invoke_tool_directly(): standard ToolEngine.generic_invoke (no sandbox)
- _invoke_tool_in_sandbox(): delegates to SandboxBashSession.run_tool()
which uses DifyCli to call back to Dify API from inside the sandbox
- Graceful fallback: if sandbox execution fails, logs warning and returns
error message (does not crash the agent loop)
To enable sandbox for an Agent workflow:
1. Create a Sandbox via SandboxBuilder
2. Add it to run_context under DIFY_SANDBOX_CONTEXT_KEY
3. Agent V2 nodes will automatically use sandbox for tool execution
46 existing tests still pass.
Made-with: Cursor
Fixes discovered during end-to-end testing of Agent workflow execution:
1. ModelManager instantiation: use ModelManager.for_tenant() instead of
ModelManager() which requires a ProviderManager argument
2. Variable template resolution: use VariableTemplateParser(template).format()
instead of non-existent resolve_template() static method
3. invoke_llm() signature: remove unsupported 'user' keyword argument
4. Event dispatch: remove ModelInvokeCompletedEvent from _run() yield
(graphon base Node._dispatch doesn't support it via singledispatch)
5. NodeRunResult metadata: use WorkflowNodeExecutionMetadataKey enum keys
(TOTAL_TOKENS, TOTAL_PRICE, CURRENCY) instead of arbitrary string keys
6. SSE topic mismatch: use AppMode.AGENT (not ADVANCED_CHAT) in
retrieve_events() so publisher and subscriber share the same channel
7. Celery task routing: add AppMode.AGENT to workflow_execute_task._run_app()
alongside ADVANCED_CHAT
All issues verified fixed: Agent V2 node successfully invokes LLM and
returns "Hello there!" through the full SSE streaming pipeline.
Made-with: Cursor
Add AppMode.AGENT to validate_features_structure() match case
alongside ADVANCED_CHAT, fixing 'Invalid app mode: agent' error
when creating Agent apps (which auto-generate a workflow draft).
Discovered during E2E testing of the full create -> draft -> publish flow.
Made-with: Cursor
Ensure new Agent apps (AppMode.AGENT) can access all workflow-related
APIs and Service API chat endpoints:
- Add AppMode.AGENT to 13 workflow controller mode checks
- Add AppMode.AGENT to 4 workflow_run controller mode checks
- Add AppMode.AGENT to workflow_draft_variable controller
- Add AppMode.AGENT to Service API chat, conversation, message endpoints
- Add AgentV2Node.get_default_config() with prompt templates and strategy defaults
- 46 unit tests all passing (8 new Phase 7 tests)
Old agent/agent-chat paths remain completely unchanged.
Made-with: Cursor