Vibe3DScene | Vision-Aware 3D Scene Runtime

Overview

A layered runtime, not just a Blender bot

The current system separates entry surfaces, request orchestration, tool mediation, Blender execution, and persistence so the same core runtime can drive web generation, Blender-side chat, and service-oriented deployment.

Runtime orchestration

FastAPI receives the request, resolves thread ownership and VLM configuration, then reuses or rebuilds the LangGraph instance for that thread before execution starts.

Tool surface

MCP exposes the agent-callable tool registry while a small set of verification-only Blender commands remain internal to the graph for observation and geometry checks.

Durability and recovery

Redis ownership metadata, persisted `.blend` files, and stored image assets let sessions recover after restarts and keep long-running threads stable across workers.

Showcase

Web and headless runtime visuals

Scene generation, image grounding, retrieval, and procedural workflows across the web-native runtime.

Web-native scene generation

Headless runtime driving scene construction through a browser-facing workflow.

Prompt-to-scene iteration

Short-loop scene editing and generation with visual feedback in the same runtime.

Sketch and reference grounding

Reference images can be persisted, routed by role, and rehydrated per request.

Large-scene composition

Retrieval, generation, and camera-aware verification can be combined for bigger scene assembly tasks.

Interactive tool integrations

New tool support now includes SAM3D-style reconstruction flows inside the broader runtime.

PCG and generation backends

Procedural generation services can sit beside retrieval and model-generation tools in the same stack.

Blender

Blender-integrated chat generation

A Blender-native creation loop with the same agent runtime and tool orchestration behind it.

Blender-side creation loop

Agent orchestration remains automatic while the creation loop can still happen inside Blender.

Architecture

Platform architecture

How requests move through the platform, how sessions stay stable, and how core tooling connects to external services.

End-to-End Runtime Layers

Client -> API -> Graph -> MCP -> Blender / Services -> Storage

flowchart LR
    U[Web UI / CLI / Blender client] --> API[FastAPI]
    API --> G[LangGraph runtime]
    G --> MCP[MCP server and tool registry]
    MCP --> B[Blender socket or headless Blender session]
    MCP --> T[External tool services]
    API --> S[(Redis + persisted storage)]
    B --> O[Scene / Render / Export]
    T --> O

Deployment and session routing

Single-worker simplicity with multi-worker scaling when needed

flowchart LR
    C[Client] --> GW[Gateway / Nginx]
    GW --> W1[API Worker 1]
    GW --> W2[API Worker 2]
    W1 --> R[(Redis control plane)]
    W2 --> R
    W1 --> O1{Owns thread?}
    W2 --> O2{Owns thread?}
    O1 -->|Yes| E1[Execute locally]
    O1 -->|No| P1[Proxy to owner]
    O2 -->|Yes| E2[Execute locally]
    O2 -->|No| P2[Proxy to owner]
    E1 --> HS1[Headless session manager]
    E2 --> HS2[Headless session manager]
    HS1 --> FS[(Shared session and image storage)]
    HS2 --> FS

Core runtime and connected services

Core orchestration stays lightweight while heavier services remain modular

flowchart TB
    subgraph CORE[This repository]
        API[FastAPI + runtime APIs]
        GRAPH[LangGraph nodes]
        MCP[MCP runtime + tool registry]
        BL[Blender-facing tools]
        RT[Runtime-only Blender checks]
    end

    subgraph EXT[Sibling tool-service stack]
        TRE[TRELLIS2]
        RET[Retrieval backends]
        SAM[SAM reconstruction]
        SS[SceneSmith compatibility APIs]
        PCG[PCG services]
    end

    API --> GRAPH
    GRAPH --> MCP
    GRAPH --> RT
    MCP --> BL
    MCP --> TRE
    MCP --> RET
    MCP --> SAM
    MCP --> SS
    MCP --> PCG

Workflow

How a request turns into a finished scene

Each request is initialized, routed into the right execution mode, verified with fresh evidence, and driven toward completion through evaluator-controlled loops.

Request intake and graph setup

Per-thread graph reuse, rebuild, and request-state injection

flowchart LR
    A["Client / Web / CLI"] --> B["chat or chat-stream request"]
    B --> C["claim_or_proxy_request"]
    C --> D["resolve_thread_vlm_for_chat"]
    D --> E["get_agent(thread_id)"]
    E --> F{"Graph exists and VLM matches?"}
    F -->|No| G["create_agent_graph"]
    G --> H["get_blender_tools + bind_tools"]
    H --> I["build_agent_state_graph + compile"]
    F -->|Yes| J["Reuse graph"]
    I --> J
    B --> K["Build initial request state"]
    K --> L["messages + images + topology + fast_mode"]
    J --> M["agent.ainvoke / agent.astream"]
    L --> M

Initialization chain

`initialize_request` resolves workflow topology, memory profile, counters, and `fast_mode`.
`sync_reference_catalog` refreshes the compact thread-level image catalog from persisted assets.
`prepare_reference_context` computes the active image subset for this turn instead of replaying all history.
`router` selects direct mode or plan mode, with `fast_mode=true` forcing direct execution.
`plan_node` only runs when planning is needed and seeds append-only todo history.

Default execution loop

Tool execution, observation, verification, and finalization

flowchart LR
    A["router / plan_node"] --> B["agent"]
    B --> C["turn_dispatch"]
    C --> D{"assistant_turn_kind"}
    D -->|has_calls| E["tools"]
    E --> F["update_memory"]
    F --> G["scene_observe"]
    G --> H["verify"]
    H --> I["evaluator"]
    D -->|no_calls| I
    I -->|continue| B
    I -->|finalize| J["finalize"]
    I -->|pure_qa| K["END"]
    J --> K

Current control rules

The evaluator owns todo state transitions and convergence for both single-agent and dual-agent topologies.
`verify` can merge VLM-based judgment with an internal Blender penetration check when enabled.
`fast_mode` skips the normal observe-and-verify cycle for latency-sensitive requests.
Streaming hides internal node chatter such as router, planner, verify, and evaluator tokens from the user-facing SSE stream.

Tools

Supported tools and capabilities

The platform combines scene inspection, rendering, editing, retrieval, generation, reconstruction, and runtime control in one coordinated tool layer.

Scene inspection and evidence

Observation tools provide the fresh evidence that the evaluator expects before a mutated scene can finalize.

get_scene_info get_object_info observe_scene_global get_viewport_screenshot

Camera, render, and verification

Camera-oriented tools support both normal observation loops and verification-time evidence gathering.

camera_observe camera_act camera_set_pose render_from_camera render_from_objects

Scene editing and runtime control

The runtime supports direct scene mutation, imported models, retries, rollback, and manual object edits.

execute_blender_code import_glb_model undo_last_snapshot @ object refs primitive add transform / delete

Retrieval backends

Retrieval tools can search, preview, and import scene assets or materials through external service adapters.

HSSD retrieval AmbientCG materials SceneSmith APIs PolyHaven Sketchfab

3D generation and reconstruction

The generation layer spans mesh synthesis, reconstruction, and imported generated assets.

TRELLIS2 Tripo3D SAM3D Hyper3D / Rodin Hunyuan3D

Planning, memory, and topology hints

Request-scoped knobs decide how much planning, memory carryover, and verification the runtime should use.

direct_mode plan_mode fast_mode single_agent dual_agent image bindings

Changelog

Latest updates in v0.2.0

Highlights across runtime behavior, model support, editing controls, tool integrations, and frontend experience.

v0.2.0 · 2026-04-12

Agent runtime and execution policy

Introduced routing and budgeting logic to decide when planning and verification are actually needed.
Improved image memory and automatic reference reuse across the workflow.
Added both single-agent and dual-agent topologies, with dual-agent still under active debugging.

Models and observability

Provider support is broader

Added Qwen support alongside Gemini as the main cost-performance model options.
Added end-to-end token usage display across the full workflow.

Editing and interaction

Scene control is more direct

Manual edits for vibe-generated objects now include transforms and deletion.
Users can add basic primitives and keep those edits synchronized back to Blender.
Object-specific edits can now be targeted via @ references.

Tools and frontend

Tooling and UI both moved forward

Added SAM3D, Tripo3D, HSSD retrieval, and AmbientCG material retrieval.
Improved code-execution robustness with validation and rollback.
Upgraded the frontend chat, lighting and rendering presentation, and separated scene persistence from runtime persistence.

Vibe3DScene: Create Your 3D Scene with Words