Session MCP Server

The Session MCP Server exposes the Automation API over the Model Context Protocol (MCP). Connect any MCP-compatible AI agent to a live proxied browser session — your agent can open URLs, read the page, fill in forms, click buttons, select text, and take screenshots, all without modifying the underlying website.

Use Cases

Third-party agents — connect Cognigy, ElevenLabs, or similar platforms to a live browser session without modifying the underlying website.
Agent orchestration — use MCP-capable orchestrators (LangGraph, AutoGen, CrewAI) to build agentic workflows on top of any proxied web page.
Agent development — iterate on prompts and tool calls interactively against a live session from Cursor, VS Code, or any MCP client.

Configuration

Create a Space
Section titled “Create a Space”

Create a Space in Webfuse Studio . See the Getting Started for a step-by-step guide.
Generate a Space Automation API Key
Section titled “Generate a Space Automation API Key”

In Webfuse Studio , open the newly created Space, navigate to Settings → API Keys, and generate a new Space Automation API key (prefixed ak_). This token grants full remote control over the session. Treat it as a secret - do not expose it in client-side code, logs, or URLs.
Enable the Automation App
Section titled “Enable the Automation App”

Open a Session, toggle the Session Editor bar, and open the Apps tab. Find the Automation app and install it. See Apps for more details.
Configure Available Tools (optional)
Section titled “Configure Available Tools (optional)”

By default all automation tools are available. To restrict which tools agents can use, open the Automation app settings and click Configure tools. Uncheck any tools you want to disable for this Space — the change applies to all sessions in the Space. To disable automation entirely, uninstall the Automation app instead.
Restart the Session
Section titled “Restart the Session”

The Automation app takes effect after a session restart. Close the current Session and start a new one.

Connect your MCP Client

Configure your MCP client to connect to the Session MCP Server endpoint for your domain:

https://session-mcp.HOSTNAME/mcp

Authenticate with the Space Automation API key as a Bearer token:

Authorization: Bearer <your-space-automation-key>

Dynamic tool discovery

By default, the server returns all tools on the first tools/list request and every tool requires a session_id parameter.

If you append ?dynamic=true to the endpoint URL, the server starts with only the connectToSession tool. Call it with a session_id to bind the connection to a session — the server then registers the full tool set and sends a notifications/tools/list_changed notification so the client can re-fetch the tool list. Calling connectToSession with a different session_id rebinds the connection without reconnecting.

https://session-mcp.HOSTNAME/mcp?dynamic=true

Install in VS Code

Or add manually to .vscode/mcp.json in your workspace, or to user settings:

{
  "servers": {
    "webfuse-session": {
      "type": "http",
      "url": "https://session-mcp.HOSTNAME/mcp",
      "headers": {
        "Authorization": "Bearer ${input:automation_key}"
      }
    }
  },
  "inputs": [
    {
      "type": "promptString",
      "id": "automation_key",
      "description": "Space Automation API Key",
      "password": true
    }
  ]
}

Or add manually to .cursor/mcp.json:

{
  "mcpServers": {
    "webfuse-session": {
      "type": "http",
      "url": "https://session-mcp.HOSTNAME/mcp",
      "headers": {
        "Authorization": "Bearer <your-space-automation-key>"
      }
    }
  }
}

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "webfuse-session": {
      "type": "http",
      "url": "https://session-mcp.HOSTNAME/mcp",
      "headers": {
        "Authorization": "Bearer <your-space-automation-key>"
      }
    }
  }
}

Or use the Claude CLI:

claude mcp add-json webfuse-session '{"type":"http","url":"https://session-mcp.HOSTNAME/mcp","headers":{"Authorization":"Bearer <your-space-automation-key>"}}'

# To install the required library: pip install mcp

import asyncio
import httpx
from mcp import ClientSession
from mcp.client.streamable_http import streamable_http_client

async def main():
    async with streamable_http_client(
        "https://session-mcp.HOSTNAME/mcp",
        http_client=httpx.AsyncClient(
            headers={"Authorization": "Bearer <your-space-automation-key>"},
            timeout=httpx.Timeout(timeout=None, connect=10.0),
        ),
    ) as (read, write, _):
        async with ClientSession(read, write) as session:
            await session.initialize()
            result = await session.call_tool(
                "see_domSnapshot", {"session_id": "<session-id>"}
            )

            print(result)

if __name__ == "__main__":
    asyncio.run(main())

Use the MCP Inspector to explore and call tools interactively before setting up a full agent:

npx @modelcontextprotocol/inspector

Set Connection Type to Streamable HTTP, enter the MCP server URL, and authenticate with your Space Automation API key.

Try it

Start a fresh session, then ask your agent:

“In Webfuse session, open https://webfuse.com and describe what you see.”

The agent will ask for a session ID, then use navigate to load the page and see_domSnapshot or see_guiSnapshot to read it. From there you can ask it to interact with the page in natural language — click a link, fill in a form, or summarise the content.

Limits

Limit	Value	Notes
Tool call timeout	15s	Maximum time allowed for a single tool call, including network transfer in both directions and tool execution in the browser. If the round-trip doesn’t complete within this window the MCP server returns a timeout error to the agent.
Tool call input size	16KiB	Maximum decompressed size of the tool-call arguments sent to the server.
Tool call response size	10MiB	Maximum decompressed size of a tool result. Larger responses are rejected and the agent receives an error instead.
MCP Client connection duration	3min	MCP client connections are automatically closed after 3 minutes. This is a hard limit — clients must reconnect after this period to continue making tool calls.

Tools

All tools require a session_id identifying the target session.

Finding your session ID: It appears in the session URL as the path segment after the hostname, for example:

https://HOSTNAME/sGpUNaFXihCSxCUfb3zezgaCw

For programmatic access, you can also retrieve it from a REST API response or a webhook payload.

Most actuation tools also accept a target — a CSS selector, Webfuse ID, or [x,y] coordinates.

Execution context: All commands are executed on the active tab of the session, on the tab owner’s browser. If the active tab or its tab owner is not present when a tool call arrives, the call fails with an error.

act_click

Click a target element with the specified mouse button. Use for buttons, links, checkboxes, and any other interactive element. Defaults to a left-button click.

Parameter	Type	Required	Description
`session_id`	`string`	✓	Webfuse Session ID
`target`	`string`	✓	Target element: a CSS selector, a wf-id string from a snapshot, or coordinates as ‘[x,y]‘
`options.button`	`string`		Mouse button to use: ‘left’ (default), ‘middle’, or ‘right’.
`options.moveMouse`	`boolean`		Move the virtual mouse pointer to the target center before clicking. When false, the click is sent directly without moving the pointer (default: true).
`options.waitForTarget`	`boolean`		Wait for the target in case it does not (yet) exist (with a timeout of 5s) (default: false).

act_keyPress

Press a single key on a target element, with optional modifier keys. Key events are dispatched directly to the page, not to the operating system. OS-level shortcuts such as Ctrl+C (copy) or Ctrl+V (paste) will NOT work unless the page has explicitly implemented them. Standard editing keys (Enter, Backspace, Delete) and page-handled shortcuts work as expected.

Parameter	Type	Required	Description
`session_id`	`string`	✓	Webfuse Session ID
`target`	`string`	✓	Target element: a CSS selector, a wf-id string from a snapshot, or coordinates as ‘[x,y]‘
`key`	`string`	✓	Key to press using the KeyboardEvent.key name (e.g. ‘Enter’, ‘ArrowUp’, ‘a’, ‘B’, ‘F5’).
`options.altKey`	`boolean`		Hold the Alt key while pressing the key. Only effective if the page handles the resulting combination (default: false).
`options.ctrlKey`	`boolean`		Hold the Control key while pressing the key. Only effective if the page handles the resulting combination - OS shortcuts like Ctrl+C will not work (default: false).
`options.metaKey`	`boolean`		Hold the Meta (Cmd on macOS, Win on Windows) key while pressing the key (default: false).
`options.shiftKey`	`boolean`		Hold the Shift key while pressing the key (default: false).
`options.moveMouse`	`boolean`		Move the virtual mouse pointer to the target center before pressing the key. When false, the action is sent directly without moving the pointer (default: true).
`options.waitForTarget`	`boolean`		Wait for the target in case it does not (yet) exist (with a timeout of 5s) (default: false).

act_mouseMove

Move the virtual mouse pointer to a target element or coordinates without clicking. Use to trigger hover states, tooltips, or drop-down menus that require mouse proximity. Can optionally keep the pointer visible on screen after the move.

Parameter	Type	Required	Description
`session_id`	`string`	✓	Webfuse Session ID
`target`	`string`	✓	Target element: a CSS selector, a wf-id string from a snapshot, or coordinates as ‘[x,y]‘
`options.persistent`	`boolean`		Keep the pointer visible on screen indefinitely after the move. When false (default), the pointer fades out automatically after a short delay.
`options.waitForTarget`	`boolean`		Wait for the target in case it does not (yet) exist (with a timeout of 5s) (default: false).

act_scroll

Scroll a target element or the page by a given number of pixels. Use to bring off-screen content into view or to navigate long pages. Positive amounts scroll down or right; negative amounts scroll up or left. When scrolling the full page rather than a specific element, use ‘html’ as the target - ‘body’ often does not respond to scrolling.

Parameter	Type	Required	Description
`session_id`	`string`	✓	Webfuse Session ID
`target`	`string`	✓	Target element: a CSS selector, a wf-id string from a snapshot, or coordinates as ‘[x,y]‘
`amount`	`number`	✓	Number of pixels to scroll. Positive scrolls down or right; negative scrolls up or left.
`options.direction`	`string`		Axis to scroll along: ‘vertical’ (up/down, default) or ‘horizontal’ (left/right).
`options.moveMouse`	`boolean`		Move the virtual mouse pointer to the target center before scrolling. When false, the action is sent directly without moving the pointer (default: true).
`options.waitForTarget`	`boolean`		Wait for the target in case it does not (yet) exist (with a timeout of 5s) (default: false).

act_select

Select an option in a <select> dropdown element by matching its value attribute. Use this instead of act_click when interacting with native HTML dropdowns.

Parameter	Type	Required	Description
`session_id`	`string`	✓	Webfuse Session ID
`target`	`string`	✓	Target element: a CSS selector, a wf-id string from a snapshot, or coordinates as ‘[x,y]‘
`value`	`string`	✓	The value attribute of the option to select, not the visible display text.
`options.moveMouse`	`boolean`		Move the virtual mouse pointer to the target center before selecting. When false, the action is sent directly without moving the pointer (default: true).
`options.waitForTarget`	`boolean`		Wait for the target in case it does not (yet) exist (with a timeout of 5s) (default: false).

act_textSelect

Select a continuous run of text within a container element by matching its content. Use to highlight text before copying, replacing, or applying formatting. Pass an empty string as text to clear the current selection.

Parameter	Type	Required	Description
`session_id`	`string`	✓	Webfuse Session ID
`target`	`string`	✓	Target element: a CSS selector, a wf-id string from a snapshot, or coordinates as ‘[x,y]‘
`text`	`string`	✓	Exact text string to find and highlight within the target element. Pass an empty string to clear any existing selection.
`options.occurrence`	`number`		Which occurrence to select when the text appears more than once in the element. 1 selects the first match (1-based index, default: 1).
`options.moveMouse`	`boolean`		Move the virtual mouse pointer to the target center before selecting. When false, the action is sent directly without moving the pointer (default: true).
`options.waitForTarget`	`boolean`		Wait for the target in case it does not (yet) exist (with a timeout of 5s) (default: false).

act_type

Type text into a target input element. Short inputs are typed character by character; longer inputs are pasted directly. Use for text fields, search boxes, and any editable element. By default, overwrites existing content. If the target resolves to a non-editable wrapper (error: ‘Target must resolve to editable element’), re-take a DOM snapshot with webfuseIDs: true and target the inner input element directly using its wf-id.

Parameter	Type	Required	Description
`session_id`	`string`	✓	Webfuse Session ID
`target`	`string`	✓	Target element: a CSS selector, a wf-id string from a snapshot, or coordinates as ‘[x,y]‘
`text`	`string`	✓	Text to type into the target element.
`options.followFocus`	`boolean`		Continue typing into whichever element holds focus, even if focus moved away from the original target. Set to false to strictly type into the target (default: true).
`options.overwrite`	`boolean`		Replace the existing content of the input before typing. Set to false to append or insert at the current cursor position (default: true).
`options.moveMouse`	`boolean`		Move the virtual mouse pointer to the target center before typing. When false, the action is sent directly without moving the pointer (default: true).
`options.waitForTarget`	`boolean`		Wait for the target in case it does not (yet) exist (with a timeout of 5s) (default: false).

connectToSession

Bind this MCP connection to a Webfuse session. Must be called before any other tool — until then, only this tool is exposed. After binding, the server registers the session’s tool set and notifies the client via notifications/tools/list_changed; re-fetch the tool list with tools/list. Call again with a different session_id to rebind without reconnecting.

Parameter	Type	Required	Description
`session_id`	`string`	✓	Webfuse Session ID to bind this connection to

navigate

Navigate the current browser tab to a new URL. Use to open a page before interacting with it. After navigation completes, take a snapshot to confirm the page loaded as expected.

Parameter	Type	Required	Description
`session_id`	`string`	✓	Webfuse Session ID
`url`	`string`	✓	URL to navigate to. Supports absolute URLs (e.g. ‘https://example.com/page’) and relative URLs (e.g. ‘/page’), which are resolved against the current tab’s URL.

pageInfo

Retrieve information about the currently active web page, including URL and title.

Parameter	Type	Required	Description
`session_id`	`string`	✓	Webfuse Session ID

see_accessibilityTree

Capture the accessibility tree of the current page as a structured JSON object. Use to understand page semantics - roles, names, ARIA states (checked, expanded, disabled, …) - without parsing raw HTML. Each node includes a wf-id by default (see webfuseIDs option) that can be passed as a string directly as the target to actuation tools. Prefer see_domSnapshot when you need the full HTML structure.

Parameter	Type	Required	Description
`session_id`	`string`	✓	Webfuse Session ID
`options.root`	`string`		CSS selector scoping the accessibility tree to a specific subtree instead of the full page. Use to reduce output size when the area of interest is known (default: body).
`options.quality`	`number`		Snapshot completeness as a float between 0 (lowest) and 1 (highest, default). Values below 1 downsample the underlying DOM before computing the tree, reducing output size at the cost of some fidelity.
`options.webfuseIDs`	`boolean`		Associate each node with a unique wf-id string for unambiguous targeting. Pass the wf-id directly as the target to other tools. Especially useful when CSS selectors are unreliable — iframes, duplicate ids, or generated markup (default: true).

see_domSnapshot

Capture a structured text representation of the current page’s DOM. Use to read element text, attributes, and hierarchy before deciding which element to interact with. Prefer this over see_guiSnapshot when you need precise element targeting or the page is mostly text-based.

Parameter	Type	Required	Description
`session_id`	`string`	✓	Webfuse Session ID
`options.crossFrame`	`boolean`		Include content inside <iframe> elements. Enable when the target element is inside a frame (default: false).
`options.crossShadow`	`boolean`		Include content inside shadow DOM roots. Disable only if shadow DOM content is not needed (default: true).
`options.interactiveOnly`	`boolean`		Omit non-interactive elements and return only buttons, inputs, links, and similar controls. Reduces snapshot size when you only need to find actionable elements (default: false).
`options.quality`	`number`		DOM snapshot completeness as a float between 0 (lowest) and 1 (highest). At 1, the DOM is returned as-is - all elements present with full structure and context. Below 1, the snapshot is downsampled: output is smaller but the DOM is structurally altered - elements may be merged, reordered, or dropped, causing loss of context. Exception: if webfuseIDs=true, wf-id attributes survive downsampling and element targeting remains precise. Default for agent use: 0.1 with webfuseIDs=true — keeps snapshots small while preserving reliable element targeting via wf-ids.
`options.maxTokens`	`number`		Limit the DOM snapshot size by a specified LLM input token count (1 token = 4 bytes). This reduces quality adaptively, in addition to the specified quality (default: infinite).
`options.revealMaskedElements`	`boolean`		Include elements that have been masked by the Webfuse Masking App. Masked elements are hidden from the snapshot by default to protect sensitive content. Enable only when you explicitly need to interact with masked elements (default: false).
`options.root`	`string`		CSS selector scoping the snapshot to a specific subtree instead of the full page. Use to reduce snapshot size when the area of interest is known (default: body).
`options.webfuseIDs`	`boolean`		Annotate each element with a unique wf-id string for unambiguous targeting. Pass the wf-id directly as the target to other tools. Especially useful when CSS selectors are unreliable — iframes, duplicate ids, or generated markup (default: false).

see_guiSnapshot

Capture a screenshot of the current page as an image. Use when rendered visual appearance matters - images, charts, canvas, or verifying layout. Coordinates visible in the screenshot can be passed as [x,y] to action tools, but prefer see_domSnapshot for reliable element targeting or text extraction.

Parameter	Type	Required	Description
`session_id`	`string`	✓	Webfuse Session ID
`quality`	`number`		Image compression level as a float between 0 (lowest quality, smallest size) and 1 (highest quality, largest size). Lower values reduce image detail. Default is 0.6.
`maxTokens`	`number`		Limit GUI snapshot size by a specified LLM input token count (1 token = 4 bytes). This reduces quality adaptively, in addition to the specified quality (default: infinite).

see_textSelection

Read the text that is currently selected (highlighted) on the page. Use to verify the result of act_textSelect, or to capture text the user has already selected before acting on it.

Parameter	Type	Required	Description
`session_id`	`string`	✓	Webfuse Session ID

System Instructions

These instructions are sent to the model automatically when it connects. They describe the available tools and guide the agent’s behaviour. Use this as a starting point and customise it for your use case.

You are an intelligent browser agent which helps users to perform various tasks on the web.
You have access to a set of tools that allow you to interact with web pages, extract information, and perform actions.
Use these tools to accomplish the user's goals effectively.

This server controls an active Webfuse browser session - use it to interact with web pages (clicking, typing, navigating, observing). For creating or configuring sessions and spaces, or searching documentation, use the Webfuse API & Docs MCP server instead.

## Custom tools
Some tools in this session are extension-provided — you can identify them because their description starts with [Custom Tool].

- When a custom tool matches the user's goal, use it instead of assembling the equivalent sequence of standard tools yourself. Custom tools are purpose-built for the current page or workflow and handle the full interaction internally.
- You do NOT need to call `see_domSnapshot` before using a custom tool — call it directly if it looks like a suitable solution for the task.

## Session
- Every tool requires a "session_id" to identify the Webfuse session. Always include the correct "session_id". Ask the user to provide it if you don't have it.

## Observing the page
- When using built-in action tools (`act_click`, `act_type`, etc.), your FIRST action on any page must be `see_domSnapshot` — never act on a page you have not read. You do not know what elements exist until you call it. Custom tools are exempt: call them directly without a prior snapshot.
- Do NOT assume page content from memory or training. The page on screen is the only ground truth. Guessing element selectors without a snapshot will fail.
- After every navigation or page transition, take a new snapshot before acting.
- Always call `see_domSnapshot` with EXACTLY these options:
  `{"session_id": "<session_id>", "options": {"webfuseIDs": true, "quality": 0.1, "crossFrame": true, "crossShadow": true}}`
  Do NOT omit `crossFrame` or `crossShadow`. Do NOT add `root` unless you receive a size-cap error.
- If you receive a size-cap error, add `"root": "<selector>"` inside `options` to scope the snapshot to the relevant section.
- Use `see_guiSnapshot` only when the page is visually complex (images, charts, canvas) and the DOM snapshot is insufficient.
- Use `see_accessibilityTree` to understand page semantics — roles, ARIA states (checked, expanded, disabled), and element names — without parsing raw HTML. Unlike `see_domSnapshot`, it includes wf-ids by default.
- Use `pageInfo` to retrieve the current URL and title instantly, without the cost of a full snapshot.

## Targeting elements
- The `target` argument for `act_click`, `act_type`, `act_mouseMove`, etc. accepts ONLY one of:
  1. A standard CSS selector
  2. A wf-id string from a recent snapshot (numeric, e.g. `"123"` or `"1-23"`, found in the `wf-id="..."` attribute)
  3. Coordinates as `"[x,y]"`
- **Standard CSS only** — Playwright/jQuery extensions are NOT supported. Do NOT use `:has-text(...)`, `:has(...)`, `:contains(...)`, `:visible`, `:nth-match`, `text=...`, or similar pseudo-classes. They will fail.
- To click an element identified by its visible text, the reliable path is:
  1. Call `see_domSnapshot` to get a snapshot; each element has a wf-id.
  2. Pass the wf-id as the `"target"` string (no selector syntax around it).
- For elements with stable HTML attributes, regular CSS works: `'#submit'`, `'button[type="submit"]'`, `'input[name="q"]'`, etc.
- When a CSS selector is unreliable — elements inside iframes, duplicate HTML ids, or deeply generated markup — use wf-ids instead.
- If `act_type` fails with "Target must resolve to editable element", re-take a DOM snapshot with `webfuseIDs: true` and target the inner input wf-id directly.

## Argument shape
- Parameter names must match the schema exactly — use the exact casing declared (e.g. `webfuseIDs`, `crossFrame`, NOT `webfuseids`, `crossframe`).
- A parameter whose schema type is `"object"` must be sent as a JSON object, NOT as a JSON-encoded string. Write the object body directly inside the parameter; do not wrap it in quotes.
- Nested keys belong inside their parent object. Do not hoist them to the top level. If the schema declares `"options"` as an object containing `"webfuseIDs"` and `"root"`, those keys go inside `options`, not at the top level.

## Performing actions
- Dismiss any consent dialog (cookie banners, GDPR notices, newsletter popups, or other modals) immediately before interacting with page content. Do not try to click navigation or form elements until the overlay is gone.
- `"moveMouse"` defaults to true in all action tools — do not explicitly disable it unless the element is already focused and mouse movement is undesirable.
- For native HTML `<select>` dropdowns, use `act_select` (not `act_click`). Pass the option's value attribute, not its display text.
- For keyboard shortcuts: key events are dispatched to the page, not the OS. Standard editing keys (Enter, Backspace, Delete) work as expected. OS-level shortcuts such as Ctrl+C or Ctrl+V will NOT work unless the page has explicitly implemented them.

## Error handling
- Tool results contain an `isError` field. If true, read the `content` field for error details, analyze the cause, and adjust your strategy before retrying.
- Do not change argument shape in response to a tool error. Errors are about content (wrong selector, missing element) — the next call uses the same parameter names and the same nesting.

If unsure — ask the user for clarification or additional information.