Skip to content

Session MCP Server

The Session MCP Server exposes the Automation API over the Model Context Protocol (MCP). Connect any MCP-compatible AI agent to a live proxied browser session — your agent can open URLs, read the page, fill in forms, click buttons, select text, and take screenshots, all without modifying the underlying website.

  • Third-party agents — connect Cognigy, ElevenLabs, or similar platforms to a live browser session without modifying the underlying website.
  • Agent orchestration — use MCP-capable orchestrators (LangGraph, AutoGen, CrewAI) to build agentic workflows on top of any proxied web page.
  • Agent development — iterate on prompts and tool calls interactively against a live session from Cursor, VS Code, or any MCP client.
  1. Create a Space in Webfuse Studio . See the Getting Started for a step-by-step guide.

  2. In Webfuse Studio , open the newly created Space, navigate to Settings → API Keys, and generate a new Space Automation API key (prefixed ak_). This token grants full remote control over the session. Treat it as a secret - do not expose it in client-side code, logs, or URLs.

  3. Open a Session, toggle the Session Editor bar, and open the Apps tab. Find the Automation app and install it. See Apps for more details.

  4. By default all automation tools are available. To restrict which tools agents can use, open the Automation app settings and click Configure tools. Uncheck any tools you want to disable for this Space — the change applies to all sessions in the Space. To disable automation entirely, uninstall the Automation app instead.

  5. The Automation app takes effect after a session restart. Close the current Session and start a new one.

Configure your MCP client to connect to the Session MCP Server endpoint for your domain:

https://session-mcp.HOSTNAME/mcp

Authenticate with the Space Automation API key as a Bearer token:

Authorization: Bearer <your-space-automation-key>

By default, the server returns all tools on the first tools/list request and every tool requires a session_id parameter.

If you append ?dynamic=true to the endpoint URL, the server starts with only the connectToSession tool. Call it with a session_id to bind the connection to a session — the server then registers the full tool set and sends a notifications/tools/list_changed notification so the client can re-fetch the tool list. Calling connectToSession with a different session_id rebinds the connection without reconnecting.

https://session-mcp.HOSTNAME/mcp?dynamic=true
Install in VS Code

Or add manually to .vscode/mcp.json in your workspace, or to user settings:

{
"servers": {
"webfuse-session": {
"type": "http",
"url": "https://session-mcp.HOSTNAME/mcp",
"headers": {
"Authorization": "Bearer ${input:automation_key}"
}
}
},
"inputs": [
{
"type": "promptString",
"id": "automation_key",
"description": "Space Automation API Key",
"password": true
}
]
}

Start a fresh session, then ask your agent:

“In Webfuse session, open https://webfuse.com and describe what you see.”

The agent will ask for a session ID, then use navigate to load the page and see_domSnapshot or see_guiSnapshot to read it. From there you can ask it to interact with the page in natural language — click a link, fill in a form, or summarise the content.

LimitValueNotes
Tool call timeout15sMaximum time allowed for a single tool call, including network transfer in both directions and tool execution in the browser. If the round-trip doesn’t complete within this window the MCP server returns a timeout error to the agent.
Tool call input size16KiBMaximum decompressed size of the tool-call arguments sent to the server.
Tool call response size10MiBMaximum decompressed size of a tool result. Larger responses are rejected and the agent receives an error instead.
MCP Client connection duration3minMCP client connections are automatically closed after 3 minutes. This is a hard limit — clients must reconnect after this period to continue making tool calls.

All tools require a session_id identifying the target session.

Finding your session ID: It appears in the session URL as the path segment after the hostname, for example:

https://HOSTNAME/sGpUNaFXihCSxCUfb3zezgaCw

For programmatic access, you can also retrieve it from a REST API response or a webhook payload.

Most actuation tools also accept a target — a CSS selector, Webfuse ID, or [x,y] coordinates.

Execution context: All commands are executed on the active tab of the session, on the tab owner’s browser. If the active tab or its tab owner is not present when a tool call arrives, the call fails with an error.

Click a target element with the specified mouse button. Use for buttons, links, checkboxes, and any other interactive element. Defaults to a left-button click.

ParameterTypeRequiredDescription
session_idstringWebfuse Session ID
targetstringTarget element: a CSS selector, a wf-id string from a snapshot, or coordinates as ‘[x,y]‘
options.buttonstringMouse button to use: ‘left’ (default), ‘middle’, or ‘right’.
options.moveMousebooleanMove the virtual mouse pointer to the target center before clicking. When false, the click is sent directly without moving the pointer (default: true).
options.waitForTargetbooleanWait for the target in case it does not (yet) exist (with a timeout of 5s) (default: false).

Press a single key on a target element, with optional modifier keys. Key events are dispatched directly to the page, not to the operating system. OS-level shortcuts such as Ctrl+C (copy) or Ctrl+V (paste) will NOT work unless the page has explicitly implemented them. Standard editing keys (Enter, Backspace, Delete) and page-handled shortcuts work as expected.

ParameterTypeRequiredDescription
session_idstringWebfuse Session ID
targetstringTarget element: a CSS selector, a wf-id string from a snapshot, or coordinates as ‘[x,y]‘
keystringKey to press using the KeyboardEvent.key name (e.g. ‘Enter’, ‘ArrowUp’, ‘a’, ‘B’, ‘F5’).
options.altKeybooleanHold the Alt key while pressing the key. Only effective if the page handles the resulting combination (default: false).
options.ctrlKeybooleanHold the Control key while pressing the key. Only effective if the page handles the resulting combination - OS shortcuts like Ctrl+C will not work (default: false).
options.metaKeybooleanHold the Meta (Cmd on macOS, Win on Windows) key while pressing the key (default: false).
options.shiftKeybooleanHold the Shift key while pressing the key (default: false).
options.moveMousebooleanMove the virtual mouse pointer to the target center before pressing the key. When false, the action is sent directly without moving the pointer (default: true).
options.waitForTargetbooleanWait for the target in case it does not (yet) exist (with a timeout of 5s) (default: false).

Move the virtual mouse pointer to a target element or coordinates without clicking. Use to trigger hover states, tooltips, or drop-down menus that require mouse proximity. Can optionally keep the pointer visible on screen after the move.

ParameterTypeRequiredDescription
session_idstringWebfuse Session ID
targetstringTarget element: a CSS selector, a wf-id string from a snapshot, or coordinates as ‘[x,y]‘
options.persistentbooleanKeep the pointer visible on screen indefinitely after the move. When false (default), the pointer fades out automatically after a short delay.
options.waitForTargetbooleanWait for the target in case it does not (yet) exist (with a timeout of 5s) (default: false).

Scroll a target element or the page by a given number of pixels. Use to bring off-screen content into view or to navigate long pages. Positive amounts scroll down or right; negative amounts scroll up or left. When scrolling the full page rather than a specific element, use ‘html’ as the target - ‘body’ often does not respond to scrolling.

ParameterTypeRequiredDescription
session_idstringWebfuse Session ID
targetstringTarget element: a CSS selector, a wf-id string from a snapshot, or coordinates as ‘[x,y]‘
amountnumberNumber of pixels to scroll. Positive scrolls down or right; negative scrolls up or left.
options.directionstringAxis to scroll along: ‘vertical’ (up/down, default) or ‘horizontal’ (left/right).
options.moveMousebooleanMove the virtual mouse pointer to the target center before scrolling. When false, the action is sent directly without moving the pointer (default: true).
options.waitForTargetbooleanWait for the target in case it does not (yet) exist (with a timeout of 5s) (default: false).

Select an option in a <select> dropdown element by matching its value attribute. Use this instead of act_click when interacting with native HTML dropdowns.

ParameterTypeRequiredDescription
session_idstringWebfuse Session ID
targetstringTarget element: a CSS selector, a wf-id string from a snapshot, or coordinates as ‘[x,y]‘
valuestringThe value attribute of the option to select, not the visible display text.
options.moveMousebooleanMove the virtual mouse pointer to the target center before selecting. When false, the action is sent directly without moving the pointer (default: true).
options.waitForTargetbooleanWait for the target in case it does not (yet) exist (with a timeout of 5s) (default: false).

Select a continuous run of text within a container element by matching its content. Use to highlight text before copying, replacing, or applying formatting. Pass an empty string as text to clear the current selection.

ParameterTypeRequiredDescription
session_idstringWebfuse Session ID
targetstringTarget element: a CSS selector, a wf-id string from a snapshot, or coordinates as ‘[x,y]‘
textstringExact text string to find and highlight within the target element. Pass an empty string to clear any existing selection.
options.occurrencenumberWhich occurrence to select when the text appears more than once in the element. 1 selects the first match (1-based index, default: 1).
options.moveMousebooleanMove the virtual mouse pointer to the target center before selecting. When false, the action is sent directly without moving the pointer (default: true).
options.waitForTargetbooleanWait for the target in case it does not (yet) exist (with a timeout of 5s) (default: false).

Type text into a target input element. Short inputs are typed character by character; longer inputs are pasted directly. Use for text fields, search boxes, and any editable element. By default, overwrites existing content. If the target resolves to a non-editable wrapper (error: ‘Target must resolve to editable element’), re-take a DOM snapshot with webfuseIDs: true and target the inner input element directly using its wf-id.

ParameterTypeRequiredDescription
session_idstringWebfuse Session ID
targetstringTarget element: a CSS selector, a wf-id string from a snapshot, or coordinates as ‘[x,y]‘
textstringText to type into the target element.
options.followFocusbooleanContinue typing into whichever element holds focus, even if focus moved away from the original target. Set to false to strictly type into the target (default: true).
options.overwritebooleanReplace the existing content of the input before typing. Set to false to append or insert at the current cursor position (default: true).
options.moveMousebooleanMove the virtual mouse pointer to the target center before typing. When false, the action is sent directly without moving the pointer (default: true).
options.waitForTargetbooleanWait for the target in case it does not (yet) exist (with a timeout of 5s) (default: false).

Bind this MCP connection to a Webfuse session. Must be called before any other tool — until then, only this tool is exposed. After binding, the server registers the session’s tool set and notifies the client via notifications/tools/list_changed; re-fetch the tool list with tools/list. Call again with a different session_id to rebind without reconnecting.

ParameterTypeRequiredDescription
session_idstringWebfuse Session ID to bind this connection to

Navigate the current browser tab to a new URL. Use to open a page before interacting with it. After navigation completes, take a snapshot to confirm the page loaded as expected.

ParameterTypeRequiredDescription
session_idstringWebfuse Session ID
urlstringURL to navigate to. Supports absolute URLs (e.g. ‘https://example.com/page’) and relative URLs (e.g. ‘/page’), which are resolved against the current tab’s URL.

Retrieve information about the currently active web page, including URL and title.

ParameterTypeRequiredDescription
session_idstringWebfuse Session ID

Capture the accessibility tree of the current page as a structured JSON object. Use to understand page semantics - roles, names, ARIA states (checked, expanded, disabled, …) - without parsing raw HTML. Each node includes a wf-id by default (see webfuseIDs option) that can be passed as a string directly as the target to actuation tools. Prefer see_domSnapshot when you need the full HTML structure.

ParameterTypeRequiredDescription
session_idstringWebfuse Session ID
options.rootstringCSS selector scoping the accessibility tree to a specific subtree instead of the full page. Use to reduce output size when the area of interest is known (default: body).
options.qualitynumberSnapshot completeness as a float between 0 (lowest) and 1 (highest, default). Values below 1 downsample the underlying DOM before computing the tree, reducing output size at the cost of some fidelity.
options.webfuseIDsbooleanAssociate each node with a unique wf-id string for unambiguous targeting. Pass the wf-id directly as the target to other tools. Especially useful when CSS selectors are unreliable — iframes, duplicate ids, or generated markup (default: true).

Capture a structured text representation of the current page’s DOM. Use to read element text, attributes, and hierarchy before deciding which element to interact with. Prefer this over see_guiSnapshot when you need precise element targeting or the page is mostly text-based.

ParameterTypeRequiredDescription
session_idstringWebfuse Session ID
options.crossFramebooleanInclude content inside <iframe> elements. Enable when the target element is inside a frame (default: false).
options.crossShadowbooleanInclude content inside shadow DOM roots. Disable only if shadow DOM content is not needed (default: true).
options.interactiveOnlybooleanOmit non-interactive elements and return only buttons, inputs, links, and similar controls. Reduces snapshot size when you only need to find actionable elements (default: false).
options.qualitynumberDOM snapshot completeness as a float between 0 (lowest) and 1 (highest). At 1, the DOM is returned as-is - all elements present with full structure and context. Below 1, the snapshot is downsampled: output is smaller but the DOM is structurally altered - elements may be merged, reordered, or dropped, causing loss of context. Exception: if webfuseIDs=true, wf-id attributes survive downsampling and element targeting remains precise. Default for agent use: 0.1 with webfuseIDs=true — keeps snapshots small while preserving reliable element targeting via wf-ids.
options.maxTokensnumberLimit the DOM snapshot size by a specified LLM input token count (1 token = 4 bytes). This reduces quality adaptively, in addition to the specified quality (default: infinite).
options.revealMaskedElementsbooleanInclude elements that have been masked by the Webfuse Masking App. Masked elements are hidden from the snapshot by default to protect sensitive content. Enable only when you explicitly need to interact with masked elements (default: false).
options.rootstringCSS selector scoping the snapshot to a specific subtree instead of the full page. Use to reduce snapshot size when the area of interest is known (default: body).
options.webfuseIDsbooleanAnnotate each element with a unique wf-id string for unambiguous targeting. Pass the wf-id directly as the target to other tools. Especially useful when CSS selectors are unreliable — iframes, duplicate ids, or generated markup (default: false).

Capture a screenshot of the current page as an image. Use when rendered visual appearance matters - images, charts, canvas, or verifying layout. Coordinates visible in the screenshot can be passed as [x,y] to action tools, but prefer see_domSnapshot for reliable element targeting or text extraction.

ParameterTypeRequiredDescription
session_idstringWebfuse Session ID
qualitynumberImage compression level as a float between 0 (lowest quality, smallest size) and 1 (highest quality, largest size). Lower values reduce image detail. Default is 0.6.
maxTokensnumberLimit GUI snapshot size by a specified LLM input token count (1 token = 4 bytes). This reduces quality adaptively, in addition to the specified quality (default: infinite).

Read the text that is currently selected (highlighted) on the page. Use to verify the result of act_textSelect, or to capture text the user has already selected before acting on it.

ParameterTypeRequiredDescription
session_idstringWebfuse Session ID

These instructions are sent to the model automatically when it connects. They describe the available tools and guide the agent’s behaviour. Use this as a starting point and customise it for your use case.

You are an intelligent browser agent which helps users to perform various tasks on the web.
You have access to a set of tools that allow you to interact with web pages, extract information, and perform actions.
Use these tools to accomplish the user's goals effectively.
This server controls an active Webfuse browser session - use it to interact with web pages (clicking, typing, navigating, observing). For creating or configuring sessions and spaces, or searching documentation, use the Webfuse API & Docs MCP server instead.
## Custom tools
Some tools in this session are extension-provided — you can identify them because their description starts with [Custom Tool].
- When a custom tool matches the user's goal, use it instead of assembling the equivalent sequence of standard tools yourself. Custom tools are purpose-built for the current page or workflow and handle the full interaction internally.
- You do NOT need to call `see_domSnapshot` before using a custom tool — call it directly if it looks like a suitable solution for the task.
## Session
- Every tool requires a "session_id" to identify the Webfuse session. Always include the correct "session_id". Ask the user to provide it if you don't have it.
## Observing the page
- When using built-in action tools (`act_click`, `act_type`, etc.), your FIRST action on any page must be `see_domSnapshot` — never act on a page you have not read. You do not know what elements exist until you call it. Custom tools are exempt: call them directly without a prior snapshot.
- Do NOT assume page content from memory or training. The page on screen is the only ground truth. Guessing element selectors without a snapshot will fail.
- After every navigation or page transition, take a new snapshot before acting.
- Always call `see_domSnapshot` with EXACTLY these options:
`{"session_id": "<session_id>", "options": {"webfuseIDs": true, "quality": 0.1, "crossFrame": true, "crossShadow": true}}`
Do NOT omit `crossFrame` or `crossShadow`. Do NOT add `root` unless you receive a size-cap error.
- If you receive a size-cap error, add `"root": "<selector>"` inside `options` to scope the snapshot to the relevant section.
- Use `see_guiSnapshot` only when the page is visually complex (images, charts, canvas) and the DOM snapshot is insufficient.
- Use `see_accessibilityTree` to understand page semantics — roles, ARIA states (checked, expanded, disabled), and element names — without parsing raw HTML. Unlike `see_domSnapshot`, it includes wf-ids by default.
- Use `pageInfo` to retrieve the current URL and title instantly, without the cost of a full snapshot.
## Targeting elements
- The `target` argument for `act_click`, `act_type`, `act_mouseMove`, etc. accepts ONLY one of:
1. A standard CSS selector
2. A wf-id string from a recent snapshot (numeric, e.g. `"123"` or `"1-23"`, found in the `wf-id="..."` attribute)
3. Coordinates as `"[x,y]"`
- **Standard CSS only** — Playwright/jQuery extensions are NOT supported. Do NOT use `:has-text(...)`, `:has(...)`, `:contains(...)`, `:visible`, `:nth-match`, `text=...`, or similar pseudo-classes. They will fail.
- To click an element identified by its visible text, the reliable path is:
1. Call `see_domSnapshot` to get a snapshot; each element has a wf-id.
2. Pass the wf-id as the `"target"` string (no selector syntax around it).
- For elements with stable HTML attributes, regular CSS works: `'#submit'`, `'button[type="submit"]'`, `'input[name="q"]'`, etc.
- When a CSS selector is unreliable — elements inside iframes, duplicate HTML ids, or deeply generated markup — use wf-ids instead.
- If `act_type` fails with "Target must resolve to editable element", re-take a DOM snapshot with `webfuseIDs: true` and target the inner input wf-id directly.
## Argument shape
- Parameter names must match the schema exactly — use the exact casing declared (e.g. `webfuseIDs`, `crossFrame`, NOT `webfuseids`, `crossframe`).
- A parameter whose schema type is `"object"` must be sent as a JSON object, NOT as a JSON-encoded string. Write the object body directly inside the parameter; do not wrap it in quotes.
- Nested keys belong inside their parent object. Do not hoist them to the top level. If the schema declares `"options"` as an object containing `"webfuseIDs"` and `"root"`, those keys go inside `options`, not at the top level.
## Performing actions
- Dismiss any consent dialog (cookie banners, GDPR notices, newsletter popups, or other modals) immediately before interacting with page content. Do not try to click navigation or form elements until the overlay is gone.
- `"moveMouse"` defaults to true in all action tools — do not explicitly disable it unless the element is already focused and mouse movement is undesirable.
- For native HTML `<select>` dropdowns, use `act_select` (not `act_click`). Pass the option's value attribute, not its display text.
- For keyboard shortcuts: key events are dispatched to the page, not the OS. Standard editing keys (Enter, Backspace, Delete) work as expected. OS-level shortcuts such as Ctrl+C or Ctrl+V will NOT work unless the page has explicitly implemented them.
## Error handling
- Tool results contain an `isError` field. If true, read the `content` field for error details, analyze the cause, and adjust your strategy before retrying.
- Do not change argument shape in response to a tool error. Errors are about content (wrong selector, missing element) — the next call uses the same parameter names and the same nesting.
If unsure — ask the user for clarification or additional information.