Skip to content

Automation API

The automation object is available in the browser.webfuseSession namespace in every content script of an extension. If you want to call it from a background script or a popup, you need to send a message to the content script and then call the method of the automation API.

By default, automation actions work on the element below the virtual mouse pointer. A target can optionally be specified in different ways: Either via CSS selector (selector: string), or absolute point coordinates ([x: number, y: number]).

type Target = string | [number, number]; // CSS selector or point coordinate
type Target = string | [number, number];

Move the virtual mouse pointer. The pointer can be used as an implicit target specifier for subsequent actions.

browser.webfuseSession.automation.mouse_move(target: Target): Promise<void>

target

  • Mouse pointer target.

A promise that resolves once the mouse was moved.

Scrolls the deepest scrollable element under the current position of the virtual mouse by the given amount in the given direction. If a selector is provided, it will scroll the element that matches the selector.

browser.webfuseSession.automation.scroll(
direction: 'vertical' | 'horizontal',
amount: number,
target?: Target
): Promise<void>

direction

  • The direction to scroll.

amount

  • The amount of pixels to scroll.

[target]

  • Scroll(able) target.

A promise that resolves once scroll ended.

await browser.webfuseSession.automation.scroll(100, 'down', '#scrollable');

Perform a left (primary) mouse button click.

browser.webfuseSession.automation.left_click(target?: Target): Promise<void>

[target]

  • Click target.

A promise that resolves once click was performed.

await browser.webfuseSession.automation.left_click([100, 250]);

Perform a middle (wheel) mouse button click.

browser.webfuseSession.automation.middle_click(target?: Target): Promise<void>

[target]

  • Click target.

A promise that resolves once click was performed.

Perform a right (secondary) mouse button click.

browser.webfuseSession.automation.right_click(target?: Target): Promise<void>

[target]

  • Click target.

A promise that resolves once click was performed.

browser.webfuseSession.automation.type(text: string, target>: Target): Promise<void>

Type text to an element. Typing is natural, i.e. as if a human presses a sequence of keys.

text

  • Text to type.

[target]

  • Typing target.

A promise that resolves once text was typed.

browser.webfuseSession.automation.key_press(
key: "a" | "b" | ... | "Y" | "Z",
options?: {
altKey?: boolean;
ctrlKey?: boolean;
metaKey?: boolean;
shiftKey?: boolean;
}
target?: Target
): Promise<void>

Press a key on an element.

key

  • Key to press.

[options]

  • Booleans to hold down a secondary during the press: alt, ctrl, meta, or shift.

[target]

  • Key press target.

A promise that resolves once key was pressed.

browser.webfuseSession.automation.wait(ms: number):Promise<void>

ms

  • The amount of milliseconds to wait.

A promise that resolves once the given time passed.

browser.webfuseSession.automation.take_dom_snapshot(options?: {
rootSelector?: string;
crossframe?: boolean;
revealMaskedElements?: boolean;
modifier?: "downsample" | {
name: string;
params?: unknown[];
};
}): Promise<void>

Serialize the DOM for various processing purposes, such as for LLM input.

[options]

  • DOM snapshot options:

  • [rootSelector] Selector of the element to designate as the snapshot root (documentElement by default).

  • [crossframe] Whether to take include iframe subtrees (false by default).

  • [revealMaskedElements] Whether to include masked elements (false by default).

  • [modifier] Snpashot modifier (Identity by default).

    • name Modifier name.
    • [params] Modifier parameter record.

A promise that resolves to the snapshot.

Reduce the overall DOM below 2^13 estimated LLM input tokens. The resulting DOM can be considered a low resolution variant, which retains the majority of inherent UI features.

const domSnapshot = await browser.webfuseSession
.automation
.take_dom_snapshot({
rootSelector: '#app',
modifier: 'downsample',
})

Applies the D2Snap algorithm to the DOM. This will reduce its size, while retaining a majority of UI features. The algorithm was developed in order to mitigate the prevalent DOM token size disadvantage.

const domSnapshot = await browser.webfuseSession
.automation
.take_dom_snapshot({
modifier: {
name: 'D2Snap',
params: {
hierarchyRatio: 0.4, textRatio: 0.6, attributeRatio: 0.8
// or
// k: 0.4, l: 0.6, m: 0.8
}
}
})

Applies the AdaptiveD2Snap algorithm to the DOM. This is an adaptive version of the D2Snap algorithm that does not require explicit parameters.

const domSnapshot = await browser.webfuseSession
.automation
.take_dom_snapshot({
modifier: {
name: 'AdaptiveD2Snap',
params: {
maxTokens: 32768,
maxIterations: 3
}
}
})

Serialize the GUI for various processing purposes, such as for LLM input.

browser.webfuseSession.automation.take_gui_snapshot(): Promise<ImageBitmap>