Skip to content

Automation API

The automation object is available in the browser.webfuseSession namespace in every content script of an extension. If you want to call it from a background script or a popup, you need to send a message to the content script and then call the method of the automation API.

By default, automation actions work on the element below the virtual mouse pointer. A target can optionally be specified in different ways: Either via CSS selector (selector: string), or absolute point coordinates ([x: number, y: number]).

type Target = Element | string | [number, number]; // Element reference, CSS selector or point coordinate

Move the virtual mouse pointer.

browser.webfuseSession.automation.mouse_move(
target: Target,
persistent: boolean = false
): Promise<void>

target

  • Mouse pointer target.

persistent

  • Whether to keep the mouse pointer on screen (hides after some time by default).

A promise that resolves once the mouse was moved.

Scrolls the deepest scrollable element under the target by the given amount in the given direction.

browser.webfuseSession.automation.scroll(
target: Target,
direction: 'vertical' | 'horizontal',
amount: number
): Promise<void>

target

  • Scroll(able) target.

direction

  • The direction to scroll.

amount

  • The amount of pixels to scroll.

A promise that resolves once scroll ended.

await browser.webfuseSession.automation.scroll(100, 'down', '#scrollable');

Perform a left (primary) mouse button click.

browser.webfuseSession.automation.left_click(
target: Target,
moveMouse: boolean = false
): Promise<void>

target

  • Click target.

[moveMouse]

  • Whether to optionally move the virtual mouse pointer to the target center before the click.

A promise that resolves once click was performed.

await browser.webfuseSession.automation.left_click([100, 250]);

Perform a middle (wheel) mouse button click.

browser.webfuseSession.automation.middle_click(
target: Target,
moveMouse: boolean = false
): Promise<void>

target

  • Click target.

[moveMouse]

  • Whether to optionally move the virtual mouse pointer to the target center before the click.

A promise that resolves once click was performed.

Perform a right (secondary) mouse button click.

browser.webfuseSession.automation.right_click(
target: Target,
moveMouse: boolean = false
): Promise<void>

target

  • Click target.

[moveMouse]

  • Whether to optionally move the virtual mouse pointer to the target center before the click.

A promise that resolves once click was performed.

browser.webfuseSession.automation.type(
target: Target,
text: string,
moveMouse: boolean = false,
overwrite: boolean = false,
timePerChar: number = 100
): Promise<void>

Type text to an element. Typing is natural, i.e. as if a human presses a sequence of keys.

target

  • Typing target.

text

  • Text to type.

[moveMouse]

  • Whether to optionally imply a virtual mouse pointer movement into the center of the target.

[overwrite]

  • Whether to overwrite the current value of the target element.

[timePerChar]

  • Average time to type a character in milliseconds.

A promise that resolves once text was typed.

browser.webfuseSession.automation.key_press(
target: Target,
key: "a" | "b" | ... | "Y" | "Z",
options?: {
altKey?: boolean;
ctrlKey?: boolean;
metaKey?: boolean;
shiftKey?: boolean;
}
): Promise<void>

Press a key on an element.

target

  • Key press target.

key

  • Key to press.

[options]

  • Booleans to hold down a secondary during the press: alt, ctrl, meta, or shift.

A promise that resolves once key was pressed.

browser.webfuseSession.automation.wait(ms: number):Promise<void>

ms

  • The amount of milliseconds to wait.

A promise that resolves once the given time passed.

browser.webfuseSession.automation.take_dom_snapshot(options?: {
rootSelector?: string;
crossframe?: boolean;
revealMaskedElements?: boolean;
modifier?: "downsample" | {
name: string;
params?: unknown[];
};
}): Promise<void>

Serialize the DOM for various processing purposes, such as for LLM input.

[options]

  • DOM snapshot options:

  • [rootSelector] Selector of the element to designate as the snapshot root (documentElement by default).

  • [crossframe] Whether to take include iframe subtrees (false by default).

  • [revealMaskedElements] Whether to include masked elements (false by default).

  • [modifier] Snpashot modifier (Identity by default).

    • name Modifier name.
    • [params] Modifier parameter record.

A promise that resolves to the snapshot.

Reduce the overall DOM below about 32K (2^15) estimated LLM input tokens. The resulting DOM can be considered a low resolution variant, which retains the majority of inherent UI features.

const domSnapshot = await browser.webfuseSession
.automation
.take_dom_snapshot({
rootSelector: '#app',
modifier: 'downsample',
})

Applies the D2Snap algorithm to the DOM. This will reduce its size, while retaining a majority of UI features. The algorithm was developed in order to mitigate the prevalent DOM token size disadvantage.

const domSnapshot = await browser.webfuseSession
.automation
.take_dom_snapshot({
modifier: {
name: 'D2Snap',
params: {
hierarchyRatio: 0.4, textRatio: 0.6, attributeRatio: 0.8,
// or
// k: 0.4, l: 0.6, m: 0.8
options: {
assignUniqueIDs: false,
keepUnknownElements: false,
skipMarkdownTranslation: false,
}
}
}
})

[modifier.params.options]

  • [assignUniqueIDs] Whether to add a unique data attribute data-uid to every element in the DOM in order to allow identification of equivalent elements across the original and the downsampled DOM. For example, <button class="btn btn-primary" data-uid="27">Click here!</button>.
  • [keepUnknownElements] Whether to keep unknown (custom) elements in the downsampled DOM.
  • [skipMarkdownTranslation] Whether to skip content HTML to Markdown translation. :::

Applies the AdaptiveD2Snap algorithm to the DOM. This is an adaptive version of the D2Snap algorithm that does not require explicit parameters.

const domSnapshot = await browser.webfuseSession
.automation
.take_dom_snapshot({
modifier: {
name: 'AdaptiveD2Snap',
params: {
maxTokens: 32768,
maxIterations: 3,
options: {
assignUniqueIDs: false,
skipMarkdownTranslation: false,
}
}
}
})

Serialize the GUI for various processing purposes, such as for LLM input.

browser.webfuseSession.automation.take_gui_snapshot(): Promise<ImageBitmap>