Automation API

The automation object is available in the browser.webfuseSession namespace in every content script of an extension. If you want to call it from a background script or a popup, you need to send a message to the content script and then call the method of the automation API.

Targeting

By default, automation actions work on the element below the virtual mouse pointer. A target can optionally be specified in different ways: Either via CSS selector (selector: string), or absolute point coordinates ([x: number, y: number]).

type Target = Element | string | [number, number];  // Element reference, CSS selector or point coordinate

mouse_move()

Move the virtual mouse pointer.

browser.webfuseSession.automation.mouse_move(
    target: Target,
    persistent: boolean = false
): Promise<void>

Parameters

target

Mouse pointer target.

persistent

Whether to keep the mouse pointer on screen (hides after some time by default).

Returns

A promise that resolves once the mouse was moved.

scroll()

Scrolls the deepest scrollable element under the target by the given amount in the given direction.

browser.webfuseSession.automation.scroll(
    target: Target,
    direction: 'vertical' | 'horizontal',
    amount: number
): Promise<void>

Parameters

target

Scroll(able) target.

direction

The direction to scroll.

amount

The amount of pixels to scroll.

Returns

A promise that resolves once scroll ended.

Example

await browser.webfuseSession.automation.scroll(100, 'down', '#scrollable');

left_click()

Perform a left (primary) mouse button click.

browser.webfuseSession.automation.left_click(
    target: Target,
    moveMouse: boolean = false
): Promise<void>

Parameters

target

Click target.

[moveMouse]

Whether to optionally move the virtual mouse pointer to the target center before the click.

Returns

A promise that resolves once click was performed.

Example

await browser.webfuseSession.automation.left_click([100, 250]);

middle_click()

Perform a middle (wheel) mouse button click.

browser.webfuseSession.automation.middle_click(
    target: Target,
    moveMouse: boolean = false
): Promise<void>

Parameters

target

Click target.

[moveMouse]

Whether to optionally move the virtual mouse pointer to the target center before the click.

Returns

A promise that resolves once click was performed.

right_click()

Perform a right (secondary) mouse button click.

browser.webfuseSession.automation.right_click(
    target: Target,
    moveMouse: boolean = false
): Promise<void>

Parameters

target

Click target.

[moveMouse]

Whether to optionally move the virtual mouse pointer to the target center before the click.

Returns

A promise that resolves once click was performed.

type()

browser.webfuseSession.automation.type(
    target: Target,
    text: string,
    moveMouse: boolean = false,
    overwrite: boolean = false,
    timePerChar: number = 100
): Promise<void>

Type text to an element. Typing is natural, i.e. as if a human presses a sequence of keys.

Parameters

target

Typing target.

text

Text to type.

[moveMouse]

Whether to optionally imply a virtual mouse pointer movement into the center of the target.

[overwrite]

Whether to overwrite the current value of the target element.

[timePerChar]

Average time to type a character in milliseconds.

Returns

A promise that resolves once text was typed.

key_press()

browser.webfuseSession.automation.key_press(
    target: Target,
    key: "a" | "b" | ... | "Y" | "Z",
    options?: {
        altKey?: boolean;
        ctrlKey?: boolean;
        metaKey?: boolean;
        shiftKey?: boolean;
    }
): Promise<void>

Press a key on an element.

Parameters

target

Key press target.

key

Key to press.

[options]

Booleans to hold down a secondary during the press: alt, ctrl, meta, or shift.

Returns

A promise that resolves once key was pressed.

wait()

browser.webfuseSession.automation.wait(ms: number):Promise<void>

Parameters

ms

The amount of milliseconds to wait.

Returns

A promise that resolves once the given time passed.

take_dom_snapshot()

browser.webfuseSession.automation.take_dom_snapshot(options?: {
    rootSelector?: string;
    crossframe?: boolean;
    revealMaskedElements?: boolean;
    modifier?: "downsample" | {
        name: string;
        params?: unknown[];
    };
}): Promise<void>

Serialize the DOM for various processing purposes, such as for LLM input.

Parameters

[options]

DOM snapshot options:
[rootSelector] Selector of the element to designate as the snapshot root (documentElement by default).
[crossframe] Whether to take include iframe subtrees (false by default).
[revealMaskedElements] Whether to include masked elements (false by default).
[modifier] Snpashot modifier (Identity by default).
- name Modifier name.
- [params] Modifier parameter record.

Returns

A promise that resolves to the snapshot.

Modifiers

`downsample`

Reduce the overall DOM below about 32K (2^15) estimated LLM input tokens. The resulting DOM can be considered a low resolution variant, which retains the majority of inherent UI features.

const domSnapshot = await browser.webfuseSession
    .automation
    .take_dom_snapshot({
        rootSelector: '#app',
        modifier: 'downsample',
    })

`D2Snap`

Applies the D2Snap algorithm to the DOM. This will reduce its size, while retaining a majority of UI features. The algorithm was developed in order to mitigate the prevalent DOM token size disadvantage.

const domSnapshot = await browser.webfuseSession
    .automation
    .take_dom_snapshot({
        modifier: {
            name: 'D2Snap',
            params: {
                hierarchyRatio: 0.4, textRatio: 0.6, attributeRatio: 0.8,
                // or
                // k: 0.4, l: 0.6, m: 0.8
                options: {
                    assignUniqueIDs: false,
                    keepUnknownElements: false,
                    skipMarkdownTranslation: false,
                }
            }
        }
    })

[modifier.params.options]

[assignUniqueIDs] Whether to add a unique data attribute data-uid to every element in the DOM in order to allow identification of equivalent elements across the original and the downsampled DOM. For example, <button class="btn btn-primary" data-uid="27">Click here!</button>.
[keepUnknownElements] Whether to keep unknown (custom) elements in the downsampled DOM.
[skipMarkdownTranslation] Whether to skip content HTML to Markdown translation. :::

`AdaptiveD2Snap`

Applies the AdaptiveD2Snap algorithm to the DOM. This is an adaptive version of the D2Snap algorithm that does not require explicit parameters.

const domSnapshot = await browser.webfuseSession
    .automation
    .take_dom_snapshot({
        modifier: {
            name: 'AdaptiveD2Snap',
            params: {
                maxTokens: 32768,
                maxIterations: 3,
                options: {
                    assignUniqueIDs: false,
                    skipMarkdownTranslation: false,
                }
            }
        }
    })

take_gui_snapshot()

Serialize the GUI for various processing purposes, such as for LLM input.

browser.webfuseSession.automation.take_gui_snapshot(): Promise<ImageBitmap>