Skip to content

Automation API

The automation object is available in the browser.webfuseSession namespace in every content script of an extension. If you want to call it from a background script or a popup, you need to send a message to the content script and then call the method of the automation API.

By default, automation actions work on the element below the virtual mouse pointer. A target can optionally be specified in different ways: Either via CSS selector (selector: string), or absolute point coordinates ([x: number, y: number]).

type Target = Element | string | [number, number]; // Element reference, CSS selector or point coordinate

Move the virtual mouse pointer.

browser.webfuseSession.automation.mouse_move(
target: Target,
persistent: boolean = false
): Promise<void>

target

  • Mouse pointer target.

persistent

  • Whether to keep the mouse pointer on screen (hides after some time by default).

A promise that resolves once the mouse was moved.

Scrolls the deepest scrollable element under the target by the given amount in the given direction.

browser.webfuseSession.automation.scroll(
target: Target,
direction: 'vertical' | 'horizontal',
amount: number
): Promise<void>

target

  • Scroll(able) target.

direction

  • The direction to scroll.

amount

  • The amount of pixels to scroll.

A promise that resolves once scroll ended.

await browser.webfuseSession.automation.scroll(100, 'down', '#scrollable');

Perform a left (primary) mouse button click.

browser.webfuseSession.automation.left_click(
target: Target,
moveMouse: boolean = false
): Promise<void>

target

  • Click target.

[moveMouse]

  • Whether to optionally move the virtual mouse pointer to the target center before the click.

A promise that resolves once click was performed.

await browser.webfuseSession.automation.left_click([100, 250]);

Perform a middle (wheel) mouse button click.

browser.webfuseSession.automation.middle_click(
target: Target,
moveMouse: boolean = false
): Promise<void>

target

  • Click target.

[moveMouse]

  • Whether to optionally move the virtual mouse pointer to the target center before the click.

A promise that resolves once click was performed.

Perform a right (secondary) mouse button click.

browser.webfuseSession.automation.right_click(
target: Target,
moveMouse: boolean = false
): Promise<void>

target

  • Click target.

[moveMouse]

  • Whether to optionally move the virtual mouse pointer to the target center before the click.

A promise that resolves once click was performed.

browser.webfuseSession.automation.type(
target: Target,
text: string,
moveMouse: boolean = false,
overwrite: boolean = false,
timePerChar: number = 100
): Promise<void>

Type text to an element. Typing is natural, i.e. as if a human presses a sequence of keys.

target

  • Typing target.

text

  • Text to type.

[moveMouse]

  • Whether to optionally imply a virtual mouse pointer movement into the center of the target.

[overwrite]

  • Whether to overwrite the current value of the target element.

[timePerChar]

  • Average time to type a character in milliseconds.

A promise that resolves once text was typed.

browser.webfuseSession.automation.key_press(
target: Target,
key: "a" | "b" | ... | "Y" | "Z",
options?: {
altKey?: boolean;
ctrlKey?: boolean;
metaKey?: boolean;
shiftKey?: boolean;
}
): Promise<void>

Press a key on an element.

target

  • Key press target.

key

  • Key to press.

[options]

  • Booleans to hold down a secondary during the press: alt, ctrl, meta, or shift.

A promise that resolves once key was pressed.

browser.webfuseSession.automation.setSelection(target: Target, text: string, occurrence: number = 0): Promise<void>

target

  • Text content selection target.

text

  • Text to select (empty text also removes any existing selection).

[occurrence]

  • The index of occurrence if text repeats in the target (e.g., 1 for the second occurrence).

A promise that resolves once the selection was applied.

browser.webfuseSession.automation.getSelection(): Promise<string>

A promise that resolves with the currently selected text (or empty string if nothing is selected).

browser.webfuseSession.automation.wait(ms: number): Promise<void>

ms

  • The amount of milliseconds to wait.

A promise that resolves once the given time passed.

browser.webfuseSession.automation.take_dom_snapshot(options?: {
rootSelector?: string;
crossframe?: boolean;
crossshadow?: boolean;
revealMaskedElements?: boolean;
modifier?: "downsample" | {
name: string;
params?: unknown[];
};
}): Promise<void>

Serialize the DOM for various processing purposes, such as for LLM input.

[options]

  • DOM snapshot options:

  • [rootSelector] Selector of the element to designate as the snapshot root (documentElement by default).

  • [crossframe] Webfuse Exclusive Whether to include iframe subtrees (false by default).

  • [crossshadow] Webfuse Exclusive Whether to include shadow DOM subtrees (true by default).

  • [revealMaskedElements] Whether to include masked elements (false by default).

  • [modifier] Snpashot modifier (Identity by default).

    • name Modifier name.
    • [params] Modifier parameter record.

A promise that resolves to the snapshot.

Translate the DOM to an accessibility tree representation.

const domSnapshot = await browser.webfuseSession
.automation
.take_dom_snapshot({
modifier: {
name: 'accessibility-tree'
}
})
<form
role="form"
aria-describedby="recipe-hint"
aria-labelledby="recipe-form-title">
<div
role="group"
aria-labelledby="checkbox-group">
<h3 id="checkbox-group">Recipe Preferences</h3>
<label for="notifications"
aria-describedby="notifications-description">
<input type="checkbox" id="notifications"
name="notifications"
aria-label="Enable recipe update notifications">
Receive recipe updates
</label>
<p id="notifications-description">I would like to receive updates.</p>
</div>
<button type="button" onclick="PASTA.showRecipes()"
aria-controls="recipe-results"
aria-label="Show pasta recipes for selected type" role="button"
aria-live="assertive">
Show Recipes
</button>
</form>
{
"role": "RootWebArea",
"source": "html",
"children": [
{
"name": "Recipe Preferences",
"properties": {
"level": 3
},
"role": "heading",
"source": "#checkbox-group"
},
{
"children": [
{
"name": "Enable recipe update notifications",
"properties": {
"aria-label": "Enable recipe update notifications"
},
"role": "checkbox",
"source": "#notifications",
"states": {
"checked": false
}
}
],
"properties": {
"aria-describedby": "notifications-description"
},
"role": "generic",
"source": "html > body > section > form > div > label",
"description": "I would like to receive updates."
}
]
}
downsampled recommended
Section titled “downsampled ”

Reduce the overall DOM below about 32K (2^15) estimated LLM input tokens. The resulting DOM can be considered a low resolution variant, which retains the majority of inherent UI features.

const domSnapshot = await browser.webfuseSession
.automation
.take_dom_snapshot({
rootSelector: '#app',
modifier: 'downsample',
})

Applies the D2Snap algorithm to the DOM. This will reduce its size, while retaining a majority of UI features. The algorithm was developed in order to mitigate the prevalent DOM token size disadvantage.

const domSnapshot = await browser.webfuseSession
.automation
.take_dom_snapshot({
modifier: {
name: 'D2Snap',
params: {
hierarchyRatio: 0.4, textRatio: 0.6, attributeRatio: 0.8,
// or
// k: 0.4, l: 0.6, m: 0.8
options: {
assignUniqueIDs: false,
keepUnknownElements: false,
skipMarkdownTranslation: false,
}
}
}
})

[modifier.params.options]

  • [assignUniqueIDs] Whether to add a unique data attribute data-uid to every element in the DOM in order to allow identification of equivalent elements across the original and the downsampled DOM. For example, <button class="btn btn-primary" data-uid="27">Click here!</button>.
  • [keepUnknownElements] Whether to keep unknown (custom) elements in the downsampled DOM.
  • [skipMarkdownTranslation] Whether to skip content HTML to Markdown translation.
<section class="container" tabindex="3" required="true" type="example">
<div class="mx-auto" data-topic="products" required="false">
<h1>Our Pizza</h1>
<div>
<div class="shadow-lg">
<h2>Margherita</h2>
<p>
A simple classic: mozzarela, tomatoes and basil.
An everyday choice!
</p>
<button type="button">Add</button>
</div>
<div class="shadow-lg">
<h2>Capricciosa</h2>
<p>
A rich taste: mozzarella, ham, mushrooms, artichokes, and olives.
A true favourite!
</p>
<button type="button">Add</button>
</div>
</div>
</div>
</section>
<!-- k = .4, l = .6, m = .8 -->
<section>
# Our Pizza
<div>
## Margherita
A simple classic:
<button>Add</button>
## Capricciosa
A rich taste:
<button>Add</button>
</div>
</section>

Applies the AdaptiveD2Snap algorithm to the DOM. This is an adaptive version of the D2Snap algorithm that does not require explicit parameters.

const domSnapshot = await browser.webfuseSession
.automation
.take_dom_snapshot({
modifier: {
name: 'AdaptiveD2Snap',
params: {
maxTokens: 32768,
maxIterations: 3,
options: {
assignUniqueIDs: false,
skipMarkdownTranslation: false,
}
}
}
})

Serialize the GUI for various processing purposes, such as for LLM input. Serialized GUI corresponds to a screenshot. Hence, this is an alias of webfuseSession.takeScreenshot().

browser.webfuseSession.automation.take_gui_snapshot(): Promise<ImageBitmap>