Automation Guide

Let’s set up a simple automation of a Wikipedia search flow: Navigate to the English Wikipedia landing page, then search for the term “Amsterdam”.

1. Create Manifest

In the first step, we define a manifest.json, which contains metadata about our automation Extension. We want to automate interaction with two different pages:

Wikipedia’s Landing Page: wikipedia.org, and
Wikipedia’s English Landing Page: en.wikipedia.org/wiki/Main_Page.

For this, we declare two content scripts that are injected only into the respective page:

{
    "name": "automation-example",
    "manifest_version": 3,
    "version": "1.0",
    "content_scripts": [
        {
            "js": [ "landing.content.js" ],
            "matches": [ "*://wikipedia.org/" ]
        },
        {
            "js": [ "search.content.js" ],
            "matches": [ "*://en.wikipedia.org/wiki/Main_Page" ]
        }
    ]
}

2. Implement Automation

Per content script, we set up document listeners that wait for the page contents to have loaded. Within each listener callback function, we define the automation calls. To wait for the human user catching up with the rendered page, we first wait three seconds into the page load.

On Wikipedia’s generic landing page, we dispatch a left mouse button click on the English Wikipedia landing page button. This button – identified via Firefox’s Inspector – has the assigned ID js-link-box-en:

document
    .addEventListener('DOMContentLoaded', async () => {
        await browser.webfuseSession
            .automation.wait(3000);

        // Click on English language wiki
        browser.webfuseSession
            .automation.left_click('#js-link-box-en');
    });

On Wikipedia’s English landing page, we first type to the search field the search term “Amsterdam”. Subsequently – note that we await the asynchronous type() call – we dispatch a left click on the search button:

document
    .addEventListener('DOMContentLoaded', async () => {
        await browser.webfuseSession
            .automation.wait(3000);

        // Type search term 'Amsterdam'
        await browser.webfuseSession
            .automation.type('Amsterdam', '#searchform input[name="search"]');

        // Click search button
        await browser.webfuseSession
            .automation.left_click('#searchform button');
    });

3. See Automation in Action

In the final step, we create a Webfuse Space, and open a Session. In the Session, we open the Session Settings to define the generic Wikipedia landing page (https://wikipedia.org) as the Space’s start page.

Webfuse Session Settings start page configuration view

Open Session Settings -> Start Page -> Open a specific page or set of pages -> Add Start Page, and type the URL of the start page.

Now, we install our just developed automation Extension to the Space through the Extensions tab. Opening a new tab, we can see our automation in action…

4. Utilize AI Capabilities

Simple, hardcoded action chains are just the tip of the Automation API’s power. Utilizing LLMs, web pages can be analyzed in automatic routines to receive automation suggestions and drive them by the Automation API. Let’s breifly look at how this can be achieved in Webfuse with a few lines of code:

/* [...] */

const task = "Search for 'Amsterdam'";

// Take downsampled DOM snapshot of article contents
const domSnapshot = await browser.webfuseSession
    .automation.take_dom_snapshot({
        rootSelector: "#content",
        modifier: "downsample"
    });

// Prompt LLM with page contents state (DOM snapshot) with respect to a task
const llmSuggestions = await promptActionsFromGPT(task, domSnapshot);

// Drive suggested actions in the page
llmSuggestion.forEach(suggestion => {
    await browser.webfuseSession
        .automation[suggestion.action]
        .apply(null, suggestion.args);
});