Quick Start
Get a feel for Midscene in a few minutes. There are two ways to start, and you can do either one first:
- No code — install the Chrome extension and try Midscene on any web page, without setting up a project.
- Write a script — use the JavaScript SDK to build a repeatable automation.
Both need a multimodal model configured, so let's set that up first.
Configure a model
Midscene drives the UI with a multimodal model. Set these environment variables to get started. The example below uses Qwen3.x via OpenRouter — it is easy to obtain and a solid default:
Using a different model (Doubao, GLM, Gemini, GPT-5…)? See Configure your model for every supported provider.
You'll reuse these values in whichever path you pick below — paste them into the extension's settings, or set them as environment variables for the SDK.
Try it with no code (Chrome extension)
Think of the extension as a playground for Midscene — much like an API Playground, it's an interactive sandbox where you can try natural-language prompts, preview the results immediately, and debug, all without writing or running any code. Because it shares the same core as the @midscene/web SDK, anything you validate here behaves the same once you script it.
-
Install Midscene from the Chrome Web Store:
-
Open the Midscene panel (it may be folded under the Chrome extensions icon) — a sidebar appears on the right side of the browser.
-
Click the settings (gear) icon and paste your model configuration. The extension accepts the same
export KEY="value"format shown in Configure a model above. -
Open any web page, type an instruction — an action, a data query, or an assertion — and watch Midscene operate the page for you.
For the full walkthrough and troubleshooting, see Quick experience by Chrome extension.
Write your first script (SDK)
Prefer code? Build a repeatable automation with the JavaScript SDK. This example uses the browser (Playwright).
Step 1. Install dependencies
Step 2. Set the model environment variables
Set the values from Configure a model as environment variables, or put them in a .env file and load it with dotenv.
Step 3. Write the script
Save the following as ./demo.ts. It opens eBay, searches for headphones, reads the result list, and asserts the page state — all described in natural language:
Step 4. Run it
Step 5. View the report
After a successful run, Midscene prints something like:
Open that HTML file in your browser to replay every action, query, and assertion step by step. The report is the tool most developers rely on to understand and debug what the AI did.
Next steps
- Use it in your tests: Integrate with Playwright to add Midscene to your existing suite.
- Other platforms: get started on Android, iOS, HarmonyOS, or desktop.
- Go further: improve results with Model strategy, or look up every method in the API reference.

