This directory contains a computer use agent that can operate a browser to complete user tasks. The agent uses Playwright to control a Chromium browser and can interact with web pages by taking screenshots, clicking, typing, and navigating.
This agent is to demo the usage of ComputerUseToolset.
The computer use agent consists of:
agent.py: Main agent configuration using Google's gemini-2.5-computer-use-preview-10-2025 modelplaywright.py: Playwright-based computer implementation for browser automationrequirements.txt: Python dependencies
Install the required Python packages from the requirements file:
uv pip install -r contributing/samples/computer_use/requirements.txtInstall Playwright's system dependencies for Chromium:
playwright install-deps chromiumInstall the Chromium browser for Playwright:
playwright install chromiumTo start the computer use agent, run the following command from the project root:
adk web contributing/samplesThis will start the ADK web interface where you can interact with the computer_use agent.
Once the agent is running, you can send queries like:
find me a flight from SF to Hawaii on next Monday, coming back on next Friday. start by navigating directly to flights.google.com
The agent will:
- Open a browser window
- Navigate to the specified website
- Interact with the page elements to complete your task
- Provide updates on its progress
- Book hotel reservations
- Search for products online
- Fill out forms
- Navigate complex websites
- Research information across multiple pages
- Model: Uses Google's
gemini-2.5-computer-use-preview-10-2025model for computer use capabilities - Browser: Automated Chromium browser via Playwright
- Screen Size: Configured for 600x800 resolution
- Tools: Uses ComputerUseToolset for screen capture, clicking, typing, and scrolling
If you encounter issues:
- Playwright not found: Make sure you've run both
playwright install-deps chromiumandplaywright install chromium - Dependencies missing: Verify all packages from
requirements.txtare installed - Browser crashes: Check that your system supports Chromium and has sufficient resources
- Permission errors: Ensure your user has permission to run browser automation tools
- The agent operates in a controlled browser environment
- Screenshots are taken to help the agent understand the current state
- The agent will provide updates on its actions as it works
- Be patient as complex tasks may take some time to complete