Introducing computer use in Gemini 3.5 Flash

What happened

Google DeepMind announced that Gemini 3.5 Flash now supports computer use, allowing the model to directly interact with graphical user interfaces such as browsers and desktop applications. Instead of relying on APIs, the model can observe screen content via screenshots and perform actions like clicking buttons, typing text, and navigating menus. This capability is available through Google AI Studio and the Gemini API. For developers building AI workflows, this means they can automate tasks that involve legacy or no-API software without writing custom integrations. The feature positions Gemini as a more versatile tool for creating autonomous agents that can handle complex, multi-step processes in real-world applications. According to Google DeepMind, the computer use function is designed for transparency and safety, with users able to see the model’s reasoning at each step. Early adopters have used it for data entry, web research, and software testing. This release lowers the barrier for building practical AI agents, as developers can now teach models to use any software visually, similar to how a human would.

Key takeaways

Gemini 3.5 Flash can now take screenshots and perform GUI actions like clicking and typing.

The feature reduces the need for API integrations to automate software.

Available via Google AI Studio and the Gemini API for developers.

Users can view the model's reasoning steps for transparency and safety.

Practical applications include data entry, web research, and software testing.

Introducing computer use in Gemini 3.5 Flash

What happened

Key takeaways

Why it matters

More AI news

Search AI Workflow Pro

Introducing computer use in Gemini 3.5 Flash

What happened

Key takeaways

Why it matters

Related tools

More AI news