tutorial

Speeding up agentic workflows with WebSockets in the Responses API

Builders of AI agents can achieve faster, more responsive interactions by adopting WebSocket-based communication and connection caching, enabling more fluid user experiences and complex multi-step tasks.

OpenAI Blog·April 22, 2026·1 min readtutorial

tutorialSpeeding up agentic workflows with WebSockets in the Responses API

openai.com

What happened

OpenAI published a technical deep dive on optimizing agentic workflows using WebSockets in the Responses API. The post focuses on the Codex agent loop, illustrating how switching from traditional request-response patterns to persistent WebSocket connections significantly reduces API overhead and latency. Additionally, connection-scoped caching allows the model to reuse context across multiple turns without re-sending history, speeding up iterative reasoning. For developers building autonomous agents or multi-step AI workflows, this approach cuts down round-trip time and makes real-time interactions more feasible. The blog provides concrete implementation patterns, making it a practical resource for anyone designing agent loops that require low-latency, stateful communication with OpenAI's models.

Key takeaways

OpenAI explains how WebSockets lower overhead compared to polling in agentic workflows.
Connection-scoped caching reduces model latency by avoiding redundant context transmission.
The Codex agent loop is used as a case study to demonstrate these optimizations.
The approach is directly applicable to building real-time, stateful AI agents.