Why we built our AI agents on WebSockets instead of HTTP

Building our AI Copilots product taught us where HTTP breaks down, and how WebSockets can keep agents in sync without extra layers of infrastructure.

, on
Why we built our AI agents on WebSockets instead of HTTP

When we started building AI Copilots at Liveblocks, we weren’t trying to invent a new protocol. We just needed a way to keep an AI agent in sync with the UI, across tabs, devices, and even after a refresh in the middle of a task.

Most AI agents default to HTTP streaming. And that made sense for ChatGPT and first-generation chat UIs. But the more we leaned into UI-first copilots with front-end tool calls, confirmation flows, realtime feedback, and resumable streams, the more things started to break.

So we turned to a solution we had years of experience scaling: a stable and persistent WebSocket stack with authentication, automatic reconnection, and reliable message delivery.

This post isn’t prescriptive, and WebSockets aren’t the right answer in every case. But they made our lives much easier in ways we didn’t anticipate. Here’s what we learned.

WebSockets made multi-tab persistence and real-time updates simple

HTTP’s request–response model works for basic interactions, but it struggles with long-running processes, page refreshes, or multiple tabs. Once a request ends, the connection is gone. To bridge that gap, teams usually add infrastructure like polling, pub-sub servers, or custom session logic. This becomes especially painful with LLMs, where responses can stream for extended periods and users may join or rejoin mid-process.

WebSockets provide a persistent, bidirectional connection instead. The server can push updates at any time, and clients can subscribe to an in-progress task and immediately receive the latest state. Additionally, updates broadcast to all connected tabs, browsers, and devices, which keeps the state consistent without extra coordination logic.

Multiple streaming chats & live updates in our AI Chats example.

At Liveblocks, our sync layer was already built on WebSockets for multiplayer editing, so Copilots inherited persistence and multi-tab support without us adding new queues or background processes.

For teams starting fresh, the tradeoff is clear. You can patch around HTTP’s limitations with additional infrastructure, but WebSockets remove that class of problems entirely.

Tool calls with human confirmation work better on WebSockets

Copilots need to do more than return text. They should call tools, render UI, and give users control when manual confirmation is required. WebSockets are especially valuable here because every client stays in sync whenever a user acts.

With HTTP, a confirmation is scoped to a single tab, meaning that if a user has multiple tabs open or collaborators are working together, others will not see that the action was already confirmed or denied. This can cause duplicate or conflicting actions.

With WebSockets, the confirmation event is broadcast to all connected clients. As soon as someone clicks “Confirm”, every session updates in real time and the state stays consistent. An example of this is in our AI Dashboard demo, where the copilot can suggest inviting a new member, but the action only runs after the human approves it.

<AiTool name="invite-member">  <AiTool.Confirmation    confirm={async () => {      await sendInvite(args.email);      return { data: { invitedEmail: args.email } };    }}    cancel={() => {      respond({ cancel: true });    }}  >    Invite {args.email} to the team?  </AiTool.Confirmation></AiTool>

The AI proposes the action, the human confirms, and the decision is streamed via WebSockets so all tabs and collaborators immediately see the same outcome.

Why teams choose HTTP, WebSockets, or both

There isn’t a single "right" way to connect AI to the client. Teams pick different approaches based on tradeoffs. HTTP streaming is straightforward and stateless, which makes it a natural fit for simple request–response interactions like text completions or image generation. WebSockets introduce persistent, bidirectional channels that are better for real-time feedback, multi-user sync, and long-running tasks. Many modern products blend the two.

Based on our research, here’s how some well-known products have approached the problem:

WebSocket-first

HTTP-first

Hybrid

Figma AI: Multiplayer editing with real-time AI suggestions

ChatGPT API: Request/response completions streamed over HTTPS/SSE

Vercel v0: HTTP streaming pipeline + WebSockets for updates

Notion AI: Shared context across editors and copilots

Midjourney / DALL·E: Image generation as one-off jobs

GitHub Copilot: HTTP for completions, sockets inside the IDE for streaming

Runway: AI video editing synced across users

Anthropic Claude API: Stateless text interactions

Replit Ghostwriter: HTTP for background analysis, WebSockets for in-editor suggestions

Devin AI: AI software engineer with continuous tasks, realtime collaboration, and persistent context

Zapier AI Actions: Workflow triggers as HTTP calls

Some ChatGPT multiplayer wrappers: Add sockets for shared sessions

You can also see where HTTP-only setups start to break down: long-running agent tasks fail on refresh, multi-tab sessions collide, and developers bolt on polling or pub-sub to compensate. WebSockets handle these cases natively.

For us, leaning on the WebSocket stack we had for multiplayer editing meant Copilots inherited persistence, real-time delivery, and shared state without extra infrastructure.

Final thoughts

WebSockets are not always the right choice. But for copilots that need persistence, cross-tab consistency, and human-in-the-loop flows, they solved problems that HTTP would have forced us to patch with extra infrastructure.

If you're building similar experiences, a persistent connection may save you more time than you expect.


FAQs

Should I use WebSockets or HTTP for AI applications?

Use WebSockets when you need real-time updates, multi-tab synchronization, or long-running processes. Use HTTP for simple request-response interactions like text completions or one-off image generation.

When is HTTP streaming still a good choice over WebSockets?

HTTP streaming remains ideal for simple, stateless interactions such as text completions or one-off image generations. It is straightforward to implement and fits request–response patterns where persistent, bidirectional connections are unnecessary.

Can I use both HTTP and WebSockets in the same application?

Yes. Many applications use HTTP for simple interactions and WebSockets for real-time features. This hybrid approach gives you the benefits of both protocols.

How do WebSockets improve multi-tab persistence and real-time updates for AI applications?

Unlike HTTP’s request–response model, WebSockets keep a persistent connection so the server can push updates anytime. This allows multiple tabs, browsers, or devices to stay in sync instantly, even if a user joins an in-progress task or refreshes mid-stream, without adding complex coordination logic.

Ready to get started?

Join thousands of companies using Liveblocks ready‑made collaborative features to drive growth in their products.

Book a demo