When we started building AI Copilots at Liveblocks, we weren’t
trying to invent a new protocol. We just needed a way to keep an AI agent in
sync with the UI, across tabs, devices, and even after a refresh in the middle
of a task.
Most AI agents default to HTTP streaming. And that made sense for ChatGPT and
first-generation chat UIs. But the more we leaned into UI-first copilots with
front-end tool calls, confirmation flows, realtime feedback, and resumable
streams, the more things started to break.
So we turned to a solution we had years of experience scaling: a stable and
persistent WebSocket stack with authentication, automatic
reconnection, and reliable message delivery.
This post isn’t prescriptive, and WebSockets aren’t the right answer in every
case. But they made our lives much easier in ways we didn’t anticipate. Here’s
what we learned.
HTTP’s request–response model works for basic interactions, but it struggles
with long-running processes, page refreshes, or multiple tabs. Once a request
ends, the connection is gone. To bridge that gap, teams usually add
infrastructure like polling, pub-sub servers, or custom session logic. This
becomes especially painful with LLMs, where responses can stream for extended
periods and users may join or rejoin mid-process.
WebSockets provide a persistent, bidirectional connection instead. The server
can push updates at any time, and clients can subscribe to an in-progress task
and immediately receive the latest state. Additionally, updates broadcast to all
connected tabs, browsers, and devices, which keeps the state consistent without
extra coordination logic.
Multiple streaming chats & live updates in our AI Chats example.
At Liveblocks, our sync layer was already built on WebSockets for
multiplayer editing, so Copilots inherited persistence
and multi-tab support without us adding new queues or background processes.
For teams starting fresh, the tradeoff is clear. You can patch around HTTP’s
limitations with additional infrastructure, but WebSockets remove that class of
problems entirely.
Copilots need to do more than return text. They should call
tools, render UI, and give users control when manual confirmation is required.
WebSockets are especially valuable here because every client stays in sync
whenever a user acts.
With HTTP, a confirmation is scoped to a single tab, meaning that if a user has
multiple tabs open or collaborators are working together, others will not see
that the action was already confirmed or denied. This can cause duplicate or
conflicting actions.
With WebSockets, the confirmation event is broadcast to all connected clients.
As soon as someone clicks “Confirm”, every session updates in real time and the
state stays consistent. An example of this is in our
AI Dashboard demo, where the copilot can
suggest inviting a new member, but the action only runs after the human approves
it.
<AiToolname="invite-member"> <AiTool.Confirmation confirm={async()=>{awaitsendInvite(args.email);return{data:{invitedEmail: args.email}};}} cancel={()=>{respond({cancel:true});}} > Invite {args.email} to the team?</AiTool.Confirmation></AiTool>
The AI proposes the action, the human confirms, and the decision is streamed via
WebSockets so all tabs and collaborators immediately see the same outcome.
There isn’t a single "right" way to connect AI to the client. Teams pick
different approaches based on tradeoffs. HTTP streaming is straightforward and
stateless, which makes it a natural fit for simple request–response interactions
like text completions or image generation. WebSockets introduce persistent,
bidirectional channels that are better for real-time feedback, multi-user sync,
and long-running tasks. Many modern products blend the two.
Based on our research, here’s how some well-known products have approached the
problem:
WebSocket-first
HTTP-first
Hybrid
Figma AI: Multiplayer editing with
real-time AI suggestions
ChatGPT API:
Request/response completions streamed over HTTPS/SSE
Vercel v0: HTTP streaming pipeline + WebSockets
for updates
Notion AI: Shared context across
editors and copilots
Some ChatGPT multiplayer wrappers: Add sockets for shared sessions
You can also see where HTTP-only setups start to break down: long-running agent
tasks fail on refresh, multi-tab sessions collide, and developers bolt on
polling or pub-sub to compensate. WebSockets handle these cases natively.
For us, leaning on the WebSocket stack we had for
multiplayer editing meant Copilots inherited
persistence, real-time delivery, and shared state without extra infrastructure.
WebSockets are not always the right choice. But for copilots that need
persistence, cross-tab consistency, and human-in-the-loop flows, they solved
problems that HTTP would have forced us to patch with extra infrastructure.
If you're building similar experiences, a persistent connection may save you
more time than you expect.
Should I use WebSockets or HTTP for AI applications?
Use WebSockets when you need real-time updates, multi-tab synchronization, or long-running processes. Use HTTP for simple request-response interactions like text completions or one-off image generation.
When is HTTP streaming still a good choice over WebSockets?
HTTP streaming remains ideal for simple, stateless interactions such as text completions or one-off image generations. It is straightforward to implement and fits request–response patterns where persistent, bidirectional connections are unnecessary.
Can I use both HTTP and WebSockets in the same application?
Yes. Many applications use HTTP for simple interactions and WebSockets for real-time features. This hybrid approach gives you the benefits of both protocols.
How do WebSockets improve multi-tab persistence and real-time updates for AI applications?
Unlike HTTP’s request–response model, WebSockets keep a persistent connection so the server can push updates anytime. This allows multiple tabs, browsers, or devices to stay in sync instantly, even if a user joins an in-progress task or refreshes mid-stream, without adding complex coordination logic.