AI Chat UI Design: Message Bubbles, Streaming Text and Tool Calls
Build production-ready AI chat UIs with React — message bubbles, token streaming, tool call states, and the layout patterns that actually hold up under real usage.
Why AI Chat UI Is Harder Than It Looks
You've built a form before. Maybe a comments section. This feels similar — messages in, messages out, list on screen. Then you actually wire up a streaming LLM response and suddenly your perfectly centered bubble layout is flashing, jumping, and re-measuring itself every 30ms. Welcome to AI chat UI.
The fundamental difference between a normal chat UI (Slack, Discord, iMessage) and an AI chat UI is that one side of the conversation is live. The AI doesn't hand you a complete message — it hands you a stream of tokens arriving anywhere from 30 to 150 tokens per second depending on model and load. Your layout has to absorb that gracefully. Honest opinion: most first attempts don't.
In practice, the three hardest problems are streaming text without layout thrash, representing in-progress tool calls without confusing the user, and keeping the scroll position sensible when content is growing from the bottom. The good news is that all three have solid patterns that the React ecosystem figured out by 2024, and you don't need to invent anything from scratch.
Worth noting: the visual language you pick for the chat container matters a lot. A glassmorphism panel with soft blur behind the bubbles feels distinctly premium compared to a flat white box. The glassmorphism generator is a fast way to dial in the backdrop-filter values before you commit to a token in your design system.
Message Bubble Layout: The Component Structure
Start with a mental model before touching any code. A chat thread is a messages array where each item has at least a role ('user' | 'assistant' | 'system' | 'tool'), a content field, and an optional status ('complete' | 'streaming' | 'error'). That's it. Everything visual is derived from those three fields.
The outermost container should be overflow-y: auto with a fixed height (or flex-grow: 1 in a flex column layout), not overflow-y: scroll. The difference: scroll always shows a scrollbar gutter, which causes a 15–17px layout shift when the keyboard opens on mobile. Small detail, noticeable UX.
// MessageThread.tsx
interface Message {
id: string;
role: 'user' | 'assistant' | 'tool';
content: string;
status: 'complete' | 'streaming' | 'error';
toolCall?: ToolCall;
}
export function MessageThread({ messages }: { messages: Message[] }) {
const bottomRef = useRef<HTMLDivElement>(null);
useEffect(() => {
bottomRef.current?.scrollIntoView({ behavior: 'smooth' });
}, [messages.length]);
return (
<div className="flex flex-col gap-4 overflow-y-auto px-4 py-6">
{messages.map((msg) => (
<MessageBubble key={msg.id} message={msg} />
))}
<div ref={bottomRef} />
</div>
);
}One more thing — don't scroll on every render. Scroll only when messages.length changes (new message added), not on every token arriving into an existing message. Scrolling on every token fires the smooth-scroll animation 30–150 times per second and it looks terrible. Put that useEffect dependency array correctly: [messages.length], not [messages].
Alignment is the other decision. User bubbles right-aligned with a colored fill, assistant bubbles left-aligned with a neutral/glass fill. This isn't just convention — it's a spatial grammar the user learns in seconds and then reads without thinking. Flip it and you'll watch your test users re-read every message to figure out who said what.
Streaming Text Without the Flicker
Streaming from the Vercel AI SDK (v3+ as of 2025) or a raw ReadableStream from the OpenAI or Anthropic SDKs all works the same way at the UI layer: you're appending characters to a string and re-rendering. The problem isn't the re-render — React handles that fine. The problem is that every re-render recalculates the bubble's height, and if that bubble contains a <ReactMarkdown> component doing full AST parsing on every token, you're burning CPU on every keystroke equivalent.
// StreamingBubble.tsx — defer markdown parsing until streaming stops
function StreamingBubble({ content, status }: { content: string; status: string }) {
if (status === 'streaming') {
// Plain text during stream — fast, no markdown overhead
return (
<div className="whitespace-pre-wrap font-mono text-sm">
{content}
<span className="inline-block w-2 h-4 ml-0.5 bg-current animate-pulse" />
</div>
);
}
// Full markdown render only when complete
return <MarkdownRenderer content={content} />;
}That blinking cursor span is doing real work — it signals to the user that the response isn't done, which prevents the "is it broken?" anxiety that kills trust in AI products. Use animate-pulse or a custom @keyframes blink at 1s intervals. Don't use a | character — it bleeds into the text at small sizes. A 2×16px block element reads instantly as a text cursor.
Honestly, the biggest win is switching from character-by-character state updates to batched updates. If you're consuming a stream directly, batch incoming chunks into a 50ms buffer before calling setState. Most users can't distinguish token-by-token from 50ms-batched updates, and the reduction in render frequency is dramatic on lower-end devices.
Quick aside: if you want to add a visual style to the streaming container itself, neumorphism works surprisingly well for AI chat. The inset soft shadow on the assistant bubble creates a physical feeling — like the text is being pressed into the surface as it arrives. It's a subtle distinction from the typical flat or outlined approach most chat UIs default to.
Tool Calls: Showing What the AI Is Doing
Tool calls are the weirdest part of AI chat UI and the part most tutorials skip. When an LLM decides to call a tool — search the web, run code, query a database — you have a gap between the last user message and the eventual assistant response where something needs to be on screen. That something is the tool call state.
There are three phases: invoking (the model has decided to call the tool, parameters are streaming or complete), running (your backend is executing the tool), complete (result is back, now feeding into the next assistant turn). Each needs a distinct visual representation.
// ToolCallBubble.tsx
type ToolCallStatus = 'invoking' | 'running' | 'complete' | 'error';
interface ToolCall {
name: string;
args: Record<string, unknown>;
result?: unknown;
status: ToolCallStatus;
}
const statusConfig = {
invoking: { icon: '⚙️', label: 'Preparing', bg: 'bg-yellow-500/10 border-yellow-500/20' },
running: { icon: '⏳', label: 'Running', bg: 'bg-blue-500/10 border-blue-500/20' },
complete: { icon: '✓', label: 'Done', bg: 'bg-green-500/10 border-green-500/20' },
error: { icon: '✗', label: 'Failed', bg: 'bg-red-500/10 border-red-500/20' },
};
export function ToolCallBubble({ toolCall }: { toolCall: ToolCall }) {
const cfg = statusConfig[toolCall.status];
return (
<div className={`rounded-xl border px-4 py-3 text-sm font-mono ${cfg.bg}`}>
<div className="flex items-center gap-2 font-semibold">
<span>{cfg.icon}</span>
<span>{toolCall.name}</span>
<span className="ml-auto text-xs opacity-50">{cfg.label}</span>
</div>
{toolCall.status === 'complete' && toolCall.result && (
<pre className="mt-2 text-xs opacity-70 overflow-x-auto">
{JSON.stringify(toolCall.result, null, 2)}
</pre>
)}
</div>
);
}Look, showing the raw JSON result to users is rarely a good idea unless you're building a developer tool. For consumer-facing products, collapse the result by default and let users expand it. The tool call bubble tells them *something happened* without forcing them to read a 400-line API response. The assistant's next message will synthesize the result into natural language anyway.
One more thing — animate the running state. A spinning border, a pulsing opacity, a shifting gradient. Static "Running..." text makes users think the UI froze. Something as cheap as animate-spin on an icon, or a background-size: 200% gradient animation cycling at 2s, is enough to communicate liveness.
Scroll Behavior and the Anchor Problem
Here's the problem that bites you in production: the user scrolls up to re-read something while the assistant is still streaming. You're now in a conflict — the UI wants to auto-scroll to the bottom, the user is deliberately reading something above. If you win that fight, you've just wrecked the experience.
The standard pattern is to track whether the user is "pinned" to the bottom. On scroll events, calculate if scrollTop + clientHeight >= scrollHeight - threshold (a 100–150px threshold handles the case where the bottom content is still partially visible). If they're pinned, auto-scroll on new tokens. If they scrolled up, stop. Resume pinning once they manually scroll back down.
// useScrollPin.ts
export function useScrollPin(containerRef: RefObject<HTMLElement>) {
const [pinned, setPinned] = useState(true);
const handleScroll = useCallback(() => {
const el = containerRef.current;
if (!el) return;
const threshold = 120; // px
const atBottom = el.scrollTop + el.clientHeight >= el.scrollHeight - threshold;
setPinned(atBottom);
}, [containerRef]);
const scrollToBottom = useCallback(() => {
containerRef.current?.scrollTo({
top: containerRef.current.scrollHeight,
behavior: 'smooth',
});
}, [containerRef]);
return { pinned, handleScroll, scrollToBottom };
}In practice, behavior: 'smooth' causes the same "scroll animation spams" problem if called too frequently. Only call scrollToBottom when a new message is added (role changes from assistant streaming → complete, or a new user message appears), not on every token. The anchor <div ref={bottomRef} /> pattern from earlier works fine for the non-smooth version — scrollIntoView without behavior: 'smooth' is instant and doesn't queue animations.
That said, one UX detail worth adding: a "scroll to bottom" floating button that appears when the user is unpinned. Make it a small pill or circle in the bottom-right corner of the thread container. Dismiss it automatically when they scroll back down. That single element removes the confusion of "why isn't this updating?" for users who don't realize the stream is still going.
Input Area, Multiline and the Send Pattern
The input area seems trivial. It's not. The interaction surface where users compose and submit messages handles more edge cases than the entire thread display. Enter to submit or Shift+Enter for newline? Auto-grow textarea or fixed height with scroll? Disabled state during streaming? These matter.
Use a <textarea> not an <input>. Auto-grow it with a CSS trick rather than JavaScript measurement — set rows={1} and min-height: 44px, then override height dynamically in a useLayoutEffect. The target is max 200px before it scrolls internally. Beyond that, the input takes up too much screen real estate on mobile.
// ChatInput.tsx
export function ChatInput({ onSubmit, disabled }: ChatInputProps) {
const [value, setValue] = useState('');
const ref = useRef<HTMLTextAreaElement>(null);
useLayoutEffect(() => {
const el = ref.current;
if (!el) return;
el.style.height = 'auto';
el.style.height = Math.min(el.scrollHeight, 200) + 'px';
}, [value]);
const handleKeyDown = (e: React.KeyboardEvent) => {
if (e.key === 'Enter' && !e.shiftKey) {
e.preventDefault();
if (value.trim() && !disabled) {
onSubmit(value.trim());
setValue('');
}
}
};
return (
<div className="flex items-end gap-2 rounded-2xl border border-white/10 bg-white/5 backdrop-blur-sm px-4 py-3">
<textarea
ref={ref}
rows={1}
value={value}
onChange={(e) => setValue(e.target.value)}
onKeyDown={handleKeyDown}
disabled={disabled}
placeholder="Message..."
className="flex-1 resize-none bg-transparent text-sm outline-none placeholder:opacity-40"
/>
<button
onClick={() => { if (value.trim() && !disabled) { onSubmit(value.trim()); setValue(''); } }}
disabled={disabled || !value.trim()}
className="rounded-lg bg-white/90 px-3 py-1.5 text-xs font-semibold text-black disabled:opacity-30"
>
Send
</button>
</div>
);
}Disable the input and send button during streaming. Don't just ignore submissions — visually disable. Users will retry if the button looks active. A subtle opacity-50 cursor-not-allowed on the container during status === 'streaming' is enough. Some chat UIs also show a "Stop generating" button in place of Send — that's worth adding if your backend supports cancellation via AbortController.
The glassmorphism backdrop-blur-sm on the input container (bg-white/5 backdrop-blur-sm) isn't purely decorative. It visually separates the input from the message thread below it without a hard border or shadow, which keeps the interface feeling light. You can prototype the exact values in the glassmorphism generator before committing them to your component.
Visual Styling: Making the AI Chat Feel Premium
Most AI chat UIs look like GPT clones. White background, gray bubbles, blue send button. That's fine if you're building an internal tool nobody sees. If you're shipping a product, the visual layer is part of the value proposition.
The most effective differentiation move is theming the assistant bubble differently from the user bubble — not just alignment and color, but surface treatment. A glassmorphism assistant bubble (translucent, blurred, slightly luminous) against solid user bubbles creates a spatial metaphor: the assistant's words feel like they're coming from a different layer of reality. That's a feature. Check out Empire UI's glassmorphism components for pre-built surface treatments that drop straight into a chat layout.
For dark-mode AI products (most are dark), the cyberpunk or aurora styles from Empire UI work extremely well. Cyberpunk's neon accent borders on bubbles read as high-tech without being cartoonish. Aurora's gradient background adds movement to the container without any JavaScript — pure CSS animation using @keyframes and background-position shifts over a 10s loop.
Typography is underrated in chat. AI responses are often long. Set line-height: 1.65 and max-width: 680px on the assistant bubble content — not on the whole app, just the message text. Prose-width columns read faster. And use a slightly smaller font size for code blocks inside the response: font-size: 13px vs the body 15px creates visual hierarchy that makes code feel like a distinct artifact, not just a differently-colored paragraph.
One last thing you should actually do: add custom cursors to the chat interface. It sounds like a minor detail, but a custom cursor that changes shape over interactive elements (the input, buttons, expandable tool calls) adds a layer of craft that users notice without being able to name. Empire UI ships 10 cursor variants including a precision crosshair style that pairs well with technical AI tools.
FAQ
Avoid rendering markdown during streaming — use plain whitespace-pre-wrap text while tokens arrive, then swap to a full markdown renderer when status === 'complete'. Also batch your state updates to a 50ms interval rather than updating on every incoming token. Both changes together eliminate the vast majority of jank.
Yes — visually disable both the textarea and send button with disabled attributes and opacity-50 cursor-not-allowed styles. If your backend supports AbortController-based cancellation, replace the send button with a stop button during streaming so users can bail early on long responses.
Show a compact collapsed pill with the tool name and a status indicator (invoking / running / done). Hide the raw JSON result behind an expandable toggle. The assistant's next message will summarize what the tool returned in natural language anyway, so users only need to know something happened, not exactly what.
Track a 'pinned' boolean: if scrollTop + clientHeight >= scrollHeight - 120px, the user is at the bottom and you auto-scroll on new content. If they scroll up, stop auto-scrolling and show a 'scroll to bottom' floating button. Resume pinning automatically once they scroll back down.