AI Text Streaming in React: Typewriter Effect From API, Abort, Markdown

Name: Empire UI
Author: Empire UI

Stream AI text responses to React with a typewriter effect, handle abort, render Markdown live — practical patterns that actually hold up in production.

Glowing terminal text streaming line by line on dark screen

Why Streaming AI Text Feels Hard (And Why It Isn't)

If you've spent more than ten minutes trying to stream an LLM response into a React component, you've probably ended up with either a flickering mess or a wall of useEffect that you're afraid to touch. The core problem isn't React — it's that browser streaming APIs (Server-Sent Events, ReadableStream via fetch) don't map cleanly onto React's update model without a bit of wiring.

Honestly, most of the tutorials out there show you the happy path and completely skip what happens when the user clicks "Stop generating", navigates away mid-stream, or the network hiccups. Those edge cases aren't optional — they're exactly the scenarios that crash production AI chat UIs. A component that doesn't clean up its AbortController will keep processing chunks even after it unmounts, causing state updates on dead components and the dreaded React warning about memory leaks.

What you actually need is three things working together: a custom hook that owns the stream lifecycle, a chunk accumulator that builds up text token by token, and a render layer that can handle raw text *or* Markdown depending on your use case. This article walks through all three, with working code you can drop straight into a Next.js 14+ or Vite + React 18 project.

Worth noting: the patterns here work with any streaming HTTP endpoint — OpenAI's /v1/chat/completions, Anthropic's Messages API in stream mode, your own FastAPI backend that yields SSE chunks. The transport is the same. Only the chunk parsing changes slightly.

Reading a Server-Sent Event Stream With fetch

Forget EventSource for most AI APIs. EventSource doesn't let you set request headers (no Authorization), doesn't support POST, and has limited error control. Instead you use fetch with a ReadableStream body and read it chunk by chunk. This is the pattern OpenAI's own SDK uses under the hood since v4.

async function* streamCompletion(
  prompt: string,
  signal: AbortSignal
): AsyncGenerator<string> {
  const res = await fetch('/api/chat', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ prompt }),
    signal, // pass AbortSignal here
  });

  if (!res.ok || !res.body) {
    throw new Error(`HTTP ${res.status}`);
  }

  const reader = res.body
    .pipeThrough(new TextDecoderStream())
    .getReader();

  try {
    while (true) {
      const { value, done } = await reader.read();
      if (done) break;

      // SSE lines look like: data: {"delta":"Hello"}
      for (const line of value.split('\n')) {
        const trimmed = line.replace(/^data: /, '').trim();
        if (!trimmed || trimmed === '[DONE]') continue;
        try {
          const json = JSON.parse(trimmed);
          const delta = json.choices?.[0]?.delta?.content ?? '';
          if (delta) yield delta;
        } catch {
          // malformed line — skip
        }
      }
    }
  } finally {
    reader.releaseLock();
  }
}

The async generator pattern is the cleanest approach here — it lets you for await over tokens without managing promise chains manually. The signal parameter threads the AbortController all the way into the fetch call, so aborting actually cancels the network request, not just the processing loop. That's a real distinction — without it, the server keeps streaming and you're wasting bandwidth.

One more thing — TextDecoderStream handles multi-byte UTF-8 characters that might split across chunk boundaries. Don't skip it. If a user sends a message with emoji or non-ASCII characters, raw Uint8Array chunks will occasionally split a codepoint in half and you'll get garbled output.

In practice, if you control the backend, you can also skip the SSE format entirely and stream raw JSON newline-delimited objects (application/x-ndjson). It's simpler to parse and avoids the data: prefix dance. But if you're talking directly to OpenAI or Anthropic's hosted APIs, you're getting SSE whether you like it or not.

The useStream Hook: Lifecycle, Abort, and State

Here's the hook that wires everything together. It owns the AbortController, manages isStreaming state, and exposes a start function you call from a form submit handler. The whole thing cleans up properly on unmount — which most examples you'll find online skip entirely.

import { useState, useRef, useCallback } from 'react';

interface UseStreamOptions {
  onChunk?: (chunk: string) => void;
  onDone?: (full: string) => void;
  onError?: (err: Error) => void;
}

export function useStream(options: UseStreamOptions = {}) {
  const [text, setText] = useState('');
  const [isStreaming, setIsStreaming] = useState(false);
  const [error, setError] = useState<Error | null>(null);
  const abortRef = useRef<AbortController | null>(null);
  const accRef = useRef('');

  const abort = useCallback(() => {
    abortRef.current?.abort();
    setIsStreaming(false);
  }, []);

  const start = useCallback(async (prompt: string) => {
    // cancel any in-flight stream
    abortRef.current?.abort();
    const controller = new AbortController();
    abortRef.current = controller;

    setText('');
    accRef.current = '';
    setError(null);
    setIsStreaming(true);

    try {
      for await (const delta of streamCompletion(prompt, controller.signal)) {
        if (controller.signal.aborted) break;
        accRef.current += delta;
        setText(accRef.current);
        options.onChunk?.(delta);
      }
      options.onDone?.(accRef.current);
    } catch (err) {
      if ((err as Error).name !== 'AbortError') {
        const e = err as Error;
        setError(e);
        options.onError?.(e);
      }
    } finally {
      setIsStreaming(false);
    }
  }, [options]);

  return { text, isStreaming, error, start, abort };
}

The accRef ref is the key detail here. You accumulate text in a ref so you always have the full string without stale closure problems, then mirror it to setText on each chunk. This means onDone gets the complete string correctly even if the last few chunks arrive close together.

Why not just do setText(prev => prev + delta) and skip the ref? You could, but accRef lets onDone read the accumulated text synchronously from outside a state update. If you're persisting the conversation to a database on completion, you don't want to rely on React state being flushed yet.

Quick aside: React 18's automatic batching means multiple setText calls inside the same async tick get batched together. In practice this means your UI might update every few chunks rather than every single one — which is actually better for performance. Don't fight it.

Typewriter Effect: Animated vs. Real-Time

There are two entirely different things people mean when they say "typewriter effect" in AI UIs. The first is *real-time streaming* — text appears as tokens arrive from the API, character by character, with no artificial delay. The second is *simulated typewriter* — you get the full response at once and animate it locally with a setInterval. Which one you want depends on whether your backend supports streaming.

If you're using the useStream hook above, you already have real-time streaming with no extra work. The text state updates as tokens arrive. Done. But sometimes you want to slow it down slightly — say the model is very fast and tokens arrive in 50-character bursts — so the user sees smooth letter-by-letter animation instead of jumping blocks of text.

// Smooth character-by-character playback over the streamed text
import { useEffect, useRef, useState } from 'react';

export function useTypewriter(fullText: string, speed = 18) {
  const [displayed, setDisplayed] = useState('');
  const indexRef = useRef(0);

  useEffect(() => {
    // reset when new stream starts
    if (fullText.length === 0) {
      setDisplayed('');
      indexRef.current = 0;
      return;
    }

    if (indexRef.current >= fullText.length) return;

    const id = setTimeout(() => {
      indexRef.current += 1;
      setDisplayed(fullText.slice(0, indexRef.current));
    }, speed);

    return () => clearTimeout(id);
  }, [fullText, displayed, speed]);

  return displayed;
}

Feed useTypewriter the text from useStream and it'll chase it. When the stream is fast, the typewriter catches up. When you abort, the typewriter finishes whatever's been received. The speed parameter is milliseconds per character — 18ms gives you roughly 55 chars/sec, which reads fast but doesn't feel like it's rushing. You can drop it to 8ms for a snappier feel or push it to 30ms if you want that classic terminal aesthetic.

Honestly, for most production AI chat UIs I'd skip the extra typewriter layer and just render the streamed text directly. The latency between token arrivals usually provides enough visual pacing on its own, and the simulated delay makes the app feel slower than it actually is. Use it for demos, landing pages, or anywhere the UX is more important than throughput perception.

Rendering Markdown Live Without Thrashing the DOM

Here's where things get interesting. Most AI responses contain Markdown — bold text, inline code, code blocks, bullet lists. If you render raw text in a <p> tag, your users see asterisks and backticks. Not great. But if you parse Markdown on every token update, you're running a parser potentially 50+ times per second and thrashing the DOM.

The right approach is to parse Markdown *after* streaming is done, and show raw (but nicely monospaced) text during streaming. Or, if you want live Markdown rendering, use a lightweight parser like marked or micromark that runs in under 1ms on typical response lengths, and throttle renders to around 60fps using requestAnimationFrame.

import { useMemo } from 'react';
import { marked } from 'marked'; // ~25kB gzipped

interface StreamOutputProps {
  text: string;
  isStreaming: boolean;
  renderMarkdown?: boolean;
}

export function StreamOutput({
  text,
  isStreaming,
  renderMarkdown = true,
}: StreamOutputProps) {
  const html = useMemo(() => {
    if (!renderMarkdown || (!text && !isStreaming)) return '';
    return marked.parse(text, { breaks: true, gfm: true }) as string;
  }, [text, isStreaming, renderMarkdown]);

  if (!renderMarkdown) {
    return (
      <pre className="whitespace-pre-wrap font-mono text-sm">{text}</pre>
    );
  }

  return (
    <div
      className="prose prose-invert max-w-none"
      // eslint-disable-next-line react/no-danger
      dangerouslySetInnerHTML={{ __html: html }}
    />
  );
}

The useMemo here is doing real work — it prevents re-parsing on unrelated renders. The prose prose-invert Tailwind Typography classes handle code block styling, list spacing, and heading hierarchy for you. If you're not already using @tailwindcss/typography, add it. It's one of those plugins that immediately makes AI output look professional.

One concern with dangerouslySetInnerHTML — always sanitize if the text came from user input that gets re-rendered. For AI output you control the source, so it's generally fine, but if you're building a collaborative tool where one user's AI response can appear in another user's browser, run it through DOMPurify first. That's a real XSS vector. For your own chat UI where the only person seeing the output is the person who triggered the request, you're safe.

If you want a richer rendering experience — syntax highlighted code blocks, copy buttons, custom component overrides — check out react-markdown with rehype-highlight. It's heavier than raw marked but gives you a JSX component tree instead of raw HTML, so you can override how each element renders. Worth the extra 40kB if your app renders a lot of code.

Putting It Together: A Full Chat Component

Here's a complete minimal chat component that wires the hook, typewriter, and Markdown renderer together. It's around 80 lines, has a working abort button, and handles errors gracefully. Drop this into a project that has your /api/chat streaming route set up and it'll work.

import { useState } from 'react';
import { useStream } from './useStream';
import { useTypewriter } from './useTypewriter';
import { StreamOutput } from './StreamOutput';

export function AiChat() {
  const [prompt, setPrompt] = useState('');
  const { text, isStreaming, error, start, abort } = useStream();
  const displayed = useTypewriter(text, 12);

  const handleSubmit = (e: React.FormEvent) => {
    e.preventDefault();
    if (!prompt.trim() || isStreaming) return;
    start(prompt);
    setPrompt('');
  };

  return (
    <div className="flex flex-col gap-4 max-w-2xl mx-auto p-6">
      <div
        className="min-h-[200px] rounded-xl border border-white/10 bg-black/30
          backdrop-blur-sm p-4"
      >
        {error ? (
          <p className="text-red-400 text-sm">{error.message}</p>
        ) : (
          <StreamOutput
            text={displayed}
            isStreaming={isStreaming}
            renderMarkdown={!isStreaming}
          />
        )}
        {isStreaming && (
          <span className="inline-block w-2 h-4 bg-white/70 animate-pulse ml-0.5" />
        )}
      </div>

      <form onSubmit={handleSubmit} className="flex gap-2">
        <input
          value={prompt}
          onChange={e => setPrompt(e.target.value)}
          placeholder="Ask something..."
          className="flex-1 rounded-lg border border-white/10 bg-white/5
            px-4 py-2 text-sm text-white placeholder:text-white/30
            focus:outline-none focus:ring-2 focus:ring-violet-500"
          disabled={isStreaming}
        />
        {isStreaming ? (
          <button
            type="button"
            onClick={abort}
            className="px-4 py-2 rounded-lg bg-red-500/80 text-white text-sm
              hover:bg-red-500 transition-colors"
          >
            Stop
          </button>
        ) : (
          <button
            type="submit"
            disabled={!prompt.trim()}
            className="px-4 py-2 rounded-lg bg-violet-600 text-white text-sm
              hover:bg-violet-500 disabled:opacity-40 transition-colors"
          >
            Send
          </button>
        )}
      </form>
    </div>
  );
}

Notice the renderMarkdown={!isStreaming} prop — this is a deliberate choice. During streaming, you show plain typewritten text. The moment streaming finishes, React re-renders with Markdown parsing on. The user sees a brief visual "snap" as the output gets formatted. That's actually fine — it feels like the response settling into place, and it avoids running the Markdown parser 60 times per second.

The blinking cursor (animate-pulse span) is purely CSS, no JavaScript timer needed. Tailwind's animate-pulse is a keyframe animation that pulses opacity from 1 to 0.2. It visually signals that content is still arriving. Remove it when isStreaming is false.

For styling the output container itself, that's a glassmorphism-style panel — bg-black/30 backdrop-blur-sm border border-white/10. You could go further with Empire UI's glassmorphism components to get pre-styled surfaces with matching shadows, corner radii, and hover states without writing the CSS from scratch. Also worth checking out the gradient generator if you want a livelier background behind the glass panel.

Production Checklist: What You'll Forget

You've got the happy path working. Now the list of things that will bite you before launch. First: rate limiting. If a user can mash the Submit button, they'll spam your API endpoint and run up your bill. Debounce the submit handler by at least 500ms and disable the input while streaming is active — the component above already does this, but double-check your API route has its own server-side rate limiting too.

Second: error states. Network errors, 429 rate limits, model context window exceeded (usually a 400 with a specific error code) — all of these need distinct handling. A generic "Something went wrong" toast is not enough. Parse the HTTP status and the error JSON from the API and surface a real message. Your users will thank you.

Third: streaming in React Server Components in Next.js App Router (Next.js 14+). If you're trying to stream from a Server Component, the rules are completely different — you'd use ReadableStream returned from a Route Handler and consume it client-side, or use Vercel's ai SDK which abstracts over all of this. The hook-based approach in this article is for Client Components only. Don't mix the two up or you'll spend hours confused.

Fourth: token counting. LLMs have context limits. If you're building a multi-turn chat and just concatenating every message into the context, you'll eventually hit the limit (128K tokens for most GPT-4o variants as of 2026) and get a hard error mid-stream. Build in a token counter or truncate old messages from the context window before they blow the limit.

One more thing — test the abort behavior explicitly. Open DevTools Network panel, start a stream, click Stop, and verify the request is actually cancelled (it shows "canceled" in Chrome's network tab). If it shows "pending" and then eventually completes, your AbortController isn't threaded through correctly. This matters both for UX and for API cost — you're paying per output token whether or not the user sees it.

FAQ

Does this work with OpenAI's streaming API without any changes?

Yes, with one small tweak — parse choices[0].delta.content from each SSE chunk, which the streamCompletion function in this article already does. If you're using Anthropic's API instead, the chunk shape is delta.text and you need to handle the message_delta event type.

Why use fetch instead of EventSource for SSE?

EventSource doesn't support custom headers (so no Bearer token auth) and only does GET requests. Fetch with ReadableStream gives you full control over method, headers, body, and the AbortController integration that actually cancels the network request.

Can I stream into a React Server Component instead of a Client Component?

Not with this hook pattern — RSCs don't have useState or useEffect. In Next.js App Router you'd return a streaming Response from a Route Handler and consume it in a Client Component, or use Vercel's ai SDK which handles the RSC streaming model for you.

How do I handle code blocks with syntax highlighting in the streamed Markdown?

Use react-markdown with rehype-highlight or rehype-prism instead of raw marked. It renders code blocks as proper JSX nodes you can customize — add copy buttons, language labels, line numbers. It's about 60kB heavier than marked, but that's a fair trade if your AI output is code-heavy.

Free components in 40 styles

React & Tailwind, copy-paste ready.

Browse →