Vercel AI SDK useChat in Production: Lessons From 30 Days of Real Traffic

DEV Community

Atlas Whoff

Apr 16, 2026, 05:20 PM

The Vercel AI SDK useChat hook looks simple in demos. In production, it's a different story. After running it under real traffic for 30 days — streaming Claude responses, handling errors, managing session state — here's what I learned. useChat holds messages in local state. On every re-render, new message objects are created. If you're passing messages to child components without memoization, you'll trigger expensive re-renders on every token. Fix: const { messages } = useChat({ api: '/api/chat' }); const stableMessages = useMemo(() => messages, [messages.length]); This alone cut our rendering overhead by 60%. Mobile networks drop connections. useChat doesn't retry by default. You need: const { messages, reload, isLoading, error } = useChat({ api: '/api/chat' }); useEffect(() => { if (error) { const timer = setTimeout(() => reload(), 2000); return () => clearTimeout(timer); } }, [error]); Streaming costs money. Without token limits, a single misbehaving user can run up your Anthropic bill. // app/api/chat/route.ts import { anthropic } from '@ai-sdk/anthropic'; import { streamText } from 'ai'; export async function POST(req: Request) { const { messages } = await req.json(); const result = streamText({ model: anthropic('claude-sonnet-4-6'), messages, maxTokens: 1024, temperature: 0.7, }); return result.toDataStreamResponse(); } useChat is stateless by default. For multi-turn sessions that survive page refresh: const { messages, setMessages } = useChat({ api: '/api/chat' }); useEffect(() => { const saved = localStorage.getItem('chat-session'); if (saved) setMessages(JSON.parse(saved)); }, []); useEffect(() => { localStorage.setItem('chat-session', JSON.stringify(messages)); }, [messages]); [ ] Memoize message arrays passed to children [ ] Add retry logic for network errors [ ] Set maxTokens on every route [ ] Implement session persistence [ ] Add rate limiting at the API route level [ ] Monitor streaming latency (p99 matters) useChat is production-ready if you add the guard rails it doesn't ship with. The defaults work for demos. Production needs explicit token limits, retry logic, and state management. GitHub: https://github.com/willweigeshoff/whoff-automation