AI Memory Systems: Everything You Need to Know

DEV Community

Victory Lucky

Apr 19, 2026, 11:14 AM

If you have built anything with ChatGPT, Claude, or any large language model in the past year, you have probably hit this wall: the AI forgets what you told it three messages ago. You explain your preferences, share context about your project, and then have to repeat it all over again in the next conversation. This is not a bug. It is how these systems work by default. But it does not have to stay that way. Memory systems for AI are changing how we build applications. Instead of stateless chatbots that treat every message as if it is the first one, we can now build AI that remembers, learns, and gets better over time. This post will walk you through everything you need to know about building memory systems for AI applications in 2026. Think about how you interact with a human assistant versus a typical AI chatbot. A good human assistant remembers that you prefer morning meetings, that you are allergic to peanuts, and that your current project is focused on healthcare. They do not ask you to repeat this information every single time you talk. Traditional AI systems lack this capability. Every conversation starts from scratch. The model has no memory of what you discussed yesterday, last week, or even five minutes ago. This creates several problems: For users: Constant repetition is frustrating. You waste time re-explaining context that should already be known. For developers: You end up building elaborate workarounds, stuffing massive amounts of context into every prompt, which makes your application slow and expensive. For applications: Without memory, AI cannot personalize, cannot learn from mistakes, and cannot build long-term relationships with users. Memory systems solve these problems by giving AI the ability to store, retrieve, and use information across interactions. AI memory systems are modeled after human memory, which psychologists divide into three main types. Understanding these types is critical because each serves a different purpose in your application. Episodic memory stores specific events and experiences. In human memory, this is like remembering your first day at a new job or what you had for breakfast this morning. For AI, episodic memory tracks individual interactions and events. Here is what an episodic memory entry looks like: { id: "ep_12345", user_id: "user_789", session_id: "sess_abc", content: "User asked about refund policy for product XYZ", timestamp: "2026-02-20T14:30:00Z", context: { conversation_turn: 5, user_sentiment: "frustrated", resolution: "explained_policy" } } Episodic memories are time-bound and specific. They answer questions like "What did we discuss last Tuesday?" or "How did the user react when I suggested that solution?" When to use episodic memory: Customer support systems that need to reference past tickets Personal AI assistants tracking daily interactions Educational AI that remembers what lessons were covered Debugging and audit trails for AI decisions Important characteristics: Decay over time (older memories become less relevant) High volume (you generate many episodic memories) Rich in context (includes metadata about the situation) Semantic memory stores facts and general knowledge. In humans, this is knowing that Paris is the capital of France or that water boils at 100 degrees Celsius. For AI, semantic memory captures learned facts about users, domains, and concepts. Here is a semantic memory entry: { id: "sem_67890", user_id: "user_789", content: "User is a software engineer specializing in backend systems", confidence: 0.95, sources: ["ep_12345", "ep_12347", "ep_12392"], // Episodic memories that support this fact first_learned: "2026-01-15T08:00:00Z", last_reinforced: "2026-02-19T16:45:00Z", reinforcement_count: 8 } Semantic memories are extracted from patterns in episodic memories. If a user mentions they are a software engineer in three different conversations, that pattern gets consolidated into a semantic fact. When to use semantic memory: User profile systems storing preferences and attributes Domain knowledge bases for specialized AI assistants Long-term learned facts that do not change often General rules and patterns discovered from experience Important characteristics: More stable than episodic memory Lower volume (consolidation reduces quantity) Should strengthen with repeated evidence Can have varying confidence levels Working memory is the information currently being used. In humans, this is like holding a phone number in your head while you dial it. For AI, working memory is the active context for the current task or conversation. Here is a working memory entry: { id: "work_11111", user_id: "user_789", session_id: "sess_abc", content: "Currently helping user debug a Python authentication error", active: true, ttl: 3600, // Expires in 1 hour context: { current_task: "debugging", current_file: "auth.py", error_type: "401_unauthorized", steps_completed: ["checked_credentials", "verified_endpoint"], next_step: "test_token_expiration" } } Working memory is short-lived and task-specific. Once the task completes or the session ends, working memory is either discarded or archived. When to use working memory: Multi-step workflows that need to track progress Active troubleshooting sessions Ongoing tasks that span multiple interactions Temporary context that should not persist long-term Important characteristics: Very short lifespan (minutes to hours) High churn rate (constantly created and destroyed) Does not always need semantic search capabilities Should automatically expire or archive Now that we understand the types of memory, let us look at how they are actually stored and retrieved. The foundation of any memory system is the database schema. Here is a production-ready schema using TiDB (but this pattern works with any SQL database that supports vector search): CREATE TABLE ai_memories ( id BIGINT PRIMARY KEY AUTO_INCREMENT, user_id BIGINT NOT NULL, session_id VARCHAR(255), -- Content content TEXT NOT NULL, embedding VECTOR(1536), -- Vector representation for semantic search -- Classification memory_type ENUM('episodic', 'semantic', 'working') NOT NULL, importance FLOAT DEFAULT 0.5, confidence FLOAT DEFAULT 1.0, -- Temporal data created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, last_accessed TIMESTAMP, access_count INT DEFAULT 0, expires_at TIMESTAMP NULL, -- For working memory TTL -- Metadata metadata JSON, -- Relationships source_memory_ids JSON, -- References to supporting memories -- Indexes INDEX idx_user_type (user_id, memory_type), INDEX idx_session (session_id), INDEX idx_created (created_at), INDEX idx_expires (expires_at), VECTOR INDEX idx_embedding (embedding) ); This schema gives you: Flexible content storage: Store any type of memory as text Semantic search: Use vector embeddings to find similar memories Temporal tracking: Know when memories were created and last used Automatic expiration: Working memory can auto-delete Rich metadata: Store any additional context as JSON Relationship tracking: Link memories that support each other You might be wondering what that VECTOR(1536) field is. This is the core technology that makes semantic search possible. The problem: Computers cannot understand meaning directly. If you store the text "user likes coffee" and later search for "caffeine preferences", a traditional database will not find it because the words do not match. The solution: Vector embeddings convert text into arrays of numbers that capture meaning. Similar concepts have similar vectors. Here is how it works: // Using OpenAI's embedding model import OpenAI from 'openai'; const openai = new OpenAI(); async function createEmbedding(text) { const response = await openai.embeddings.create({ model: 'text-embedding-3-small', input: text }); return response.data[0].embedding; // Returns array of 1536 numbers } // Example const embedding1 = await createEmbedding("user likes coffee"); const embedding2 = await createEmbedding("caffeine preferences"); // These embeddings will be similar because the concepts are related The embedding is a list of 1536 numbers (called dimensions) that represents the semantic meaning of the text. When you want to find related memories, you compare these number arrays using a mathematical function called cosine similarity. Cosine similarity explained simply: Imagine two arrows in space. If the arrows point in the same direction, they are similar. If they point in opposite directions, they are different. If they are at right angles, they are unrelated. Cosine similarity measures the angle between vector arrows. function cosineSimilarity(vectorA, vectorB) { // Calculate dot product const dotProduct = vectorA.reduce((sum, a, i) => sum + a * vectorB[i], 0); // Calculate magnitudes const magnitudeA = Math.sqrt(vectorA.reduce((sum, a) => sum + a * a, 0)); const magnitudeB = Math.sqrt(vectorB.reduce((sum, b) => sum + b * b, 0)); // Return cosine similarity (between -1 and 1) return dotProduct / (magnitudeA * magnitudeB); } // Similarity of 1 means identical meaning // Similarity of 0 means unrelated // Similarity of -1 means opposite meaning In practice, your database handles this calculation. You just store the embeddings and query for similar ones: -- Find memories similar to current query SELECT id, content, VEC_COSINE_DISTANCE(embedding, :query_embedding) AS similarity FROM ai_memories WHERE user_id = :user_id AND memory_type = 'semantic' ORDER BY similarity ASC LIMIT 10; The database returns the 10 most semantically similar memories, even if they do not share any exact words with your query. When your AI application runs, it needs to decide what to remember and what to ignore. Not every message should become a memory. Here is a production-quality implementation: import OpenAI from 'openai'; import { db } from './database'; const openai = new OpenAI(); async function processMessage(userId, message) { // Step 1: Use the LLM to extract memorable facts const extraction = await openai.chat.completions.create({ model: 'gpt-4o-mini', messages: [ { role: 'system', content: `Extract important facts from the user message that should be remembered long-term. Ignore: - Pleasantries ("thanks", "hello", "goodbye") - Confirmations ("ok", "yes", "got it") - Questions that don't reveal user information Extract: - Personal preferences or attributes - Important project or work details - Specific requests or requirements - Feedback about previous interactions Return a JSON array of facts with importance scores (0-1). Format: [{"fact": "...", "importance": 0.8, "type": "semantic"}]` }, { role: 'user', content: message } ], response_format: { type: 'json_object' } }); const facts = JSON.parse(extraction.choices[0].message.content).facts || []; // Step 2: Filter facts by importance threshold const importantFacts = facts.filter(f => f.importance > 0.6); // Step 3: Store each fact as a memory for (const fact of importantFacts) { // Generate embedding for semantic search const embeddingResponse = await openai.embeddings.create({ model: 'text-embedding-3-small', input: fact.fact }); const embedding = embeddingResponse.data[0].embedding; // Check if similar memory already exists const existing = await db.query(` SELECT id, content FROM ai_memories WHERE user_id = ? AND memory_type = ? AND VEC_COSINE_DISTANCE(embedding, ?) 0) { // Similar memory exists, update it instead of creating duplicate await db.query(` UPDATE ai_memories SET last_accessed = NOW(), access_count = access_count + 1, confidence = LEAST(confidence + 0.1, 1.0) WHERE id = ? `, [existing[0].id]); } else { // Create new memory await db.query(` INSERT INTO ai_memories ( user_id, content, embedding, memory_type, importance, created_at, last_accessed ) VALUES (?, ?, ?, ?, ?, NOW(), NOW()) `, [ userId, fact.fact, JSON.stringify(embedding), fact.type, fact.importance ]); } } return importantFacts.length; } // Example usage const userId = 123; const message = "I really prefer dark mode for all my interfaces, and I hate small fonts. Also, I am working on a healthcare project focused on cardiology research."; const memoriesCreated = await processMessage(userId, message); console.log(`Created ${memoriesCreated} new memories`); This approach: Uses the LLM itself to decide what is worth remembering Filters out low-importance information Checks for duplicates before creating new memories Reinforces existing memories when similar information appears Stores embeddings for semantic search Critical insight: The quality of your memory extraction directly impacts your system's usefulness. If you store too much, you create noise. If you store too little, you lose important context. The importance threshold (0.6 in this example) is something you will need to tune for your specific application. Storing memories is only half the problem. The real challenge is retrieving the right memories at the right time. You cannot just dump all memories into the prompt. You need a smart ranking system. A good retrieval system balances three factors: Semantic relevance: How related is this memory to the current query? Recency: How recent is this memory? Importance: How important was this memory when it was created? Here is a production-quality retrieval implementation: async function retrieveRelevantMemories(userId, queryText, options = {}) { const { memoryTypes = ['semantic', 'episodic'], limit = 10, recencyWeight = 0.25, importanceWeight = 0.15, relevanceWeight = 0.60 } = options; // Generate embedding for the query const embeddingResponse = await openai.embeddings.create({ model: 'text-embedding-3-small', input: queryText }); const queryEmbedding = embeddingResponse.data[0].embedding; // Retrieve candidate memories with scoring const memories = await db.query(` SELECT id, content, memory_type, importance, created_at, last_accessed, VEC_COSINE_DISTANCE(embedding, ?) AS semantic_distance, -- Calculate recency score (exponential decay) EXP(-TIMESTAMPDIFF(HOUR, created_at, NOW()) / 168.0) AS recency_score, -- Final combined score ( (1 - VEC_COSINE_DISTANCE(embedding, ?)) * ? + EXP(-TIMESTAMPDIFF(HOUR, created_at, NOW()) / 168.0) * ? + importance * ? ) AS final_score FROM ai_memories WHERE user_id = ? AND memory_type IN (?) AND (expires_at IS NULL OR expires_at > NOW()) ORDER BY final_score DESC LIMIT ? `, [ JSON.stringify(queryEmbedding), JSON.stringify(queryEmbedding), relevanceWeight, recencyWeight, importanceWeight, userId, memoryTypes, limit * 2 // Get extra for filtering ]); // Update access tracking for retrieved memories const memoryIds = memories.map(m => m.id); if (memoryIds.length > 0) { await db.query(` UPDATE ai_memories SET last_accessed = NOW(), access_count = access_count + 1 WHERE id IN (?) `, [memoryIds]); } return memories.slice(0, limit); } // Example usage const userId = 123; const query = "What are my preferences for user interfaces?"; const memories = await retrieveRelevantMemories(userId, query, { memoryTypes: ['semantic'], limit: 5 }); console.log("Retrieved memories:"); memories.forEach(m => { console.log(`- ${m.content} (score: ${m.final_score.toFixed(3)})`); }); Understanding the scoring formula: The recency score uses exponential decay with a half-life of 168 hours (one week). This means: Brand new memories get a score of 1.0 One week old memories get a score of 0.5 Two weeks old memories get a score of 0.25 Four weeks old memories get a score of 0.0625 You can adjust the half-life (168 hours) based on your needs. A chatbot might use 24 hours. A long-term assistant might use 720 hours (30 days). The weights (60% relevance, 25% recency, 15% importance) are defaults that work well for most applications, but you should experiment with your specific use case. Over time, you will accumulate thousands of episodic memories. Many will be redundant or contain the same core information. Memory consolidation is the process of combining related episodic memories into semantic knowledge. Think of it like this: If a user mentions they like coffee in three different conversations, you do not need three separate memories. You need one semantic fact: "User likes coffee." Here is a practical consolidation system that runs periodically: async function consolidateMemories(userId) { // Step 1: Find clusters of similar episodic memories const clusters = await db.query(` WITH memory_pairs AS ( SELECT m1.id AS id1, m2.id AS id2, m1.content AS content1, m2.content AS content2, VEC_COSINE_DISTANCE(m1.embedding, m2.embedding) AS distance FROM ai_memories m1 JOIN ai_memories m2 ON m1.id NOW() - INTERVAL 30 DAY AND VEC_COSINE_DISTANCE(m1.embedding, m2.embedding) `${i + 1}. ${c}`).join('\n')} Synthesize a single, concise fact that captures the core information without losing important details. Return JSON: {"fact": "...", "confidence": 0.9}` } ], response_format: { type: 'json_object' } }); const result = JSON.parse(synthesis.choices[0].message.content); // Step 4: Create semantic memory const embeddingResponse = await openai.embeddings.create({ model: 'text-embedding-3-small', input: result.fact }); const embedding = embeddingResponse.data[0].embedding; await db.query(` INSERT INTO ai_memories ( user_id, content, embedding, memory_type, importance, confidence, source_memory_ids, created_at ) VALUES (?, ?, ?, 'semantic', 0.8, ?, ?, NOW()) `, [ userId, result.fact, JSON.stringify(embedding), result.confidence, JSON.stringify(group.members) ]); // Step 5: Mark episodic memories as consolidated await db.query(` UPDATE ai_memories SET metadata = JSON_SET( COALESCE(metadata, '{}'), '$.consolidated', TRUE ) WHERE id IN (?) `, [group.members]); } console.log(`Consolidated ${consolidationGroups.length} memory groups`); } // Run consolidation daily setInterval(async () => { const users = await db.query('SELECT DISTINCT user_id FROM ai_memories'); for (const user of users) { await consolidateMemories(user.user_id); } }, 24 * 60 * 60 * 1000); // Once per day This consolidation process: Finds episodic memories that are semantically similar Groups them into clusters Uses an LLM to synthesize a single semantic fact Creates a new semantic memory with references to source episodes Marks the original episodes as consolidated (but keeps them for audit trail) Important note: Do not delete the original episodic memories immediately. Keep them for a period (30-90 days) in case you need to verify or debug the consolidation. You can archive them to cheaper storage or mark them as consolidated so they are not retrieved during normal queries. Memory systems introduce serious security risks. Here are the critical issues and how to handle them. This is catastrophic if it happens. If User A can retrieve User B's memories, you have a massive data breach. Bad code (DO NOT DO THIS): // SECURITY VULNERABILITY async function getMemories(queryText) { const embedding = await createEmbedding(queryText); // No user filtering! return db.query(` SELECT content FROM ai_memories WHERE VEC_COSINE_DISTANCE(embedding, ?) contextBudget) break; selected.push(memory); totalTokens += tokens; } return selected; } function estimateTokens(text) { // Rough estimate: 1 token ≈ 4 characters return Math.ceil(text.length / 4); } Let us put it all together with a real-world example: a customer support AI that remembers past interactions. import OpenAI from 'openai'; import { db } from './database'; const openai = new OpenAI(); class CustomerSupportAI { constructor(userId) { this.userId = userId; } async chat(userMessage) { // 1. Retrieve relevant memories const memories = await retrieveRelevantMemories( this.userId, userMessage, { memoryTypes: ['semantic', 'episodic'], limit: 10 } ); // 2. Build context from memories const memoryContext = memories.length > 0 ? `Here is what I remember about this user:\n${memories.map(m => `- ${m.content}`).join('\n')}\n\n` : ''; // 3. Generate response with memory context const response = await openai.chat.completions.create({ model: 'gpt-4o', messages: [ { role: 'system', content: `You are a helpful customer support agent. Use the provided memory context to personalize your responses. ${memoryContext} Be conversational and reference past interactions when relevant.` }, { role: 'user', content: userMessage } ] }); const aiResponse = response.choices[0].message.content; // 4. Extract and store new memories (background) this.processMessage(userMessage).catch(console.error); // 5. Store this interaction as episodic memory this.storeInteraction(userMessage, aiResponse).catch(console.error); return aiResponse; } async processMessage(message) { // Extract memorable facts const extraction = await openai.chat.completions.create({ model: 'gpt-4o-mini', messages: [ { role: 'system', content: `Extract important facts to remember about this user. Return JSON array: [{"fact": "...", "importance": 0.8, "type": "semantic"}]` }, { role: 'user', content: message } ], response_format: { type: 'json_object' } }); const facts = JSON.parse(extraction.choices[0].message.content).facts || []; for (const fact of facts.filter(f => f.importance > 0.6)) { const embedding = await this.createEmbedding(fact.fact); // Check for duplicates const existing = await db.query(` SELECT id FROM ai_memories WHERE user_id = ? AND memory_type = ? AND VEC_COSINE_DISTANCE(embedding, ?) < 0.15 LIMIT 1 `, [this.userId, fact.type, JSON.stringify(embedding)]); if (existing.length === 0) { // Create new memory await db.query(` INSERT INTO ai_memories ( user_id, content, embedding, memory_type, importance ) VALUES (?, ?, ?, ?, ?) `, [this.userId, fact.fact, JSON.stringify(embedding), fact.type, fact.importance]); } } } async storeInteraction(userMessage, aiResponse) { const interaction = `User: ${userMessage}\nAgent: ${aiResponse}`; const embedding = await this.createEmbedding(interaction); await db.query(` INSERT INTO ai_memories ( user_id, content, embedding, memory_type, importance ) VALUES (?, ?, ?, 'episodic', 0.5) `, [this.userId, interaction, JSON.stringify(embedding)]); } async createEmbedding(text) { const response = await openai.embeddings.create({ model: 'text-embedding-3-small', input: text }); return response.data[0].embedding; } } // Usage const support = new CustomerSupportAI(123); const response1 = await support.chat("I need help with my subscription. I am on the Pro plan."); console.log(response1); // Later conversation (AI remembers the Pro plan) const response2 = await support.chat("Can I upgrade to a higher tier?"); console.log(response2); // Will reference that they are already on Pro plan This complete example shows: Memory retrieval before responding Context building from memories Background memory extraction Episodic memory of interactions Duplicate detection Proper user scoping After building memory systems for a year, here are the mistakes I see most often: 1. Storing everything: Not every message needs to be remembered. Filter aggressively. 2. Not handling contradictions: User says they are vegetarian in January, orders steak in February. Your system needs to handle this. 3. Ignoring the cold start problem: New users have no memories. Your system should still work well with zero memories. 4. Over-complicating retrieval: Start simple (pure semantic search), add complexity (recency, importance) only when needed. 5. Not monitoring memory quality: Track metrics like retrieval accuracy, memory usage rate, and consolidation effectiveness. 6. Forgetting to forget: Old, irrelevant memories should be archived or deleted. Otherwise your database grows forever and queries get slower. 7. Not testing cross-user isolation: This is a security disaster waiting to happen. Test it thoroughly. Memory systems for AI are evolving rapidly. Here are the trends to watch in 2026: Multimodal memory: Storing memories from images, audio, and video, not just text. An AI that remembers your face, your voice, your preferred workspace layout. Collaborative memory: Multiple AI agents sharing a memory pool. Your coding assistant and your writing assistant remembering the same project context. Memory as a service: Just like you use OpenAI for LLM calls, you will use specialized services for memory management. Early players include Mem0, Zep, and LangMem. Procedural memory: Beyond facts and events, AI will remember how to do things. "When user asks X, follow this workflow." This is closer to traditional programming but managed by the AI itself. Memory reasoning: AI that can explain why it remembers something, evaluate memory trustworthiness, and actively request missing information. Building memory systems is hard. It requires careful database design, smart retrieval algorithms, security awareness, and constant tuning. But the payoff is massive. An AI with memory is fundamentally more useful than one without. The key is to start simple: Pick one memory type (semantic is easiest) Build basic storage and retrieval Test thoroughly with real users Add complexity only when needed Your first version will have bugs. Your retrieval ranking will need tuning. Your consolidation logic will miss edge cases. That is fine. Every production memory system started rough and got better through iteration. The future of AI applications is not stateless chatbots. It is systems that remember, learn, and adapt over time. Memory is how we get there. Now go build something that remembers. If you find this article helpful, share it with others and give me a follow on X https://x.com/codewithveek.