Context Engineering Part 4: Agent Memory — What to Forget, What to Remember, and Why It Matters

Introduction: Forgetting Is a Feature, Not a Flaw

In AI agents, forgetting is a design decision. Get it right and you have an agent that feels intelligently attentive. Get it wrong and you have one that is either amnesiac or paralysed under the weight of irrelevant history.

The Two Types of Memory That Shape Agent Intelligence

Ephemeral memory is short-term, session-scoped context. It holds the immediate conversation: the last several exchanges, the current task state, any decisions made in the ongoing interaction. By design, ephemeral memory is lightweight and transient. It does not persist beyond the current session.

Persistent memory is long-term, cross-session knowledge. It captures information that should shape the agent's behaviour over time: user preferences, past decisions, patterns of interaction. Persistent memory is what makes an agent feel like it knows you.

The Design Challenge: Relevance Over Volume

The temptation in memory design is to store everything. In practice, the opposite is often true. Overloaded context degrades agent performance. It makes retrieval harder and introduces noise. An agent that has access to every interaction may retrieve old, outdated information that leads to responses that feel stale or confused.

The discipline of memory design is about defining clear criteria for what gets retained, what gets summarised, what gets archived, and what gets discarded entirely.

Tools That Power Agent Memory

Pinecone provides scalable vector search for long-term persistent memory. ChromaDB offers a lightweight, open-source alternative. Redis handles high-speed ephemeral memory for real-time conversation. Frameworks like LangChain and LlamaIndex provide higher-level abstractions over these storage backends.

Conclusion