Memory Context Block
A unified Memory Service Interface for AI agent projects that stores and retrieves both short-term conversational context (STM) and long-term historical memory (LTM) through a clean, stateless REST service with pluggable storage backends.
The Problem
AI agents quickly become unreliable and repetitive when they cannot maintain context across a session or learn from past interactions. Teams often solve this in fragmented ways: ad-hoc caches for “recent context,” separate databases for “long-term memory,” and inconsistent retrieval approaches across products.
- Multi-turn conversations: The agent loses track of what was said earlier and starts repeating questions or contradicting itself.
- Long-lived user journeys: Users return later and the agent behaves as if it has never met them.
- Scaling agent features: Teams add new memory use cases (preferences, summaries, knowledge base) and rebuild storage logic each time.
Intended Outcome
Key outcomes this block enables:
- Stable conversational continuity within a session using STM.
- Longer-term personalization and recall using LTM, without mixing responsibilities.
- A consistent memory API that multiple agents and services can reuse, regardless of backend.
Scope & Boundaries
- A single REST interface for creating, storing, retrieving, and managing memory.
- A dual memory model: STM for recent conversational context, LTM for persistent historical memory.
- Pluggable backends (configured per environment): STM (Redis/in-memory), LTM (Qdrant/PGVector).
- Stateless, horizontally scalable service behavior with production surfaces (health, logging, docker).
- Product-specific memory policies (what to store, what to forget, how to summarize).
- Trust decisions like consent, retention rules, or sensitive data classification (those must be defined by the product and governance layer).
How it Works
At a high level, how does this building block operate? Core behavior flow:
The agent or product sends a memory write request (STM or LTM) with structured metadata (e.g., session, user, category).
The service routes the write to the configured backend (Redis/in-memory for STM; Qdrant/PGVector for LTM).
The agent sends a memory read/query request when composing a response.
The service returns relevant memory entries based on scope (STM, LTM, or both) and retrieval parameters.
The agent uses retrieved memory as context to improve response continuity, personalization, and task completion.
System Interaction
Placement & Triggers
- Acts as a shared memory service that multiple agents, microservices, or orchestration layers call.
- Sits between the agent runtime and storage backends, standardizing memory operations.
- Supports incremental evolution: teams can change backends or storage strategies without changing agent integrations.
Supported Architectures
Experience Surface
This block is mostly invisible to end users, but it materially shapes the experience through: Reduced repetition (“I already told you this”), Better continuity (“it remembers the plan we agreed on”), More consistent personalization (“it recalls preferences appropriately”), Smoother multi-step tasks (“it resumes where we left off”). If misconfigured or misused, users experience the opposite: inconsistency, amnesia, or inappropriate recall.
Design Assets
Available Resources
Memory Indicator (Optional UI Pattern). Purpose: subtle signal when the agent is using saved context. Key elements: “Using saved context” label + “Manage memory” link. Memory Controls (Optional UI Pattern). Purpose: give users basic control over persistence. Key elements: “Forget this,” “Clear history,” “View saved preferences.”
When to use
Risks & Guidelines
- Treating memory as a dumping ground (storing everything without purpose or policy).
- Mixing STM and LTM responsibilities (turning long-term memory into noisy chat logs).
- Storing sensitive data without clear retention, consent, and deletion rules.
- Over-retrieving (injecting too much memory into context and degrading model performance).
- Define clear memory categories (preferences, tasks, summaries) and store intentionally.
- Keep STM scoped to sessions; keep LTM scoped to durable, high-signal information.
- Apply governance: retention rules, deletion paths, and sensitive data handling before enabling persistence at scale.
How to Measure Success
How will we know this pattern is strengthening trust?
North Star
Memory Retrieval Utilization Rate — percentage of agent turns that successfully retrieve relevant memory (STM, LTM, or both) when memory is expected to be used.
STM Hit Rate
Share of requests that return STM context within the active session. Indicates whether session continuity is working.
LTM Recall Rate
Share of LTM queries that return results above a relevance threshold. Indicates whether long-term memory is finding usable information rather than noise.
Memory Latency (p50 / p95)
End-to-end response time for memory reads/writes. Ensures memory does not become a bottleneck in agent responsiveness.
Write-to-Retrieve Ratio (by category)
Compares how much is stored vs how much is actually retrieved and used. Helps detect “memory hoarding” and low-signal storage.
Backend Health (by store)
Availability and error rate per backend (Redis/in-memory, Qdrant/PGVector).