FoundationalComplexity: 3/5Backend + AI

Memory Context Block

A unified Memory Service Interface for AI agent projects that stores and retrieves both short-term conversational context (STM) and long-term historical memory (LTM) through a clean, stateless REST service with pluggable storage backends.

01.

The Problem

AI agents quickly become unreliable and repetitive when they cannot maintain context across a session or learn from past interactions. Teams often solve this in fragmented ways: ad-hoc caches for “recent context,” separate databases for “long-term memory,” and inconsistent retrieval approaches across products.

  • Multi-turn conversations: The agent loses track of what was said earlier and starts repeating questions or contradicting itself.
  • Long-lived user journeys: Users return later and the agent behaves as if it has never met them.
  • Scaling agent features: Teams add new memory use cases (preferences, summaries, knowledge base) and rebuild storage logic each time.
02.

Intended Outcome

Key outcomes this block enables:

  • Stable conversational continuity within a session using STM.
  • Longer-term personalization and recall using LTM, without mixing responsibilities.
  • A consistent memory API that multiple agents and services can reuse, regardless of backend.
03.

Scope & Boundaries

Handles
  • A single REST interface for creating, storing, retrieving, and managing memory.
  • A dual memory model: STM for recent conversational context, LTM for persistent historical memory.
  • Pluggable backends (configured per environment): STM (Redis/in-memory), LTM (Qdrant/PGVector).
  • Stateless, horizontally scalable service behavior with production surfaces (health, logging, docker).
Does Not Handle
  • Product-specific memory policies (what to store, what to forget, how to summarize).
  • Trust decisions like consent, retention rules, or sensitive data classification (those must be defined by the product and governance layer).
04.

How it Works

At a high level, how does this building block operate? Core behavior flow:

1

The agent or product sends a memory write request (STM or LTM) with structured metadata (e.g., session, user, category).

2

The service routes the write to the configured backend (Redis/in-memory for STM; Qdrant/PGVector for LTM).

3

The agent sends a memory read/query request when composing a response.

4

The service returns relevant memory entries based on scope (STM, LTM, or both) and retrieval parameters.

5

The agent uses retrieved memory as context to improve response continuity, personalization, and task completion.

STM: Redis or in-memory
LTM: Qdrant or PGVector
05.

System Interaction

Placement & Triggers

  • Acts as a shared memory service that multiple agents, microservices, or orchestration layers call.
  • Sits between the agent runtime and storage backends, standardizing memory operations.
  • Supports incremental evolution: teams can change backends or storage strategies without changing agent integrations.

Supported Architectures

Conversation history and session stateUser preferences and personalizationAgent summaries and “working notes”Knowledge base references (when paired with embeddings/search in LTM)
06.

Experience Surface

This block is mostly invisible to end users, but it materially shapes the experience through: Reduced repetition (“I already told you this”), Better continuity (“it remembers the plan we agreed on”), More consistent personalization (“it recalls preferences appropriately”), Smoother multi-step tasks (“it resumes where we left off”). If misconfigured or misused, users experience the opposite: inconsistency, amnesia, or inappropriate recall.

07.

Design Assets

Available Resources

Memory Indicator (Optional UI Pattern). Purpose: subtle signal when the agent is using saved context. Key elements: “Using saved context” label + “Manage memory” link. Memory Controls (Optional UI Pattern). Purpose: give users basic control over persistence. Key elements: “Forget this,” “Clear history,” “View saved preferences.”

08.

When to use

You are building agents that require multi-turn continuity and session state.
You need both “recent context” and “long-term recall” without mixing them in one store.
You want a shared memory layer across multiple agents or microservices.
You anticipate growth in memory use cases (preferences, summaries, knowledge, sessions) and want a stable interface now.
09.

Risks & Guidelines

Common Risks
  • Treating memory as a dumping ground (storing everything without purpose or policy).
  • Mixing STM and LTM responsibilities (turning long-term memory into noisy chat logs).
  • Storing sensitive data without clear retention, consent, and deletion rules.
  • Over-retrieving (injecting too much memory into context and degrading model performance).
Safety Guidelines
  • Define clear memory categories (preferences, tasks, summaries) and store intentionally.
  • Keep STM scoped to sessions; keep LTM scoped to durable, high-signal information.
  • Apply governance: retention rules, deletion paths, and sensitive data handling before enabling persistence at scale.
11.

How to Measure Success

How will we know this pattern is strengthening trust?

North Star Metric

North Star

Memory Retrieval Utilization Rate — percentage of agent turns that successfully retrieve relevant memory (STM, LTM, or both) when memory is expected to be used.

STM Hit Rate

Share of requests that return STM context within the active session. Indicates whether session continuity is working.

LTM Recall Rate

Share of LTM queries that return results above a relevance threshold. Indicates whether long-term memory is finding usable information rather than noise.

Memory Latency (p50 / p95)

End-to-end response time for memory reads/writes. Ensures memory does not become a bottleneck in agent responsiveness.

Write-to-Retrieve Ratio (by category)

Compares how much is stored vs how much is actually retrieved and used. Helps detect “memory hoarding” and low-signal storage.

Backend Health (by store)

Availability and error rate per backend (Redis/in-memory, Qdrant/PGVector).