Skip to content

Memory Tiers

Deep dive into Axon's three-tier memory architecture: ephemeral, session, and persistent tiers.


Overview

Axon organizes memories into three tiers, each with different characteristics for storage duration, access speed, and cost:

graph LR
    A[Ephemeral] -->|promoted| B[Session]
    B -->|promoted| C[Persistent]
    C -.->|demoted| B
    B -.->|demoted| A

    style A fill:#FF5252,color:#fff
    style B fill:#FFA726,color:#fff
    style C fill:#66BB6A,color:#fff
Tier Duration Storage Use Case TTL
Ephemeral Seconds-Minutes Redis/Memory Cache, rate limits 5s-1hr
Session Minutes-Hours Redis/Vector DBs Conversations, workspace ≥60s
Persistent Days-Forever Vector DBs Knowledge base None/Long

Ephemeral Tier

The ephemeral tier is for very short-lived, high-volume data that should expire quickly.

Characteristics

  • Duration: 5 seconds to 1 hour
  • Storage: Redis or InMemory only
  • Eviction: TTL-based (automatic expiration)
  • Vector Search: Disabled (not needed)
  • Access Pattern: Very frequent, short-lived

When to Use

Good For: - Rate limiting tokens - One-time verification codes (OTP) - Temporary feature flags - Recent activity tracking (last 5 minutes) - Short-term cache warming - API request de-duplication - Temporary session markers

Not Good For: - Conversation history (use session) - User preferences (use persistent) - Anything needing vector search - Data that must survive restarts

Configuration

from axon.core.policies import EphemeralPolicy

ephemeral = EphemeralPolicy(
    adapter_type="redis",  # or "memory"
    ttl_seconds=60         # 5-3600 seconds
)

Constraints: - adapter_type: Only "redis" or "memory" - ttl_seconds: Between 5 and 3600 (1 hour max) - eviction_strategy: Always "ttl" (cannot be changed) - enable_vector_search: Always False (cannot be enabled)

Example Usage

from axon import MemorySystem
from axon.core.config import MemoryConfig
from axon.core.policies import EphemeralPolicy

# Configure ephemeral-only system
config = MemoryConfig(
    ephemeral=EphemeralPolicy(
        adapter_type="redis",
        ttl_seconds=300  # 5 minutes
    ),
    default_tier="ephemeral"
)

memory = MemorySystem(config)

# Store temporary data
await memory.store(
    "Rate limit: user_123 made 5 requests",
    importance=0.1,
    tier="ephemeral",
    tags=["rate-limit", "user_123"]
)

# Automatically expires after 5 minutes
await asyncio.sleep(301)
result = await memory.recall("rate limit user_123", tier="ephemeral")
# Returns [] - data expired

Real-World Use Cases

1. Rate Limiting

# Track API request counts
user_id = "user_123"
key = f"rate_limit:{user_id}"

# Store request count
await memory.store(
    f"Request count: 1",
    importance=0.1,
    tier="ephemeral",
    tags=[key]
)

# Check rate limit
results = await memory.recall(key, tier="ephemeral")
if len(results) > 100:  # Max 100 requests per minute
    raise RateLimitExceeded()

2. OTP Codes

# Generate and store OTP
otp_code = "123456"
email = "user@example.com"

await memory.store(
    f"OTP for {email}: {otp_code}",
    importance=0.1,
    tier="ephemeral",
    tags=["otp", email]
)

# Verify OTP (within 60 seconds)
results = await memory.recall(f"OTP {email}", tier="ephemeral")
if results and otp_code in results[0].text:
    print("OTP valid!")

3. Recent Activity Tracking

# Track last 5 minutes of activity
await memory.store(
    f"User {user_id} clicked 'Add to Cart' button",
    importance=0.2,
    tier="ephemeral",
    tags=["activity", user_id]
)

# Get recent activity
recent = await memory.recall(
    f"user {user_id} activity",
    tier="ephemeral",
    k=10
)

Session Tier

The session tier stores memories for the duration of a user session, typically minutes to hours.

Characteristics

  • Duration: Minutes to hours (≥60 seconds)
  • Storage: Redis, InMemory, or Vector DBs
  • Eviction: TTL-based or capacity-based
  • Vector Search: Optional (adapter-dependent)
  • Access Pattern: Moderate frequency, session-scoped

When to Use

Good For: - Conversation history (chatbots) - Active workspace state - Recent user interactions - Shopping cart data - Session-specific preferences - Temporary project data - Active document context

Not Good For: - Very short-lived data (use ephemeral) - Long-term knowledge (use persistent) - Data that must survive server restarts - Cross-session data

Configuration

from axon.core.policies import SessionPolicy

session = SessionPolicy(
    adapter_type="redis",              # redis, memory, chroma, qdrant, pinecone
    ttl_seconds=1800,                  # 30 minutes (≥60s)
    max_entries=1000,                  # Capacity limit
    overflow_to_persistent=True,       # Auto-promote when full
    enable_vector_search=False         # If adapter supports it
)

Constraints: - adapter_type: Any adapter ("redis", "memory", "chroma", "qdrant", "pinecone") - ttl_seconds: ≥60 seconds or None - max_entries: ≥10 or None - overflow_to_persistent: True or False - enable_vector_search: True or False (adapter-dependent)

Example Usage

from axon.core.templates import STANDARD_CONFIG

memory = MemorySystem(STANDARD_CONFIG)

# Store conversation turns
await memory.store(
    "User: What's the weather today?",
    importance=0.5,
    tier="session",
    tags=["conversation", session_id]
)

await memory.store(
    "Assistant: It's sunny and 72°F",
    importance=0.5,
    tier="session",
    tags=["conversation", session_id]
)

# Recall conversation history
history = await memory.recall(
    "conversation",
    tier="session",
    filter=Filter(tags=[session_id]),
    k=10
)

# History expires after session TTL (e.g., 30 minutes)

Overflow Behavior

When overflow_to_persistent=True:

session = SessionPolicy(
    max_entries=100,
    overflow_to_persistent=True
)

# Store 101st entry
await memory.store("Important fact", importance=0.6, tier="session")

# Automatically promoted to persistent tier
# Session tier remains at 100 entries

Real-World Use Cases

1. Chatbot Conversation

# Store conversation context
session_id = f"session_{user_id}_{timestamp}"

async def store_turn(role: str, content: str):
    await memory.store(
        f"{role}: {content}",
        importance=0.5,
        tier="session",
        tags=["conversation", session_id, role]
    )

# Store turns
await store_turn("user", "What's the capital of France?")
await store_turn("assistant", "The capital of France is Paris.")

# Recall recent conversation
history = await memory.recall(
    "conversation",
    tier="session",
    filter=Filter(tags=[session_id]),
    k=10
)

# Build prompt context
context = "\n".join([entry.text for entry in history])

2. Active Workspace

# Track currently open documents
await memory.store(
    "User opened document: project_proposal.docx",
    importance=0.4,
    tier="session",
    tags=["workspace", user_id]
)

# Track recent edits
await memory.store(
    "User edited section 'Budget' in project_proposal.docx",
    importance=0.5,
    tier="session",
    tags=["workspace", user_id, "edit"]
)

# Get workspace state
workspace = await memory.recall(
    "current workspace",
    tier="session",
    filter=Filter(tags=["workspace", user_id]),
    k=20
)

3. Shopping Cart with Overflow

config = MemoryConfig(
    session=SessionPolicy(
        adapter_type="redis",
        ttl_seconds=1800,  # 30 minutes
        max_entries=50,
        overflow_to_persistent=True  # Save to wishlist
    ),
    persistent=PersistentPolicy(adapter_type="chroma"),
    default_tier="session"
)

memory = MemorySystem(config)

# Add items to cart
await memory.store(
    "Product: Wireless Mouse, Price: $25",
    importance=0.6,
    tier="session",
    tags=["cart", user_id]
)

# When session tier reaches 50 items,
# automatically overflow to persistent (wishlist)

Persistent Tier

The persistent tier stores long-term memories indefinitely with semantic search capabilities.

Characteristics

  • Duration: Days to forever
  • Storage: Vector DBs (ChromaDB, Qdrant, Pinecone)
  • Eviction: Manual compaction or archival
  • Vector Search: Always enabled
  • Access Pattern: Less frequent, long-term storage

When to Use

Good For: - Long-term knowledge base - User history and preferences - Learned facts and insights - Important conversations - Permanent records - Training data - Compliance records

Not Good For: - Temporary data (use ephemeral/session) - High-volume cache (use ephemeral) - Data that changes frequently - Very short-lived information

Configuration

from axon.core.policies import PersistentPolicy

persistent = PersistentPolicy(
    adapter_type="chroma",                    # chroma, qdrant, pinecone, memory
    ttl_seconds=None,                         # No expiration
    compaction_threshold=10000,               # Compact at 10K entries
    compaction_strategy="importance",         # count, semantic, importance, time
    enable_vector_search=True,                # Always True
    archive_adapter=None                      # Optional: s3, gcs
)

Constraints: - adapter_type: "chroma", "qdrant", "pinecone", or "memory" (testing only) - ttl_seconds: Usually None (no expiration) or very long - compaction_threshold: ≥100 or None - compaction_strategy: "count", "semantic", "importance", or "time" - enable_vector_search: Always True (cannot be disabled) - archive_adapter: Optional cold storage adapter

Example Usage

from axon.core.templates import PRODUCTION_CONFIG

memory = MemorySystem(PRODUCTION_CONFIG)

# Store long-term knowledge
await memory.store(
    "User prefers dark mode UI",
    importance=0.8,
    tier="persistent",
    tags=["preference", "ui", user_id]
)

await memory.store(
    "User's birthday is March 15, 1990",
    importance=0.9,
    tier="persistent",
    tags=["profile", user_id]
)

# Semantic recall (works forever)
prefs = await memory.recall(
    "user UI preferences",
    tier="persistent",
    k=5
)

# Data persists across restarts

Compaction

When compaction_threshold is reached:

# Manual compaction
await memory.compact(
    tier="persistent",
    strategy="importance"  # Keep high-importance, summarize low
)

# Check compaction stats
stats = await memory.get_tier_stats("persistent")
print(f"Entries: {stats['entry_count']}")
print(f"Threshold: {stats['compaction_threshold']}")

Real-World Use Cases

1. Knowledge Base

# Build knowledge base from documents
documents = [
    "Python is a high-level programming language.",
    "FastAPI is a modern web framework for Python.",
    "Pydantic provides data validation using Python type hints.",
]

for doc in documents:
    await memory.store(
        doc,
        importance=0.8,
        tier="persistent",
        tags=["knowledge", "python"]
    )

# Semantic search
results = await memory.recall(
    "What is Python?",
    tier="persistent",
    k=3
)

for result in results:
    print(f"- {result.text} (relevance: {result.similarity:.2f})")

2. User Profile and Preferences

# Store user preferences permanently
preferences = [
    ("Language preference: English", 0.9),
    ("Theme preference: Dark mode", 0.8),
    ("Notification preference: Email only", 0.7),
]

for pref, importance in preferences:
    await memory.store(
        pref,
        importance=importance,
        tier="persistent",
        tags=["profile", user_id]
    )

# Retrieve all preferences
user_prefs = await memory.recall(
    "user preferences",
    tier="persistent",
    filter=Filter(tags=["profile", user_id]),
    k=10
)

3. Conversation Archival

# Promote important conversations to persistent
important_conversation = [
    "User: I need help with billing",
    "Agent: I can help with that. What's your account number?",
    "User: My account is ACC-12345",
    "Agent: Thank you. I've issued a refund of $50.",
]

for turn in important_conversation:
    await memory.store(
        turn,
        importance=0.85,  # High importance → persistent
        tier="persistent",
        tags=["support", "billing", user_id]
    )

# Later: Search support history
history = await memory.recall(
    "billing issues",
    tier="persistent",
    filter=Filter(tags=["support", user_id]),
    k=20
)

4. Compaction Management

# Check if compaction is needed
stats = await memory.get_tier_stats("persistent")

if stats["entry_count"] > 10000:
    print("Compaction threshold reached")

    # Compact using importance strategy
    await memory.compact(
        tier="persistent",
        strategy="importance",
        target_reduction=0.3  # Reduce by 30%
    )

    print(f"Compacted from {stats['entry_count']} to {stats['entry_count'] * 0.7:.0f} entries")

Tier Comparison

Performance

Tier Read Latency Write Latency Throughput Cost
Ephemeral <1ms <1ms Very High Low
Session 1-10ms 1-10ms High Medium
Persistent 10-50ms 10-100ms Medium High

Storage Characteristics

Tier Durability Searchability Capacity Backup
Ephemeral None Key-only Limited No
Session Redis/RAM Basic/Vector Medium Optional
Persistent Disk/Cloud Vector Unlimited Yes

Cost Analysis

# Example cost per 1M operations
ephemeral_cost = 0.10   # Redis cache
session_cost = 1.00     # Redis + vector DB
persistent_cost = 5.00  # Full vector search

# Use tiers appropriately to optimize cost

Tier Selection Guide

Decision Tree

graph TD
    A[New Memory] --> B{How long to keep?}
    B -->|<1 hour| C[Ephemeral]
    B -->|Hours| D[Session]
    B -->|Days/Forever| E[Persistent]

    C --> F{Need search?}
    F -->|No| G[✓ Ephemeral Tier]
    F -->|Yes| H[Use Session instead]

    D --> I{Active session?}
    I -->|Yes| J[✓ Session Tier]
    I -->|No| K[Use Persistent]

    E --> L{Need vector search?}
    L -->|Yes| M[✓ Persistent Tier]
    L -->|No| N[Use Session]

Quick Reference

Use Ephemeral if: - ✓ Data expires in <1 hour - ✓ No semantic search needed - ✓ High volume, low importance - ✓ Cache, rate limits, temp flags

Use Session if: - ✓ Data tied to user session - ✓ Expires in hours - ✓ Moderate importance - ✓ Conversations, workspace state

Use Persistent if: - ✓ Data needed long-term - ✓ Semantic search required - ✓ High importance - ✓ Knowledge base, user profiles


Multi-Tier Strategies

Pattern 1: Graduated Storage

# All memories start in session tier
entry_id = await memory.store(
    "User clicked product page",
    importance=0.4,
    tier="session"
)

# Frequently accessed → auto-promote to persistent
for _ in range(10):
    await memory.recall("product page", tier="session")

# Now in persistent tier automatically

Pattern 2: Explicit Tier Management

# Cache in ephemeral, facts in persistent
await memory.store("API response cache", importance=0.1, tier="ephemeral")
await memory.store("User email verified", importance=0.9, tier="persistent")

Pattern 3: Overflow Chain

config = MemoryConfig(
    session=SessionPolicy(
        max_entries=100,
        overflow_to_persistent=True  # Auto-promote
    ),
    persistent=PersistentPolicy(
        compaction_threshold=1000  # Summarize old data
    )
)

# Session → Persistent → Compacted

Best Practices

1. Match Tier to Use Case

# ✓ Good
await memory.store("Rate limit token", tier="ephemeral")
await memory.store("Conversation turn", tier="session")
await memory.store("User preference", tier="persistent")

# ✗ Bad
await memory.store("Rate limit token", tier="persistent")  # Wastes resources
await memory.store("User preference", tier="ephemeral")    # Will be lost

2. Set Appropriate TTLs

# ✓ Good TTLs
ephemeral: 30-300 seconds
session: 600-3600 seconds (10-60 minutes)
persistent: None (no expiration)

# ✗ Bad TTLs
ephemeral: 7200 seconds (use session)
session: 30 seconds (use ephemeral)
persistent: 600 seconds (defeats purpose)

3. Use Importance Scores

# Let Axon route automatically
await memory.store("Click event", importance=0.1)  # → Ephemeral
await memory.store("Page view", importance=0.5)    # → Session
await memory.store("Purchase", importance=0.9)     # → Persistent

4. Enable Overflow

# Prevent data loss
session = SessionPolicy(
    max_entries=1000,
    overflow_to_persistent=True  # ✓ Safe
)

Next Steps

  • Policies


    Learn how to configure tier policies and constraints.

    Policy Guide

  • Routing


    Understand how memories move between tiers.

    Routing Details

  • Storage Adapters


    Choose the right storage backend for each tier.

    Adapter Guide

  • Examples


    See tiers in action with working examples.

    Tier Examples