Semantic recall

Trust: ★★★☆☆ (0.90) · 0 validations · developer_reference

Published: 2026-05-10 · Source: crawler_authoritative

Tình huống

Mastra developer documentation for configuring semantic recall — a RAG-based search feature that helps AI agents maintain context across longer interactions using vector embeddings.

Insight

Semantic recall is a RAG-based search mechanism in Mastra that enables agents to retrieve contextually relevant messages from past interactions when they fall outside recent message history. It works by converting messages into vector embeddings using an embedder, storing them in a vector database, and performing similarity search to retrieve semantically related content. The system uses new incoming messages as queries against the vector store, retrieves similar past messages, and then persists new messages (user, assistant, and tool calls/results) back to the vector DB for future recall. Semantic recall is disabled by default and must be explicitly enabled via the semanticRecall: true option in Memory configuration. Configuration options include topK (number of similar messages to retrieve), messageRange (surrounding messages to include with each match), scope (search current thread or all threads for a resource, with scope: 'resource' supported by LibSQL, PostgreSQL, and Upstash), and filter (metadata criteria using operators like $eq, $ne, $gt, $gte, $lt, $lte, $in, $nin, $and, $or). Supported embedders include OpenAI (text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002), Google (gemini-embedding-001), OpenRouter, and local FastEmbed via @mastra/fastembed. The recall() method on Memory adds semantic search capability via vectorSearchString parameter, supplementing listMessages which retrieves by thread ID with basic pagination. When using PostgreSQL, HNSW indexes provide better performance than the default IVFFlat, with configuration options for type, metric (recommended dotproduct for OpenAI embeddings), m (bi-directional links), and efConstruction (candidate list size during construction).

Hành động

To enable semantic recall, configure the Memory instance with semanticRecall: true in options, provide a vector store (e.g., LibSQLVector, PgVector), and specify an embedder (recommended: ModelRouterEmbeddingModel). Example: new Memory({ storage: new LibSQLStore({ id: 'agent-storage', url: 'file:./local.db' }), vector: new LibSQLVector({ id: 'agent-vector', url: 'file:./local.db' }), embedder: new ModelRouterEmbeddingModel('openai/text-embedding-3-small'), options: { semanticRecall: true } }). To perform semantic search, call memory.recall({ threadId: 'thread-123', vectorSearchString: 'query text', threadConfig: { semanticRecall: true } }). For metadata filtering, pass a filter object like { projectId: { $eq: 'project-a' } } or use compound filters with $and/$or. To use FastEmbed locally: install @mastra/fastembed@latest, then use embedder: fastembed in Memory config. For PostgreSQL optimization, configure indexConfig: { type: 'hnsw', metric: 'dotproduct', m: 16, efConstruction: 64 }. To disable semantic recall when message history is sufficient or latency is a concern, omit the option or set semanticRecall: false. When tracing is enabled, recalled messages appear in agent trace output alongside recent message history.

Kết quả

Semantic recall retrieves contextually similar messages from vector storage, enabling agents to maintain conversational context across extended interactions. New messages are embedded and stored after each LLM response, making them available for future semantic queries. Retrieved messages appear in agent trace output when tracing is enabled.

Điều kiện áp dụng

Requires a vector store adapter (LibSQL, PostgreSQL, Upstash, or other supported backends) and an embedder model. scope: 'resource' only supported by LibSQL, PostgreSQL, and Upstash storage adapters. PostgreSQL HNSW index optimization is recommended for large-scale deployments with thousands of messages.


Nội dung gốc (Original)

Semantic recall

If you ask your friend what they did last weekend, they will search in their memory for events associated with “last weekend” and then tell you what they did. That’s sort of like how semantic recall works in Mastra.

Watch 📹: What semantic recall is, how it works, and how to configure it in Mastra → YouTube (5 minutes)

How semantic recall works

Semantic recall is RAG-based search that helps agents maintain context across longer interactions when messages are no longer within recent message history.

It uses vector embeddings of messages for similarity search, integrates with various vector stores, and has configurable context windows around retrieved messages.

Diagram showing Mastra Memory semantic recall

When it’s enabled, new messages are used to query a vector DB for semantically similar messages.

After getting a response from the LLM, all new messages (user, assistant, and tool calls/results) are inserted into the vector DB to be recalled in later interactions.

Quickstart

Semantic recall is disabled by default. To enable it, set semanticRecall: true in options and provide a vector store and embedder:

import { Agent } from '@mastra/core/agent'
import { Memory } from '@mastra/memory'
import { LibSQLStore, LibSQLVector } from '@mastra/libsql'
import { ModelRouterEmbeddingModel } from '@mastra/core/llm'
 
const agent = new Agent({
  id: 'support-agent',
  name: 'SupportAgent',
  instructions: 'You are a helpful support agent.',
  model: 'openai/gpt-5.4',
  memory: new Memory({
    storage: new LibSQLStore({
      id: 'agent-storage',
      url: 'file:./local.db',
    }),
    vector: new LibSQLVector({
      id: 'agent-vector',
      url: 'file:./local.db',
    }),
    embedder: new ModelRouterEmbeddingModel('openai/text-embedding-3-small'),
    options: {
      semanticRecall: true,
    },
  }),
})

Using the recall() method

While listMessages retrieves messages by thread ID with basic pagination, recall() adds support for semantic search. When you need to find messages by meaning rather than recency, use recall() with a vectorSearchString:

const memory = await agent.getMemory()
 
// Basic recall - similar to listMessages
const { messages } = await memory!.recall({
  threadId: 'thread-123',
  perPage: 50,
})
 
// Semantic recall - find messages by meaning
const { messages: relevantMessages } = await memory!.recall({
  threadId: 'thread-123',
  vectorSearchString: 'What did we discuss about the project deadline?',
  threadConfig: {
    semanticRecall: true,
  },
})

Storage configuration

Semantic recall relies on a storage and vector db to store messages and their embeddings.

import { Memory } from '@mastra/memory'
import { Agent } from '@mastra/core/agent'
import { LibSQLStore, LibSQLVector } from '@mastra/libsql'
 
const agent = new Agent({
  memory: new Memory({
    // this is the default storage db if omitted
    storage: new LibSQLStore({
      id: 'agent-storage',
      url: 'file:./local.db',
    }),
    // this is the default vector db if omitted
    vector: new LibSQLVector({
      id: 'agent-vector',
      url: 'file:./local.db',
    }),
    options: {
      semanticRecall: true,
    },
  }),
})

Each vector store page below includes installation instructions, configuration parameters, and usage examples:

Recall configuration

The following options control semantic recall behavior:

  1. topK: The number of similar messages to retrieve
  2. messageRange: The surrounding messages to include with each match
  3. scope: Whether to search the current thread or all threads for a resource
  4. filter: Metadata criteria that restrict search results
const agent = new Agent({
  memory: new Memory({
    options: {
      semanticRecall: {
        topK: 3, // Retrieve 3 similar messages
        messageRange: 2, // Include 2 messages before and after each match
        scope: 'resource', // Search all threads for this resource
        filter: { projectId: { $eq: 'project-a' } },
      },
    },
  }),
})

Note: scope: 'resource' is supported by the LibSQL, PostgreSQL, and Upstash storage adapters.

Metadata filtering

The filter option restricts semantic recall results to messages with matching thread metadata.

const agent = new Agent({
  memory: new Memory({
    options: {
      semanticRecall: {
        scope: 'resource',
        filter: {
          projectId: { $eq: 'project-a' },
          category: { $in: ['work', 'personal'] },
        },
      },
    },
  }),
})

Filters match metadata stored on message embeddings when messages are saved. If thread metadata changes later, existing embeddings keep their previous metadata until those messages are saved or indexed again.

Supported filter operators:

  • $and: Logical AND
  • $eq: Equal to
  • $gt: Greater than
  • $gte: Greater than or equal
  • $in: In array
  • $lt: Less than
  • $lte: Less than or equal
  • $ne: Not equal to
  • $nin: Not in array
  • $or: Logical OR

The following example demonstrates metadata filters for common use cases:

// Filter by project
const options = {
  semanticRecall: { filter: { projectId: { $eq: 'my-project' } } },
}
 
// Filter by multiple categories
const options = {
  semanticRecall: { filter: { category: { $in: ['work', 'research'] } } },
}
 
// Filter by project and priority
const options = {
  semanticRecall: {
    filter: {
      $and: [{ projectId: { $eq: 'project-a' } }, { priority: { $gte: 3 } }],
    },
  },
}

Embedder configuration

Semantic recall relies on an embedding model to convert messages into embeddings. Mastra supports embedding models through the model router using provider/model strings, or you can use any embedding model compatible with the AI SDK.

The simplest way is to use a provider/model string with autocomplete support:

import { Memory } from '@mastra/memory'
import { Agent } from '@mastra/core/agent'
import { ModelRouterEmbeddingModel } from '@mastra/core/llm'
 
const agent = new Agent({
  memory: new Memory({
    embedder: new ModelRouterEmbeddingModel('openai/text-embedding-3-small'),
    options: {
      semanticRecall: true,
    },
  }),
})

Supported embedding models:

  • OpenAI: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002
  • Google: gemini-embedding-001
  • OpenRouter: Access embedding models from various providers
import { Agent } from '@mastra/core/agent'
import { Memory } from '@mastra/memory'
import { ModelRouterEmbeddingModel } from '@mastra/core/llm'
 
const agent = new Agent({
  memory: new Memory({
    embedder: new ModelRouterEmbeddingModel({
      providerId: 'openrouter',
      modelId: 'openai/text-embedding-3-small',
    }),
  }),
})

The model router automatically handles API key detection from environment variables (OPENAI_API_KEY, GOOGLE_GENERATIVE_AI_API_KEY, OPENROUTER_API_KEY).

Using AI SDK Packages

You can also use AI SDK embedding models directly:

import { Memory } from '@mastra/memory'
import { Agent } from '@mastra/core/agent'
import { ModelRouterEmbeddingModel } from '@mastra/core/llm'
 
const agent = new Agent({
  memory: new Memory({
    embedder: new ModelRouterEmbeddingModel('openai/text-embedding-3-small'),
  }),
})

Using FastEmbed (local)

To use FastEmbed (a local embedding model), install @mastra/fastembed:

npm:

npm install @mastra/fastembed@latest

pnpm:

pnpm add @mastra/fastembed@latest

Yarn:

yarn add @mastra/fastembed@latest

Bun:

bun add @mastra/fastembed@latest

Then configure it in your memory:

import { Memory } from '@mastra/memory'
import { Agent } from '@mastra/core/agent'
import { fastembed } from '@mastra/fastembed'
 
const agent = new Agent({
  memory: new Memory({
    embedder: fastembed,
  }),
})

PostgreSQL index optimization

When using PostgreSQL as your vector store, you can optimize semantic recall performance by configuring the vector index. This is particularly important for large-scale deployments with thousands of messages.

PostgreSQL supports both IVFFlat and HNSW indexes. By default, Mastra creates an IVFFlat index, but HNSW indexes typically provide better performance, especially with OpenAI embeddings which use inner product distance.

import { Memory } from '@mastra/memory'
import { PgStore, PgVector } from '@mastra/pg'
 
const agent = new Agent({
  memory: new Memory({
    storage: new PgStore({
      id: 'agent-storage',
      connectionString: process.env.DATABASE_URL,
    }),
    vector: new PgVector({
      id: 'agent-vector',
      connectionString: process.env.DATABASE_URL,
    }),
    options: {
      semanticRecall: {
        topK: 5,
        messageRange: 2,
        indexConfig: {
          type: 'hnsw', // Use HNSW for better performance
          metric: 'dotproduct', // Best for OpenAI embeddings
          m: 16, // Number of bi-directional links (default: 16)
          efConstruction: 64, // Size of candidate list during construction (default: 64)
        },
      },
    },
  }),
})

For detailed information about index configuration options and performance tuning, see the PgVector configuration guide.

Disable semantic recall

Semantic recall is disabled by default (semanticRecall: false). Each call adds latency because new messages are converted into embeddings and used to query a vector database before the LLM receives them.

Keep semantic recall disabled when:

  • Message history provides sufficient context for the current conversation.
  • You’re building performance-sensitive applications, like realtime two-way audio, where embedding and vector query latency is noticeable.

Viewing recalled messages

When tracing is enabled, any messages retrieved via semantic recall will appear in the agent’s trace output, alongside recent message history (if configured).

Liên kết

Xem thêm: