Retrieval in RAG systems

Trust: ★★★☆☆ (0.90) · 0 validations · developer_reference

Published: 2026-05-10 · Source: crawler_authoritative

Tình huống

Mastra SDK documentation explaining how to retrieve relevant document chunks from vector stores using semantic search, filtering, and re-ranking to power RAG applications.

Insight

Mastra provides flexible retrieval options for RAG systems with three-step retrieval: (1) user query is converted to embedding using the same model as document embeddings, (2) embedding is compared to stored embeddings using vector similarity, (3) similar chunks are retrieved with optional metadata filtering, re-ranking, or knowledge graph processing. Basic retrieval uses pgVector.query() with indexName, queryVector, and topK parameters. Results include text content, similarity score (0.0-1.0), and metadata. Metadata filtering supports MongoDB-style query syntax with operators including $gt (greater than), $lt (less than), $in (array membership), $or, $and, and simple equality. The Vector Query Tool (createVectorQueryTool from @mastra/rag) enables agents to dynamically query vector databases with optional filtering and re-ranking based on context. Database-specific configurations include: Pinecone namespaces for multi-tenant isolation; pgVector with minScore (threshold filtering), ef (HNSW search parameter), and probes (IVFFlat probe parameter); Chroma with where and whereDocument filters; LanceDB with tableName and includeAllColumns options. Vector store prompts (e.g., PGVECTOR_PROMPT, PINECONE_PROMPT, QDRANT_PROMPT, CHROMA_PROMPT, ASTRA_PROMPT, LIBSQL_PROMPT, UPSTASH_PROMPT, VECTORIZE_PROMPT, MONGODB_PROMPT, OPENSEARCH_PROMPT, S3VECTORS_PROMPT) define query patterns and filtering syntax for agent instructions. Re-ranking uses rerankWithScorer from @mastra/rag with configurable weights: semantic (semantic understanding), vector (original similarity scores), position (original ordering). Supported relevance scorers include MastraAgentRelevanceScorer, CohereRelevanceScorer, and ZeroEntropyRelevanceScorer. Runtime configuration overrides use RequestContext.set('databaseConfig', {...}).

Hành động

To perform basic retrieval: (1) Import embed from ‘ai’, PgVector from ‘@mastra/pg’, and ModelRouterEmbeddingModel from ‘@mastra/core/llm’; (2) Convert query to embedding using embed({ value: queryText, model: new ModelRouterEmbeddingModel('openai/text-embedding-3-small') }); (3) Create PgVector instance with connection string; (4) Call pgVector.query({ indexName: 'embeddings', queryVector: embedding, topK: 10 }). For metadata filtering, add filter object to query with MongoDB-style operators like { price: { $gt: 100 } }, { $or: [...] }, or { tags: { $in: ['sale', 'new'] } }. To create a Vector Query Tool for agents, use createVectorQueryTool({ vectorStoreName: 'pgVector', indexName: 'embeddings', model: new ModelRouterEmbeddingModel('...') }). Include database-specific configs in databaseConfig object (e.g., { pgvector: { minScore: 0.7, ef: 200 } }). To use vector store prompts in agents, import the appropriate prompt constant (e.g., import { PGVECTOR_PROMPT } from '@mastra/pg') and include it in agent instructions. For re-ranking, use rerankWithScorer with initial results, scorer instance, and weights configuration. Override runtime config using requestContext.set('databaseConfig', {...}) before calling execute().

Kết quả

Returns array of retrieved chunks containing text content, similarity score (0.0-1.0 range), and metadata object with source information. Metadata filtering narrows results to matching documents. Re-ranking returns improved ordering based on semantic understanding weighted by semantic, vector, and position factors.

Điều kiện áp dụng

Mastra SDK for Node.js; requires embedding model configuration (openai/text-embedding-3-small or similar)


Nội dung gốc (Original)

Retrieval in RAG systems

After storing embeddings, you need to retrieve relevant chunks to answer user queries.

Mastra provides flexible retrieval options with support for semantic search, filtering, and re-ranking.

How retrieval works

  1. The user’s query is converted to an embedding using the same model used for document embeddings
  2. This embedding is compared to stored embeddings using vector similarity
  3. The most similar chunks are retrieved and can be optionally:
  • Filtered by metadata
  • Re-ranked for better relevance
  • Processed through a knowledge graph

Basic retrieval

The simplest approach is direct semantic search. This method uses vector similarity to find chunks that are semantically similar to the query:

import { embed } from 'ai'
import { PgVector } from '@mastra/pg'
import { ModelRouterEmbeddingModel } from '@mastra/core/llm'
 
// Convert query to embedding
const { embedding } = await embed({
  value: 'What are the main points in the article?',
  model: new ModelRouterEmbeddingModel('openai/text-embedding-3-small'),
})
 
// Query vector store
const pgVector = new PgVector({
  id: 'pg-vector',
  connectionString: process.env.POSTGRES_CONNECTION_STRING,
})
const results = await pgVector.query({
  indexName: 'embeddings',
  queryVector: embedding,
  topK: 10,
})
 
// Display results
console.log(results)

The topK parameter specifies the maximum number of most similar results to return from the vector search.

Results include both the text content and a similarity score:

[
  {
    text: 'Climate change poses significant challenges...',
    score: 0.89,
    metadata: { source: 'article1.txt' },
  },
  {
    text: 'Rising temperatures affect crop yields...',
    score: 0.82,
    metadata: { source: 'article1.txt' },
  },
]

Advanced retrieval options

Metadata Filtering

Filter results based on metadata fields to narrow down the search space. This approach - combining vector similarity search with metadata filters - is sometimes called hybrid vector search, as it merges semantic search with structured filtering criteria.

This is useful when you have documents from different sources, time periods, or with specific attributes. Mastra provides a unified MongoDB-style query syntax that works across all supported vector stores.

For detailed information about available operators and syntax, see the Metadata Filters Reference.

Basic filtering examples:

// Simple equality filter
const results = await pgVector.query({
  indexName: 'embeddings',
  queryVector: embedding,
  topK: 10,
  filter: {
    source: 'article1.txt',
  },
})
 
// Numeric comparison
const results = await pgVector.query({
  indexName: 'embeddings',
  queryVector: embedding,
  topK: 10,
  filter: {
    price: { $gt: 100 },
  },
})
 
// Multiple conditions
const results = await pgVector.query({
  indexName: 'embeddings',
  queryVector: embedding,
  topK: 10,
  filter: {
    category: 'electronics',
    price: { $lt: 1000 },
    inStock: true,
  },
})
 
// Array operations
const results = await pgVector.query({
  indexName: 'embeddings',
  queryVector: embedding,
  topK: 10,
  filter: {
    tags: { $in: ['sale', 'new'] },
  },
})
 
// Logical operators
const results = await pgVector.query({
  indexName: 'embeddings',
  queryVector: embedding,
  topK: 10,
  filter: {
    $or: [{ category: 'electronics' }, { category: 'accessories' }],
    $and: [{ price: { $gt: 50 } }, { price: { $lt: 200 } }],
  },
})

Common use cases for metadata filtering:

  • Filter by document source or type
  • Filter by date ranges
  • Filter by specific categories or tags
  • Filter by numerical ranges (e.g., price, rating)
  • Combine multiple conditions for precise querying
  • Filter by document attributes (e.g., language, author)

Vector Query Tool

Sometimes you want to give your agent the ability to query a vector database directly. The Vector Query Tool allows your agent to be in charge of retrieval decisions, combining semantic search with optional filtering and reranking based on the agent’s understanding of the user’s needs.

import { createVectorQueryTool } from '@mastra/rag'
import { ModelRouterEmbeddingModel } from '@mastra/core/llm'
 
const vectorQueryTool = createVectorQueryTool({
  vectorStoreName: 'pgVector',
  indexName: 'embeddings',
  model: new ModelRouterEmbeddingModel('openai/text-embedding-3-small'),
})

When creating the tool, pay special attention to the tool’s name and description - these help the agent understand when and how to use the retrieval capabilities. For example, you might name it “SearchKnowledgeBase” and describe it as “Search through our documentation to find relevant information about X topic.”

This is particularly useful when:

  • Your agent needs to dynamically decide what information to retrieve
  • The retrieval process requires complex decision-making
  • You want the agent to combine multiple retrieval strategies based on context

Database-Specific Configurations

The Vector Query Tool supports database-specific configurations that enable you to leverage unique features and optimizations of different vector stores.

Note: These configurations are for query-time options like namespaces, performance tuning, and filtering—not for database connection setup.

Connection credentials (URLs, auth tokens) are configured when you instantiate the vector store class (e.g., new LibSQLVector({ url: '...' })).

import { createVectorQueryTool } from '@mastra/rag'
import { ModelRouterEmbeddingModel } from '@mastra/core/llm'
 
// Pinecone with namespace
const pineconeQueryTool = createVectorQueryTool({
  vectorStoreName: 'pinecone',
  indexName: 'docs',
  model: new ModelRouterEmbeddingModel('openai/text-embedding-3-small'),
  databaseConfig: {
    pinecone: {
      namespace: 'production', // Isolate data by environment
    },
  },
})
 
// pgVector with performance tuning
const pgVectorQueryTool = createVectorQueryTool({
  vectorStoreName: 'postgres',
  indexName: 'embeddings',
  model: new ModelRouterEmbeddingModel('openai/text-embedding-3-small'),
  databaseConfig: {
    pgvector: {
      minScore: 0.7, // Filter low-quality results
      ef: 200, // HNSW search parameter
      probes: 10, // IVFFlat probe parameter
    },
  },
})
 
// Chroma with advanced filtering
const chromaQueryTool = createVectorQueryTool({
  vectorStoreName: 'chroma',
  indexName: 'documents',
  model: new ModelRouterEmbeddingModel('openai/text-embedding-3-small'),
  databaseConfig: {
    chroma: {
      where: { category: 'technical' },
      whereDocument: { $contains: 'API' },
    },
  },
})
 
// LanceDB with table specificity
const lanceQueryTool = createVectorQueryTool({
  vectorStoreName: 'lance',
  indexName: 'documents',
  model: new ModelRouterEmbeddingModel('openai/text-embedding-3-small'),
  databaseConfig: {
    lance: {
      tableName: 'myVectors', // Specify which table to query
      includeAllColumns: true, // Include all metadata columns in results
    },
  },
})

Key Benefits:

  • Pinecone namespaces: Organize vectors by tenant, environment, or data type
  • pgVector optimization: Control search accuracy and speed with ef/probes parameters
  • Quality filtering: Set minimum similarity thresholds to improve result relevance
  • LanceDB tables: Separate data into tables for better organization and performance
  • Runtime flexibility: Override configurations dynamically based on context

Common Use Cases:

  • Multi-tenant applications using Pinecone namespaces
  • Performance optimization in high-load scenarios
  • Environment-specific configurations (dev/staging/prod)
  • Quality-gated search results
  • Embedded, file-based vector storage with LanceDB for edge deployment scenarios

You can also override these configurations at runtime using the request context:

import { RequestContext } from '@mastra/core/request-context'
 
const requestContext = new RequestContext()
requestContext.set('databaseConfig', {
  pinecone: {
    namespace: 'runtime-namespace',
  },
})
 
await pineconeQueryTool.execute({ queryText: 'search query' }, { mastra, requestContext })

For detailed configuration options and advanced usage, see the Vector Query Tool Reference.

Vector Store Prompts

Vector store prompts define query patterns and filtering capabilities for each vector database implementation. When implementing filtering, these prompts are required in the agent’s instructions to specify valid operators and syntax for each vector store implementation.

pgVector:

import { PGVECTOR_PROMPT } from '@mastra/pg'
 
export const ragAgent = new Agent({
  id: 'rag-agent',
  name: 'RAG Agent',
  model: 'openai/gpt-5.4',
  instructions: `
  Process queries using the provided context. Structure responses to be concise and relevant.
  ${PGVECTOR_PROMPT}
  `,
  tools: { vectorQueryTool },
})

Pinecone:

import { PINECONE_PROMPT } from '@mastra/pinecone'
 
export const ragAgent = new Agent({
  id: 'rag-agent',
  name: 'RAG Agent',
  model: 'openai/gpt-5.4',
  instructions: `
  Process queries using the provided context. Structure responses to be concise and relevant.
  ${PINECONE_PROMPT}
  `,
  tools: { vectorQueryTool },
})

Qdrant:

import { QDRANT_PROMPT } from '@mastra/qdrant'
 
export const ragAgent = new Agent({
  id: 'rag-agent',
  name: 'RAG Agent',
  model: 'openai/gpt-5.4',
  instructions: `
  Process queries using the provided context. Structure responses to be concise and relevant.
  ${QDRANT_PROMPT}
  `,
  tools: { vectorQueryTool },
})

Chroma:

import { CHROMA_PROMPT } from '@mastra/chroma'
 
export const ragAgent = new Agent({
  id: 'rag-agent',
  name: 'RAG Agent',
  model: 'openai/gpt-5.4',
  instructions: `
  Process queries using the provided context. Structure responses to be concise and relevant.
  ${CHROMA_PROMPT}
  `,
  tools: { vectorQueryTool },
})

Astra:

import { ASTRA_PROMPT } from '@mastra/astra'
 
export const ragAgent = new Agent({
  id: 'rag-agent',
  name: 'RAG Agent',
  model: 'openai/gpt-5.4',
  instructions: `
  Process queries using the provided context. Structure responses to be concise and relevant.
  ${ASTRA_PROMPT}
  `,
  tools: { vectorQueryTool },
})

libSQL:

import { LIBSQL_PROMPT } from '@mastra/libsql'
 
export const ragAgent = new Agent({
  id: 'rag-agent',
  name: 'RAG Agent',
  model: 'openai/gpt-5.4',
  instructions: `
  Process queries using the provided context. Structure responses to be concise and relevant.
  ${LIBSQL_PROMPT}
  `,
  tools: { vectorQueryTool },
})

Upstash:

import { UPSTASH_PROMPT } from '@mastra/upstash'
 
export const ragAgent = new Agent({
  id: 'rag-agent',
  name: 'RAG Agent',
  model: 'openai/gpt-5.4',
  instructions: `
  Process queries using the provided context. Structure responses to be concise and relevant.
  ${UPSTASH_PROMPT}
  `,
  tools: { vectorQueryTool },
})

Vectorize:

import { VECTORIZE_PROMPT } from '@mastra/vectorize'
 
export const ragAgent = new Agent({
  id: 'rag-agent',
  name: 'RAG Agent',
  model: 'openai/gpt-5.4',
  instructions: `
  Process queries using the provided context. Structure responses to be concise and relevant.
  ${VECTORIZE_PROMPT}
  `,
  tools: { vectorQueryTool },
})

MongoDB:

import { MONGODB_PROMPT } from '@mastra/mongodb'
 
export const ragAgent = new Agent({
  id: 'rag-agent',
  name: 'RAG Agent',
  model: 'openai/gpt-5.4',
  instructions: `
  Process queries using the provided context. Structure responses to be concise and relevant.
  ${MONGODB_PROMPT}
  `,
  tools: { vectorQueryTool },
})

OpenSearch:

import { OPENSEARCH_PROMPT } from '@mastra/opensearch'
 
export const ragAgent = new Agent({
  id: 'rag-agent',
  name: 'RAG Agent',
  model: 'openai/gpt-5.4',
  instructions: `
  Process queries using the provided context. Structure responses to be concise and relevant.
  ${OPENSEARCH_PROMPT}
  `,
  tools: { vectorQueryTool },
})

S3Vectors:

import { S3VECTORS_PROMPT } from '@mastra/s3vectors'
 
export const ragAgent = new Agent({
  id: 'rag-agent',
  name: 'RAG Agent',
  model: 'openai/gpt-5.4',
  instructions: `
  Process queries using the provided context. Structure responses to be concise and relevant.
  ${S3VECTORS_PROMPT}
  `,
  tools: { vectorQueryTool },
})

Re-ranking

Initial vector similarity search can sometimes miss nuanced relevance. Re-ranking is a more computationally expensive process, but more accurate algorithm that improves results by:

  • Considering word order and exact matches
  • Applying more sophisticated relevance scoring
  • Using a method called cross-attention between query and documents

Here’s how to use re-ranking:

import { rerankWithScorer as rerank, MastraAgentRelevanceScorer } from '@mastra/rag'
 
// Get initial results from vector search
const initialResults = await pgVector.query({
  indexName: 'embeddings',
  queryVector: queryEmbedding,
  topK: 10,
})
 
// Create a relevance scorer
const relevanceProvider = new MastraAgentRelevanceScorer('relevance-scorer', 'openai/gpt-5.4')
 
// Re-rank the results
const rerankedResults = await rerank({
  results: initialResults,
  query,
  scorer: relevanceProvider,
  options: {
    weights: {
      semantic: 0.5, // How well the content matches the query semantically
      vector: 0.3, // Original vector similarity score
      position: 0.2, // Preserves original result ordering
    },
    topK: 10,
  },
})

The weights control how different factors influence the final ranking:

  • semantic: Higher values prioritize semantic understanding and relevance to the query
  • vector: Higher values favor the original vector similarity scores
  • position: Higher values help maintain the original ordering of results

Note: For semantic scoring to work properly during re-ranking, each result must include the text content in its metadata.text field.

You can also use other relevance score providers like Cohere or ZeroEntropy:

const relevanceProvider = new CohereRelevanceScorer('rerank-v3.5')
const relevanceProvider = new ZeroEntropyRelevanceScorer('zerank-1')

The re-ranked results combine vector similarity with semantic understanding to improve retrieval quality.

For more details about re-ranking, see the rerank() method.

For graph-based retrieval that follows connections between chunks, see the GraphRAG documentation.

Liên kết

Xem thêm: