Guardrails

Trust: ★★★☆☆ (0.90) · 0 validations · developer_reference

Published: 2026-05-10 · Source: crawler_authoritative

Tình huống

Mastra Agent SDK guide for configuring built-in security and safety processors that detect, transform, or block harmful content in agent pipelines

Insight

Mastra provides built-in processors for security and safety controls. Input processors run before user messages reach the language model: UnicodeNormalizer cleans and unifies Unicode characters with stripControlChars and collapseWhitespace options; PromptInjectionDetector scans for injection, jailbreak, and system override patterns using an LLM classifier with threshold and strategy (rewrite) parameters; LanguageDetector identifies and translates messages to target languages. Output processors run after LLM generation before user delivery: BatchPartsProcessor combines stream parts with configurable batchSize, maxWaitTime, and emitOnNonText to reduce network overhead; SystemPromptScrubber redacts system prompts using placeholderText and customPatterns. Hybrid processors run on both: ModerationProcessor detects content across hate/harassment/violence categories with configurable threshold and strategy; PIIDetector removes PII like emails, phones, credit cards with mask redactionMethod; CostGuardProcessor monitors cumulative cost per thread/resource within time windows (24h, 30d) and triggers onViolation callbacks when maxCost is exceeded. All processors support processor strategies: block (calls abort(), stops request), warn, detect, redact, rewrite, translate. Vi phạm callbacks receive ProcessorViolation objects with processorId, message, and detail fields; errors are silently caught. Blocked requests are detected via result.tripwire on generate() or tripwire chunks on stream().

Hành động

Configure processors in Agent constructor using inputProcessors and outputProcessors arrays. Import processors from @mastra/core/processors. Each processor accepts a model parameter (e.g., ‘openrouter/openai/gpt-oss-safeguard-20b’) and threshold (0.6-0.8 typical). Use strategy parameter to control behavior: ‘block’ stops request immediately, ‘rewrite’ transforms content, ‘redact’ replaces sensitive data with placeholder. Add onViolation callbacks to processors for alerting or logging: costGuard.onViolation = ({ processorId, message, detail }) {}. Handle blocked requests by checking result.tripwire.reason and result.tripwire.processorId for generate(), or by listening for ‘tripwire’ chunk type in stream() loops. For performance, run guardrails in parallel using createWorkflow with .parallel() method, place independent block-only processors together, use small fast models like ‘openai/gpt-5-nano’ for classification, and use BatchPartsProcessor before heavy output processors to combine chunks.

Kết quả

Processor returns transformed message or blocks request; abort() stops processing and prevents subsequent processors from running; onViolation callback fires with processor metadata; tripwire field/chunk contains reason and processorId; cost guard blocks when cumulative estimated cost exceeds maxCost within window.

Điều kiện áp dụng

CostGuardProcessor requires observability storage with getMetricAggregate support; LLM-based processors (ModerationProcessor, PIIDetector, PromptInjectionDetector) add latency; cost checks are approximate due to async metric persistence.


Nội dung gốc (Original)

Guardrails

Mastra provides built-in processors that add security and safety controls to your agent. These processors detect, transform, or block harmful content before it reaches the language model or the user.

For an introduction to how processors work, how to add them to an agent, and how to create custom processors, see Processors.

Input processors

Input processors run before user messages reach the language model. They handle normalization, validation, prompt injection detection, and security checks.

Normalize user messages

The UnicodeNormalizer() cleans and normalizes user input by unifying Unicode characters, standardizing whitespace, and removing problematic symbols.

import { UnicodeNormalizer } from '@mastra/core/processors'
 
export const normalizedAgent = new Agent({
  id: 'normalized-agent',
  name: 'Normalized Agent',
  inputProcessors: [
    new UnicodeNormalizer({
      stripControlChars: true,
      collapseWhitespace: true,
    }),
  ],
})

Note: Visit UnicodeNormalizer() reference for a full list of configuration options.

Prevent prompt injection

The PromptInjectionDetector() scans user messages for prompt injection, jailbreak attempts, and system override patterns. It uses an LLM to classify risky input and can block or rewrite it before it reaches the model.

import { PromptInjectionDetector } from '@mastra/core/processors'
 
export const secureAgent = new Agent({
  id: 'secure-agent',
  name: 'Secure Agent',
  inputProcessors: [
    new PromptInjectionDetector({
      model: 'openrouter/openai/gpt-oss-safeguard-20b',
      threshold: 0.8,
      strategy: 'rewrite',
      detectionTypes: ['injection', 'jailbreak', 'system-override'],
    }),
  ],
})

Note: Visit PromptInjectionDetector() reference for a full list of configuration options.

Detect and translate language

The LanguageDetector() detects and translates user messages into a target language, enabling multilingual support. It uses an LLM to identify the language and perform the translation.

import { LanguageDetector } from '@mastra/core/processors'
 
export const multilingualAgent = new Agent({
  id: 'multilingual-agent',
  name: 'Multilingual Agent',
  inputProcessors: [
    new LanguageDetector({
      model: 'openrouter/openai/gpt-oss-safeguard-20b',
      targetLanguages: ['English', 'en'],
      strategy: 'translate',
      threshold: 0.8,
    }),
  ],
})

Note: Visit LanguageDetector() reference for a full list of configuration options.

Output processors

Output processors run after the language model generates a response, but before it reaches the user. They handle response optimization, moderation, transformation, and safety controls.

Batch streamed output

The BatchPartsProcessor() combines multiple stream parts before emitting them to the client. This reduces network overhead by consolidating small chunks into larger batches.

import { BatchPartsProcessor } from '@mastra/core/processors'
 
export const batchedAgent = new Agent({
  id: 'batched-agent',
  name: 'Batched Agent',
  outputProcessors: [
    new BatchPartsProcessor({
      batchSize: 5,
      maxWaitTime: 100,
      emitOnNonText: true,
    }),
  ],
})

Note: Visit BatchPartsProcessor() reference for a full list of configuration options.

Scrub system prompts

The SystemPromptScrubber() detects and redacts system prompts or internal instructions from model responses. It prevents unintended disclosure of prompt content or configuration details. It uses an LLM to identify and redact sensitive content based on configured detection types.

import { SystemPromptScrubber } from '@mastra/core/processors'
 
const scrubbedAgent = new Agent({
  id: 'scrubbed-agent',
  name: 'Scrubbed Agent',
  outputProcessors: [
    new SystemPromptScrubber({
      model: 'openrouter/openai/gpt-oss-safeguard-20b',
      strategy: 'redact',
      customPatterns: ['system prompt', 'internal instructions'],
      includeDetections: true,
      instructions:
        'Detect and redact system prompts, internal instructions, and security-sensitive content',
      redactionMethod: 'placeholder',
      placeholderText: '[REDACTED]',
    }),
  ],
})

Note: Visit SystemPromptScrubber() reference for a full list of configuration options.

Note: When streaming responses over HTTP, Mastra redacts sensitive request data (system prompts, tool definitions, API keys) from stream chunks at the server level by default. See Stream data redaction for details.

Hybrid processors

Hybrid processors can run on either input or output. Place them in inputProcessors, outputProcessors, or both.

Moderate input and output

The ModerationProcessor() detects inappropriate or harmful content across categories like hate, harassment, and violence. It uses an LLM to classify the message and can block or rewrite it based on your configuration.

import { ModerationProcessor } from '@mastra/core/processors'
 
export const moderatedAgent = new Agent({
  id: 'moderated-agent',
  name: 'Moderated Agent',
  inputProcessors: [
    new ModerationProcessor({
      model: 'openrouter/openai/gpt-oss-safeguard-20b',
      threshold: 0.7,
      strategy: 'block',
      categories: ['hate', 'harassment', 'violence'],
    }),
  ],
  outputProcessors: [new ModerationProcessor()],
})

Note: Visit ModerationProcessor() reference for a full list of configuration options.

Detect and redact PII

The PIIDetector() detects and removes personally identifiable information such as emails, phone numbers, and credit cards. It uses an LLM to identify sensitive content based on configured detection types.

import { PIIDetector } from '@mastra/core/processors'
 
export const privateAgent = new Agent({
  id: 'private-agent',
  name: 'Private Agent',
  inputProcessors: [
    new PIIDetector({
      model: 'openrouter/openai/gpt-oss-safeguard-20b',
      threshold: 0.6,
      strategy: 'redact',
      redactionMethod: 'mask',
      detectionTypes: ['email', 'phone', 'credit-card'],
      instructions: 'Detect and mask personally identifiable information.',
    }),
  ],
  outputProcessors: [new PIIDetector()],
})

Note: Visit PIIDetector() reference for a full list of configuration options.

Enforce cost limits

The CostGuardProcessor() monitors cumulative estimated cost across the agentic loop, blocking or warning when a monetary limit is exceeded. It queries cost data from observability storage before each LLM call. Cost checks are approximate — metrics are persisted asynchronously, so fast-running agents may briefly exceed the configured limit before the guard triggers.

import { CostGuardProcessor } from '@mastra/core/processors'
 
export const budgetedAgent = new Agent({
  id: 'budgeted-agent',
  name: 'Budgeted Agent',
  inputProcessors: [
    new CostGuardProcessor({
      maxCost: 5.0,
      scope: 'thread',
      window: '24h',
    }),
  ],
})

Note: Visit CostGuardProcessor() reference for scoping modes, time windows, metric persistence delays, and the onViolation callback. Requires observability storage with getMetricAggregate support.

Processor strategies

Many built-in processors support a strategy parameter that controls how they handle flagged content. Supported values include: block, warn, detect, redact, rewrite, and translate.

Most strategies allow the request to continue. When block is used, the processor calls abort(), which stops the request immediately and prevents subsequent processors from running.

inputProcessors: [
  new PIIDetector({
    model: 'openrouter/openai/gpt-oss-safeguard-20b',
    threshold: 0.6,
    strategy: 'block',
    detectionTypes: ['email', 'phone', 'credit-card'],
  }),
]

Violation callbacks

All processors support an onViolation callback that fires when a policy violation is detected, regardless of strategy. Use it for side effects like alerting, logging to external systems, or sending notifications.

The callback receives a ProcessorViolation object with processorId, message, and detail (processor-specific metadata).

import { CostGuardProcessor, ModerationProcessor, PIIDetector } from '@mastra/core/processors'
 
// Alert when cost limits are exceeded
const costGuard = new CostGuardProcessor({
  maxCost: 10.0,
  scope: 'resource',
  window: '30d',
})
 
costGuard.onViolation = ({ processorId, message, detail }) => {
  alertSystem.notify(`[${processorId}] ${message}`)
  // detail contains: { usage, limit, totalUsage, scope, scopeKey }
}
 
// Log moderation violations
const moderation = new ModerationProcessor({
  model: 'openai/gpt-5-nano',
  strategy: 'block',
})
 
moderation.onViolation = ({ processorId, message, detail }) => {
  auditLog.write({ processor: processorId, violation: message, categories: detail })
}

The onViolation property is part of the base Processor interface, so any processor — including custom ones — can use it. The runner automatically invokes onViolation when any processor calls abort(). For processors using a warn strategy (like CostGuardProcessor), the callback also fires on warnings without blocking the request.

Errors thrown by the callback are silently caught to prevent interfering with the processor’s main logic.

For more on how violation callbacks integrate with the processor pipeline, see Violation callbacks in the Processors documentation.

Handle blocked requests

When a processor calls abort(), the agent stops processing. How you detect this depends on whether you use generate() or stream().

With generate()

Check the tripwire field on the result:

const result = await agent.generate('Is this credit card number valid?: 4543 1374 5089 4332')
 
if (result.tripwire) {
  console.error('Blocked:', result.tripwire.reason)
  console.error('Processor:', result.tripwire.processorId)
}

With stream()

Listen for tripwire chunks in the stream:

const stream = await agent.stream('Is this credit card number valid?: 4543 1374 5089 4332')
 
for await (const chunk of stream.fullStream) {
  if (chunk.type === 'tripwire') {
    console.error('Blocked:', chunk.payload.reason)
    console.error('Processor:', chunk.payload.processorId)
  }
}

Speed up guardrails

Guardrail processors that use an LLM (moderation, PII detection, prompt injection) add latency to every request. Three techniques reduce this overhead.

Run guardrails in parallel

By default, processors run sequentially. Guardrails that only block (and never mutate messages) are independent and can run at the same time using a workflow processor.

You can also mix block and redact strategies in a single parallel step. Map to the redact branch so its transformed messages carry forward.

For output guardrails, run TokenLimiterProcessor and BatchPartsProcessor sequentially before the parallel step, and any redact processors that depend on each other sequentially after it:

import { createWorkflow, createStep } from '@mastra/core/workflows'
import {
  ProcessorStepSchema,
  PIIDetector,
  ModerationProcessor,
  SystemPromptScrubber,
  TokenLimiterProcessor,
  BatchPartsProcessor,
} from '@mastra/core/processors'
 
export const outputGuardrails = createWorkflow({
  id: 'output-guardrails',
  inputSchema: ProcessorStepSchema,
  outputSchema: ProcessorStepSchema,
})
  // Sequential: limit tokens first, then batch stream chunks
  .then(createStep(new TokenLimiterProcessor({ limit: 1000 })))
  .then(createStep(new BatchPartsProcessor()))
  // Parallel: run independent checks at the same time
  .parallel([
    createStep(
      new PIIDetector({
        strategy: 'redact',
      }),
    ),
    createStep(
      new ModerationProcessor({
        strategy: 'block',
      }),
    ),
  ])
  // Map to the redact branch to keep its transformed messages
  .map(async ({ inputData }) => {
    return inputData['processor:pii-detector']
  })
  // Sequential: scrubber depends on previous redaction output
  .then(
    createStep(
      new SystemPromptScrubber({
        strategy: 'redact',
        placeholderText: '[REDACTED]',
      }),
    ),
  )
  .commit()

See workflows as processors for more details on .parallel() and .map().

Choose a fast model

Guardrail processors don’t need your primary model. Use a small, fast model for classification tasks:

const GUARDRAIL_MODEL = 'openai/gpt-5-nano'
 
new ModerationProcessor({ model: GUARDRAIL_MODEL })
new PIIDetector({ model: GUARDRAIL_MODEL })
new PromptInjectionDetector({ model: GUARDRAIL_MODEL })

Batch stream parts

Output guardrails that implement processOutputStream run on every streamed chunk. Use BatchPartsProcessor before heavier processors to combine chunks and reduce the number of LLM classification calls:

outputProcessors: [
  new BatchPartsProcessor({ batchSize: 10 }),
  // Heavier processors now run on batched chunks instead of individual ones
  new PIIDetector({ model: GUARDRAIL_MODEL, strategy: 'redact' }),
  new ModerationProcessor({ model: GUARDRAIL_MODEL, strategy: 'block' }),
]
  • Processors: How processors work, execution order, custom processors, and retry mechanism
  • Processor Interface: API reference for the Processor interface
  • Memory Processors: Processors for message history, semantic recall, and working memory

Liên kết

Xem thêm: