Tham số Effort trong Claude API - Kiểm soát mức độ chi tiết phản hồi
Trust: ★★★☆☆ (0.90) · 0 validations · factual
Published: 2026-05-09 · Source: crawler_authoritative
Tình huống
Developer đang sử dụng Claude API (Anthropic) để xây dựng ứng dụng AI, cần tối ưu hóa giữa chất lượng phản hồi và chi phí token.
Insight
Tham số effort cho phép kiểm soát mức độ ’ háo hức’ của Claude trong việc sử dụng token khi trả lời yêu cầu. Mặc định Claude sử dụng mức ‘high’ (cao), chi tiêu đủ token để đạt kết quả xuất sắc. Effort ảnh hưởng đến TẤT CẢ token trong phản hồi: văn bản, tool calls, và extended thinking. Ưu điểm: không cần bật thinking vẫn dùng được, kiểm soát được cả tool calls. Các mức effort: max (không giới hạn, chỉ available trên Claude Mythos Preview, Claude Opus 4.7, Opus 4.6, Sonnet 4.6) cho reasoning sâu nhất; xhigh (chỉ Opus 4.7) cho công việc dài hạn với token budget hàng triệu; high (mặc định, tương đương không đặt effort) cho reasoning phức tạp; medium cho cân bằng tốc độ/cost/performance; low cho task đơn giản, tốc độ nhanh nhất. Effort là tín hiệu hành vi, không phải token budget cứng - ở mức thấp Claude vẫn suy nghĩ với bài toán khó đủ. Claude Opus 4.7 tôn trọng effort nghiêm ngặt hơn Opus 4.6, model sẽ giới hạn công việc theo yêu cầu thay vì làm hơn thế. Tương thích Zero Data Retention (ZDR).
Hành động
Đặt effort trong output_config khi gọi API. Với Claude Opus 4.7: bắt đầu với ‘xhigh’ cho coding/agentic work, dùng ‘high’ làm tối thiểu cho workload nhạy cảm trí tuệ, giảm xuống ‘medium’ cho cost-sensitive, lên ‘max’ chỉ khi eval cho thấy headroom. Với Sonnet 4.6: ‘medium’ là default recommended cho agentic coding và tool-heavy workflows, ‘low’ cho chat và use case không phải coding. Khi dùng xhigh/max, đặt max_tokens lớn (recommend 64k tokens trở lên) để model có không gian suy nghĩ. Không cần bật thinking configuration - effort hoạt động độc lập. Với Opus 4.6 và Sonnet 4.6, effort thay thế budget_tokens (deprecated).
Điều kiện áp dụng
Chỉ áp dụng cho Claude API (Anthropic). Supported models: Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, Claude Sonnet 4.6. Không hỗ trợ Claude Opus 4.5 và các Claude 4 models khác (các model này dùng manual thinking với budget_tokens).
Nội dung gốc (Original)
Effort
Control how many tokens Claude uses when responding with the effort parameter, trading off between response thoroughness and token efficiency.
The effort parameter allows you to control how eager Claude is about spending tokens when responding to requests. This gives you the ability to trade off between response thoroughness and token efficiency, all with a single model. The effort parameter is generally available on all supported models with no beta header required.
How effort works
By default, Claude uses high effort, spending as many tokens as needed for excellent results. You can raise the effort level to max for the absolute highest capability, or lower it to be more conservative with token usage, optimizing for speed and cost while accepting some reduction in capability.
The effort parameter affects all tokens in the response, including:
- Text responses and explanations
- Tool calls and function arguments
- Extended thinking (when enabled)
This approach has two major advantages:
- It doesn’t require thinking to be enabled in order to use it.
- It can affect all token spend including tool calls. For example, lower effort would mean Claude makes fewer tool calls. This gives a much greater degree of control over efficiency.
Effort levels
| Level | Description | Typical use case |
|---|---|---|
max | Absolute maximum capability with no constraints on token spending. Available on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6. | Tasks requiring the deepest possible reasoning and most thorough analysis |
xhigh | Extended capability for long-horizon work. Available on Claude Opus 4.7. | Long-running agentic and coding tasks (over 30 minutes) with token budgets in the millions |
high | High capability. Equivalent to not setting the parameter. | Complex reasoning, difficult coding problems, agentic tasks |
medium | Balanced approach with moderate token savings. | Agentic tasks that require a balance of speed, cost, and performance |
low | Most efficient. Significant token savings with some capability reduction. | Simpler tasks that need the best speed and lowest costs, such as subagents |
Recommended effort levels for Sonnet 4.6
Sonnet 4.6 defaults to high effort. Explicitly set effort when using Sonnet 4.6 to avoid unexpected latency:
- Medium effort (recommended default): Best balance of speed, cost, and performance for most applications. Suitable for agentic coding, tool-heavy workflows, and code generation.
- Low effort: For high-volume or latency-sensitive workloads. Suitable for chat and non-coding use cases where faster turnaround is prioritized.
- High effort: For tasks requiring maximum intelligence from Sonnet 4.6.
- Max effort: For tasks requiring the absolute highest capability with no constraints on token spending.
Recommended effort levels for Claude Opus 4.7
Start with xhigh for coding and agentic use cases, and use high as the minimum for most intelligence-sensitive workloads. Step down to medium for cost-sensitive workloads, or up to max only when your evals show measurable headroom at xhigh.
The API default is high. To use xhigh, set effort explicitly; the value you pass overrides the default.
| Effort | Guidance for Claude Opus 4.7 |
|---|---|
low | Efficient, but best for short, scoped tasks. Pair low with explicit checklists if your task has multiple sections. |
medium | The drop-in for the average workflow where you want good results while reducing costs. |
high | Advanced use cases that still need a balance of intelligence and token consumption. This is often the sweet spot balancing quality and token efficiency. |
xhigh | The recommended starting point for coding and agentic work, and for exploratory tasks such as repeated tool calling, detailed web search, and knowledge-base search. Expect meaningfully higher token usage than high. |
max | Reserve for genuinely frontier problems. On most workloads max adds significant cost for relatively small quality gains, and on some structured-output or less intelligence-sensitive tasks it can lead to overthinking. |
Claude Opus 4.7 also respects effort levels more strictly than Claude Opus 4.6, especially at low and medium. At lower effort levels, the model scopes its work to what was asked rather than going above and beyond. If you observe shallow reasoning on complex problems with Claude Opus 4.7, raise effort rather than prompting around it. If you must keep effort low for latency, add targeted guidance like “This task involves multi-step reasoning. Think carefully before responding.”
When running Claude Opus 4.7 at xhigh or max effort, set a large max_tokens so the model has room to think and act across subagents and tool calls. Starting at 64k tokens and tuning from there is a reasonable default.
Basic usage
ant messages create --transform 'content.0.text' --format yaml <<'YAML'
model: claude-opus-4-7
max_tokens: 4096
messages:
- role: user
content: Analyze the trade-offs between microservices and monolithic architectures
output_config:
effort: medium
YAMLimport anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=4096,
messages=[
{
"role": "user",
"content": "Analyze the trade-offs between microservices and monolithic architectures",
}
],
output_config={"effort": "medium"},
)
print(response.content[0].text)import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const response = await client.messages.create({
model: "claude-opus-4-7",
max_tokens: 4096,
messages: [
{
role: "user",
content: "Analyze the trade-offs between microservices and monolithic architectures"
}
],
output_config: {
effort: "medium"
}
});
const textBlock = response.content.find(
(block): block is Anthropic.TextBlock => block.type === "text"
);
console.log(textBlock?.text);using System;
using System.Threading.Tasks;
using Anthropic;
using Anthropic.Models.Messages;
class Program
{
static async Task Main(string[] args)
{
AnthropicClient client = new();
var parameters = new MessageCreateParams
{
Model = Model.ClaudeOpus4_7,
MaxTokens = 4096,
Messages = [new() { Role = Role.User, Content = "Analyze the trade-offs between microservices and monolithic architectures" }],
OutputConfig = new OutputConfig
{
Effort = Effort.Medium
}
};
var message = await client.Messages.Create(parameters);
Console.WriteLine(message);
}
}package main
import (
"context"
"fmt"
"log"
"github.com/anthropics/anthropic-sdk-go"
)
func main() {
client := anthropic.NewClient()
response, err := client.Messages.New(context.TODO(), anthropic.MessageNewParams{
Model: anthropic.ModelClaudeOpus4_7,
MaxTokens: 4096,
Messages: []anthropic.MessageParam{
anthropic.NewUserMessage(anthropic.NewTextBlock("Analyze the trade-offs between microservices and monolithic architectures")),
},
OutputConfig: anthropic.OutputConfigParam{
Effort: anthropic.OutputConfigEffortMedium,
},
})
if err != nil {
log.Fatal(err)
}
fmt.Println(response.Content[0].Text)
}import com.anthropic.client.AnthropicClient;
import com.anthropic.client.okhttp.AnthropicOkHttpClient;
import com.anthropic.models.messages.MessageCreateParams;
import com.anthropic.models.messages.Message;
import com.anthropic.models.messages.Model;
import com.anthropic.models.messages.OutputConfig;
public class Main {
public static void main(String[] args) {
AnthropicClient client = AnthropicOkHttpClient.fromEnv();
MessageCreateParams params = MessageCreateParams.builder()
.model(Model.CLAUDE_OPUS_4_7)
.maxTokens(4096L)
.addUserMessage("Analyze the trade-offs between microservices and monolithic architectures")
.outputConfig(OutputConfig.builder()
.effort(OutputConfig.Effort.MEDIUM)
.build())
.build();
Message response = client.messages().create(params);
response.content().stream()
.flatMap(block -> block.text().stream())
.forEach(textBlock -> System.out.println(textBlock.text()));
}
}<?php
use Anthropic\Client;
$client = new Client(apiKey: getenv("ANTHROPIC_API_KEY"));
$message = $client->messages->create(
maxTokens: 4096,
messages: [
['role' => 'user', 'content' => 'Analyze the trade-offs between microservices and monolithic architectures']
],
model: 'claude-opus-4-7',
outputConfig: ['effort' => 'medium'],
);
echo $message->content[0]->text;require "anthropic"
client = Anthropic::Client.new
message = client.messages.create(
model: "claude-opus-4-7",
max_tokens: 4096,
messages: [
{ role: "user", content: "Analyze the trade-offs between microservices and monolithic architectures" }
],
output_config: {
effort: "medium"
}
)
puts message.content.first.textWhen to adjust the effort parameter
- Use max effort when you need the absolute highest capability with no constraints: the most thorough reasoning and deepest analysis. Available on Claude Mythos Preview, Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6.
- Use xhigh effort for advanced coding and complex agentic work requiring extended exploration, such as repeated tool calling and detailed search. Available on Claude Opus 4.7.
- Use high effort (the default) when you need Claude’s best work: complex reasoning, nuanced analysis, difficult coding problems, or any task where quality is the top priority.
- Use medium effort as a balanced option when you want solid performance without the full token expenditure of high effort.
- Use low effort when you’re optimizing for speed (because Claude answers with fewer tokens) or cost. For example, simple classification tasks, quick lookups, or high-volume use cases where marginal quality improvements don’t justify additional latency or spend.
Effort with tool use
When using tools, the effort parameter affects both the explanations around tool calls and the tool calls themselves. Lower effort levels tend to:
- Combine multiple operations into fewer tool calls
- Make fewer tool calls
- Proceed directly to action without preamble
- Use terse confirmation messages after completion
Higher effort levels may:
- Make more tool calls
- Explain the plan before taking action
- Provide detailed summaries of changes
- Include more comprehensive code comments
Effort with extended thinking
The effort parameter works alongside extended thinking. Its behavior depends on the model:
- Claude Mythos Preview uses adaptive thinking by default (no
thinkingconfiguration required).thinking: {type: "disabled"}is rejected. Effort controls thinking depth the same way as on Opus 4.7 and Opus 4.6. - Claude Opus 4.7 uses adaptive thinking (
thinking: {type: "adaptive"}), where effort is the recommended control for thinking depth. Manual extended thinking (thinking: {type: "enabled", budget_tokens: N}) is no longer supported on Opus 4.7; use adaptive thinking with effort instead. Athigh,xhigh, andmaxeffort, Claude almost always thinks deeply. At lower levels, it may skip thinking for simpler problems. - Claude Opus 4.6 uses adaptive thinking (
thinking: {type: "adaptive"}), where effort is the recommended control for thinking depth. Whilebudget_tokensis still accepted on Opus 4.6, it is deprecated and will be removed in a future release. Athighandmaxeffort, Claude almost always thinks deeply. At lower levels, it may skip thinking for simpler problems. - Claude Sonnet 4.6 uses adaptive thinking (where effort controls thinking depth). Manual thinking with interleaved mode (
thinking: {type: "enabled", budget_tokens: N}) is still functional but deprecated. - Claude Opus 4.5 and other Claude 4 models use manual thinking (
thinking: {type: "enabled", budget_tokens: N}), where effort works alongside the thinking token budget. Set the effort level for your task, then set the thinking token budget based on task complexity.
The effort parameter can be used with or without extended thinking enabled. When used without thinking, it still controls overall token spend for text responses and tool calls.
Best practices
- Set effort explicitly: The API defaults to
high, but the right starting point depends on your model and workload. - Use low for speed-sensitive or simple tasks: When latency matters or tasks are straightforward, low effort can significantly reduce response times and costs.
- Test your use case: The impact of effort levels varies by task type. Evaluate performance on your specific use cases before deploying.
- Consider dynamic effort: Adjust effort based on task complexity. Simple queries may warrant low effort while agentic coding and complex reasoning benefit from high effort.
Liên kết
- Nền tảng: Claude
- Nguồn: https://platform.claude.com/docs/en/build-with-claude/effort.md
Xem thêm: