Worth turning on
Multi-step reasoning. Large refactors. Hard math. Ambiguous specs you want Claude to resolve carefully. Anything where a careless answer is worse than a slow one.
On hard problems — multi-step reasoning, tricky math, gnarly code — extended thinking gives Claude an explicit budget of thinking tokens before it produces the user-visible answer. The thinking trace is returned alongside the answer so you can inspect or hide it. Trades latency and tokens for quality.
Multi-step reasoning. Large refactors. Hard math. Ambiguous specs you want Claude to resolve carefully. Anything where a careless answer is worse than a slow one.
Classification. Simple lookups. UI generation. Anything where Claude already nails the answer in a single pass.
msg = client.messages.create(
model="claude-opus-4-7",
max_tokens=4096,
thinking={"type":"enabled","budget_tokens":2000},
messages=[{"role":"user","content": HARD_QUESTION}],
)
# msg.content has both `thinking` blocks and `text` blocks.
# `thinking` is for inspection; show only `text` to end users.
The model decides how much of the budget to actually use. A 2,000-token budget is not a guarantee of 2,000 thinking tokens — just an upper bound.