thinking
field that separates their reasoning trace from the final answer.
Use this capability to audit model steps, animate the model thinking in a UI, or hide the trace entirely when you only need the final response.
Supported models
- Qwen 3
- GPT-OSS (use
think
levels:low
,medium
,high
— the trace cannot be fully disabled) - DeepSeek-v3.1
- DeepSeek R1
- Browse the latest additions under thinking models
Enable thinking in API calls
Set thethink
field on chat or generate requests. Most models accept booleans (true
/false
).
GPT-OSS instead expects one of low
, medium
, or high
to tune the trace length.
The message.thinking
(chat endpoint) or thinking
(generate endpoint) field contains the reasoning trace while message.content
/ response
holds the final answer.
- cURL
- Python
- JavaScript
GPT-OSS requires
think
to be set to "low"
, "medium"
, or "high"
. Passing true
/false
is ignored for that model.Stream the reasoning trace
Thinking streams interleave reasoning tokens before answer tokens. Detect the firstthinking
chunk to render a “thinking” section, then switch to the final reply once message.content
arrives.
- Python
- JavaScript
CLI quick reference
- Enable thinking for a single run:
ollama run deepseek-r1 --think "Where should I visit in Lisbon?"
- Disable thinking:
ollama run deepseek-r1 --think=false "Summarize this article"
- Hide the trace while still using a thinking model:
ollama run deepseek-r1 --hidethinking "Is 9.9 bigger or 9.11?"
- Inside interactive sessions, toggle with
/set think
or/set nothink
. - GPT-OSS only accepts levels:
ollama run gpt-oss --think=low "Draft a headline"
(replacelow
withmedium
orhigh
as needed).
Thinking is enabled by default in the CLI and API for supported models.