Skip to main content
Thinking-capable models emit a thinking field that separates their reasoning trace from the final answer. Use this capability to audit model steps, animate the model thinking in a UI, or hide the trace entirely when you only need the final response.

Supported models

Enable thinking in API calls

Set the think field on chat or generate requests. Most models accept booleans (true/false). GPT-OSS instead expects one of low, medium, or high to tune the trace length. The message.thinking (chat endpoint) or thinking (generate endpoint) field contains the reasoning trace while message.content / response holds the final answer.
  • cURL
  • Python
  • JavaScript
curl http://localhost:11434/api/chat -d '{
  "model": "qwen3",
  "messages": [{
    "role": "user",
    "content": "How many letter r are in strawberry?"
  }],
  "think": true,
  "stream": false
}'
GPT-OSS requires think to be set to "low", "medium", or "high". Passing true/false is ignored for that model.

Stream the reasoning trace

Thinking streams interleave reasoning tokens before answer tokens. Detect the first thinking chunk to render a “thinking” section, then switch to the final reply once message.content arrives.
  • Python
  • JavaScript
from ollama import chat

stream = chat(
  model='qwen3',
  messages=[{'role': 'user', 'content': 'What is 17 × 23?'}],
  think=True,
  stream=True,
)

in_thinking = False

for chunk in stream:
  if chunk.message.thinking and not in_thinking:
    in_thinking = True
    print('Thinking:\n', end='')

  if chunk.message.thinking:
    print(chunk.message.thinking, end='')
  elif chunk.message.content:
    if in_thinking:
      print('\n\nAnswer:\n', end='')
      in_thinking = False
    print(chunk.message.content, end='')

CLI quick reference

  • Enable thinking for a single run: ollama run deepseek-r1 --think "Where should I visit in Lisbon?"
  • Disable thinking: ollama run deepseek-r1 --think=false "Summarize this article"
  • Hide the trace while still using a thinking model: ollama run deepseek-r1 --hidethinking "Is 9.9 bigger or 9.11?"
  • Inside interactive sessions, toggle with /set think or /set nothink.
  • GPT-OSS only accepts levels: ollama run gpt-oss --think=low "Draft a headline" (replace low with medium or high as needed).
Thinking is enabled by default in the CLI and API for supported models.
I