Skip to main content
Ollama provides compatibility with the Anthropic Messages API to help connect existing applications to Ollama, including tools like Claude Code. For coding use cases, models like glm-4.7:cloud, minimax-m2.1:cloud, and qwen3-coder are recommended. Pull a model before use:
ollama pull qwen3-coder
ollama pull glm-4.7:cloud

Usage

Environment variables

To use Ollama with tools that expect the Anthropic API (like Claude Code), set these environment variables:
export ANTHROPIC_BASE_URL=http://localhost:11434
export ANTHROPIC_API_KEY=ollama  # required but ignored

Simple /v1/messages example

basic.py
import anthropic

client = anthropic.Anthropic(
    base_url='http://localhost:11434',
    api_key='ollama',  # required but ignored
)

message = client.messages.create(
    model='qwen3-coder',
    max_tokens=1024,
    messages=[
        {'role': 'user', 'content': 'Hello, how are you?'}
    ]
)
print(message.content[0].text)

Streaming example

streaming.py
import anthropic

client = anthropic.Anthropic(
    base_url='http://localhost:11434',
    api_key='ollama',
)

with client.messages.stream(
    model='qwen3-coder',
    max_tokens=1024,
    messages=[{'role': 'user', 'content': 'Count from 1 to 10'}]
) as stream:
    for text in stream.text_stream:
        print(text, end='', flush=True)

Tool calling example

tools.py
import anthropic

client = anthropic.Anthropic(
    base_url='http://localhost:11434',
    api_key='ollama',
)

message = client.messages.create(
    model='qwen3-coder',
    max_tokens=1024,
    tools=[
        {
            'name': 'get_weather',
            'description': 'Get the current weather in a location',
            'input_schema': {
                'type': 'object',
                'properties': {
                    'location': {
                        'type': 'string',
                        'description': 'The city and state, e.g. San Francisco, CA'
                    }
                },
                'required': ['location']
            }
        }
    ],
    messages=[{'role': 'user', 'content': "What's the weather in San Francisco?"}]
)

for block in message.content:
    if block.type == 'tool_use':
        print(f'Tool: {block.name}')
        print(f'Input: {block.input}')

Using with Claude Code

Claude Code can be configured to use Ollama as its backend:
ANTHROPIC_BASE_URL=http://localhost:11434 ANTHROPIC_API_KEY=ollama claude --model qwen3-coder
Or set the environment variables in your shell profile:
export ANTHROPIC_BASE_URL=http://localhost:11434
export ANTHROPIC_API_KEY=ollama
Then run Claude Code with any Ollama model:
# Local models
claude --model qwen3-coder
claude --model gpt-oss:20b

# Cloud models
claude --model glm-4.7:cloud
claude --model minimax-m2.1:cloud

Endpoints

/v1/messages

Supported features

  • Messages
  • Streaming
  • System prompts
  • Multi-turn conversations
  • Vision (images)
  • Tools (function calling)
  • Tool results
  • Thinking/extended thinking

Supported request fields

  • model
  • max_tokens
  • messages
    • Text content
    • Image content (base64)
    • Array of content blocks
    • tool_use blocks
    • tool_result blocks
    • thinking blocks
  • system (string or array)
  • stream
  • temperature
  • top_p
  • top_k
  • stop_sequences
  • tools
  • thinking
  • tool_choice
  • metadata

Supported response fields

  • id
  • type
  • role
  • model
  • content (text, tool_use, thinking blocks)
  • stop_reason (end_turn, max_tokens, tool_use)
  • usage (input_tokens, output_tokens)

Streaming events

  • message_start
  • content_block_start
  • content_block_delta (text_delta, input_json_delta, thinking_delta)
  • content_block_stop
  • message_delta
  • message_stop
  • ping
  • error

Models

Ollama supports both local and cloud models.

Local models

Pull a local model before use:
ollama pull qwen3-coder
Recommended local models:
  • qwen3-coder - Excellent for coding tasks
  • gpt-oss:20b - Strong general-purpose model

Cloud models

Cloud models are available immediately without pulling:
  • glm-4.7:cloud - High-performance cloud model
  • minimax-m2.1:cloud - Fast cloud model

Default model names

For tooling that relies on default Anthropic model names such as claude-3-5-sonnet, use ollama cp to copy an existing model name:
ollama cp qwen3-coder claude-3-5-sonnet
Afterwards, this new model name can be specified in the model field:
curl http://localhost:11434/v1/messages \
    -H "Content-Type: application/json" \
    -d '{
        "model": "claude-3-5-sonnet",
        "max_tokens": 1024,
        "messages": [
            {
                "role": "user",
                "content": "Hello!"
            }
        ]
    }'

Differences from the Anthropic API

Behavior differences

  • API key is accepted but not validated
  • anthropic-version header is accepted but not used
  • Token counts are approximations based on the underlying model’s tokenizer

Not supported

The following Anthropic API features are not currently supported:
FeatureDescription
/v1/messages/count_tokensToken counting endpoint
tool_choiceForcing specific tool use or disabling tools
metadataRequest metadata (user_id)
Prompt cachingcache_control blocks for caching prefixes
Batches API/v1/messages/batches for async batch processing
Citationscitations content blocks
PDF supportdocument content blocks with PDF files
Server-sent errorserror events during streaming (errors return HTTP status)

Partial support

FeatureStatus
Image contentBase64 images supported; URL images not supported
Extended thinkingBasic support; budget_tokens accepted but not enforced