Usage
OpenAI Python library
Copy
from openai import OpenAI
client = OpenAI(
    base_url='http://localhost:11434/v1/',
    # required but ignored
    api_key='ollama',
)
chat_completion = client.chat.completions.create(
    messages=[
        {
            'role': 'user',
            'content': 'Say this is a test',
        }
    ],
    model='llama3.2',
)
response = client.chat.completions.create(
    model="llava",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": "",
                },
            ],
        }
    ],
    max_tokens=300,
)
completion = client.completions.create(
    model="llama3.2",
    prompt="Say this is a test",
)
list_completion = client.models.list()
model = client.models.retrieve("llama3.2")
embeddings = client.embeddings.create(
    model="all-minilm",
    input=["why is the sky blue?", "why is the grass green?"],
)
Structured outputs
Copy
from pydantic import BaseModel
from openai import OpenAI
client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
# Define the schema for the response
class FriendInfo(BaseModel):
    name: str
    age: int
    is_available: bool
class FriendList(BaseModel):
    friends: list[FriendInfo]
try:
    completion = client.beta.chat.completions.parse(
        temperature=0,
        model="llama3.1:8b",
        messages=[
            {"role": "user", "content": "I have two friends. The first is Ollama 22 years old busy saving the world, and the second is Alonso 23 years old and wants to hang out. Return a list of friends in JSON format"}
        ],
        response_format=FriendList,
    )
    friends_response = completion.choices[0].message
    if friends_response.parsed:
        print(friends_response.parsed)
    elif friends_response.refusal:
        print(friends_response.refusal)
except Exception as e:
    print(f"Error: {e}")
OpenAI JavaScript library
Copy
import OpenAI from "openai";
const openai = new OpenAI({
  baseURL: "http://localhost:11434/v1/",
  // required but ignored
  apiKey: "ollama",
});
const chatCompletion = await openai.chat.completions.create({
  messages: [{ role: "user", content: "Say this is a test" }],
  model: "llama3.2",
});
const response = await openai.chat.completions.create({
  model: "llava",
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "What's in this image?" },
        {
          type: "image_url",
          image_url:
            "",
        },
      ],
    },
  ],
});
const completion = await openai.completions.create({
  model: "llama3.2",
  prompt: "Say this is a test.",
});
const listCompletion = await openai.models.list();
const model = await openai.models.retrieve("llama3.2");
const embedding = await openai.embeddings.create({
  model: "all-minilm",
  input: ["why is the sky blue?", "why is the grass green?"],
});
curl
Copy
curl http://localhost:11434/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "llama3.2",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Hello!"
            }
        ]
    }'
curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llava",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What'\''s in this image?"
          },
          {
            "type": "image_url",
            "image_url": {
               "url": ""
            }
          }
        ]
      }
    ],
    "max_tokens": 300
  }'
curl http://localhost:11434/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "llama3.2",
        "prompt": "Say this is a test"
    }'
curl http://localhost:11434/v1/models
curl http://localhost:11434/v1/models/llama3.2
curl http://localhost:11434/v1/embeddings \
    -H "Content-Type: application/json" \
    -d '{
        "model": "all-minilm",
        "input": ["why is the sky blue?", "why is the grass green?"]
    }'
Endpoints
/v1/chat/completions
Supported features
- Chat completions
 - Streaming
 - JSON mode
 - Reproducible outputs
 - Vision
 - Tools
 - Logprobs
 
Supported request fields
-  
model -  
messages-  Text 
content -  Image 
content- Base64 encoded image
 - Image URL
 
 -  Array of 
contentparts 
 -  Text 
 -  
frequency_penalty -  
presence_penalty -  
response_format -  
seed -  
stop -  
stream -  
stream_options-  
include_usage 
 -  
 -  
temperature -  
top_p -  
max_tokens -  
tools -  
tool_choice -  
logit_bias -  
user -  
n 
/v1/completions
Supported features
- Completions
 - Streaming
 - JSON mode
 - Reproducible outputs
 - Logprobs
 
Supported request fields
-  
model -  
prompt -  
frequency_penalty -  
presence_penalty -  
seed -  
stop -  
stream -  
stream_options-  
include_usage 
 -  
 -  
temperature -  
top_p -  
max_tokens -  
suffix -  
best_of -  
echo -  
logit_bias -  
user -  
n 
Notes
promptcurrently only accepts a string
/v1/models
Notes
createdcorresponds to when the model was last modifiedowned_bycorresponds to the ollama username, defaulting to"library"
/v1/models/{model}
Notes
createdcorresponds to when the model was last modifiedowned_bycorresponds to the ollama username, defaulting to"library"
/v1/embeddings
Supported request fields
-  
model -  
input- string
 - array of strings
 - array of tokens
 - array of token arrays
 
 -  
encoding format -  
dimensions -  
user 
Models
Before using a model, pull it locallyollama pull:
Copy
ollama pull llama3.2
Default model names
For tooling that relies on default OpenAI model names such asgpt-3.5-turbo, use ollama cp to copy an existing model name to a temporary name:
Copy
ollama cp llama3.2 gpt-3.5-turbo
model field:
Copy
curl http://localhost:11434/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "gpt-3.5-turbo",
        "messages": [
            {
                "role": "user",
                "content": "Hello!"
            }
        ]
    }'
Setting the context size
The OpenAI API does not have a way of setting the context size for a model. If you need to change the context size, create aModelfile which looks like:
Copy
FROM <some model>
PARAMETER num_ctx <context size>
ollama create mymodel command to create a new model with the updated context size. Call the API with the updated model name:
Copy
curl http://localhost:11434/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "mymodel",
        "messages": [
            {
                "role": "user",
                "content": "Hello!"
            }
        ]
    }'

