Usage
Simple v1/chat/completions example
basic.py
Simple v1/responses example
responses.py
v1/chat/completions with vision example
vision.py
Endpoints
/v1/chat/completions
Supported features
- Chat completions
- Streaming
- JSON mode
- Reproducible outputs
- Vision
- Tools
- Logprobs
Supported request fields
-
model -
messages- Text
content - Image
content- Base64 encoded image
- Image URL
- Array of
contentparts
- Text
-
frequency_penalty -
presence_penalty -
response_format -
seed -
stop -
stream -
stream_options-
include_usage
-
-
temperature -
top_p -
max_tokens -
tools -
tool_choice -
logit_bias -
user -
n
/v1/completions
Supported features
- Completions
- Streaming
- JSON mode
- Reproducible outputs
- Logprobs
Supported request fields
-
model -
prompt -
frequency_penalty -
presence_penalty -
seed -
stop -
stream -
stream_options-
include_usage
-
-
temperature -
top_p -
max_tokens -
suffix -
best_of -
echo -
logit_bias -
user -
n
Notes
promptcurrently only accepts a string
/v1/models
Notes
createdcorresponds to when the model was last modifiedowned_bycorresponds to the ollama username, defaulting to"library"
/v1/models/{model}
Notes
createdcorresponds to when the model was last modifiedowned_bycorresponds to the ollama username, defaulting to"library"
/v1/embeddings
Supported request fields
-
model -
input- string
- array of strings
- array of tokens
- array of token arrays
-
encoding format -
dimensions -
user
/v1/responses
Ollama supports the OpenAI Responses API. Only the non-stateful flavor is supported (i.e., there is no previous_response_id or conversation support).
Supported features
- Streaming
- Tools (function calling)
- Reasoning summaries (for thinking models)
- Stateful requests
Supported request fields
-
model -
input -
instructions -
tools -
stream -
temperature -
top_p -
max_output_tokens -
previous_response_id(stateful v1/responses not supported) -
conversation(stateful v1/responses not supported) -
truncation
Models
Before using a model, pull it locallyollama pull:
Default model names
For tooling that relies on default OpenAI model names such asgpt-3.5-turbo, use ollama cp to copy an existing model name to a temporary name:
model field:
Setting the context size
The OpenAI API does not have a way of setting the context size for a model. If you need to change the context size, create aModelfile which looks like:
ollama create mymodel command to create a new model with the updated context size. Call the API with the updated model name:

