Generate a response

Body

application/json

model

string

required

Model name

prompt

string

Text for the model to generate a response from

suffix

string

Used for fill-in-the-middle models, text that appears after the user prompt and before the model response

images

string[]

Base64-encoded images for models that support image input

format

Structured output format for the model to generate a response from. Supports either the string "json" or a JSON schema object.

system

string

System prompt for the model to generate a response from

stream

boolean

default:true

When true, returns a stream of partial responses

think

When true, returns separate thinking output in addition to content. Can be a boolean (true/false) or a string ("high", "medium", "low") for supported models.

raw

boolean

When true, returns the raw response from the model without any prompt templating

keep_alive

Model keep-alive duration (for example 5m or 0 to unload immediately)

options

object

Runtime options that control text generation

Show child attributes

options.seed

integer

Random seed used for reproducible outputs

options.temperature

number<float>

Controls randomness in generation (higher = more random)

options.top_k

integer

Limits next token selection to the K most likely

options.top_p

number<float>

Cumulative probability threshold for nucleus sampling

options.min_p

number<float>

Minimum probability threshold for token selection

options.stop

Stop sequences that will halt generation

options.num_ctx

integer

Context length size (number of tokens)

options.num_predict

integer

Maximum number of tokens to generate

logprobs

boolean

Whether to return log probabilities of the output tokens

top_logprobs

integer

Number of most likely tokens to return at each token position when logprobs are enabled

Response

Generation responses

model

string

Model name

created_at

string

ISO 8601 timestamp of response creation

response

string

The model's generated text response

thinking

string

The model's generated thinking output

done

boolean

Indicates whether generation has finished

done_reason

string

Reason the generation stopped

total_duration

integer

Time spent generating the response in nanoseconds

load_duration

integer

Time spent loading the model in nanoseconds

prompt_eval_count

integer

Number of input tokens in the prompt

prompt_eval_duration

integer

Time spent evaluating the prompt in nanoseconds

eval_count

integer

Number of output tokens generated in the response

eval_duration

integer

Time spent generating tokens in nanoseconds

logprobs

object[]

Log probability information for the generated tokens when logprobs are enabled

Show child attributes

logprobs.token

string

The text representation of the token

logprobs.logprob

number

The log probability of this token

logprobs.bytes

integer[]

The raw byte representation of the token

logprobs.top_logprobs

object[]

Most likely tokens and their log probabilities at this position

Show child attributes

logprobs.top_logprobs.token

string

The text representation of the token

logprobs.top_logprobs.logprob

number

The log probability of this token

logprobs.top_logprobs.bytes

integer[]

The raw byte representation of the token

API Reference

Endpoints

Generate a response

Body

Response