Skip to main content
Ollama’s API responses include metrics that can be used for measuring performance and model usage:
  • total_duration: How long the response took to generate
  • load_duration: How long the model took to load
  • prompt_eval_count: How many input tokens were processed
  • prompt_eval_duration: How long it took to evaluate the prompt
  • eval_count: How many output tokens were processes
  • eval_duration: How long it took to generate the output tokens
All timing values are measured in nanoseconds.

Example response

For endpoints that return usage metrics, the response body will include the usage fields. For example, a non-streaming call to /api/generate may return the following response:
{
  "model": "gemma3",
  "created_at": "2025-10-17T23:14:07.414671Z",
  "response": "Hello! How can I help you today?",
  "done": true,
  "done_reason": "stop",
  "total_duration": 174560334,
  "load_duration": 101397084,
  "prompt_eval_count": 11,
  "prompt_eval_duration": 13074791,
  "eval_count": 18,
  "eval_duration": 52479709
}
For endpoints that return streaming responses, usage fields are included as part of the final chunk, where done is true.