The default context length in Ollama is 4096 tokens.
Setting context length
Setting a larger context length will increase the amount of memory required to run a model. Ensure you have enough VRAM available to increase the context length. Cloud models are set to their maximum context length by default.App
Change the slider in the Ollama app under settings to your desired context length.
CLI
If editing the context length for Ollama is not possible, the context length can also be updated when serving Ollama.Check allocated context length and model offloading
For best performance, use the maximum context length for a model, and avoid offloading the model to CPU. Verify the split underPROCESSOR using ollama ps.

