The default context length in Ollama is 4096 tokens.
Setting context length
Setting a larger context length will increase the amount of memory required to run a model. Ensure you have enough VRAM available to increase the context length. For best performance, use the maximum context length for a model, and avoid offloading the model to CPU. Verify the split underPROCESSOR
using ollama ps
.
App
Change the slider in the Ollama app under settings to your desired context length.