Ollama is a self-hosted AI solution to run open-source large language models on your own infrastructure.

GPUs are coming to Koyeb! Get ready to deploy serverless AI apps on high-performance infrastructure. Join the preview

Deploy Ollama for free

Get your $200 of credit to try Koyeb over 30 days!

Claim credit


Ollama is a self-hosted AI solution to run open-source large language models, such as Llama 2, Mistral, and other LLMs locally or on your own infrastructure. Ollama exposes a REST API and provides Python and JavaScript libraries to integrate with your apps easily.

Try it out

Once the Ollama server is deployed, you can start interacting with the Ollama API via your Koyeb App URL similar to: https://<YOUR_APP_NAME>-<YOUR_KOYEB_ORG>.koyeb.app.

Let's pull one of the available Ollama models and make a request to the Ollama API:

The following example shows how to pull the llama2 model via the Ollama API.

curl https://<YOUR_APP_NAME>-<YOUR_KOYEB_ORG>.koyeb.app/api/pull -d '{
  "name": "llama2"

Once the domain is pulled, we can start generating a response for a given prompt with a provided model. The following example shows how to generate a response from the llama2 model for the prompt "Why is the sky blue?".

curl https://<YOUR_APP_NAME>-<YOUR_KOYEB_ORG>.koyeb.app/api/generate -d '{
  "model": "llama2",
  "prompt":"Why is the sky blue?",
  "stream": false

Related One-Click Apps in this category

  • DeepSparse Server

    DeepSparse is an inference runtime taking advantage of sparsity with neural networks offering GPU-class performance on CPUs.

  • Fooocus

    Deploy Fooocus, a powerful AI image generation tool, on Koyeb

  • LangServe

    LangServe makes it easy to deploy LangChain applications as RESTful APIs.

The fastest way to deploy applications globally.