Ollama
Ollama is a self-hosted AI solution to run open-source large language models on your own infrastructure.
Overview
Ollama is a self-hosted AI solution to run open-source large language models, such as Llama 2, Mistral, and other LLMs locally or on your own infrastructure. Ollama exposes a REST API and provides Python and JavaScript libraries to integrate with your apps easily.
Get up to $200 in credit to get started!
Try it out
Once the Ollama server is deployed, you can start interacting with the Ollama API via your Koyeb App URL similar to: https://<YOUR_APP_NAME>-<YOUR_KOYEB_ORG>.koyeb.app
.
Let's pull one of the available Ollama models and make a request to the Ollama API:
The following example shows how to pull the qwen2.5
model via the Ollama API.
curl https://<YOUR_APP_NAME>-<YOUR_KOYEB_ORG>.koyeb.app/api/pull -d '{
"name": "qwen2.5"
}'
Once the domain is pulled, we can start generating a response for a given prompt with a provided model. The following example shows how to generate a response from the qwen2.5
model for the prompt "Why is the sky blue?".
curl https://<YOUR_APP_NAME>-<YOUR_KOYEB_ORG>.koyeb.app/api/generate -d '{
"model": "qwen2.5",
"prompt":"Why is the sky blue?",
"stream": false
}'