Deploy the vLLM Inference Engine to Run Large Language Models (LLM) on Koyeb
![Justin Ellingwood](/_next/image?url=%2Fstatic%2Fimages%2Fteam%2Fjellingwood.jpeg&w=3840&q=75)
Learn how to set up a vLLM Instance to run inference workloads and host your own OpenAI-compatible API on Koyeb.
Discover how to build, deploy and run applications in production on Koyeb. The fastest way to deploy applications globally.
Learn how to set up a vLLM Instance to run inference workloads and host your own OpenAI-compatible API on Koyeb.
In this tutorial, we showcase how to deploy a FAQ search service built with Hugging Face's Inference API, pgvector, Koyeb's Managed Postgres. The optimized FAQ Search leverages sentence similarity searching to provide the most relevant results to a user's search terms.