DeepSparse Server
DeepSparse is an inference runtime taking advantage of sparsity with neural networks offering GPU-class performance on CPUs.
Deploy TGI for high-performance text generation using the most popular open-source LLMs
Text Generation Inference (TGI) is a powerful toolkit designed for deploying and serving Large Language Models (LLMs). It supports high-performance text generation across popular open-source LLMs, including Llama, Falcon, StarCoder, BLOOM, GPT-NeoX, and T5.
This Starter deploys Text Generation Inference (TGI) to Koyeb in one click. By default, it deploys on an Nvidia RTX 4000 SFF Ada GPU Instance using Qwen/Qwen2.5-1.5B. You can change the model during deployment by modifying the MODEL_ID environment variable.
You must run Text Generation Inference (TGI) on a GPU Instance type. During initialization, Text Generation Inference (TGI) will download the specified model from Hugging Face.
To change the deployed model, in the Environment variables section, modify the MODEL_ID value to the desired model ID you want to deploy.
When deploying Text Generation Inference (TGI) on Koyeb, the following environment variables can be configured to customize the deployment.
DeepSparse is an inference runtime taking advantage of sparsity with neural networks offering GPU-class performance on CPUs.
Deploy Fooocus, a powerful AI image generation tool, on Koyeb
LangServe makes it easy to deploy LangChain applications as RESTful APIs.