Text Generation Inference (TGI)

Deploy TGI for high-performance text generation using the most popular open-source LLMs


Overview

Text Generation Inference (TGI) is a powerful toolkit designed for deploying and serving Large Language Models (LLMs). It supports high-performance text generation across popular open-source LLMs, including Llama, Falcon, StarCoder, BLOOM, GPT-NeoX, and T5.

This Starter deploys Text Generation Inference (TGI) to Koyeb in one click. By default, it deploys on an Nvidia RTX 4000 SFF Ada GPU Instance using Qwen/Qwen2.5-1.5B. You can change the model during deployment by modifying the MODEL_ID environment variable.

Configuration

You must run Text Generation Inference (TGI) on a GPU Instance type. During initialization, Text Generation Inference (TGI) will download the specified model from Hugging Face.

To change the deployed model, in the Environment variables section, modify the MODEL_ID value to the desired model ID you want to deploy.

When deploying Text Generation Inference (TGI) on Koyeb, the following environment variables can be configured to customize the deployment.

Related One-Click Apps in this category

  • DeepSparse Server

    DeepSparse is an inference runtime taking advantage of sparsity with neural networks offering GPU-class performance on CPUs.

  • Fooocus

    Deploy Fooocus, a powerful AI image generation tool, on Koyeb

  • LangServe

    LangServe makes it easy to deploy LangChain applications as RESTful APIs.

The fastest way to deploy applications globally.