All appsPaddlePaddle FastDeploy

PaddlePaddle FastDeploy

FastDeploy is an inference and deployment toolkit for large language models and visual language models based on PaddlePaddle.

Overview

Deploy FastDeploy for free

Get up to $200 in credit to get started!

Claim credit

FastDeploy is an inference and deployment toolkit for large language models and visual language models based on PaddlePaddle. It delivers production-ready, out-of-the-box deployment solutions with core acceleration technologies:

FastDeploy is an inference and deployment toolkit for large language models and visual language models based on PaddlePaddle. It delivers production-ready, out-of-the-box deployment solutions with core acceleration technologies:

🚀 Load-Balanced PD Disaggregation: Industrial-grade solution featuring context caching and dynamic instance role switching. Optimizes resource utilization while balancing SLO compliance and throughput.

🔄 Unified KV Cache Transmission: Lightweight high-performance transport library with intelligent NVLink and RDMA selection.

🤝 OpenAI API Server and vLLM Compatible: One-command deployment with vLLM interface compatibility.

🧮 Comprehensive Quantization Format Support: W8A16, W8A8, W4A16, W4A8, W2A16, FP8, and more.

⏩ Advanced Acceleration Techniques: Speculative decoding, Multi-Token Prediction (MTP) and Chunked Prefill.

🖥️ Multi-Hardware Support: NVIDIA GPU, Kunlunxin XPU, Hygon DCU, Iluvatar GPU, Enflame GCU, MetaX GPU, Intel Gaudi etc.

For the complete project and full documentation, please refer to the FastDeploy GitHub Repo

Deploy AI apps to production in minutes

Get started
Koyeb is a developer-friendly serverless platform to deploy apps globally. No-ops, servers, or infrastructure management.
All systems operational
© Koyeb