Jan 13, 2026
6 min read

Serverless GPUs: RTX Pro 6000, H200, and B200 Now Available on Koyeb

Today we’re announcing NVIDIA RTX Pro 6000, H200, and B200 available serverlessly on Koyeb.

These new GPU instances enable high-performance inference for compute-intensive workloads that are memory-bound, latency-sensitive, or throughput-constrained, including long-context and large-model serving.

With price ranging from $2.20/hr to $5.50/hr, all GPU instances are billed by the second, so you can build, experiment, and autoscale with no infrastructure management while avoiding idle cost or unexpected bills.

RTX Pro 6000H200B200
ArchitectureNVIDIA BlackwellNVIDIA HopperNVIDIA Blackwell
GPU Memory (VRAM)96 GB141 GB180 GB
VRAM Bandwidth1.6 TB/s4.8 TB/s8 TB/s
RAM220 GB240 GB240 GB
Dedicated vCPU151515
Disk240 GB320 GB320 GB
Price$2.20/hr$3.00/hr$5.50/hr

H200 and RTX Pro 6000 are accessible instantly. Start running your workload right away!

Get started with GPUs

The fastest way to get started is to deploy one of the ready-to-run models available in the one-click catalog. Just pick a model, choose a GPU, and Koyeb handles the infrastructure for you, no setup required.

Deploy AI Models in One-Click

You can also deploy your own models and workloads using the Koyeb CLI or the Koyeb Dashboard. As always, we support multiple deployment paths:

RTX Pro 6000

With 96 GB of VRAM and native FP4 support, the RTX Pro 6000 Server Edition is ideal for agentic AI systems, generative applications, AI-driven rendering, and data-heavy analytics workloads where latency, cost efficiency, and flexibility matter.

RTX Pro 6000
GPU Memory (VRAM)96 GB
VRAM Bandwidth1.6 TB/s
FP4 AI Compute (Peak)4 PFLOPS
FP32120 TFLOPS
Cuda Cores24 064
Tensor Cores752

H200

The H200 is an evolution of the well known H100, delivering approximately 75% more GPU memory and 40% higher memory bandwidth to support larger-scale AI models.

H200 excels at running very large models (100B+ parameters), sustaining high batch throughput for latency-sensitive inference, and efficiently processing long input sequences with tens of thousands of tokens.

H100H200
GPU Memory (VRAM)94 GB HBM3141 GB HBM3e
VRAM Bandwidth3.9 TB/s4.8 TB/s
Interconnect with NVIDIA NVLink and NVSwitch600 GB/s900GB/s
FP16 Compute (Dense)835,5 TFLOPS835,5 TFLOPS
BFLOAT16 Compute (Dense)835,5 TFLOPS835,5 TFLOPS
FP8 Compute (Dense)1670,5 TFLOPS1670,5 TFLOPS
INT8 Compute (Dense)1670,5 TFLOPS1670,5 TFLOPS

B200

NVIDIA B200, powered by the Blackwell architecture, is built for ultra-large model inference at scale, combining 180 GB of HBM3e memory with 8 TB/s of bandwidth and exceptional FP8 and FP4 Tensor Core performance. This makes it a strong fit for high-throughput, low-latency inference on the largest LLMs and multimodal models, especially in multi-GPU setups where performance per node directly reduces cost and time-to-response.

B200
GPU Memory (VRAM)180 GB HBM3e
VRAM Bandwidth8 TB/s
FP8 Tensor (Dense)36 PFLOPS
FP4 Tensor (Dense)72 PFLOPS

Multi-GPU configurations

The H200 instance is available in multi-GPU configurations (2×, 4×, and 8×). These configurations make it easy to scale vertically for distributed inference, large-batch processing, or multi-model deployment, while keeping Koyeb’s serverless experience with no idle cost and predictable, usage-based pricing.

VRAMvCPURAMDiskPrice
H200141 GB15240 GB320 GB$3.00/hr
2× H200282 GB30480 GB640 GB$6.00/hr
4× H200564 GB60960 GB1 280 GB$12.00/hr
8× H2001 128 GB1201920 GB2 560 GB$24.00/hr

Why Koyeb for serverless GPUs?

These new GPUs build on what makes Koyeb different. All GPU instances are fully serverless, and come with:

  • Docker container deployment from any registry using the Koyeb API, CLI, or control panel.
  • Scale-to-Zero with reactive autoscaling based on requests per second, concurrent connections, or P95 response time.
  • Pay only for what you use, per second.
  • Built-in observability, metrics and more.
  • Dedicated GPU performance without managing underlying infrastructure.

With support for RTX 400 SFF ADA, RTX A6000, L40S, A100, H100, H200, RTX Pro 6000, B200, and next-gen AI accelerators like Tenstorrent, Koyeb provides a broad range of compute options for AI inference, fine-tuning, and other AI-driven or data-analytics workloads, available through a serverless platform.

Deploy AI workloads on Koyeb

Run your AI workloads on high-performance serverless GPUs. Enjoy native autoscaling and Scale-to-Zero.

Deploy Now

Launch your workloads today

Over the past few months, we’ve continued to expand what teams can build on Koyeb.

Two months ago, we slashed GPU prices by up to 24% to make high-performance AI infrastructure more accessible. A week ago, we refilled our RTX A6000 stock to support growing demand. Today, we’re extending our offer again with RTX Pro 6000, H200, and B200, bringing the latest GPUs to Koyeb’s serverless platform.

The new GPU instances are available today. Deploy in minutes, scale automatically, and build AI without infrastructure friction.


Deploy AI apps to production in minutes

Get started
Koyeb is a developer-friendly serverless platform to deploy apps globally. No-ops, servers, or infrastructure management.
All systems operational
© Koyeb