Serverless GPUs: RTX Pro 6000, H200, and B200 Now Available on Koyeb

Today we’re announcing NVIDIA RTX Pro 6000, H200, and B200 available serverlessly on Koyeb.

These new GPU instances enable high-performance inference for compute-intensive workloads that are memory-bound, latency-sensitive, or throughput-constrained, including long-context and large-model serving.

With price ranging from $2.20/hr to $5.50/hr, all GPU instances are billed by the second, so you can build, experiment, and autoscale with no infrastructure management while avoiding idle cost or unexpected bills.

	RTX Pro 6000	H200	B200
Architecture	NVIDIA Blackwell	NVIDIA Hopper	NVIDIA Blackwell
GPU Memory (VRAM)	96 GB	141 GB	180 GB
VRAM Bandwidth	1.6 TB/s	4.8 TB/s	8 TB/s
RAM	220 GB	240 GB	240 GB
Dedicated vCPU	15	15	15
Disk	240 GB	320 GB	320 GB
Price	$2.20/hr	$3.00/hr	$5.50/hr

H200 and RTX Pro 6000 are accessible instantly. Start running your workload right away!

Get started with GPUs

The fastest way to get started is to deploy one of the ready-to-run models available in the one-click catalog. Just pick a model, choose a GPU, and Koyeb handles the infrastructure for you, no setup required.

Deploy AI Models in One-Click

You can also deploy your own models and workloads using the Koyeb CLI or the Koyeb Dashboard. As always, we support multiple deployment paths:

Pre-built containers with your model and inference server
Connect your GitHub repository and let Koyeb automatically handle the build and deployment

RTX Pro 6000

With 96 GB of VRAM and native FP4 support, the RTX Pro 6000 Server Edition is ideal for agentic AI systems, generative applications, AI-driven rendering, and data-heavy analytics workloads where latency, cost efficiency, and flexibility matter.

	RTX Pro 6000
GPU Memory (VRAM)	96 GB
VRAM Bandwidth	1.6 TB/s
FP4 AI Compute (Peak)	4 PFLOPS
FP32	120 TFLOPS
Cuda Cores	24 064
Tensor Cores	752

H200

The H200 is an evolution of the well known H100, delivering approximately 75% more GPU memory and 40% higher memory bandwidth to support larger-scale AI models.

H200 excels at running very large models (100B+ parameters), sustaining high batch throughput for latency-sensitive inference, and efficiently processing long input sequences with tens of thousands of tokens.

	H100	H200
GPU Memory (VRAM)	94 GB HBM3	141 GB HBM3e
VRAM Bandwidth	3.9 TB/s	4.8 TB/s
Interconnect with NVIDIA NVLink and NVSwitch	600 GB/s	900GB/s
FP16 Compute (Dense)	835,5 TFLOPS	835,5 TFLOPS
BFLOAT16 Compute (Dense)	835,5 TFLOPS	835,5 TFLOPS
FP8 Compute (Dense)	1670,5 TFLOPS	1670,5 TFLOPS
INT8 Compute (Dense)	1670,5 TFLOPS	1670,5 TFLOPS

B200

NVIDIA B200, powered by the Blackwell architecture, is built for ultra-large model inference at scale, combining 180 GB of HBM3e memory with 8 TB/s of bandwidth and exceptional FP8 and FP4 Tensor Core performance. This makes it a strong fit for high-throughput, low-latency inference on the largest LLMs and multimodal models, especially in multi-GPU setups where performance per node directly reduces cost and time-to-response.

	B200
GPU Memory (VRAM)	180 GB HBM3e
VRAM Bandwidth	8 TB/s
FP8 Tensor (Dense)	36 PFLOPS
FP4 Tensor (Dense)	72 PFLOPS

Multi-GPU configurations

The H200 instance is available in multi-GPU configurations (2×, 4×, and 8×). These configurations make it easy to scale vertically for distributed inference, large-batch processing, or multi-model deployment, while keeping Koyeb’s serverless experience with no idle cost and predictable, usage-based pricing.

	VRAM	vCPU	RAM	Disk	Price
H200	141 GB	15	240 GB	320 GB	$3.00/hr
2× H200	282 GB	30	480 GB	640 GB	$6.00/hr
4× H200	564 GB	60	960 GB	1 280 GB	$12.00/hr
8× H200	1 128 GB	120	1920 GB	2 560 GB	$24.00/hr

Why Koyeb for serverless GPUs?

These new GPUs build on what makes Koyeb different. All GPU instances are fully serverless, and come with:

Docker container deployment from any registry using the Koyeb API, CLI, or control panel.
Scale-to-Zero with reactive autoscaling based on requests per second, concurrent connections, or P95 response time.
Pay only for what you use, per second.
Built-in observability, metrics and more.
Dedicated GPU performance without managing underlying infrastructure.

With support for RTX 400 SFF ADA, RTX A6000, L40S, A100, H100, H200, RTX Pro 6000, B200, and next-gen AI accelerators like Tenstorrent, Koyeb provides a broad range of compute options for AI inference, fine-tuning, and other AI-driven or data-analytics workloads, available through a serverless platform.

Deploy AI workloads on Koyeb

Run your AI workloads on high-performance serverless GPUs. Enjoy native autoscaling and Scale-to-Zero.

Deploy Now

Launch your workloads today

Over the past few months, we’ve continued to expand what teams can build on Koyeb.

Two months ago, we slashed GPU prices by up to 24% to make high-performance AI infrastructure more accessible. A week ago, we refilled our RTX A6000 stock to support growing demand. Today, we’re extending our offer again with RTX Pro 6000, H200, and B200, bringing the latest GPUs to Koyeb’s serverless platform.

The new GPU instances are available today. Deploy in minutes, scale automatically, and build AI without infrastructure friction.

Sign up and get started today
Explore our documentation and tutorials
Interested in building the future of AI and cloud? We’re hiring!