Jan 05, 2026
10 min read

Serverless AI Infrastructure Going into 2026: Sandboxes, GPUs, and More

At Koyeb, we are building high-performance serverless infrastrcture for AI. Deploy your workloads on serverless GPUs, next-generation AI accelerators, and CPUs. Our platform runs fully isolated, secure microVMs on bare-metal servers around the world with autoscaling, scale-to-zero, and cold starts as low as 250ms.

2026 is shaping up to be even bigger than 2025 for the AI stack: AI agents, AI-generated code, and multi-modal models are changing how all applications are built and deployed. The AI stack becomes a part of all stacks in the same way that cloud became essential.

Just like everyone building in AI, 2025 was a busy year for us. As we saw more and more teams move from PoCs and demos to production, we focused on building the high-performance serverless infrastructure that makes deploying and scaling AI reliable in the real world. We shipped a lot of features and improvements designed to make your AI deployments experience faster, smoother, and more cost effective:

What’s more, we partnered with leading companies teams shaping the future of cloud and AI infrastructure, organized and participated in over 50 amazing community events, and published a bunch of helpful resources to help you develop and deploy AI apps.

In case you missed it, here’s a comprehensive rundown of everything that went live in 2025 to help you build, deploy, and scale your AI workloads on serverless infrastructure.

Koyeb Sandboxes Go Live

We launched Koyeb Sandboxes after working closely with teams running AI-generated code at massive scale on Koyeb. Those teams needed to spin up thousands of workloads every day securely, cost-effectively, and without building custom infrastructure abstractions themselves.

Koyeb Sandboxes package those requirements into a single, easy-to-use experience: ephemeral, fully isolated environments for AI agents, code generation, and more.

You can get started with Koyeb Sandboxes using our Python and JavaScript SDKs. Spin up a sandboxed environment whenever you (or your AI agent) need one, run commands, manage files, expose ports, and shut it down, all without worrying about the underlying infrastructure.

Here are helpful links to get started with Koyeb Sandboxes:

Light Sleep Scale-to-Zero Enabled

On the journey to provide the true serverless experience, light sleep scale-to-zero was a major milestone.

Demo app comparing calling an API from Light Sleep and Deep Sleep

Bringing it to life required reworking core parts of the platform, from VM snapshotting to kernel-level networking. Thanks to eBPF isolation, snapshots, and other behind-the-scenes engineering, VMs can now wake from an idle state in as fast as 250ms.

Engineering Deep Dives

Read the full story of how we built light sleep to enable an even faster version of scale-to-zero.

Learn More

Released Koyeb MCP Server & Deploying Your Own Remote MCP Server

2025 was a breakthrough year for MCP! We launched the Koyeb MCP Server, letting you interact with your Koyeb resources using natural language.

We also released resources to help you deploy your own MCP servers remotely for low-latency and reliable deployments.

Deploy Remote MCP Server on Koyeb

Learn how to deploy your own MCP server on Koyeb's serverless infrastructure for low-latency and reliable deployments.

Deploy Now

Introduced Tenstorrent Cloud Instances

As you may already know, we're committed to bringing alternative accelerators to market to foster innovation in the AI infrastructure space. Our platform is hardware agnostic, which means we can provide the seamless Koyeb deployment experience anywhere.

Starting last year, we added Tenstorrent Cloud Instances to provide on-demand access to Tenstorrent's Tensix Processor and open-source TT-Metalium SDK.

  • TT-N300S: With one n300s, this instance has 24GB of GDDR6, 192MB of SRAM, provides up to 466 FP8 TFLOPS, and comes with 64GB of RAM with 4 vCPU.
  • TT-Loudbox: With 4x n300s meshed together, this instance has 96GB of GDDR6, 768MB of SRAM, provides up to 1864 FP8 TFLOPS, and comes with 256GB of RAM with 16 vCPU.

These new Tenstorrent instances come with all the native features of the Koyeb platform to bring developers the fastest way to build AI models and run inference workloads in on-demand environments, with zero infrastructure management.

Tenstorrent Wormhole N300S

New Models and AI Starter Apps in Koyeb's One-Click Deploy Catalog

Over a hundred ready-to-use AI models, starter apps, and templates are now a click away. Perfect for inference, fine-tuning, or getting started with adding AI to your stack.

Discover SOTA AI models from Mistral AI, DeepSeek, Black Forest Labs, Google DeepMind, and more, and more in our One-Click Deploy Catalog.

One-Click Catalog

All deployments run on production-grade serverless GPUs, with built-in autoscaling and scale-to-zero, so you can move from experiments to production. Whether you’re serving models for inference or fine-tuning, Koyeb handles capacity, scaling, performance, and reliability, so you can focus on shipping AI, not managing infrastructure.

Serverless GPU Price Drop

Running your AI workloads on high-performance serverless GPUs is easier than ever thanks to our price drops on A100, H100, and L40 GPUs.

AI infrastructure is expensive, and optimizing costs can change what experiments you run. Lower prices mean you can iterate faster, run bigger models, or just try ideas that previously felt “too expensive.”

Serverless GPU InstancePrice Per HourPrice ReductionVRAMvCPURAM
L40S$1.55/hour
$1.20/hour
23%48 GB1564 GB
A100$2.00/hour
$1.60/hour
20%80 GB15180 GB
H100$3.30/hour
$2.50/hour
24%80 GB15180 GB

New Authentication Features

To support more advanced features and offer a smoother sign-in experience, we’ve introduced a fully redesigned authentication system with robust security features, including 2FA, MFA (Multi-Factor Authentication), and passkeys.

Learn more about these new features and how to get started by reading the full announcement.

Koyeb Authentication Features

Announced TCP Proxy

Up until last year, services running on Koyeb could only be publicly exposed via HTTP, HTTP/2, WebSocket, and gRPC protocols.

In 2025, we made it easy to expose TCP ports publicly. Useful for debugging, testing multi-service systems, or connecting local tools to a cloud environment without all the networking headaches.

Whether you’re running:

  • Databases like MySQLMongoDBRedis
  • Remote access protocols like SSH, RDP, VNC
  • IoT protocols like MQTT, AMQP

You can now reach them over the public Internet easily. No more need for complex proxy setups or custom routing rules.

Deploy TCP-based Apps on Koyeb

Run and scale TCP-based applications on high-performance serverless infrastructure.

Deploy Now

Faster Deployments, Faster Time to Production

In 2025, we significantly upgraded Koyeb's serverless engine to speed up deployments and scaling.

Together, these improvements shorten the path to production while increasing reliability for real-world AI workloads.

Launched the Koyeb Partner Hub

Since day 1, we've teamed up with leading companies to drive the next wave of global AI infrastructure. Our Partner Hub proudly showcases these incredible collaborations.

We have a full ecosystem of partners, here are a few of them:

If you're interested in becoming a Koyeb partner, fill out our short form and our partnerships team will connect with next steps.

Koyeb Partner Hub

Community & Events

In 2025, we brought the global AI engineering community together through 20+ events across SF, Paris, and NYC including meetups, hackathons, dinners, happy hours, and our first conference!

A major milestone was hosting the first AI Engineer Conference in Paris, where we gathered top AI builders, founders, and leaders to share how AI is being built and deployed in production today.

Docs, Tutorials, Videos, & Other Resources

Throughout the year, we published a bunch of helpful resources to help you develop and deploy AI apps:

Check out the Koyeb Community for our weekly changelog updates, to ask questions, and share your projects.

Coming Next in 2026

2026 is shaping up to be another big year for AI. AI-generated code, coding agents, and multi-modal models are quickly moving from experiments to production. The infrastructure powering them needs to keep up.

At Koyeb, we're building high-performance serverless infrastructure so teams can focus on building and deploying AI without worrying about the underlying infrastructure. From Sandboxes designed to safely execute AI-generated code, to serverless GPUs for inference and fine-tuning, to scaling primitives built for real-world workloads, everything we shipped in 2025 was about one thing: making AI in production fast, reliable, and effortless.

  • Best-in-Class Performance: Leverage cutting-edge GPUs, next-gen AI accelerators, and CPUs that automatically scale to handle your workloads.
  • Lower Costs: Pay only for what you use, with reduced GPU costs and scale-to-zero efficiency.
  • Fast, Reliable and Effortless Deployments: From new one-click deployments to sandboxed environments, we’re removing friction so you can focus on building and deploying your ideas.

As always, your feedback drives us. Let us know what features you want to see on the platform with our feature request platform.

There are big updates coming ahead, so stay tuned! Until then, we’re excited to see how you’ll use Koyeb to build and scale your applications.

Build the Future of Serverless AI Infrastructure

If this work sounds fun to you, check out our available engineering roles!

Join Koyeb

Deploy AI apps to production in minutes

Get started
Koyeb is a developer-friendly serverless platform to deploy apps globally. No-ops, servers, or infrastructure management.
All systems operational
© Koyeb