Inside AI Engineer Paris 2025 Part 1 – 5 Highlights That Shaped the Stage

At Koyeb, we run a serverless platform for deploying production-grade applications on high-performance infrastructure—GPUs, CPUs, and accelerators. You push code or containers; we handle everything from build to global deployment, running workloads in secure, lightweight virtual machines on bare-metal servers around the world.

Paris was buzzing last week — not just with tourists, but with 700 AI builders, hackers, researchers, and dreamers who came together for the first-ever AI Engineer Paris. Over two packed days, we turned the city into the European capital of AI engineering.

With five talk tracks spanning everything from practical ops to creative co-pilots, 47 sessions from speakers at Mistral, Hugging Face, Microsoft, NVIDIA, and dozens more, plus the backing of 25 incredible sponsors (Neo4j, Docker, Google DeepMind, Black Forest Labs…), this was more than a conference — it was a showcase of how fast the AI stack is evolving and who’s driving it forward.

The Welcome Keynote set the tone for the event. Our MC, Rauf Chebri, guided the audience throughout the day, while AI Engineer co-founders Ben and Swyx laid out the vision for the community. Yann Leger, CEO of Koyeb, welcomed everyone to the very first AI Engineer Paris, and Marwan Elfitesse, Head of Startup Programs at Station F, introduced attendees to the iconic venue that hosted us. The keynote closed with Lélio Renard Lavaud, Head of Engineering at Mistral AI, who delivered a reality check on the hard data challenges of RAG and why context engineering will define the next wave of AI systems.

This post kicks off a four-part series where we’ll recap the highlights and share what’s next:

🌟 5 Highlights that Shaped the Stage - our look at the AI trends we saw at AI Engineer Paris.
🚀 How We Built a Photobooth With Flux Kontext + Qwen 2.5 VLM — a behind-the-scenes look at the AI-powered demo we ran at the Koyeb booth.
🛠 Organizing a Large Conference in 3 Months — how we pulled off a global AI gathering on a tight timeline.
⚡ Model Showcasing Fine-Tuning on Koyeb in Real Time — what it looks like to train and deploy live at a conference.

Couldn’t make it in person? Don’t worry — every talk will be uploaded soon to the Koyeb YouTube channel. In the meantime, you can catch the full Main Track livestreams of Day 1 and Day 2. Bookmark this post too: we’ll update it with direct links as sessions go live.

More than anything, AI Engineer Paris showed us the future of the field: not just bigger models, but better engineering. Across the five tracks, five themes kept coming up again and again — themes that will shape the way developers build in the year ahead:

Open Models & Ecosystem — choice, ops, and evaluation
Reliability, Control & Context Engineering — making systems predictable
Agents & Automation — from wrappers to orchestrators
From PoC to Product (Infra & Ops) — the gritty work of shipping
Human + AI Co-Creation & Evaluation — subjective value and new UX patterns

Swyx addressing the crowd at event opening

Let’s dive into what these trends mean, and how the sessions at AI Engineer Paris brought them to life.

1) Open Models & Ecosystem — choice, ops, and evaluation (what to run and why)

Why it mattered at the event: The open-weight and open-model conversation has shifted from “can we?” to “how do we best use them?” Talks in this theme dug into trade-offs between self-hosting and hosted APIs, fine-tuning vs. retrieval, and practical evaluation pipelines for teams that want the flexibility of open models without sacrificing reliability.

What engineers are wrestling with:

Model fragmentation and specialization — teams are assembling “best-of-breed” stacks (retrieval + base model + fine-tune) rather than a single monolith.
Evaluation becomes a first-class engineering concern — test suites, scenario-based evaluation, and domain-specific metrics are replacing single-number benchmarks.
Practical hosting concerns — memory, cold-starts, and cost efficiency are driving engineering innovation (smaller models + smarter retrieval or sharding). Learn how Koyeb reduced cold starts with Light Sleep

Representative topics covered by sessions in this theme:

Assembling the Future: Open Source Bricks for the Next Generation of AI — Laurent Sifre, Founder and CTO, H Company - Main Track (Master Stage)
From Artisanal Training to Foundation Model Factory — Robin Nabel, Member of Technical Staff, Agent Model Lead, Poolside - Discovery Track 2 (Junior Stage)
Function calling that doesn’t suck — Rémi Louf, Co-founder and CEO, .txt - Discovery Track 2 (Junior Stage)
How open source drives successful enterprise adoption — Lélio Renard Lavaud, VP of Engineering, Mistral - Main Track (Master Stage)
How to make your AI models faster, smaller, cheaper, greener? — Bertrand Charpentier, Cofounder, President, Chief Scientist, Pruna AI - Discovery Track 1 (Founder's Cafe)
LLM-Based GUI Agents: Bridging Human Interfaces and Autonomous AI — Daniel Homola, AI Project Lead, BMW Group - Discovery Track 1 (Founder's Cafe)
Inside FLUX: From Open-Weights to Advanced Models — Stephen Batifol, Developer Advocate, Black Forest Labs - Workshop Track (Central Room)
Speaker diarization: the foundational layer of conversational AI — Hervé Bredin, Co-founder and Chief Science Officer, pyannoteAI - Discovery Track 1 (Founder's Cafe)
State of Open LLMs in 2025 — Vaibhav Srivastav, Head of Developer Experience and Community, Hugging Face - Main Track (Master Stage)
Stop Wasting GPU Flops on Cold Starts: High Performance Inference with Model Streamer — Peter Schuurman, Software Engineer, Google, and Ekin Karabulut, AI/ML Developer Advocate, NVIDIA - Discovery Track 1 (Founder's Cafe)
Taking your AI home lab on the road: a look at (small-ish) AI in 2025 — Lars Trieloff, Principal, Adobe - Discovery Track 2 (Junior Stage)

VB Srivastav from Hugging Face presenting on open models

A recurring theme was that open weights are no longer just about freedom but about control. From practical optimizations like high-performance model streaming to composable architectures built on “bricks,” the open model talks made clear that the ecosystem is maturing into something production-ready. At the same time, talks like State of Open LLMs in 2025 offered a sobering assessment of fragmentation and the need for robust evaluation.

2) Reliability, Control & Context Engineering — making systems predictable

Why it mattered at the event: As models get more capable, teams are confronting the “control” problem: how to keep systems predictable, interpretable, and aligned with human goals. This theme overlapped with safety, red-teaming, observability, and context engineering.

Key engineering takeaways:

Context engineering is not “just prompt craft”, it’s a software layer: retrieval connectors, memory stores, session management, and versioned context pipelines.
Observability for LLMs is rising: logs, query traces, drift monitoring, and hallucination detection are becoming standard telemetry.
Threat modeling and red-teaming are maturing into practical processes to discover and close failure modes early.

Representative topics covered by sessions in this theme:

Context Engineering with Graphs for More Intelligent Agents — Zach Blumenfeld, Senior Product Specialist, Neo4j - Expo Track (Expo Hall)
How We Built an AI Agent for Highly Regulated Environments — Jesús Espino, Principal Engineer, Gitpod - Discovery Track 1 (Founder's Cafe)
MCP isn’t good yet we got to 30M requests/month — Miguel Betegón, Building nets to catch AI bugs, Sentry - Discovery Track 1 (Founder's Cafe)
Open Source Champions: Gamify GitHub Contributions with an AI Agent — Ogi Bostjancic, Senior Software Engineer, Sentry - Workshop Track (Central Room)
Towards unlimited contexts: faster-than-GPU sparse logarithmic attention on CPU — Steeve Morin, Founder, ZML - Main Track (Master Stage)

Attendees gather in the expo hall to watch the talks

Control was not just about safety but also about predictability in the face of complexity. Talks like Context Engineering with Graphs showed how richer data structures can reduce drift, while MCP isn’t good yet… reminded us that even imperfect systems can power production workloads if the engineering scaffolding is strong enough. And in domains like regulated industries, reliability becomes existential — as Jesús Espino’s case study made clear.

3) Agents & Automation — from wrappers to orchestrators

Why it mattered at the event: Agentic approaches — systems that plan, call tools, maintain memory, and take multi-step actions — were among the most discussed patterns. The conference moved the conversation from toy agents to concrete architectures and failure modes in production.

Key engineering takeaways:

Successful agents combine planning, tool-use, and robust grounding (retrieval/memory) rather than relying purely on reactive prompts.
Evaluation of agents requires session-level and multi-turn metrics, not just turn-by-turn accuracy.
Orchestration and safety are essential: tool access needs permissions, auditing, and fallback behaviors.

Representative topics covered by sessions in this theme:

An intro to Algolia Agent Studio — Paul-Louis Nech, Staff Machine Learning Engineer, Algolia - Expo Track (Expo Hall)
Automating massive refactors with parallel agents — Robert Brennan, CEO of All Hands AI, co-creator, OpenHands - Discovery Track 1 (Founder's Cafe)
Building Intelligent Multi-Agent Systems with docker cagent: From Solo AI to Collaborative Teams — Djordje Lukic, Principal Software Engineer, Docker, Jean-Laurent de Morlhon, Sr Vice President, GenAI Acceleration, Docker, and Mat Wilson, Director, Product Management, Docker - Workshop Track (Central Room)
Building MCP Servers for VS Code — Marlene Mhangami, Senior Developer Advocate, Microsoft - Discovery Track 2 (Junior Stage)
Building an open-source NotebookLM alternative — Tuana Çelik, DevRel & AI Engineer, LlamaIndex - Main Track (Master Stage)
Building for the Agentic Era: The Future of AI Infrastructure — Yann Leger, Co-Founder & CEO, Koyeb - Main Track (Master Stage)
Building reasoning agents with reinforcement learning — Julien Launay, Co-founder and CEO, Adaptive ML - Discovery Track 2 (Junior Stage)
Democratizing AI Agents: Building, Sharing, and Securing Made Simple — Tushar Jain, President, Product & Engineering, Docker - Main Track (Master Stage)
How a Docker Engineer Automated Their Way to an Agent Framework — Djordje Lukic, Principal Software Engineer, Docker - Expo Track (Expo Hall)
Inside Chat: how we taught AI to review code like a senior engineer — Merrill Lutsky, Co-founder, CEO, Graphite - Discovery Track 2 (Junior Stage)
Live Debugging AI Agents — Miguel Betegón, Building nets to catch AI bugs, Sentry - Expo Track (Expo Hall)
Rewriting all of Spotify's code base, all the time. — Aleksandar Mitic, Senior Software Engineer, Spotify, and Jo Kelly-Fenton, Software Engineer, Spotify - Discovery Track 1 (Founder's Cafe)
Scaling real-time voice AI — Neil Zeghidour, Co-founder, Kyutai - Main Track (Master Stage)
System Prompt Learning for Agents — Aparna Dhinakaran, Chief Product Officer and Co-Founder, Arize AI - Main Track (Master Stage)
The rise of local CI tooling. Thanks AI coding agents! — Yves Brissaud, Senior Software Engineer, Dagger - Discovery Track 2 (Junior Stage)

Jo Kelly-Fenton and Aleksander Mitic present about refactoring Spotify's codebase

Agents emerged as the most vibrant theme, with sessions ranging from bold experiments — like Spotify’s ongoing codebase rewrites — to nuts-and-bolts tools such as Docker’s internal agent frameworks. A key trend was the shift from agent “toys” to agent “teams”: collaborative, multi-agent setups that distribute tasks, require debugging, and demand orchestration at scale. Talks like System Prompt Learning for Agents emphasized that better prompts aren’t enough; agents need structured learning and evaluation.

4) From PoC to Product (Infra & Ops) — the gritty work of shipping

Why it mattered at the event: Many talks focused on the gap between prototype and production. This theme collected talks on infra choices, CI/CD for models, cost management, and organizational practices required to ship AI features that customers actually rely on.

Key engineering takeaways:

Observability + cost control are product requirements. Production-ready systems must measure both user-facing quality and infrastructure cost.
Reproducible pipelines (data + model + infra) are non-negotiable for iterative product development.
Developer ergonomics—APIs, SDKs, and internal tools—determine how quickly teams iterate on AI features.

Representative topics covered by sessions in this theme:

AX is the only Experience that Matters — Toma Puljak, Software Developer & Advocate, Daytona - Discovery Track 2 (Junior Stage)
Build with Google AI Studio: The fastest path from prompt to production with Gemini — Paige Bailey, AI DevX Lead, Google DeepMind, Patrick Loeber, AI Developer Experience, Google DeepMind, and Guillaume Vernade, Gemini DevRel, Google Deepmind - Workshop Track (Central Room)
Building AI workflows: from local experiments to serving users — Oleg Šelajev, Testcontainers, AI, & Developer relations, Docker - Discovery Track 1 (Founder's Cafe)
Context Engineering: The Art of Feeding LLMs — Alberto Castelo, Staff Machine Learning Engineer, Shopify - Discovery Track 1 (Founder's Cafe)
From Research to Reality with Google DeepMind — Paige Bailey, AI DevX Lead, Google DeepMind - Expo Track (Expo Hall)
Hands-on GraphRAG — Adam Cowley, Developer Advocate at Neo4j, Neo4j - Workshop Track (Central Room)
The State of^H^Hin AI Engineering — Emil Eifrem, Co-Founder and CEO, Neo4j - Main Track (Master Stage)
VC Panel — From Hype to Hard Tech: The Future of the AI Stack — Floriane de Maupeou, Principal at Serena Data Ventures, Assaf Araki, Investment Director, Intel Capital, and Thomas Turelier, Managing Director, Eurazeo - Expo Track (Expo Hall)
Vibe > Benchmarks: Rethinking AI Evaluation for the Real World — Pierre Burgy, CEO, Strapi - Discovery Track 2 (Junior Stage)
Vibing With Data — Andreas Kollegger, GenAI Lead for Developer Relations, Neo4j - Main Track (Master Stage)

Pierre Burgy presents about benchmarking

The “shipping” track often brought audiences down to earth. Talks like GraphRAG workshops put retrieval into developers’ hands, while Paige Bailey’s dual roles — technical deep dives in Google AI Studio and broad perspective in the DeepMind session — tied infra choices directly to product velocity. Meanwhile, Emil Eifrem’s state-of-the-field keynote made clear that infra choices will define competitive advantage for years to come.

5) Human + AI Co-Creation & Evaluation — subjective value and new UX patterns

Why it mattered at the event: Several sessions explored human-centered workflows: design prototyping, creative co-authoring, and evaluation beyond numerical benchmarks—“vibe-testing” and subjective measures of quality.

Key engineering takeaways:

Evaluation must capture subjective qualities (tone, creativity, coherence) that matter to users; this often requires human-in-the-loop experiments and A/B testing designed for creative outcomes.-
Interfaces matter: even powerful models fail if the UX for controlling them is poor.
Co-creation workflows (human + model) are becoming an important product category — focusing on augmentation, not replacement.

Representative topics covered by sessions in this theme:

Beyond Single Turns: Evaluating AI Agents at the Session Level — Srilakshmi Chavali, AI Engineer, Arize AI - Expo Track (Expo Hall)
Building MCP's at GitHub Scale — Martin Woodward, Vice President of DevRel, GitHub - Main Track (Master Stage)
Everything That Can Go Wrong Building Analytics Agents (And How We Survived It) — Thomas Schmidt, AI Engineer, Metabase - Discovery Track 2 (Junior Stage)
Inside FLUX, How It Really Works — Andreas Blattmann, Co-founder, Black Forest Labs - Main Track (Master Stage)
Systematic Agent Evaluation with Arize — SallyAnn DeLucia, Staff AI Product Manager, Arize AI - Workshop Track (Central Room)
What's new and what's next for generative AI — Paige Bailey, AI DevX Lead, Google DeepMind - Main Track (Master Stage)

Paige Bailey presenting about what's new and next for gen AI

Evaluation was treated as both art and science. On one end, Systematic Agent Evaluation with Arize showed how to design structured metrics; on the other, Vibe > Benchmarks challenged the community to think about UX-first metrics that reflect human experience. Talks like Beyond Single Turns and Everything That Can Go Wrong drove home the point: AI that can’t be co-created or trusted with real people’s work is still just a lab experiment.

Event wrap-up

AI Engineer Paris 2025 underscored what the AI Engineer community has been proving all along: AI is no longer primarily an academic or marketing exercise — it’s engineering. Across tracks we saw teams moving from experimentation to systems thinking, from one-off demos to operational patterns, and from isolated models to multi-component stacks (models + retrieval + memory + tooling). The next 12–24 months will be about reproducibility, cost-efficiency, and making agents and assistants genuinely useful in people's workflows.

Thanks to all speakers, attendees, volunteers, and sponsors — and especially to the engineers who walked us through their mistakes and hard-earned patterns. See you at the next event!

Be sure to subscribe to the Koyeb YouTube channel so you can check out all the talks as they are posted.

1) Open Models & Ecosystem — choice, ops, and evaluation (what to run and why)

2) Reliability, Control & Context Engineering — making systems predictable

3) Agents & Automation — from wrappers to orchestrators

4) From PoC to Product (Infra & Ops) — the gritty work of shipping

5) Human + AI Co-Creation & Evaluation — subjective value and new UX patterns

Event wrap-up

Deploy AI apps to production in minutes