All appsMistral Magistral Small 2506

Mistral Magistral Small 2506

Deploy Magistral with vLLM on Koyeb GPU for high-performance, low-latency, and efficient inference.

Deploy Magistral Small 2506 large language model on Koyeb’s high-performance cloud infrastructure.

With one click, get a dedicated GPU-powered inference endpoint ready to handle requests with built-in autoscaling and scale-to-zero.

Deploy Magistral Small 2506 for free

Get up to $200 in credit to get started!

Claim credit

Overview of Mistral Magistral Small 2506

Mistral Magistral Small 2506 is a 24-billion-parameter model built on Mistral Small 3 delivering added reasoning capabilities and multilingual support for dozens of languages. It is ideal for various reasoning tasks, fast-response conversational agents, and any other applications requiring reasoning and strong language understanding.

Magistral Small 2506 will be served with the vLLM inference engine, optimized for high-throughput and low-latency model serving.

The default GPU for running this model is the Nvidia A100 instance type. You are free to adjust the GPU instance type to fit your workload requirements.

Quickstart

The Magistral Small 2506 one-click model is served using the vLLM engine. vLLM is an advanced inference engine designed for high-throughput and low-latency model serving. Optimized for large language models, it provides efficient performance and compatibility with the OpenAI API.

After you deploy the Magistral Small 2506 model, copy the Koyeb App public URL similar to https://<YOUR_DOMAIN_PREFIX>.koyeb.app and create a simple Python file with the following content to start interacting with the model.

import os

from openai import OpenAI

client = OpenAI(
  api_key = os.environ.get("OPENAI_API_KEY", "fake"),
  base_url="https://<YOUR_DOMAIN_PREFIX>.koyeb.app/v1",
)

chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "If animals could talk, which species would be the most sarcastic, and why?",
        }
    ],
    model="mistralai/Magistral-Small-2506",
    max_tokens=30,
)

print(chat_completion.to_json(indent=4))

The snippet above is using the OpenAI SDK to interact with the Magistral Small 2506 model thanks to vLLM OpenAI compatibility.

Take care to replace the base_url value in the snippet with your Koyeb App public URL.

Executing the Python script will return the model's response to the input message.


python main.py

{
    "id": "chatcmpl-d5d754c2-3ef3-95c2-a707-660bc5c3de56",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null,
            "message": {
                "content": "Cats would probably be the most sarcastic species if they could talk. They have a reputation for being aloof and indifferent, and their body language often suggests they think they're superior to humans. Imagine a cat rolling its eyes and saying something like, 'Oh, you brought me a toy? How original.' Their dry wit and disdain for human antics would make them the ultimate masters of sarcasm."
            }
            "stop_reason": null
        }
    ],
    "created": 1732135919,
    "model": "mistralai/Magistral-Small-2506",
    "object": "chat.completion",
    "usage": {
        "completion_tokens": 30,
        "prompt_tokens": 20,
        "total_tokens": 50
        "prompt_tokens_details": null
    }
    "prompt_logprobs": null
}

Securing the Inference Endpoint

To ensure that only authenticated requests are processed, we recommend setting up an API key to secure your inference endpoint. Follow these steps to configure the API key:

  1. Generate a strong unique API key to use for authentication
  2. Navigate to your Koyeb Service settings
  3. Add a new environment variable named VLLM_API_KEY and set its value to your secret API key
  4. Save the changes and redeploy to update the service

Once the service is updated, all requests to the inference endpoint will require the API key.

When making requests, ensure the API key is included in the headers. If you are using the OpenAI SDK, you can provide the API key through the api_key parameter when instantiating the OpenAI client. Alternatively, you can set the API key using the OPENAI_API_KEY environment variable. For example:

OPENAI_API_KEY=<YOUR_API_KEY> python main.py

Deploy AI apps to production in minutes

Get started
Koyeb is a developer-friendly serverless platform to deploy apps globally. No-ops, servers, or infrastructure management.
All systems operational
© Koyeb