Pruna AI Flux.1 [dev] Juiced
Deploy Flux.1 [dev], optimized with Pruna AI, achieving 5x to 9x speedups over the base model with lossless quality, on a dedicated API endpoint powered by Koyeb GPUs for high-performance, low-latency, and efficient inference.
Deploy Flux.1 [dev] image generation model optimized with Pruna AI on Koyeb high-performance infrastructure.
With one click, get a dedicated GPU-powered inference endpoint ready to handle requests with built-in autoscaling and scale-to-zero.
Overview of Pruna AI Flux.1 [dev] Juiced
The Pruna AI Flux.1 [dev] Juiced one-click model is a highly optimized version of Black Forest Labs' FLUX.1 [dev], enhanced using Pruna AI to achieve 5x to 9x faster inference speeds without sacrificing quality.
Pruna AI’s advanced optimization techniques, rooted in cutting-edge AI Efficiency research, ensure that this model maintains high fidelity while significantly boosting performance. The default configuration (Juiced 🔥
) provides a safe balance between speed and quality. However, additional settings are available, allowing users to either prioritize consistent output quality (Lightly Juiced 🍊
) or push for even faster inference times (Extra Juiced 🔥
).
This one-click model demonstrates how Pruna accelerates image generation, enabling near-instantaneous results for an outstanding user experience and optimized inference costs by minimizing computational time.
The default GPU for running this model is the NVIDIA L40S instance type. You are free to adjust the GPU instance type to fit your workload requirements.
Quickstart
After you deploy the Pruna AI Flux.1 [dev] Juiced model, copy the Koyeb App public URL similar to https://<YOUR_DOMAIN_PREFIX>.koyeb.app
and create a simple Python file with the following content to start interacting with the model.
import base64
from io import BytesIO
import httpx
from PIL import Image
KOYEB_URL = "https://<YOUR_DOMAIN_PREFIX>.koyeb.app"
def b64_to_pil(base64_string):
"""
Convert a Base64 string to a PIL Image.
:param base64_string: Base64 encoded image string
:return: PIL Image object
"""
# Remove the header if present
if base64_string.startswith("data:image"):
base64_string = base64_string.split(",")[1]
# Decode the Base64 string
image_data = base64.b64decode(base64_string)
# Create a PIL Image from the decoded binary data
return Image.open(BytesIO(image_data))
payload = {
"prompt": 'a purple cheetah holding a sign that says "pip install pruna"',
"speed_mode": "Juiced 🔥 (default)",
"num_inference_steps": 28,
"guidance_scale": 3.5,
"seed": 2,
"width": 1024,
"height": 1024,
"num_image_per_prompt": 1,
}
# Call the model precition endpoint
res = httpx.post(
f"{KOYEB_URL}/predict",
json=payload,
timeout=60.0,
)
# Get the output image
res = res.json()
output = res.get("images")[0]
# Convert the base64 model output to an image and save it to disk
img = b64_to_pil(output)
img.save("output_image.jpg")
The snippet above showcases how to interact with the Pruna AI Flux.1 [dev] model to generate an image from a text prompt and save it to disk.
Take care to replace the KOYEB_URL
value in the snippet with your Koyeb App public URL.
Speed and Quality Trade-Offs
For Lightly Juiced 🍊 (more consistent)
results, with more consistency in the outputs, update the JSON payload by setting speed_mode
to Lightly Juiced 🍊 (more consistent)
:
payload = {
"prompt": 'a purple cheetah holding a sign that says "pip install pruna"',
- "speed_mode": "Juiced 🔥 (default)",
+ "speed_mode": "Lightly Juiced 🍊 (more consistent)",
"num_inference_steps": 28,
"guidance_scale": 3.5,
"seed": 2,
"width": 1024,
"height": 1024,
"num_image_per_prompt": 1,
}
For Extra Juiced 🔥 (more speed)
results, with faster inference times, update the JSON payload by setting speed_mode
to Extra Juiced 🔥 (more speed)
:
payload = {
"prompt": 'a purple cheetah holding a sign that says "pip install pruna"',
- "speed_mode": "Juiced 🔥 (default)",
+ "speed_mode": "Extra Juiced 🔥 (more speed)",
"num_inference_steps": 28,
"guidance_scale": 3.5,
"seed": 2,
"width": 1024,
"height": 1024,
"num_image_per_prompt": 1,
}