ImageBench

AI image generation benchmark with the images included

10 models evaluated on 192 prompts across 6 categories. Know which model is best — for your use case, your budget, your quality bar.

The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink
Text RenderingTypography StyleEasyopenai/gpt-image-2

Prompt: The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink

V1 Leaderboard

192 prompts, 6 categories, graded pass/fail by VLM judges.

Full benchmark explorer
#ModelPass RatePass / FailAvg Latency
1openai/gpt-image-2
95.8%
184/745.3s
2fal/fal-ai/nano-banana-2
93.8%
180/1228.1s
3bfl/flux-2-max
78.6%
151/1526.7s
4fal/fal-ai/nano-banana-pro
78.6%
151/1423.4s
5bfl/flux-2-klein-9b
75.5%
145/474.1s
6z-image-local/z-image-turbo
75.5%
145/4718.1s
7bfl/flux-2-pro
73.4%
141/2911.8s
8nucleus-local/nucleus-image
67.2%
129/6239.1s
9bfl/flux-2-klein-4b
63.5%
122/463.8s
10sana-local/sana-1.5-1.6b
52.6%
101/9011.1s

What we evaluate

Each model is tested across 6 categories with 192 prompts spanning easy to extreme difficulty.

Text Rendering
Typography accuracy, writing correctness across difficulty levels
Spatial Reasoning
Compositionality, counting, relative position, scale & proportions
Human Realism
Faces, expressions, hands, full body, multi-subject coherence
Truthfulness
Physics, reflections, photorealism, world knowledge
Professional Studio
Camera & lighting, color precision, photorealistic quality
Graphical Design
Layout, data visualisation, style diversity

Start learning

Comprehensive guides on image generation evaluation — from metrics to methodology.

Browse guides

Frequently asked questions

See how every model performs

Compare models side-by-side with our interactive benchmark explorer.

Explore ImageBench V1