AI image generation benchmark with the images included
10 models evaluated on 192 prompts across 6 categories. Know which model is best — for your use case, your budget, your quality bar.

Text Rendering › Typography Style › Easyopenai/gpt-image-2
Prompt: The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink
V1 Leaderboard
192 prompts, 6 categories, graded pass/fail by VLM judges.
| # | Model | Pass Rate | Pass / Fail | Avg Latency |
|---|---|---|---|---|
| 1 | openai/gpt-image-2 | 95.8% | 184/7 | 45.3s |
| 2 | fal/fal-ai/nano-banana-2 | 93.8% | 180/12 | 28.1s |
| 3 | bfl/flux-2-max | 78.6% | 151/15 | 26.7s |
| 4 | fal/fal-ai/nano-banana-pro | 78.6% | 151/14 | 23.4s |
| 5 | bfl/flux-2-klein-9b | 75.5% | 145/47 | 4.1s |
| 6 | z-image-local/z-image-turbo | 75.5% | 145/47 | 18.1s |
| 7 | bfl/flux-2-pro | 73.4% | 141/29 | 11.8s |
| 8 | nucleus-local/nucleus-image | 67.2% | 129/62 | 39.1s |
| 9 | bfl/flux-2-klein-4b | 63.5% | 122/46 | 3.8s |
| 10 | sana-local/sana-1.5-1.6b | 52.6% | 101/90 | 11.1s |
What we evaluate
Each model is tested across 6 categories with 192 prompts spanning easy to extreme difficulty.
Start learning
Comprehensive guides on image generation evaluation — from metrics to methodology.
Browse guidesFrequently asked questions
See how every model performs
Compare models side-by-side with our interactive benchmark explorer.
Explore ImageBench V1