Benchmark V1
Capability benchmark: 18 models graded pass/fail by VLM judges across 192 prompts in six categories. Every generated image is published so you can judge with your own eyes.
V1 Leaderboard
Click any model to see every image it generated, or open two models to compare them side-by-side.
| # | Model | Pass Rate | Pass / Fail | Avg Latency |
|---|---|---|---|---|
| 1 | fal/google/nano-banana-2 | 95.3% | 183/9 | 28.1s |
| 2 | openai/gpt-image-2 | 95.3% | 183/9 | 45.3s |
| 3 | fal/google/nano-banana-pro | 91.1% | 175/17 | 23.4s |
| 4 | bfl/flux-2-max | 90.6% | 174/18 | 26.7s |
| 5 | fal/bytedance/seedream-v4 | 84.4% | 162/30 | 14.1s |
| 6 | bfl/flux-2-pro | 82.8% | 159/33 | 11.8s |
| 7 | fal/ideogram/v4 | 82.3% | 158/34 | 16.6s |
| 8 | bfl/flux-2-klein-9b | 78.6% | 151/41 | 4.1s |
| 9 | local/z-image-6b | 75.5% | 145/47 | 130.7s |
| 10 | local/z-image-turbo-6b | 74.5% | 143/49 | 18.1s |
| 11 | bfl/flux-2-klein-4b | 72.4% | 139/53 | 3.8s |
| 12 | local/qwen-image-2512-20b | 69.3% | 133/59 | 80.2s |
| 13 | local/bonsai-image-ternary-4b | 68.2% | 131/61 | 4.1s |
| 14 | fal/ideogram/v3 | 68.2% | 131/61 | 12.9s |
| 15 | local/prxpixel-t2i-7b | 65.1% | 125/67 | 64.7s |
| 16 | local/nucleus-image-17b-a2b | 64.1% | 123/69 | 39.1s |
| 17 | local/hidream-i1-full-17b | 56.8% | 109/83 | 91.3s |
| 18 | local/sana-1.5-1.6b | 51.0% | 98/94 | 11.1s |