AI image generation benchmark with the images included

Name: ImageBench V1 AI Image Generation Benchmark
Creator: ImageBench

10 models evaluated on 192 prompts across 6 categories. Know which model is best — for your use case, your budget, your quality bar.

Text Rendering › Typography Style › Easyopenai/gpt-image-2

Prompt: The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink

V1 Leaderboard

192 prompts, 6 categories, graded pass/fail by VLM judges.

#	Model	Pass Rate	Pass / Fail	Avg Latency
1	openai/gpt-image-2	95.8%	184/7	45.3s
2	fal/fal-ai/nano-banana-2	93.8%	180/12	28.1s
3	bfl/flux-2-max	78.6%	151/15	26.7s
4	fal/fal-ai/nano-banana-pro	78.6%	151/14	23.4s
5	bfl/flux-2-klein-9b	75.5%	145/47	4.1s
6	z-image-local/z-image-turbo	75.5%	145/47	18.1s
7	bfl/flux-2-pro	73.4%	141/29	11.8s
8	nucleus-local/nucleus-image	67.2%	129/62	39.1s
9	bfl/flux-2-klein-4b	63.5%	122/46	3.8s
10	sana-local/sana-1.5-1.6b	52.6%	101/90	11.1s

Each model is tested across 6 categories with 192 prompts spanning easy to extreme difficulty.

Text Rendering

Typography accuracy, writing correctness across difficulty levels

Spatial Reasoning

Compositionality, counting, relative position, scale & proportions

Human Realism

Faces, expressions, hands, full body, multi-subject coherence

Truthfulness

Physics, reflections, photorealism, world knowledge

Professional Studio

Camera & lighting, color precision, photorealistic quality

Graphical Design

Layout, data visualisation, style diversity

Comprehensive guides on image generation evaluation — from metrics to methodology.

See how every model performs

Compare models side-by-side with our interactive benchmark explorer.