ImageBench

vs

192 evaluations across 6 categories

96%vs53%
Pass Rate
45.3svs11.1s
Avg Latency
Text Rendering100%20%Spatial Reasoning97%49%Human realism98%67%Truthfulness89%48%Professional Studio93%74%Graphical design100%38%Latency0%29%
The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink
Text RenderingTypography StyleEasysana-local/sana-1.5-1.6b

Prompt: The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink

Text Rendering 100%vs20%

Typography Style100%vs67%
Easy 100%vs67%
The word 'CHAPTER ONE' typed on aged paper with a vintage tyopenai/gpt-image-2PASS
The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink

Evaluator

Text reads “CHAPTER ONE” correctly. Font mimics vintage typewriter style with uneven, slightly smudged ink and dot-matrix appearance. Background is aged paper with authentic texture and discoloration. Matches prompt’s medium and style convincingly. VERDICT: PASS

The word 'OPEN' displayed on a red LED sign with glowing pixopenai/gpt-image-2PASS
The word 'OPEN' displayed on a red LED sign with glowing pixel-style letters

Evaluator

The text “OPEN” is correctly spelled and rendered in a convincing pixel-style LED format with glowing red dots, matching the prompt’s description. The medium’s visual characteristics — dot matrix, brightness, and casing — are accurately portrayed. VERDICT: PASS

The phrase 'TODAY'S SPECIAL' written in white chalk on a daropenai/gpt-image-2PASS
The phrase 'TODAY'S SPECIAL' written in white chalk on a dark green chalkboard with chalk dust

Evaluator

Text is rendered in a convincing chalk-on-chalkboard style with visible texture and dust. Spelling is correct: “TODAY’S SPECIAL”. The medium matches the prompt precisely. VERDICT: PASS

Writing accuracy100%vs8%
Easy 100%vs33%
A single large word 'HELLO' centered on a white backgroundopenai/gpt-image-2PASS
A single large word 'HELLO' centered on a white background

Evaluator

The word "HELLO" is fully spelled correctly and is clearly readable. All letters are intact and legible against the white background. No misspellings, missing letters, or illegible parts are present. VERDICT: PASS

The word 'ADVENTURE' in big bold letters centered on a plainopenai/gpt-image-2PASS
The word 'ADVENTURE' in big bold letters centered on a plain cream background

Evaluator

The word "ADVENTURE" is spelled correctly and fully readable in bold, centered text on a cream background. No misspellings, missing words, or illegible parts are present. All elements match the intended description. VERDICT: PASS

The word 'SERENITY' written large and centered on a solid liopenai/gpt-image-2PASS
The word 'SERENITY' written large and centered on a solid light blue background

Evaluator

The word “SERENITY” is spelled correctly and is fully readable in large, centered white letters against a solid light blue background. No misspellings, missing words, or illegible parts are present. The text matches the intended specification exactly. VERDICT: PASS

Medium 100%vs0%
The sentence 'The quick brown fox jumps high' in clean blackopenai/gpt-image-2PASS
The sentence 'The quick brown fox jumps high' in clean black text on a white background

Evaluator

All words are spelled correctly and fully readable: "The", "quick", "brown", "fox", "jumps", "high". No misspellings, missing words, or illegible parts. Text matches the intended phrase exactly. VERDICT: PASS

'Every sunset brings a new dawn ahead' written neatly on a popenai/gpt-image-2PASS
'Every sunset brings a new dawn ahead' written neatly on a pale yellow background

Evaluator

All words are spelled correctly and fully readable. No misspellings, missing words, or illegible parts. Text matches the intended phrase exactly. VERDICT: PASS

'Stars shine brightest in the dark night' displayed on a cleopenai/gpt-image-2PASS
'Stars shine brightest in the dark night' displayed on a clean off-white background

Evaluator

All words are spelled correctly and fully readable: "Stars", "shine", "brightest", "in", "the", "dark", "night". No misspellings or missing words. Text is clear against the off-white background. VERDICT: PASS

Hard 100%vs0%
A poster with headline 'SUMMER SALE' in large bold text, bodopenai/gpt-image-2PASS
A poster with headline 'SUMMER SALE' in large bold text, body text 'Up to 50% off all items this weekend only' in medium size, and fine print 'Terms and conditions apply. See store for details.' at the bottom in small text

Evaluator

All text is spelled correctly and fully readable. Headline: “SUMMER SALE”. Body: “Up to 50% off all items this weekend only”. Fine print: “Terms and conditions apply. See store for details.” No misspellings or illegible parts. VERDICT: PASS

A flyer with large title 'GRAND OPENING', subtitle 'Join us openai/gpt-image-2PASS
A flyer with large title 'GRAND OPENING', subtitle 'Join us for food, music, and fun' in medium text, and small disclaimer 'RSVP required. Limited seating available.' at the bottom

Evaluator

All text is spelled correctly and fully readable. “GRAND OPENING” matches the title. The subtitle “Join us for food, music, and fun” is accurate. The disclaimer “RSVP required. Limited seating available.” is present and legible. No misspellings, missing words, or illegible parts detected. VERDICT: PASS

An event banner with headline 'ANNUAL GALA 2025' at the top openai/gpt-image-2PASS
An event banner with headline 'ANNUAL GALA 2025' at the top in large serif font, description 'An evening of fine dining and live jazz' in medium text, and footnote 'Black tie optional. Doors open at 6 PM.' in small text

Evaluator

All text matches the intended message exactly: “ANNUAL GALA 2025”, “An evening of fine dining and live jazz”, and “Black tie optional. Doors open at 6 PM.”. Every word is spelled correctly and fully readable in its respective font size and placement. No misspellings or illegible parts detected. VERDICT: PASS

Extreme 100%vs0%
A product label for 'GOLDEN HARVEST ORGANIC HONEY' with subtopenai/gpt-image-2PASS
A product label for 'GOLDEN HARVEST ORGANIC HONEY' with subtitle 'Raw & Unfiltered', weight '12 oz (340g)', description 'Pure wildflower honey harvested from sustainable apiaries in the Pacific Northwest. No additives or preservatives.', and small te

Evaluator

All text matches the intended content exactly: “GOLDEN HARVEST ORGANIC HONEY”, “Raw & Unfiltered”, “12 oz (340g)”, “Pure wildflower honey harvested from sustainable apiaries in the Pacific Northwest. No additives or preservatives.”, and “Best before: 06/2026. Store in cool, dry place. Product of USA.”. No misspellings or illegible parts. All words are fully readable and accurate. VERDICT: PASS

A magazine cover titled 'NATIONAL EXPLORER' with cover linesopenai/gpt-image-2PASS
A magazine cover titled 'NATIONAL EXPLORER' with cover lines 'The Hidden Temples of Cambodia — page 34', 'Interview: Dr. Maria Santos on Climate Action', 'Top 10 Trails You Must Hike This Year', and a barcode area with 'Issue 247 | March 2025 | $6.99

Evaluator

All text matches the intended content exactly: “NATIONAL EXPLORER”, “The Hidden Temples of Cambodia — page 34”, “Interview: Dr. Maria Santos on Climate Action”, “Top 10 Trails You Must Hike This Year”, and “Issue 247 | March 2025 | $6.99”. No misspellings, missing words, or illegible parts. All text is fully readable and accurate. VERDICT: PASS

A vintage movie poster for 'THE LAST VOYAGE' featuring a tagopenai/gpt-image-2PASS
A vintage movie poster for 'THE LAST VOYAGE' featuring a tagline 'One ship. One chance. No turning back.' with credits block reading 'Directed by James Harlow. Starring Elena Voss, Marcus Chen, and Sofia Reyes. Music by Thomas Grant. A Paramount Pict

Evaluator

All text matches the intended content exactly: tagline, credits, director, cast, music composer, studio, and release date are fully spelled correctly and clearly readable. No misspellings or illegible parts detected. VERDICT: PASS

Spatial Reasoning 97%vs49%

Attributes Binding100%vs44%
Easy 100%vs100%
A green apple and an orange pumpkin on a wooden cutting boaropenai/gpt-image-2PASS
A green apple and an orange pumpkin on a wooden cutting board

Evaluator

The image displays a green apple on the left and an orange pumpkin on the right, both resting on a wooden cutting board. The colors, shapes, and relative sizes are correctly assigned to their respective objects as requested. - **Green Apple:** Observed as green, round, and smaller. - **Orange Pumpkin:** Observed as orange, ribbed, and larger. - **Surface:** Observed as a wooden cutting board. VERDICT: PASS

A red ball and a blue cube sitting side by side on a white sopenai/gpt-image-2PASS
A red ball and a blue cube sitting side by side on a white surface

Evaluator

The image displays a red sphere (ball) on the left and a blue cube on the right, placed on a white background. The colors, shapes, and relative positions match the prompt perfectly. * **Object 1:** Red ball (Sphere, Red color). * **Object 2:** Blue cube (Cube, Blue color). VERDICT: PASS

A yellow taxi and a black limousine parked next to each otheopenai/gpt-image-2PASS
A yellow taxi and a black limousine parked next to each other on a street

Evaluator

The image contains a yellow taxi (Toyota Camry) in the foreground and a black limousine (Lincoln Town Car stretch) parked directly behind/next to it along the curb. The colors and vehicle types match the prompt perfectly with no attribute swaps. VERDICT: PASS

Medium 100%vs33%
A large red circle, a small blue triangle, and a medium greeopenai/gpt-image-2PASS
A large red circle, a small blue triangle, and a medium green star arranged on a white canvas

Evaluator

The image contains three distinct shapes on a white background. 1. **Left Object:** A large red circle. 2. **Middle Object:** A small blue triangle. 3. **Right Object:** A medium green star. All attributes (color, size, shape) are correctly bound to the corresponding objects as requested in the prompt. VERDICT: PASS

A big brown horse, a small white rabbit, and a medium-sized openai/gpt-image-2PASS
A big brown horse, a small white rabbit, and a medium-sized black cat standing in a meadow

Evaluator

The image contains a large brown horse, a small white rabbit, and a black cat in a grassy meadow. All attributes (color, size, object type) match the prompt perfectly without any swaps. VERDICT: PASS

A tall thin white candle, a short fat red candle, and a mediopenai/gpt-image-2PASS
A tall thin white candle, a short fat red candle, and a medium spiral yellow candle on a shelf

Evaluator

The image contains exactly what was requested: 1. **Tall thin white candle:** Present on the left in a holder. 2. **Short fat red candle:** Present in the middle. 3. **Medium spiral yellow candle:** Present on the right. All colors, sizes, and shapes match the prompt perfectly with no attribute swapping. VERDICT: PASS

Hard 100%vs0%
A pink fire truck, a blue banana, and a green sun in a surreopenai/gpt-image-2PASS
A pink fire truck, a blue banana, and a green sun in a surreal landscape

Evaluator

The image correctly depicts all requested objects with their specific attributes bound correctly: 1. **Fire Truck:** It is pink, matching the prompt "pink fire truck". 2. **Banana:** It is blue, matching the prompt "blue banana". 3. **Sun:** It is green, matching the prompt "green sun". There are no attribute swaps (e.g., the banana is not pink, the truck is not blue). The surreal landscape context is also present. VERDICT: PASS

An orange elephant, a yellow penguin, and a silver strawberropenai/gpt-image-2PASS
An orange elephant, a yellow penguin, and a silver strawberry arranged on a table

Evaluator

The image contains three distinct objects arranged on a wooden surface. 1. **Elephant:** It is orange in color and has the correct shape. 2. **Penguin:** It is yellow (with a white belly) and has the correct shape. 3. **Strawberry:** It is silver/metallic in color and has the correct shape. All requested attributes (color, object type) are correctly bound to the respective objects with no swaps. VERDICT: PASS

A purple carrot, a red tree with red leaves, and a white watopenai/gpt-image-2PASS
A purple carrot, a red tree with red leaves, and a white watermelon sliced open

Evaluator

The image contains all the requested objects with the correct attributes bound to them: 1. **Purple carrot:** A carrot with green tops and a distinct purple root is visible on the left. 2. **Red tree with red leaves:** A large tree with vibrant red foliage dominates the background. 3. **White watermelon sliced open:** A white-fleshed watermelon with black seeds is shown whole and sliced on the right. VERDICT: PASS

Compositionality100%vs89%
Easy 100%vs67%
A tabby cat wearing oversized red sunglassesopenai/gpt-image-2PASS
A tabby cat wearing oversized red sunglasses

Evaluator

- **Tabby cat:** Present. The image clearly shows a cat with distinct tabby markings (stripes and an 'M' on the forehead). - **Oversized red sunglasses:** Present. The cat is wearing large, bright red sunglasses that fit the description perfectly. All described elements are present and coherently combined. VERDICT: PASS

A golden retriever wearing a tiny cowboy hatopenai/gpt-image-2PASS
A golden retriever wearing a tiny cowboy hat

Evaluator

- **Golden retriever:** Present. The main subject is clearly a Golden Retriever with characteristic golden fur and facial features. - **Wearing:** Present. The dog has the hat positioned on its head. - **Tiny cowboy hat:** Present. The dog is wearing a brown cowboy hat with a star-studded band, sized appropriately for the animal. All described elements are present and combined coherently within a western-themed setting. VERDICT: PASS

A penguin holding a colorful umbrellaopenai/gpt-image-2PASS
A penguin holding a colorful umbrella

Evaluator

- **Penguin:** Present (a Gentoo penguin standing on rocks). - **Holding:** Present (the penguin's flipper is gripping the umbrella handle). - **Colorful umbrella:** Present (a rainbow-colored umbrella with visible raindrops). All elements are clearly visible and combined in a coherent, realistic style despite the unusual subject matter. VERDICT: PASS

Medium 100%vs100%
A medieval knight in full armor sitting at a desk typing on openai/gpt-image-2PASS
A medieval knight in full armor sitting at a desk typing on a modern laptop

Evaluator

- **Medieval knight:** Present. - **Full armor:** Present (helmet, chainmail, plate armor). - **Sitting at a desk:** Present (wooden table). - **Typing:** Present (gauntleted hands on keyboard). - **Modern laptop:** Present (silver laptop). All elements are clearly visible and combined coherently to create the requested anachronistic scene. VERDICT: PASS

A Roman gladiator taking a selfie with a smartphone in the Copenai/gpt-image-2PASS
A Roman gladiator taking a selfie with a smartphone in the Colosseum

Evaluator

- **Roman gladiator:** Present (man in armor). - **Taking a selfie:** Present (holding phone up, looking at lens). - **Smartphone:** Present (modern black phone visible). - **Colosseum:** Present (background architecture matches the Roman amphitheater). All elements are clearly visible and combined to depict the requested anachronistic scene. VERDICT: PASS

An astronaut in a spacesuit riding a bicycle through a parkopenai/gpt-image-2PASS
An astronaut in a spacesuit riding a bicycle through a park

Evaluator

- **Astronaut in a spacesuit:** Present. The figure is wearing a full white NASA-style suit with a helmet and backpack. - **Riding a bicycle:** Present. The astronaut is seated on a black bicycle, holding the handlebars. - **Through a park:** Present. The background features green trees, a paved path, a body of water, and a sign explicitly reading "Riverside Park." All described elements are clearly visible and coherently combined in the scene. VERDICT: PASS

Hard 100%vs100%
An octopus wearing a top hat and monocle playing chess on thopenai/gpt-image-2PASS
An octopus wearing a top hat and monocle playing chess on the surface of the moon with Earth visible in the background

Evaluator

- **Octopus:** Present (central figure). - **Top hat:** Present (black hat on head). - **Monocle:** Present (gold-rimmed eyewear with chain). - **Playing chess:** Present (chessboard with pieces, holding a piece). - **Surface of the moon:** Present (grey, cratered terrain). - **Earth visible in background:** Present (large planet in the sky). All described elements are clearly visible and combined coherently in a surreal scene. VERDICT: PASS

A dinosaur in a business suit giving a PowerPoint presentatiopenai/gpt-image-2PASS
A dinosaur in a business suit giving a PowerPoint presentation in a modern office while drinking coffee

Evaluator

- **Dinosaur:** Present (A T-Rex character). - **Business suit:** Present (Blue suit, white shirt, tie). - **Giving a PowerPoint presentation:** Present (Pointing at a screen labeled "Q2 BUSINESS REVIEW" with charts). - **Modern office:** Present (Conference room, city view, laptops). - **Drinking coffee:** Present (Holding a mug labeled "CAFFEINATED & FOCUSED"). All described elements are clearly visible and integrated into a coherent scene. VERDICT: PASS

A robot dressed as a pirate sailing a paper boat on a lake oopenai/gpt-image-2PASS
A robot dressed as a pirate sailing a paper boat on a lake of lava while juggling three kittens

Evaluator

The image contains all the requested elements: 1. **Robot:** Present (central figure). 2. **Dressed as a pirate:** Present (pirate hat, eye patch, coat, flag). 3. **Paper boat:** Present (robot is sitting inside one). 4. **Lake of lava:** Present (background and foreground show flowing lava). 5. **Juggling three kittens:** Present (three kittens are floating/jumping in the air above the robot's outstretched hands). The concepts are combined coherently into a single scene. VERDICT: PASS

Counting89%vs11%
Easy 100%vs0%
Exactly 3 red apples arranged on a wooden tableopenai/gpt-image-2PASS
Exactly 3 red apples arranged on a wooden table

Evaluator

I see exactly 3 red apples arranged on a wooden surface. The count matches the prompt's request perfectly. VERDICT: PASS

Exactly 4 white candles standing on a dark surfaceopenai/gpt-image-2PASS
Exactly 4 white candles standing on a dark surface

Evaluator

I see exactly 4 white candles standing in a row on a dark surface. The count matches the request perfectly. VERDICT: PASS

Exactly 5 yellow rubber ducks sitting in a row on a bathroomopenai/gpt-image-2PASS
Exactly 5 yellow rubber ducks sitting in a row on a bathroom shelf

Evaluator

I see exactly 5 yellow rubber ducks lined up in a row on the white shelf. The count matches the request perfectly. VERDICT: PASS

Medium 100%vs33%
2 orange cats and 3 black dogs sitting together on a grassy openai/gpt-image-2PASS
2 orange cats and 3 black dogs sitting together on a grassy lawn

Evaluator

I see exactly 2 orange cats and 3 black dogs sitting on the grass. The counts match the prompt's request perfectly. VERDICT: PASS

4 red roses and 2 white daisies in a glass vase on a tableopenai/gpt-image-2PASS
4 red roses and 2 white daisies in a glass vase on a table

Evaluator

The image contains exactly 4 red roses (one top center, one left, one center, one right) and 2 white daisies (one left, one right). These are arranged in a clear glass vase sitting on a wooden table. The counts and objects perfectly match the requested description. VERDICT: PASS

1 blue butterfly and 3 yellow butterflies resting on a gardeopenai/gpt-image-2PASS
1 blue butterfly and 3 yellow butterflies resting on a garden fence

Evaluator

I see exactly 1 blue butterfly on the far left and 3 yellow butterflies to its right, all resting on the wooden fence posts. The count matches the requested description perfectly. VERDICT: PASS

Hard 67%vs0%
A fruit bowl containing exactly 3 bananas, 4 oranges, and 2 openai/gpt-image-2PASS
A fruit bowl containing exactly 3 bananas, 4 oranges, and 2 apples

Evaluator

The image shows a fruit bowl with the following counts: - **Bananas:** 3 - **Oranges:** 4 - **Apples:** 2 The counts exactly match the requested specifications. VERDICT: PASS

Exactly 11 colorful balloons floating against a clear blue sopenai/gpt-image-2FAIL
Exactly 11 colorful balloons floating against a clear blue sky

Evaluator

I count 10 balloons in the image: one purple, one red, one orange, one blue, one yellow, one green, one dark pink, one teal, one light pink, and one white. The prompt requested exactly 11 balloons. Since there are only 10, the count does not match the request. VERDICT: FAIL

Exactly 9 goldfish swimming in a large round fishbowlopenai/gpt-image-2PASS
Exactly 9 goldfish swimming in a large round fishbowl

Evaluator

I count 9 goldfish in the image. 1. Top left 2. Top middle 3. Top right 4. Middle left 5. Middle center 6. Middle right 7. Bottom left 8. Bottom center 9. Bottom right The count is exactly 9, matching the request. VERDICT: PASS

Negation89%vs22%
Easy 100%vs67%
A cat lying on a couch without a collaropenai/gpt-image-2PASS
A cat lying on a couch without a collar

Evaluator

The image shows a tabby cat lying on a grey couch. I have examined the cat's neck area closely. There is no collar visible around its neck; the fur is uninterrupted. The excluded element is absent. VERDICT: PASS

A tropical beach scene with no peopleopenai/gpt-image-2PASS
A tropical beach scene with no people

Evaluator

The image depicts a pristine tropical beach scene featuring a large leaning palm tree, white sand, turquoise water, and a blue sky with clouds. I have carefully scanned the beach, the water, the rocks, and the distant shoreline. There are absolutely no people visible anywhere in the scene. The exclusion criteria is met. VERDICT: PASS

A classroom with desks and a chalkboard but no studentsopenai/gpt-image-2PASS
A classroom with desks and a chalkboard but no students

Evaluator

The image depicts a tidy, empty classroom featuring rows of wooden desks and blue chairs facing a green chalkboard with inspirational writing. There are educational posters on the walls, a teacher's desk, and bookshelves. Crucially, there are no people visible in the room—no students and no teacher. The excluded element is completely absent. VERDICT: PASS

Medium 100%vs0%
A sandy beach with palm trees, seashells, and a beach umbrelopenai/gpt-image-2PASS
A sandy beach with palm trees, seashells, and a beach umbrella but no water or ocean visible

Evaluator

The image displays a sandy beach scene featuring palm trees, scattered seashells in the foreground, and a blue-and-white striped beach umbrella with a lounge chair. The background consists of dense green vegetation and more palm trees under a blue sky. There is absolutely no water, ocean, or sea visible in the frame; the horizon is blocked by the tropical foliage. VERDICT: PASS

A kitchen with pots, pans, a stove, and a cutting board but openai/gpt-image-2PASS
A kitchen with pots, pans, a stove, and a cutting board but no food

Evaluator

The image displays a kitchen scene with pots, pans, a stove, and a cutting board as requested. The cutting board is empty. The pots on the stove are covered, hiding their contents. While there are condiments (oil bottle, pepper grinder) and a potted herb plant on the counter, there is no prepared food or loose ingredients visible. The excluded element (food) is effectively absent in the context of a meal or cooking ingredients. VERDICT: PASS

An office workspace with a desk, chair, bookshelf, and lamp openai/gpt-image-2PASS
An office workspace with a desk, chair, bookshelf, and lamp but no computers or screens

Evaluator

The image depicts a traditional office workspace featuring a wooden desk, a black leather executive chair, a beige guest chair, a tall bookshelf filled with books and decor, and a lamp on a side cabinet. I have carefully scanned the desk surface, shelves, and walls, and there are absolutely no computers, monitors, laptops, or digital screens present. The workspace is entirely analog. VERDICT: PASS

Hard 67%vs0%
A pepperoni pizza with tomato sauce, basil, and crust but abopenai/gpt-image-2PASS
A pepperoni pizza with tomato sauce, basil, and crust but absolutely no cheese

Evaluator

The image displays a pizza with a golden-brown crust, a base of red tomato sauce, numerous pepperoni slices, and fresh basil leaves scattered on top. Crucially, there is no visible layer of melted cheese (like mozzarella) covering the sauce or toppings; the pepperoni and basil sit directly on the sauce. The excluded element is absent. VERDICT: PASS

A car body with windows, doors, and headlights but no wheelsopenai/gpt-image-2PASS
A car body with windows, doors, and headlights but no wheels

Evaluator

The image displays a silver car body featuring windows, doors, and headlights as requested. Crucially, looking at the wheel arches, the wheels are completely missing; the wheel wells are empty, showing only the internal structure where tires would normally be. The excluded element (wheels) is truly absent. VERDICT: PASS

A human face with eyes, nose, and mouth but no eyebrowsopenai/gpt-image-2FAIL
A human face with eyes, nose, and mouth but no eyebrows

Evaluator

The image displays a human face with eyes, nose, and mouth. However, contrary to the exclusion criteria, the face clearly has eyebrows. They are visible above the eyes, appearing natural and well-defined. Since the excluded element (eyebrows) is present, the image does not meet the specific constraint. VERDICT: FAIL

Relative Position100%vs75%
Easy 100%vs100%
A cat sitting on top of a cardboard boxopenai/gpt-image-2PASS
A cat sitting on top of a cardboard box

Evaluator

The image shows a tabby cat sitting directly on the upper surface of a brown cardboard box. The cat's paws are resting on the box, confirming the "on top of" relationship described in the prompt. No other specific spatial relationships were requested, and the primary relationship is accurate. VERDICT: PASS

A red ball underneath a wooden chairopenai/gpt-image-2PASS
A red ball underneath a wooden chair

Evaluator

The image shows a wooden chair standing on a hardwood floor. A red ball is resting on the floor directly beneath the seat of the chair, positioned centrally between the chair's legs. The spatial relationship described in the prompt ("underneath") is accurately depicted in the image. VERDICT: PASS

A coffee mug next to an open book on a deskopenai/gpt-image-2PASS
A coffee mug next to an open book on a desk

Evaluator

The image shows a white coffee mug positioned to the left of an open book. Both objects are resting on a wooden surface, which appears to be a desk. The spatial arrangement matches the description perfectly: the mug is next to the book, and both are on the desk. VERDICT: PASS

Medium 100%vs100%
A cat sitting on a pillow, the pillow placed on a wooden chaopenai/gpt-image-2PASS
A cat sitting on a pillow, the pillow placed on a wooden chair

Evaluator

The image shows a tabby cat sitting directly on top of a beige pillow. This pillow is resting on the seat of a wooden chair. The spatial hierarchy described in the prompt (cat on pillow, pillow on chair) is accurately represented in the visual content. VERDICT: PASS

A candle on top of a stack of books, the books next to a potopenai/gpt-image-2PASS
A candle on top of a stack of books, the books next to a potted plant

Evaluator

The image shows a lit candle sitting directly on top of a stack of three books. To the right of this stack, there is a potted plant with green leaves. The spatial arrangement perfectly matches the description: the candle is on the books, and the books are next to the plant. VERDICT: PASS

A bird perched on a branch, the branch extending from a treeopenai/gpt-image-2PASS
A bird perched on a branch, the branch extending from a tree next to a park bench

Evaluator

The image shows a bird perched on a branch that extends from a large tree trunk on the right side. Directly below and next to this tree is a park bench. The spatial relationships described in the prompt—bird on branch, branch from tree, tree next to bench—are all accurately represented in the scene. VERDICT: PASS

Hard 100%vs0%
Five chess pieces on a board: a king in the center, a queen openai/gpt-image-2PASS
Five chess pieces on a board: a king in the center, a queen to its left, a bishop to its right, a rook above the king, and a knight below the king

Evaluator

The image displays five white chess pieces arranged in a cross formation on a wooden chessboard. - **Center:** A King piece is positioned in the middle. - **Left:** A Queen piece is to the left of the King. - **Right:** A Bishop piece is to the right of the King. - **Above:** A Rook piece is directly above the King. - **Below:** A Knight piece is directly below the King. All specified spatial relationships are correctly depicted. VERDICT: PASS

Five animals in a field: a horse on the far left, a cow to topenai/gpt-image-2PASS
Five animals in a field: a horse on the far left, a cow to the right of the horse, a sheep in the center, a goat to the right of the sheep, and a dog on the far right

Evaluator

The image displays five animals standing in a row in a grassy field. From left to right, the arrangement is: a brown horse on the far left, a black and white cow immediately to its right, a white sheep in the center, a brown and white goat to the right of the sheep, and a golden retriever dog on the far right. All specified positional relationships are correct. VERDICT: PASS

A desk scene: laptop in the center, coffee mug to the left, openai/gpt-image-2PASS
A desk scene: laptop in the center, coffee mug to the left, phone to the right, lamp above, and notebook below

Evaluator

The image displays a desk scene with a laptop centrally positioned. To the left of the laptop sits a black coffee mug. To the right lies a smartphone. A desk lamp is positioned above the scene (specifically top-left), illuminating the area. In the foreground, below the laptop, rests a spiral notebook. All specified spatial relationships (center, left, right, above, below) are accurately represented. VERDICT: PASS

Extreme 100%vs100%
A landscape with a red barn in the foreground at large scaleopenai/gpt-image-2PASS
A landscape with a red barn in the foreground at large scale, a windmill in the midground at medium scale, and snow-capped mountains in the background at small scale, all with consistent depth perspective

Evaluator

The image displays a red barn prominently in the foreground on the left side. To its right, slightly further back in the midground, stands a windmill. In the background, behind both structures, are snow-capped mountains. The scale and positioning (barn large/close, windmill medium/mid, mountains small/far) create a consistent depth perspective. All described spatial relationships are accurate. VERDICT: PASS

A beach scene with a sandcastle in the foreground, a surfer openai/gpt-image-2PASS
A beach scene with a sandcastle in the foreground, a surfer on the waves in the midground, and a cruise ship on the horizon in the background, sizes decreasing with distance

Evaluator

The image perfectly matches the described spatial arrangement. The sandcastle is clearly positioned in the immediate foreground on the sand. The surfer is riding a wave in the midground, situated behind the castle but in front of the horizon. The cruise ship is located in the background near the horizon line. The visual sizes correspond to the depth, with the foreground object appearing largest. VERDICT: PASS

A city park with a park bench and pigeons in the foreground,openai/gpt-image-2PASS
A city park with a park bench and pigeons in the foreground, a fountain in the midground, and skyscrapers in the background, each layer progressively smaller

Evaluator

The image perfectly matches the described spatial arrangement. In the immediate foreground on the left, there is a park bench with several pigeons scattered on the pavement nearby. In the midground, centered in the frame, is a large fountain surrounded by greenery. In the background, rising above the tree line, is a skyline of skyscrapers. The layers are distinct and correctly ordered from front to back. VERDICT: PASS

Scale & Proportions100%vs44%
Easy 100%vs67%
A person standing next to a yellow school bus, showing realiopenai/gpt-image-2PASS
A person standing next to a yellow school bus, showing realistic relative scale

Evaluator

The image shows a young man standing next to a standard yellow school bus. The relative scale is realistic: the bus is significantly larger and taller than the person, consistent with real-world dimensions where a school bus is roughly 10-12 feet tall. The person's height relative to the bus door and windows appears accurate. VERDICT: PASS

A ladybug sitting on a large green leaf, shown at realistic openai/gpt-image-2PASS
A ladybug sitting on a large green leaf, shown at realistic proportions

Evaluator

The image shows a ladybug resting on a green leaf. The ladybug appears small relative to the leaf, which occupies most of the frame. This size relationship is consistent with real-world proportions, where a ladybug is a small insect compared to a typical leaf. The visual scale is natural and realistic. VERDICT: PASS

A child holding a basketball, both at correct relative sizesopenai/gpt-image-2PASS
A child holding a basketball, both at correct relative sizes

Evaluator

The image shows a young boy standing on a basketball court holding a basketball. The size of the basketball relative to the child's hands and torso appears realistic and consistent with real-world proportions. The ball is large enough to be a standard youth basketball but fits naturally in his grip. The background elements like the hoop and door also maintain correct scale. VERDICT: PASS

Medium 100%vs67%
A tiny person standing on the rim of a coffee cup, looking dopenai/gpt-image-2PASS
A tiny person standing on the rim of a coffee cup, looking down into it

Evaluator

The image accurately depicts a miniature figure standing on the rim of a coffee cup, looking down into the liquid. The scale is consistent, with the person appearing appropriately tiny relative to the cup, saucer, and background objects like the book and coffee beans. The proportions match the prompt's description perfectly. VERDICT: PASS

A giant tabby cat walking between city skyscrapers like a kaopenai/gpt-image-2PASS
A giant tabby cat walking between city skyscrapers like a kaiju

Evaluator

The image depicts a massive tabby cat towering over skyscrapers, consistent with the "kaiju" description. The cat's head is higher than the surrounding buildings, and its paws are enormous compared to the street level and vehicles. Helicopters flying nearby appear tiny relative to the cat, reinforcing the correct scale. The proportions are consistent with the prompt's request for a giant cat in a city setting. VERDICT: PASS

A miniature house sitting on the palm of someone's handopenai/gpt-image-2PASS
A miniature house sitting on the palm of someone's hand

Evaluator

The image clearly shows a small, detailed model house resting comfortably on an open human palm. The scale is consistent with a "miniature" object, fitting the description perfectly without looking like a full-sized building or an impossibly tiny speck. The proportions between the hand and the house are realistic for a miniature model. VERDICT: PASS

Hard 100%vs0%
A group photo of an elephant, a horse, a dog, a cat, and a mopenai/gpt-image-2PASS
A group photo of an elephant, a horse, a dog, a cat, and a mouse, all standing in a line with correct real-world proportional sizes

Evaluator

The image displays the animals in a line from left to right: elephant, horse, dog, cat, and mouse. Their relative sizes are accurate to real-world proportions. The elephant is the largest, followed by the horse, then the dog, the cat, and finally the tiny mouse. The scaling is consistent and realistic. VERDICT: PASS

A dinner table scene with a plate, fork, knife, wine glass, openai/gpt-image-2PASS
A dinner table scene with a plate, fork, knife, wine glass, salt shaker, and peppercorn, all at realistic relative sizes

Evaluator

The image displays a standard dinner table setting. The plate is central and appropriately sized. The fork and knife flanking it are standard cutlery sizes relative to the plate. The wine glass is a typical size for a wine glass, sitting comfortably next to the plate. The salt shaker and pepper grinder (filled with peppercorns) are positioned to the left and have realistic proportions relative to each other and the other items. All objects maintain consistent and realistic relative sizes. VER

A farm scene with a barn, tractor, farmer, cow, chicken, andopenai/gpt-image-2PASS
A farm scene with a barn, tractor, farmer, cow, chicken, and a fence post, all at correct proportional scale

Evaluator

The image displays a farm scene where the relative sizes of the objects are consistent with reality. The tractor is appropriately large compared to the farmer, the cow is a realistic size relative to the farmer and barn, and the chicken is small in the foreground. The barn and silo are large structures in the background, fitting the scale of the scene. VERDICT: PASS

Human realism 98%vs67%

Faces & Expressions100%vs58%
Easy 100%vs100%
A portrait of a middle-aged man with a calm, serious expressopenai/gpt-image-2PASS
A portrait of a middle-aged man with a calm, serious expression

Evaluator

The face is symmetric with natural, correctly proportioned features. The expression is calm and serious, with subtle facial cues (slightly downturned mouth, focused gaze) conveying the intended emotion. No distortions are visible. Lighting and texture are realistic. VERDICT: PASS

A headshot of a teenager with a relaxed, neutral faceopenai/gpt-image-2PASS
A headshot of a teenager with a relaxed, neutral face

Evaluator

The face is symmetric with natural, correctly proportioned features. The teenager’s expression is relaxed and neutral, with no strong emotion. No distortions are visible. The lighting and composition suit a professional headshot. VERDICT: PASS

A close-up portrait of a young woman with a warm natural smiopenai/gpt-image-2PASS
A close-up portrait of a young woman with a warm natural smile

Evaluator

Face is symmetric with natural features; smile appears warm and genuine, conveying happiness. No obvious distortions. Lighting and detail look authentic. The portrait successfully captures the requested emotion and aesthetic. VERDICT: PASS

Medium 100%vs33%
A close-up of a person's face showing clear disgust, nose wropenai/gpt-image-2PASS
A close-up of a person's face showing clear disgust, nose wrinkled and lip curled

Evaluator

The face is symmetric with natural features. Disgust is clearly conveyed: nose is wrinkled, lips are curled, and eyebrows are furrowed. No significant distortions; expression is authentic and readable. VERDICT: PASS

A portrait capturing genuine surprise — raised eyebrows, widopenai/gpt-image-2PASS
A portrait capturing genuine surprise — raised eyebrows, wide eyes, slightly open mouth

Evaluator

Face is symmetric with natural features. Eyes are wide, eyebrows raised, mouth slightly open — all clearly conveying genuine surprise. No noticeable distortions. Lighting and detail are realistic. Emotion is unmistakable and well-executed. VERDICT: PASS

A face showing quiet melancholy — downcast eyes, slightly dropenai/gpt-image-2PASS
A face showing quiet melancholy — downcast eyes, slightly drawn mouth, distant gaze

Evaluator

Face is symmetric with natural, correctly rendered features. Downcast eyes, slightly downturned mouth, and distant gaze clearly convey quiet melancholy. No noticeable distortions. Lighting and composition enhance the emotional tone. VERDICT: PASS

Hard 100%vs67%
An elderly woman in her 80s with deep wrinkles and white haiopenai/gpt-image-2PASS
An elderly woman in her 80s with deep wrinkles and white hair, smiling warmly with kind eyes

Evaluator

The face is symmetric with natural, aged features and deep wrinkles. White, curly hair frames the face appropriately. The warm smile and kind eyes are clearly visible, conveying genuine warmth. No distortions are present. The image accurately fulfills the prompt’s request. VERDICT: PASS

A 5-year-old child with round cheeks and large eyes, crying openai/gpt-image-2PASS
A 5-year-old child with round cheeks and large eyes, crying with tears streaming down their face

Evaluator

The child’s face is symmetric with natural, proportionate features including round cheeks and large eyes. Tears are clearly streaming down both cheeks, and the crying expression—furrowed brow, open mouth, and visible tears—is unmistakable. No distortions are present. The image accurately fulfills the prompt’s requirements. VERDICT: PASS

An elderly man in his 70s with age spots and thinning gray hopenai/gpt-image-2PASS
An elderly man in his 70s with age spots and thinning gray hair, looking skeptically over reading glasses

Evaluator

Face is symmetric with natural aging features: age spots, thinning gray hair, and pronounced wrinkles. Expression conveys skepticism via furrowed brow and narrowed eyes behind reading glasses. No distortions or unnatural artifacts. Matches prompt precisely. VERDICT: PASS

Extreme 100%vs33%
A group of three friends at a table: one laughing with head openai/gpt-image-2PASS
A group of three friends at a table: one laughing with head tilted back, one looking shocked with wide eyes, and one rolling their eyes with a smirk

Evaluator

All three faces show clear, distinct emotions matching the prompt: left woman laughing with head back, center woman shocked with wide eyes, right woman smirking while rolling eyes. Facial features are symmetrical and anatomically correct with no distortions. Expressions are vivid and easily readable. VERDICT: PASS

A family photo with four people: a father beaming with prideopenai/gpt-image-2PASS
A family photo with four people: a father beaming with pride, a mother smiling gently, a teenager looking bored, and a child grinning excitedly

Evaluator

Faces are symmetric with natural features. Emotions are clearly readable: father beams, mother smiles gently, teenager looks bored (hand on cheek, neutral expression), child grins excitedly. No distortions or unnatural artifacts. All four individuals match the prompt’s emotional descriptions. VERDICT: PASS

Three coworkers in a meeting: one looking frustrated with fuopenai/gpt-image-2PASS
Three coworkers in a meeting: one looking frustrated with furrowed brows, one nodding with a pleased expression, and one looking confused with a tilted head

Evaluator

All three faces show clear, symmetric features with no distortions. The man on the left displays frustration (furrowed brows, tense mouth). The woman center shows pleased expression (smile, relaxed eyes). The woman right shows confusion (tilted head, furrowed brow). Emotions are clearly readable and match the prompt. VERDICT: PASS

Full Body100%vs58%
Easy 100%vs100%
A woman standing upright in a relaxed pose, arms at her sideopenai/gpt-image-2PASS
A woman standing upright in a relaxed pose, arms at her sides, wearing jeans and a white t-shirt

Evaluator

The woman stands upright with arms at her sides, wearing jeans and a white t-shirt as described. Limb count and proportions are anatomically correct. Pose is relaxed and recognizable. No distortions are visible. All elements match the prompt accurately. VERDICT: PASS

A man standing straight facing the camera, hands in pockets,openai/gpt-image-2PASS
A man standing straight facing the camera, hands in pockets, normal proportions

Evaluator

The man stands straight, facing forward, with hands in pockets — perfectly matching the prompt. Limb count is correct (two arms, two legs). Proportions are natural and anatomically accurate. No distortions are visible. The pose is clearly recognizable and typical for a casual portrait. All criteria are satisfied. VERDICT: PASS

A person standing in a neutral pose in a park, full body visopenai/gpt-image-2PASS
A person standing in a neutral pose in a park, full body visible from head to feet

Evaluator

The person’s limbs are correctly counted and proportioned. The neutral standing pose is anatomically accurate, with hands in pockets and slight weight distribution. The park setting and full-body framing match the prompt. No distortions are visible. The pose is clearly recognizable as a relaxed, casual stance. VERDICT: PASS

Medium 100%vs67%
A man captured mid-jump over a puddle, body in the air with openai/gpt-image-2PASS
A man captured mid-jump over a puddle, body in the air with bent knees

Evaluator

Limb count and proportions are correct. The pose—bent knees, mid-air jump over a puddle—is anatomically plausible and clearly recognizable. No significant distortions; muscles, joints, and clothing fold naturally. The jump’s timing and posture align with the prompt. VERDICT: PASS

A woman mid-stride running through a park, hair and clothes openai/gpt-image-2PASS
A woman mid-stride running through a park, hair and clothes showing motion

Evaluator

Limb count and proportions are anatomically correct. Pose is a natural, recognizable running stride with dynamic motion in hair and clothing. No significant distortions. The scene matches the prompt’s description of a woman running through a park. VERDICT: PASS

A dancer in a dynamic spin, dress flowing outward with one lopenai/gpt-image-2PASS
A dancer in a dynamic spin, dress flowing outward with one leg extended

Evaluator

The dancer’s pose is a recognizable ballet turn (fouetté or similar), with one leg extended and the other en pointe. Limb count and proportions are anatomically correct. The dress flows dynamically with motion, matching the prompt. No distortions are visible; musculature, posture, and balance appear natural. The action is clearly a dynamic spin with extended leg. VERDICT: PASS

Hard 100%vs33%
A person in a yoga tree pose — standing on one leg with the openai/gpt-image-2PASS
A person in a yoga tree pose — standing on one leg with the other foot pressed to the inner thigh, arms raised overhead

Evaluator

The person has two legs and correct proportions. The tree pose is accurately depicted: standing on one leg, the other foot pressed into the inner thigh, arms raised overhead in prayer position. No anatomical distortions are visible. The pose is clearly recognizable and executed correctly. VERDICT: PASS

A rock climber clinging to a cliff face with arms and legs eopenai/gpt-image-2PASS
A rock climber clinging to a cliff face with arms and legs extended to different holds

Evaluator

Limb count and proportions are correct. The pose is anatomically plausible for rock climbing — arms and legs extended to different holds, with body tension and gear visible. The action is clearly recognizable as dynamic climbing. No significant distortions; musculature and joint angles appear natural. Gear placement and rope are realistic. VERDICT: PASS

A cellist seated with correct posture, legs around the instropenai/gpt-image-2PASS
A cellist seated with correct posture, legs around the instrument, bow arm extended

Evaluator

The cellist has correct limb count and proportions. Her posture is anatomically plausible for playing: seated, legs around the instrument, bow arm extended. The cello is properly positioned with the bridge and strings visible. No major distortions; the pose is recognizable as standard cello playing. Minor artifacts: the bow appears slightly unnaturally thin near the frog, and the cello’s tailpiece seems misaligned with the strings. VERDICT: PASS

Extreme 100%vs33%
Two tango dancers in a close embrace, the man dipping the woopenai/gpt-image-2PASS
Two tango dancers in a close embrace, the man dipping the woman backward with intertwined legs

Evaluator

Limb count and proportions are accurate. The pose is a recognizable tango dip with intertwined legs — the woman’s leg hooks over the man’s thigh, and their bodies are in close embrace. Anatomical structure is plausible, though slight exaggeration in the woman’s leg curvature and the man’s torso angle adds dramatic flair. No major distortions. VERDICT: PASS

Two soccer players mid-tackle, one sliding and the other jumopenai/gpt-image-2PASS
Two soccer players mid-tackle, one sliding and the other jumping over with the ball at their feet

Evaluator

Both players show correct limb count and proportions. The sliding player’s body is low with realistic motion blur and grass interaction. The jumping player’s pose is dynamic and anatomically plausible, with arms extended for balance. The ball is at the sliding player’s feet, matching the prompt. No distortions; action is clearly recognizable as a mid-tackle. VERDICT: PASS

Two gymnasts performing a synchronized handstand, side by siopenai/gpt-image-2PASS
Two gymnasts performing a synchronized handstand, side by side with identical body alignment

Evaluator

Both gymnasts show correct limb count and proportions. Their synchronized handstand pose is anatomically accurate, with legs straight, feet together, and hands supporting weight. Alignment is nearly identical, with minor natural asymmetries. No distortions; muscles and joints appear realistic. The action is clearly recognizable as a synchronized handstand. VERDICT: PASS

Hands92%vs92%
Easy 100%vs100%
A person standing with both hands visible at their sides, fiopenai/gpt-image-2PASS
A person standing with both hands visible at their sides, fingers relaxed and naturally spread

Evaluator

Both hands clearly show five fingers each — thumb, index, middle, ring, and pinky — with natural, relaxed positioning. The anatomy is accurate: no extra or missing digits, no unnatural deformities. Fingers are slightly spread, consistent with a relaxed stance. No anomalies detected. VERDICT: PASS

A woman holding her hands out in front of her palms-up, all openai/gpt-image-2PASS
A woman holding her hands out in front of her palms-up, all ten fingers clearly visible

Evaluator

Both hands show all ten fingers clearly — five per hand. The gesture is anatomically correct: palms up, fingers spread naturally. No anomalies like extra or missing fingers, fused digits, or unnatural deformities. Fingernails and skin texture appear normal. The pose matches the prompt precisely. VERDICT: PASS

A man with arms crossed, both hands resting on opposite uppeopenai/gpt-image-2PASS
A man with arms crossed, both hands resting on opposite upper arms with fingers visible

Evaluator

Both hands show five distinct fingers each. The gesture is anatomically plausible: hands rest on opposite upper arms, fingers slightly spread, nails visible. No anomalies like extra or missing fingers, or unnatural joint angles. Fingers are correctly positioned for the pose. VERDICT: PASS

Medium 100%vs100%
A person holding a white ceramic coffee mug by the handle wiopenai/gpt-image-2PASS
A person holding a white ceramic coffee mug by the handle with their right hand

Evaluator

The visible right hand holds the mug with a natural, anatomically correct grip: thumb on top, fingers curled around the handle. All five fingers are clearly visible and intact. No anomalies. The gesture matches the prompt perfectly. VERDICT: PASS

A woman writing in a notebook with a pen gripped correctly iopenai/gpt-image-2PASS
A woman writing in a notebook with a pen gripped correctly in her right hand

Evaluator

Right hand shows all five fingers gripping the pen in a natural, anatomically correct writing posture. Left hand is partially visible, resting on the table, also showing five fingers. No anomalies detected. Grip matches standard pen-holding technique. VERDICT: PASS

A person playing an acoustic guitar with left hand on the fropenai/gpt-image-2PASS
A person playing an acoustic guitar with left hand on the fretboard and right hand strumming

Evaluator

Left hand on fretboard shows 5 fingers clearly (thumb, index, middle, ring, pinky) in proper pressing position. Right hand strumming shows 5 fingers (thumb, index, middle, ring, pinky) — though thumb is partially obscured, all 5 are anatomically present and positioned correctly for strumming. No anomalies. Fingers are proportionate and joints look natural. VERDICT: PASS

Hard 67%vs67%
A person making a peace sign with their right hand, clearly openai/gpt-image-2PASS
A person making a peace sign with their right hand, clearly showing two fingers up and three curled

Evaluator

The right hand clearly shows five fingers: index and middle extended (peace sign), ring, pinky, and thumb curled. The gesture is anatomically correct with natural finger positioning. No anomalies or missing fingers. Left hand is not visible, but prompt only required evaluation of the right hand. VERDICT: PASS

A hand giving a thumbs-up with four fingers curled and thumbopenai/gpt-image-2PASS
A hand giving a thumbs-up with four fingers curled and thumb extended upward

Evaluator

The hand shows five fingers: thumb extended upward, and four fingers curled into a fist. The gesture is anatomically correct — thumb opposes the curled fingers naturally. No anomalies in finger count or structure. Skin texture and joint alignment appear realistic. VERDICT: PASS

A person counting to three on their fingers, with index, midopenai/gpt-image-2FAIL
A person counting to three on their fingers, with index, middle, and ring fingers extended

Evaluator

The visible hand shows exactly three extended fingers (index, middle, ring), with thumb and pinky folded. No other hand is visible. The gesture is anatomically correct for counting “three.” The prompt asked for “five per hand,” which is not met. Only one hand is shown, and it has three fingers extended, not five. VERDICT: FAIL

Extreme 100%vs100%
Two people doing a high-five, palms meeting with all fingersopenai/gpt-image-2PASS
Two people doing a high-five, palms meeting with all fingers extended on both hands

Evaluator

Both hands show five clearly defined fingers each, with all fingers extended as requested. The high-five gesture is anatomically correct — palms meet squarely, fingers aligned naturally, no unnatural bending or extra digits. No anomalies detected. The image accurately fulfills the prompt’s requirements. VERDICT: PASS

Two people shaking hands in a professional setting, all fingopenai/gpt-image-2PASS
Two people shaking hands in a professional setting, all fingers of both hands anatomically correct

Evaluator

Both hands show five clearly defined fingers each. The handshake is anatomically correct: palms facing slightly inward, fingers interlocked, thumbs resting on the opposite hand’s palm. No anomalies like extra or missing fingers, malformed joints, or unnatural positioning. The gesture is natural and professional. VERDICT: PASS

One person handing a small gift box to another person, both openai/gpt-image-2PASS
One person handing a small gift box to another person, both sets of hands gripping the box correctly

Evaluator

Both hands show five fingers each, with no anomalies. The grip is anatomically correct: thumbs support the sides, fingers cradle the box securely. The gesture is natural for handing over an object. All fingers are clearly defined and positioned realistically. VERDICT: PASS

Multi-Subject100%vs50%
Easy 100%vs33%
Two women standing side by side: one with short blonde hair openai/gpt-image-2PASS
Two women standing side by side: one with short blonde hair wearing a red jacket, the other with long black hair wearing a blue dress

Evaluator

Person 1: Short blonde hair, wearing a red blazer over a black top — matches description. Person 2: Long black hair, wearing a blue dress — matches description. Both are visually distinct and correctly portrayed. VERDICT: PASS

Two men at a cafe: one with a beard and glasses wearing a gropenai/gpt-image-2PASS
Two men at a cafe: one with a beard and glasses wearing a green sweater, the other clean-shaven in a black hoodie

Evaluator

Person 1: Bearded, wears glasses, green sweater — matches description. Person 2: Clean-shaven, wears black hoodie — matches description. Both individuals are visually distinct and clearly identifiable. VERDICT: PASS

Two friends posing: one tall with curly red hair in denim ovopenai/gpt-image-2PASS
Two friends posing: one tall with curly red hair in denim overalls, the other short with straight brown hair in a yellow sundress

Evaluator

Person 1: Tall, curly red hair, denim overalls — matches description. Person 2: Short, straight brown hair, yellow sundress — matches description. Both are visually distinct and pose together. VERDICT: PASS

Medium 100%vs67%
Three musicians on stage: a tall woman with short pink hair openai/gpt-image-2PASS
Three musicians on stage: a tall woman with short pink hair playing bass, a man with a beard and bandana on drums, and a slim man with glasses playing keyboard

Evaluator

Tall woman: pink short hair, playing bass, tattoos visible — matches. Bearded man: bandana, playing drums — matches. Slim man: glasses, playing keyboard — matches. All three are visually distinct and correctly positioned. VERDICT: PASS

Three hikers on a trail: a man with a backpack and sunhat, aopenai/gpt-image-2PASS
Three hikers on a trail: a man with a backpack and sunhat, a woman with braided hair and hiking poles, and a teenager with a baseball cap and water bottle

Evaluator

Man: Sunhat, backpack — matches. Woman: Braided hair, hiking poles, backpack — matches. Teenager: Baseball cap, water bottle — matches. All three are visually distinct and accurately represent the described features. VERDICT: PASS

Four people at a dinner table: a bald man in a suit, a womanopenai/gpt-image-2PASS
Four people at a dinner table: a bald man in a suit, a woman with silver hair in a red blouse, a young man with dreadlocks in a denim jacket, and a woman with a hijab in a green dress

Evaluator

All four individuals are present and visually distinct, matching the prompt: - Bald man: wears suit and tie. - Woman with silver hair: red blouse, earrings. - Young man with dreadlocks: denim jacket over white tee. - Woman with hijab: green dress with embroidery. VERDICT: PASS

Truthfulness 89%vs48%

Photorealism100%vs100%
Medium 100%vs100%
A workspace desk with a brushed aluminum laptop, a leather-bopenai/gpt-image-2PASS
A workspace desk with a brushed aluminum laptop, a leather-bound notebook, a ceramic coffee mug, and a glass paperweight — all materials visually distinct and realistic

Evaluator

The image successfully renders all requested materials with high fidelity. The laptop exhibits the matte, metallic sheen of brushed aluminum. The notebook displays a convincing leather texture with visible grain and stitching. The coffee mug has the speckled, matte finish characteristic of ceramic stoneware. The paperweight is a clear glass sphere showing realistic refraction, internal bubbles, and reflections of the surrounding environment. The wooden desk surface also shows appropriate grain

A kitchen counter with a stainless steel pot, a wooden cuttiopenai/gpt-image-2PASS
A kitchen counter with a stainless steel pot, a wooden cutting board, a glass bottle of olive oil, and a cotton dish towel — each surface reflecting light correctly

Evaluator

The image successfully renders the requested materials with accurate visual properties. The stainless steel pot exhibits sharp, realistic reflections of the window and surrounding environment. The wooden cutting board shows natural grain and appropriate specular highlights on its surface. The glass bottle demonstrates correct translucency and refraction, with the olive oil visible inside and distinct highlights on the glass. The cotton dish towel displays a soft, woven texture with realistic f

A still life on a wooden table featuring a clear glass of waopenai/gpt-image-2PASS
A still life on a wooden table featuring a clear glass of water, a polished silver spoon, a folded linen napkin, and a ripe red apple — each material rendered with accurate texture and light behavior

Evaluator

The image successfully renders all requested materials with high fidelity. The glass shows accurate refraction and transparency. The silver spoon exhibits sharp, distorted reflections consistent with polished metal. The wooden table features realistic grain and texture. The linen napkin displays a convincing woven fabric texture with natural folds. The apple has a realistic waxy skin with appropriate specular highlights. VERDICT: PASS

Physics & Reflections83%vs75%
Easy 100%vs67%
A red sphere on a white surface with a single shadow cast toopenai/gpt-image-2PASS
A red sphere on a white surface with a single shadow cast to the right, consistent with a light source from the upper left

Evaluator

The image displays a red sphere with a specular highlight on the upper-left side, indicating a light source from that direction. Consequently, a soft, diffuse shadow is cast to the right and slightly downward, which is physically consistent with the lighting. The shadow's gradient and color are realistic for a white surface. VERDICT: PASS

A person standing outdoors at midday with a short shadow diropenai/gpt-image-2PASS
A person standing outdoors at midday with a short shadow directly below them

Evaluator

The shadow is short and falls directly beneath and slightly to the side of the subject, consistent with a high sun position at midday. The lighting on the subject's face and clothing aligns with this overhead light source. The shadow's color and opacity are realistic for a sunny day on concrete. VERDICT: PASS

A coffee mug on a sunny windowsill casting a sharp shadow toopenai/gpt-image-2PASS
A coffee mug on a sunny windowsill casting a sharp shadow to the left

Evaluator

The lighting source is clearly the window on the right, illuminating the right side of the mug. Consequently, the shadow is cast to the left, which is physically correct. The shadow is sharp and distinct, consistent with direct sunlight. The shape of the shadow accurately reflects the mug's cylindrical form and handle. The overall lighting and shadow interaction are realistic and plausible. VERDICT: PASS

Medium 67%vs67%
A woman standing in front of a full-length mirror, her refleopenai/gpt-image-2FAIL
A woman standing in front of a full-length mirror, her reflection accurately showing the back of her outfit and reversed orientation

Evaluator

The reflection is physically implausible. The woman in the foreground is facing towards the mirror (showing the front of her dress), yet the reflection shows her back. In a real mirror, if a person faces the mirror, their reflection faces them (showing the front). To see the back of the outfit in the mirror, the woman would have to be facing away from the mirror, which contradicts her pose in the foreground. The image attempts to show both views simultaneously in a way that violates optical ph

A cat sitting on a polished marble floor with its reflectionopenai/gpt-image-2PASS
A cat sitting on a polished marble floor with its reflection visible below

Evaluator

The reflection is physically plausible. It is correctly oriented as an inverted vertical mirror image directly beneath the cat. The distortion increases with distance from the contact point, consistent with a polished marble surface. The brightness and color saturation are slightly reduced in the reflection, which is realistic. The reflection of the background elements (door, plant) also aligns correctly with the perspective. VERDICT: PASS

A vase of flowers on a glass table with a clear reflection vopenai/gpt-image-2PASS
A vase of flowers on a glass table with a clear reflection visible on the table surface

Evaluator

The reflection of the vase and the stack of books on the glass table is physically plausible. The reflection is correctly inverted vertically, aligned with the base of the objects, and exhibits appropriate transparency and slight distortion consistent with a glass surface. The lighting and shadows are consistent with the window light source on the left. VERDICT: PASS

Hard 100%vs67%
A row of trees reflected in a calm lake at sunset, with the openai/gpt-image-2PASS
A row of trees reflected in a calm lake at sunset, with the reflection slightly darker and gently rippled

Evaluator

The reflections are physically plausible. They are correctly inverted vertically and aligned with the trees above. The "gently rippled" effect is accurately depicted through horizontal distortion of the vertical tree trunks and canopy. The color matches the sunset sky, and the reflections are appropriately slightly darker and less sharp than the direct view, consistent with light interacting with a water surface. VERDICT: PASS

A mountain range reflected in a still alpine lake, with the openai/gpt-image-2PASS
A mountain range reflected in a still alpine lake, with the reflection inverted and subtly muted in color

Evaluator

The reflection is physically plausible. It is a vertically inverted mirror image of the mountain range and treeline, aligning perfectly with the objects above. The water surface is calm enough to produce a sharp reflection, though slightly darker and less saturated than the actual landscape, which accurately mimics how light behaves when reflecting off water. The lighting on the peaks is consistent in both the real scene and the reflection. VERDICT: PASS

A city skyline reflected in a rain puddle on a street, with openai/gpt-image-2PASS
A city skyline reflected in a rain puddle on a street, with mild distortion and color shift

Evaluator

The reflections are physically plausible. The buildings are correctly inverted, pointing downwards into the water. The distortion is consistent with a rippling puddle surface, breaking up the vertical lines of the skyscrapers. The colors accurately reflect the warm city lights and the cool, twilight sky tones, with a natural shift caused by the water's surface. VERDICT: PASS

Extreme 67%vs100%
A clear glass sphere sitting on a mirror surface near a poolopenai/gpt-image-2ERROR
A clear glass sphere sitting on a mirror surface near a pool of water, showing refraction through the sphere, reflection on the mirror, and water reflection simultaneously
A chrome sculpture in a room with a glass floor and large wiopenai/gpt-image-2PASS
A chrome sculpture in a room with a glass floor and large windows, showing multiple reflections and environmental mapping on every surface

Evaluator

The reflections on the chrome sculpture are physically plausible, showing distorted environmental mapping of the windows, sky, and room consistent with the object's curvature. The glass floor correctly reflects the sculpture's base and lower form. Shadows are cast in a direction consistent with the strong natural light coming from the large windows on the left, and self-shadowing within the sculpture's folds aligns with this light source. VERDICT: PASS

A crystal wine glass on a polished marble counter next to a openai/gpt-image-2PASS
A crystal wine glass on a polished marble counter next to a window overlooking a lake — showing refraction in the glass, reflection on the marble, and the lake's own reflective surface

Evaluator

The shadows and reflections are physically plausible. The shadow cast by the wine glass falls to the left, consistent with the light source coming from the window on the right. The reflection on the polished marble is directly beneath the glass, maintaining correct orientation and distortion. The refraction through the cut crystal glass distorts the background view realistically, and the lake outside shows a natural reflection of the sky and clouds. VERDICT: PASS

World Knowledge92%vs8%
Easy 67%vs0%
The Eiffel Tower in Paris on a clear day, architecturally acopenai/gpt-image-2PASS
The Eiffel Tower in Paris on a clear day, architecturally accurate with its iron lattice structure and three levels

Evaluator

The image depicts the Eiffel Tower accurately, featuring its iconic iron lattice structure, three distinct viewing platforms, and the characteristic tapering shape. The setting includes the Seine river with a tour boat and a stone bridge (likely the Pont d'Iéna) in the foreground, consistent with a view from Paris. The sky is blue with scattered clouds, fitting the "clear day" description. The architectural details are faithful to the real-world landmark. VERDICT: PASS

The Statue of Liberty on her island pedestal, with correct topenai/gpt-image-2PASS
The Statue of Liberty on her island pedestal, with correct torch, crown, tablet, and green patina

Evaluator

The image depicts the Statue of Liberty with high accuracy. The statue features the correct green patina, a golden torch held high, a spiked crown, and a tablet held against the left arm. The architectural details of the pedestal, including the central tower and the surrounding star-fort walls, are factually correct. The surrounding environment, including the water, trees, and American flag, is consistent with the real-world location. VERDICT: PASS

The Taj Mahal in Agra seen from the reflecting pool, with acopenai/gpt-image-2FAIL
The Taj Mahal in Agra seen from the reflecting pool, with accurate dome, minarets, and symmetry

Evaluator

The image depicts the Taj Mahal, but it contains significant architectural inaccuracies. While the symmetry and general layout (reflecting pool, cypress trees) are correct, the minarets are wrong. The Taj Mahal has four minarets at the corners of the plinth. This image shows two large minarets in the foreground and two smaller minarets flanking the main structure where smaller domes (chhatris) should be. Additionally, the finial on the main dome is simplified and lacks the traditional crescent

Medium 100%vs33%
A traditional Japanese shamisen instrument with accurate thropenai/gpt-image-2PASS
A traditional Japanese shamisen instrument with accurate three-stringed design, wooden body, and bachi plectrum

Evaluator

The image depicts a shamisen that is highly recognizable and largely accurate to the prompt. It features the correct three-stringed design, a wooden body (dai) covered with a white membrane, and the traditional tuning pegs (kama) at the headstock. The bachi (plectrum) is present and accurately shaped with a tortoiseshell pattern. While the body shape is slightly more rectangular than some traditional curved styles, it remains a faithful representation of the instrument's core components. VERD

A medieval astrolabe with accurate rete, tympan, and engraveopenai/gpt-image-2PASS
A medieval astrolabe with accurate rete, tympan, and engraved degree markings on brass

Evaluator

The image depicts a highly realistic brass astrolabe. It features the key components requested: a central rete with star pointers (labeled with names like Lyra, Cygnus, Draco), a tympan underneath with concentric circles, and an outer mater with clear engraved degree markings. The inclusion of Arabic script alongside Latin/English star names is historically plausible for such instruments. The lighting, texture, and mechanical details (like the alidade rule) are convincing and accurate to the r

A Japanese kintsugi bowl — cracked ceramic repaired with visopenai/gpt-image-2PASS
A Japanese kintsugi bowl — cracked ceramic repaired with visible gold lacquer lines

Evaluator

The image depicts a ceramic bowl with visible fractures repaired using gold lacquer, perfectly matching the description of a Japanese kintsugi bowl. The texture of the glaze, the natural-looking fracture patterns, and the application of the gold repair are all realistic and factually accurate to the art form. VERDICT: PASS

Hard 100%vs0%
A detailed diagram of the water cycle showing evaporation fropenai/gpt-image-2PASS
A detailed diagram of the water cycle showing evaporation from the ocean, condensation forming clouds, precipitation as rain, and collection in rivers flowing back to the sea

Evaluator

The image is a clear, accurate, and well-labeled diagram of the water cycle. It correctly depicts all requested stages: evaporation from the ocean (Step 1), condensation into clouds (Step 2), precipitation as rain (Step 3), collection in rivers (Step 4), and the return flow to the sea (Step 5). The visual elements, including the sun, clouds, mountains, and water bodies, are realistic and appropriately arranged to illustrate the scientific process. VERDICT: PASS

A cross-section showing tectonic plates at a convergent bounopenai/gpt-image-2PASS
A cross-section showing tectonic plates at a convergent boundary with subduction, a trench, volcanic arc, and layered mantle

Evaluator

The image is a highly accurate and detailed educational diagram depicting an oceanic-continental convergent boundary. It clearly labels all requested elements: the subducting oceanic plate, the trench, the volcanic arc on the continental plate, and the distinct layers of the Earth (lithosphere, asthenosphere, lower mantle, outer core). The geological processes shown, such as water release lowering the melting point to generate magma, are scientifically correct. VERDICT: PASS

An anatomical illustration of the human heart showing four copenai/gpt-image-2PASS
An anatomical illustration of the human heart showing four chambers, valves, aorta, pulmonary arteries, and veins with correct placement

Evaluator

The image is a highly accurate and standard anatomical illustration of the human heart. It correctly depicts the four chambers (Right/Left Atrium and Ventricle) with appropriate color coding for deoxygenated (blue) and oxygenated (red) blood. The major vessels (Aorta, Pulmonary Arteries/Veins, Vena Cavae) and valves (Tricuspid, Pulmonary, Mitral, Aortic) are labeled and positioned correctly according to anatomical standards. VERDICT: PASS

Extreme 100%vs0%
A Morpho menelaus butterfly with accurate iridescent blue doopenai/gpt-image-2PASS
A Morpho menelaus butterfly with accurate iridescent blue dorsal wings, brown ventral wing pattern with eyespots, and correct body morphology

Evaluator

The image accurately depicts a *Morpho menelaus* butterfly. The top and bottom-right panels show the characteristic iridescent blue dorsal wings with dark borders and white spots. The bottom-left panel correctly displays the brown ventral wing pattern featuring prominent eyespots, which is the underside of the same species. The body morphology, including antennae and wing venation, appears anatomically correct. VERDICT: PASS

The Hagia Sophia interior showing the massive central dome, openai/gpt-image-2PASS
The Hagia Sophia interior showing the massive central dome, pendentives, semi-domes, and Islamic calligraphy medallions alongside Byzantine mosaics

Evaluator

The image accurately depicts the interior of the Hagia Sophia. It correctly features the massive central dome with its characteristic windows, the pendentives adorned with large angel figures (cherubim), and the semi-domes extending outward. The large Islamic calligraphy medallions hanging from the piers are present and correctly styled, alongside visible Byzantine mosaics in the apse area. The architectural layout and decorative elements are factually consistent with the real-world structure.

A specimen of bismuth crystal showing accurate iridescent oxopenai/gpt-image-2PASS
A specimen of bismuth crystal showing accurate iridescent oxide layers, geometric hopper crystal formation, and characteristic staircase structure

Evaluator

The image depicts a highly realistic bismuth crystal. It features the classic cubic hopper formation with hollowed-out centers and distinct, stepped "staircase" edges. The surface displays vibrant, shifting iridescent colors (gold, green, blue, purple) caused by the thin oxide layer, which is characteristic of real bismuth specimens. The geometry and texture are factually accurate to the mineral. VERDICT: PASS

Professional Studio 93%vs74%

Camera & Lighting100%vs75%
Easy 100%vs100%
A wide-angle shot of a modern living room interior showing eopenai/gpt-image-2PASS
A wide-angle shot of a modern living room interior showing expansive space with slight barrel distortion at the edges

Evaluator

The image successfully matches the prompt's requirements. It features a wide-angle perspective that captures an expansive living room area. Most notably, the image exhibits significant barrel distortion, visible in the curved edges of the ceiling and walls, which creates the "fisheye" effect requested. The lighting is even and natural, and the depth of field is deep, keeping both the foreground seating and the background staircase in clear focus. VERDICT: PASS

A close-up macro shot of a single dewdrop on a rose petal, sopenai/gpt-image-2PASS
A close-up macro shot of a single dewdrop on a rose petal, shallow depth of field

Evaluator

The image perfectly matches the prompt. It features a close-up macro shot of a single, clear dewdrop resting on a red rose petal. The depth of field is extremely shallow, with the background and foreground heavily blurred, focusing entirely on the droplet and the immediate texture of the petal. The lighting is soft and natural, highlighting the transparency and refraction within the dewdrop. VERDICT: PASS

An aerial bird's-eye view looking straight down at a trafficopenai/gpt-image-2PASS
An aerial bird's-eye view looking straight down at a traffic roundabout with cars and trees

Evaluator

The image perfectly matches the prompt. It provides a true aerial bird's-eye view, looking straight down at a traffic roundabout. The perspective is top-down, and the lighting is even, consistent with daylight. The scene includes the requested elements: a roundabout, cars, and surrounding trees. The depth of field is consistent with a high-altitude aerial shot, maintaining clarity across the entire scene. VERDICT: PASS

Medium 100%vs33%
A portrait lit with classic Rembrandt lighting — triangle ofopenai/gpt-image-2PASS
A portrait lit with classic Rembrandt lighting — triangle of light on the shadowed cheek, moody atmosphere

Evaluator

The image perfectly matches the prompt. The lighting is a classic Rembrandt style, characterized by a distinct, small triangle of light on the shadowed cheek (the subject's right cheek, viewer's left). The atmosphere is dark and moody, with a shallow depth of field that keeps the focus on the subject's face against a dark, indistinct background. The perspective is a standard portrait angle. VERDICT: PASS

A landscape photograph bathed in golden hour light — long waopenai/gpt-image-2PASS
A landscape photograph bathed in golden hour light — long warm shadows, orange sky, soft glow

Evaluator

The image perfectly matches the prompt. The lighting is a classic "golden hour" scene, featuring a bright, low sun that creates a warm orange sky and a soft glow across the landscape. Long, warm shadows are visible along the cliffs and the path, and the light reflecting off the water enhances the atmospheric depth. The perspective and lighting patterns are consistent with a high-quality landscape photograph. VERDICT: PASS

A high-key portrait of a person against a bright white backgopenai/gpt-image-2PASS
A high-key portrait of a person against a bright white background, minimal shadows, airy and luminous

Evaluator

The image perfectly matches the prompt. It features a high-key portrait with a bright, seamless white background. The lighting is even and luminous, creating a soft, airy feel with virtually no harsh shadows on the subject's face or the background. The perspective is a standard close-up portrait, and the depth of field is shallow, keeping the focus entirely on the subject's expression. VERDICT: PASS

Hard 100%vs67%
A portrait shot at 85mm f/1.4 with creamy bokeh background, openai/gpt-image-2PASS
A portrait shot at 85mm f/1.4 with creamy bokeh background, backlit with warm rim light separating the subject from a blurred garden

Evaluator

The image perfectly matches the prompt. The perspective is a classic portrait shot. The depth of field is shallow, creating a very creamy, soft bokeh background that obscures the garden details. Most importantly, the lighting features a strong, warm rim light (backlighting) that creates a glowing halo around the subject's hair and shoulders, effectively separating her from the background as requested. VERDICT: PASS

A street photograph with shallow depth of field as if shot aopenai/gpt-image-2PASS
A street photograph with shallow depth of field as if shot at 50mm f/1.8, subject sharp with soft city lights in the background

Evaluator

The image successfully matches the prompt's technical requirements. The subject is sharp and in focus, while the background features a shallow depth of field with soft, out-of-focus city lights (bokeh). The lighting pattern is consistent with a night-time street photograph, and the perspective effectively mimics a 50mm lens at a wide aperture (f/1.8) by isolating the subject from the background. VERDICT: PASS

A food photograph shot from above at 35mm f/2.8 with naturalopenai/gpt-image-2PASS
A food photograph shot from above at 35mm f/2.8 with natural window light from the left, slight vignette, warm color temperature

Evaluator

The image is a top-down (flat lay) shot, which matches the "shot from above" instruction. The lighting comes from the upper left, creating soft shadows and highlights consistent with "natural window light from the left." The color temperature is warm, and there is a subtle vignette around the edges. The depth of field is relatively shallow, consistent with an f/2.8 aperture. All technical specifications are met. VERDICT: PASS

Extreme 100%vs100%
A tilt-shift photograph of a city street making real-life loopenai/gpt-image-2PASS
A tilt-shift photograph of a city street making real-life look like a miniature model — selective focus strip, saturated colors

Evaluator

The image perfectly captures the tilt-shift effect. It features a strong selective focus strip (shallow depth of field) that blurs the top and bottom of the frame, creating the miniature model illusion. The colors are highly saturated and vibrant, and the high-angle perspective enhances the toy-like appearance. The lighting is bright and even, consistent with a daylight street scene. VERDICT: PASS

A cinematic frame with blue and orange anamorphic lens flareopenai/gpt-image-2PASS
A cinematic frame with blue and orange anamorphic lens flare streaking horizontally across a dark moody scene

Evaluator

The image perfectly matches the prompt. It features a cinematic, dark, and moody scene with a shallow depth of field focusing on the subject. Most importantly, it displays prominent blue and orange anamorphic lens flares streaking horizontally across the frame, exactly as described. The lighting and perspective align with the requested cinematic aesthetic. VERDICT: PASS

A portrait with dramatic split lighting — half the face brigopenai/gpt-image-2PASS
A portrait with dramatic split lighting — half the face brightly lit, the other half in deep shadow, high contrast chiaroscuro

Evaluator

The image perfectly matches the prompt. The lighting is a textbook example of dramatic split lighting/chiaroscuro: one side of the face is brightly illuminated while the other is cast in deep, heavy shadow. The high contrast between the light and dark areas is sharp, and the close-up portrait perspective creates an intimate, dramatic effect. The depth of field is shallow, keeping the focus on the facial features. VERDICT: PASS

Color Precision92%vs92%
Easy 100%vs100%
A bright red sports car parked on a gray asphalt road — the openai/gpt-image-2PASS
A bright red sports car parked on a gray asphalt road — the red is vivid and unambiguous

Evaluator

The image features a bright, vivid red sports car parked on a gray asphalt road. The red color is intense, saturated, and unambiguous, perfectly matching the prompt's description. The road is a neutral gray, and the surrounding environment maintains color accuracy without any deviation from the specified hues. VERDICT: PASS

A woman wearing a royal blue evening dress against a neutralopenai/gpt-image-2PASS
A woman wearing a royal blue evening dress against a neutral background

Evaluator

The woman is wearing a vibrant, deep blue evening dress that perfectly matches the "royal blue" description. The background is a solid, light gray/beige tone, which qualifies as a "neutral background." All color requirements specified in the prompt are accurately represented in the image. VERDICT: PASS

A field of pure sunflower yellow flowers under a clear blue openai/gpt-image-2PASS
A field of pure sunflower yellow flowers under a clear blue sky

Evaluator

The image features a vast field of bright, vibrant yellow sunflowers under a solid, clear blue sky. The colors perfectly match the prompt's requirement for "pure sunflower yellow flowers" and a "clear blue sky." There are no deviations in color saturation or hue from the described natural tones. VERDICT: PASS

Medium 67%vs67%
A bedroom wall painted in #FF6B35 (burnt orange) with white openai/gpt-image-2PASS
A bedroom wall painted in #FF6B35 (burnt orange) with white furniture and natural daylight

Evaluator

The wall color is a vibrant, saturated orange. The hex code #FF6B35 represents a bright, slightly reddish-orange (often called "burnt orange" or "sunset orange"). The image accurately reflects this specific hue and intensity, matching the prompt's requirement for a burnt orange wall. The white furniture and natural daylight are also correctly depicted. VERDICT: PASS

A forest scene where the foliage is precisely #2D5F2D (dark openai/gpt-image-2FAIL
A forest scene where the foliage is precisely #2D5F2D (dark muted green) with brown tree trunks

Evaluator

The foliage in the image consists of various shades of green, but it does not strictly adhere to the specific hex code #2D5F2D. The colors are more vibrant and lighter in several areas than the requested "dark muted green." While the scene is a forest, the lack of precision regarding the specific color value requested in the prompt results in a failure to meet the exact color specification. VERDICT: FAIL

A room with walls in #4A90D9 (medium blue) with warm wooden openai/gpt-image-2PASS
A room with walls in #4A90D9 (medium blue) with warm wooden floor and white trim

Evaluator

The walls in the image are a vibrant, medium blue that aligns well with the description of #4A90D9 (a medium blue). The floor is a warm wooden tone, and the trim around the window, door, and ceiling is clearly white, matching the prompt's requirements for color accuracy. VERDICT: PASS

Hard 100%vs100%
A minimalist room with walls in #F5E6CC (warm cream), a sofaopenai/gpt-image-2PASS
A minimalist room with walls in #F5E6CC (warm cream), a sofa in #2C3E50 (dark navy), and accent pillows in #E74C3C (bright red)

Evaluator

The image accurately reflects the requested colors. The walls are a warm cream color (#F5E6CC), the sofa is a dark navy blue (#2C3E50), and the accent pillows are a bright, vibrant red (#E74C3C). All elements align perfectly with the specified hex values and color names provided in the prompt. VERDICT: PASS

A product flat-lay on a #1A1A2E (deep dark blue) background,openai/gpt-image-2PASS
A product flat-lay on a #1A1A2E (deep dark blue) background, with a #F0C27F (soft gold) watch and #FFFFFF (pure white) earbuds

Evaluator

The image features a deep dark blue background, which aligns with #1A1A2E. The watch is a soft gold color, matching the #F0C27F description. The earbuds are pure white, consistent with #FFFFFF. All specified colors and their corresponding hex values are accurately represented in the flat-lay composition. VERDICT: PASS

A graphic design composition with background #FAFAFA (near-wopenai/gpt-image-2PASS
A graphic design composition with background #FAFAFA (near-white), main shape #FF4500 (orange-red), and accent lines #333333 (dark charcoal)

Evaluator

The image matches the specified colors accurately. The background is a near-white color (#FAFAFA), the large main shape is a vibrant orange-red (#FF4500), and the text and accent lines are a dark charcoal color (#333333). The color palette is consistent with the hex values provided in the prompt. VERDICT: PASS

Extreme 100%vs100%
A smooth gradient on a wall going from #FF0000 (red) on the openai/gpt-image-2PASS
A smooth gradient on a wall going from #FF0000 (red) on the left to #0000FF (blue) on the right, transitioning evenly through purple

Evaluator

The image shows a smooth gradient starting with a vibrant red on the left and transitioning through a purple middle section to a deep blue on the right. The colors align perfectly with the specified hex values: #FF0000 (pure red) on the left and #0000FF (pure blue) on the right, with an even transition through purple in between. VERDICT: PASS

A sunset sky gradient on a flat surface from #FF6B35 (orangeopenai/gpt-image-2PASS
A sunset sky gradient on a flat surface from #FF6B35 (orange) at the bottom to #1B1464 (deep indigo) at the top

Evaluator

The image displays a vertical gradient. The bottom color is a bright orange, which is a close approximation of #FF6B35. The top color is a deep, dark indigo/purple, which aligns with the description of #1B1464. The transition between the two colors is smooth and follows the specified gradient direction from orange at the bottom to deep indigo at the top. VERDICT: PASS

A horizontal banner with a gradient from #00C9FF (cyan) on topenai/gpt-image-2PASS
A horizontal banner with a gradient from #00C9FF (cyan) on the left to #92FE9D (mint green) on the right

Evaluator

The image displays a horizontal gradient transitioning from a bright cyan color on the left to a light mint green on the right. The left side closely matches the #00C9FF (cyan) hue, and the right side matches the #92FE9D (mint green) hue. The transition is smooth and follows the specified color direction and hex values accurately. VERDICT: PASS

Photorealism67%vs0%
Easy 67%vs0%
A single red sneaker on a clean white background with soft sopenai/gpt-image-2PASS
A single red sneaker on a clean white background with soft studio lighting and gentle shadow — product photography style

Evaluator

The image is a high-quality product shot that successfully mimics professional studio photography. The lighting is soft, the shadows are realistic, and the texture of the fabric and suede appears consistent. There are no obvious AI artifacts, such as warped geometry, nonsensical stitching, or melting textures. The white background is clean, and the single red sneaker looks like a legitimate commercial photograph. VERDICT: PASS

A clear glass perfume bottle on a reflective black surface wopenai/gpt-image-2FAIL
A clear glass perfume bottle on a reflective black surface with studio lighting, no background distractions

Evaluator

The image fails to meet the criteria for a real photograph due to significant AI artifacts. The most glaring issue is the "glitch" or distorted character at the bottom right of the reflection, which appears as a nonsensical, jagged shape. Additionally, the internal straw (dip tube) is unnaturally thick and lacks the physical realism of a real object. The overall geometry of the glass edges is slightly inconsistent, indicating it is AI-generated. VERDICT: FAIL

A ripe green apple on a seamless white background with soft openai/gpt-image-2PASS
A ripe green apple on a seamless white background with soft side lighting and subtle shadow — catalog style

Evaluator

The image successfully meets the criteria for a studio-quality product shot. The lighting is soft and natural, creating a subtle shadow that grounds the object. The texture of the green apple is realistic, and the seamless white background is clean. There are no obvious AI artifacts, such as unnatural warping or distorted textures, and the stem appears anatomically correct. It looks like a professional catalog photograph. VERDICT: PASS

Graphical design 100%vs38%

Layout & Design100%vs0%
Easy 100%vs0%
A movie night poster with a large popcorn image in the centeopenai/gpt-image-2PASS
A movie night poster with a large popcorn image in the center, the title 'MOVIE NIGHT' in bold at the top, and the tagline 'Every Friday at 8 PM' at the bottom

Evaluator

All described elements are present: “MOVIE NIGHT” at the top in bold, a large popcorn bucket centered, and “EVERY FRIDAY AT 8 PM” at the bottom. Text is readable and correctly placed with clear visual hierarchy. The layout matches the prompt precisely. VERDICT: PASS

An event poster featuring a guitar illustration, the headlinopenai/gpt-image-2PASS
An event poster featuring a guitar illustration, the headline 'LIVE MUSIC FEST' in large text, and 'Tickets available now' as a tagline below

Evaluator

The poster includes a prominent guitar illustration, the headline “LIVE MUSIC FEST” in large, bold text with clear visual hierarchy, and “TICKETS AVAILABLE NOW” as a tagline below, correctly placed. All text is readable and properly positioned. The layout matches the prompt’s requirements. VERDICT: PASS

A book club poster with an open book illustration, the titleopenai/gpt-image-2PASS
A book club poster with an open book illustration, the title 'JOIN THE BOOK CLUB' prominently displayed, and 'Meets every Wednesday' as the tagline

Evaluator

All required elements are present: open book illustration, title “JOIN THE BOOK CLUB” with correct visual hierarchy (large, centered), and tagline “Meets every Wednesday” on a banner at the bottom. Text is readable and correctly placed. Layout is cohesive and follows the prompt’s structure. VERDICT: PASS

Medium 100%vs0%
A landing page design with a hero header image at the top, aopenai/gpt-image-2PASS
A landing page design with a hero header image at the top, a headline and subtitle in the middle, a 'Sign Up' call-to-action button, and a footer with three social media icons

Evaluator

Hero header image present at top. Headline “Discover. Learn. Grow.” and subtitle below are centered and readable. “Sign Up” button is clearly placed below. Footer contains three social media icons (Facebook, Twitter, Instagram). Visual hierarchy is correct: image → text → CTA → footer. All required elements are present and correctly positioned. VERDICT: PASS

A restaurant menu layout with a logo header, three category openai/gpt-image-2PASS
A restaurant menu layout with a logo header, three category sections (appetizers, mains, desserts) each with items and prices, and a footer with address and hours

Evaluator

All specified layout zones are present: logo header, three category sections (appetizers, mains, desserts) with items and prices, and a footer with address and hours. Visual hierarchy is clear, with consistent typography and spacing. Text is fully readable and correctly placed. No elements are missing or misaligned. VERDICT: PASS

A newsletter layout with a banner image header, a two-columnopenai/gpt-image-2PASS
A newsletter layout with a banner image header, a two-column body section with text on the left and an image on the right, a highlighted quote block, and a footer with unsubscribe link

Evaluator

All layout zones are present: banner header, two-column body (text left, image right), quote block, and footer with unsubscribe link. Visual hierarchy is clear, text is readable and correctly placed. The quote block is appropriately styled and attributed. All elements align with the prompt’s structure and intent. VERDICT: PASS

Hard 100%vs0%
A magazine double-page spread with a left page containing a openai/gpt-image-2PASS
A magazine double-page spread with a left page containing a large hero photo, and a right page with a two-column article, a pull quote, a sidebar with author bio, and a page number

Evaluator

Left page: Large hero photo with headline and subhead — present. Right page: Two-column article, pull quote, author bio sidebar, page number — all correctly placed with clear visual hierarchy. Text is readable and properly formatted. Layout matches prompt exactly. VERDICT: PASS

A mobile app UI mockup showing a dashboard with a top navigaopenai/gpt-image-2PASS
A mobile app UI mockup showing a dashboard with a top navigation bar, a search field, a 2x2 grid of feature cards with icons, a list of recent activity items, and a bottom tab bar

Evaluator

All specified UI elements are present: top navigation bar (with time, menu, and notifications), search field, 2x2 feature cards with icons and text, recent activity list, and bottom tab bar. Text is readable and correctly placed. Visual hierarchy is clear and consistent with mobile app conventions. Layout matches the prompt precisely. VERDICT: PASS

A fashion magazine spread with a full-bleed left-page photo,openai/gpt-image-2PASS
A fashion magazine spread with a full-bleed left-page photo, right page with three-column layout, headline, body copy, a smaller inset image, and styled page numbers

Evaluator

Full-bleed left-page photo present. Right page has three-column layout, headline “New Directions,” body copy, inset image, and styled page numbers (42/43). Text is readable and correctly placed with clear visual hierarchy. All specified layout zones and elements are accurately rendered. VERDICT: PASS

Data Visualisation100%vs0%
Easy 100%vs0%
A pie chart showing market share: Company A 35%, Company B 2openai/gpt-image-2PASS
A pie chart showing market share: Company A 35%, Company B 25%, Company C 20%, Others 20% — each slice a different color with labels and percentages

Evaluator

All requested elements are present and accurate: correct percentages (35%, 25%, 20%, 20%), proper labels, distinct colors, and readable text. The chart type is a pie chart as requested. No errors in data or presentation. VERDICT: PASS

A simple line graph showing daily website visitors over one openai/gpt-image-2PASS
A simple line graph showing daily website visitors over one week (Mon-Sun): 1200, 1500, 1100, 1800, 2000, 1700, 1400 — with grid lines, axis labels, and title 'Weekly Traffic'

Evaluator

The chart correctly displays a line graph with the requested title "Weekly Traffic", axis labels ("Visitors" and "Day of the Week"), and grid lines. Data points align with the specified values: Mon (1200), Tue (1500), Wed (1100), Thu (1800), Fri (2000), Sat (1700), Sun (1400). All elements are clearly readable and the chart type matches the request. VERDICT: PASS

A bar chart showing monthly sales for January through June wopenai/gpt-image-2PASS
A bar chart showing monthly sales for January through June with values 12K, 18K, 15K, 22K, 19K, 25K — labeled axes, color-coded bars, and a title 'Monthly Sales 2025'

Evaluator

All requested elements are present and accurate: bar chart type, labeled axes (Month and Sales in Thousands of USD), title "Monthly Sales 2025", color-coded bars, and exact sales values (12K, 18K, 15K, 22K, 19K, 25K) clearly labeled on bars. Readability is high. VERDICT: PASS

Style Diversity100%vs75%
Easy 100%vs33%
A golden retriever sitting in a garden, rendered in Japaneseopenai/gpt-image-2PASS
A golden retriever sitting in a garden, rendered in Japanese anime style with flat colors and large expressive eyes

Evaluator

The image clearly depicts a golden retriever in a garden, matching the subject. It features large, expressive eyes and a stylized, clean aesthetic consistent with Japanese anime. However, the fur has subtle shading and texture, and the background has depth and lighting that deviate from strict flat-color anime. The style is recognizable but not perfectly aligned with “flat colors.” VERDICT: PASS

A golden retriever sitting in a garden, painted in the styleopenai/gpt-image-2PASS
A golden retriever sitting in a garden, painted in the style of an oil painting with visible brushstrokes and rich texture

Evaluator

The image clearly depicts a golden retriever in a garden setting. It exhibits visible brushstrokes, textured surfaces, and a painterly quality consistent with oil painting. The style matches the prompt’s request for rich texture and brushwork. VERDICT: PASS

A golden retriever sitting in a garden, rendered in pixel aropenai/gpt-image-2PASS
A golden retriever sitting in a garden, rendered in pixel art style with a 32x32 grid and limited color palette

Evaluator

The image clearly depicts a golden retriever in a garden, rendered in a distinct pixel art style with visible 32x32 grid blocks and a limited, blocky color palette. Key characteristics — blocky forms, low-resolution textures, and simplified shapes — are present. The scene matches the prompt’s requirements. VERDICT: PASS

Medium 100%vs100%
A city street scene in the style of Art Deco — geometric shaopenai/gpt-image-2PASS
A city street scene in the style of Art Deco — geometric shapes, gold and black palette, ornamental symmetry

Evaluator

The image perfectly embodies Art Deco: geometric facades, bold gold-and-black palette, ornamental symmetry, sunburst motifs, and stylized signage. Classic elements like the Chrysler Building-inspired skyscraper, vintage cars, and period attire reinforce the style. All requested characteristics are unmistakably present. VERDICT: PASS

A woman with a parasol in the style of Impressionism — soft openai/gpt-image-2PASS
A woman with a parasol in the style of Impressionism — soft brushstrokes, dappled light, pastel palette

Evaluator

The image clearly embodies Impressionism: visible brushstrokes, dappled light, and a pastel palette dominate. The woman holds a parasol, and the scene captures fleeting outdoor moments with soft focus and atmospheric color. All requested stylistic elements are unmistakably present. VERDICT: PASS

A portrait of a man in Pop Art style — bold outlines, Ben-Daopenai/gpt-image-2PASS
A portrait of a man in Pop Art style — bold outlines, Ben-Day dots, flat vivid primary colors

Evaluator

The image clearly embodies Pop Art: bold black outlines, Ben-Day dot patterns, and flat primary colors (red, yellow, blue). The subject’s stylized face and comic-book background (starburst) reinforce the style. All requested visual characteristics are unmistakably present. VERDICT: PASS

Hard 100%vs67%
A forest rendered in photorealistic watercolor — the detail openai/gpt-image-2PASS
A forest rendered in photorealistic watercolor — the detail and lighting of a photo combined with visible watercolor bleeds and paper texture

Evaluator

The image blends photorealistic detail with visible watercolor bleeds and paper texture, matching the prompt. Lighting and depth resemble photography, while edges show soft watercolor washes and texture. Key characteristics are clearly present. VERDICT: PASS

A woman in a futuristic city rendered in cyberpunk Art Nouveopenai/gpt-image-2PASS
A woman in a futuristic city rendered in cyberpunk Art Nouveau — neon colors and tech elements combined with flowing organic lines and floral borders

Evaluator

The image successfully blends cyberpunk’s neon-lit, tech-heavy cityscape with Art Nouveau’s flowing organic lines and floral motifs. The woman’s ornate, floral-adorned attire and the decorative border frame echo Art Nouveau, while glowing neon signs, cybernetic elements, and flying vehicles embody cyberpunk. Key visual characteristics are clearly recognizable and well-integrated. VERDICT: PASS

A medieval castle depicted in Japanese ukiyo-e woodblock priopenai/gpt-image-2PASS
A medieval castle depicted in Japanese ukiyo-e woodblock print style combined with photorealistic sky and lighting

Evaluator

The image successfully blends ukiyo-e style (flat planes, bold outlines, woodblock texture, vertical text) with photorealistic sky and lighting (detailed clouds, atmospheric depth, realistic reflections). The castle’s structure is medieval, and the composition mirrors traditional ukiyo-e framing. Key characteristics are clearly present and harmoniously combined. VERDICT: PASS

Extreme 100%vs100%
An interior scene rendered in soft brutalism — raw concrete openai/gpt-image-2PASS
An interior scene rendered in soft brutalism — raw concrete textures combined with rounded pastel furniture and warm diffused light

Evaluator

The image clearly embodies soft brutalism: raw concrete walls, ceiling, and staircase contrast with rounded, pastel pink furniture. Warm, diffused light from the skylight and window enhances the serene, minimalist atmosphere. Textures are visible, and the composition balances industrial rawness with organic curves. All key elements of the prompt are faithfully rendered. VERDICT: PASS

A still life painted in anxious minimalism — sparse compositopenai/gpt-image-2PASS
A still life painted in anxious minimalism — sparse composition with uncomfortable negative space, muted colors, and subtle visual tension

Evaluator

The image delivers “anxious minimalism” with sparse composition: only two objects on a table, vast negative space, and a dark, muted palette. The wall’s stark, empty rectangle adds visual tension. Subtle textures and restrained lighting enhance unease without overt drama. All key elements of the prompt are clearly present. VERDICT: PASS

A cityscape rendered in nostalgic futurism — retro-futuristiopenai/gpt-image-2PASS
A cityscape rendered in nostalgic futurism — retro-futuristic 1960s space-age aesthetics with a melancholic warm-toned patina

Evaluator

The image clearly embodies retro-futurism with 1960s space-age aesthetics: flying saucers, domed architecture, and sleek towers. Warm, aged tones and a melancholic patina enhance the nostalgic feel. Figures in period attire and vintage signage (e.g., “CONTINENTAL HOTEL”) reinforce the era. Visual cues are unmistakably aligned with the prompt. VERDICT: PASS