ImageBench

ImageBench V1 —

192 evaluations across 6 categories

145
Pass
47
Fail
76%
Pass Rate
130.7s
Avg Latency
123.3s
Min Latency
146.4s
Max Latency
Text Rendering73%Spatial Reasoning75%Human realism81%Truthfulness74%Professional Studio89%Graphical design54%Latency0%
The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink
Text RenderingTypography StyleEasylocal/z-image-6b

Prompt: The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink

Text Rendering 73%

Typography StyleClick to expand100%
Easy 100%
The word 'CHAPTER ONE' typed on aged paper with a vintage tyPASS
The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink

Evaluator (Qwen 3 VL)

Text reads “CHAPTER ONE” correctly. Font mimics vintage typewriter style with uneven, slightly textured ink strokes and distressed edges. Background shows aged paper texture with vignette. The medium and style convincingly match the prompt’s description of typewriter output on aged paper. VERDICT: PASS

The phrase 'TODAY'S SPECIAL' written in white chalk on a darPASS
The phrase 'TODAY'S SPECIAL' written in white chalk on a dark green chalkboard with chalk dust

Evaluator (Qwen 3 VL)

Text "TODAY'S SPECIAL" is rendered in white chalk on a dark green chalkboard, with visible chalk dust texture and smudges. Spelling is correct. The style convincingly mimics chalk writing, including uneven strokes and a pink chalk checkmark. VERDICT: PASS

The word 'OPEN' displayed on a red LED sign with glowing pixPASS
The word 'OPEN' displayed on a red LED sign with glowing pixel-style letters

Evaluator (Qwen 3 VL)

The text "OPEN" is rendered in glowing red LED pixels, matching the described medium. Letters are clearly formed with individual light elements, and spelling is correct. The pixelated, illuminated style convincingly simulates an LED sign. VERDICT: PASS

Writing accuracyClick to expand67%
Easy 67%
A single large word 'HELLO' centered on a white backgroundPASS
A single large word 'HELLO' centered on a white background

Evaluator (Qwen 3 VL)

The word "HELLO" is spelled correctly and fully readable in bold navy font. It is centered on a white background, though framed by a decorative border. No misspellings or illegible parts exist. The text matches the intended specification exactly. VERDICT: PASS

The word 'ADVENTURE' in big bold letters centered on a plainFAIL
The word 'ADVENTURE' in big bold letters centered on a plain cream background

Evaluator (Qwen 3 VL)

The word "ADVENTURE" is split into two lines: "ADVENT" and "TURE". The letter 'U' is missing between "ADVENT" and "TURE", making it illegible as a single word. The text is readable but incorrectly formed. VERDICT: FAIL

The word 'SERENITY' written large and centered on a solid liPASS
The word 'SERENITY' written large and centered on a solid light blue background

Evaluator (Qwen 3 VL)

The word "SERENITY" is spelled correctly and is fully readable, centered on a solid light blue background. No misspellings, missing words, or illegible parts are present. The text matches the intended specification exactly. VERDICT: PASS

Medium 67%
The sentence 'The quick brown fox jumps high' in clean blackPASS
The sentence 'The quick brown fox jumps high' in clean black text on a white background

Evaluator (Qwen 3 VL)

All words are spelled correctly and fully readable: "The", "quick", "brown", "fox", "jumps", "high". Text is clean black on white background as intended. No misspellings or illegible parts. VERDICT: PASS

'Every sunset brings a new dawn ahead' written neatly on a pPASS
'Every sunset brings a new dawn ahead' written neatly on a pale yellow background

Evaluator (Qwen 3 VL)

Text reads: "Every sunset brings a new dawn ahead" — all words are spelled correctly and fully legible against the pale yellow background. No misspellings or missing words detected. Font is clear and consistent. VERDICT: PASS

'Stars shine brightest in the dark night' displayed on a cleFAIL
'Stars shine brightest in the dark night' displayed on a clean off-white background

Evaluator (Qwen 3 VL)

The main text “Stars shine brightest in the dark night” is fully readable and correctly spelled. However, the small cursive text above contains misspellings and is illegible. Since the core evaluation requires all text to match exactly, the presence of errors in the secondary text fails the criteria. VERDICT: FAIL

Hard 100%
A poster with headline 'SUMMER SALE' in large bold text, bodPASS
A poster with headline 'SUMMER SALE' in large bold text, body text 'Up to 50% off all items this weekend only' in medium size, and fine print 'Terms and conditions apply. See store for details.' at the bottom in small text

Evaluator (Qwen 3 VL)

All text matches the intended message exactly. "SUMMER SALE", "Up to 50% off all items this weekend only", and "Terms and conditions apply. See store for details." are fully legible and correctly spelled. No misspellings or missing words detected. VERDICT: PASS

A flyer with large title 'GRAND OPENING', subtitle 'Join us PASS
A flyer with large title 'GRAND OPENING', subtitle 'Join us for food, music, and fun' in medium text, and small disclaimer 'RSVP required. Limited seating available.' at the bottom

Evaluator (Qwen 3 VL)

All text matches the intended content exactly: "GRAND OPENING", "Join us for food, music, and fun", and "RSVP required. Limited seating available." No misspellings, missing words, or illegible parts. Text is fully readable and correctly rendered. VERDICT: PASS

An event banner with headline 'ANNUAL GALA 2025' at the top PASS
An event banner with headline 'ANNUAL GALA 2025' at the top in large serif font, description 'An evening of fine dining and live jazz' in medium text, and footnote 'Black tie optional. Doors open at 6 PM.' in small text

Evaluator (Qwen 3 VL)

All text matches the intended content exactly: "ANNUAL GALA 2025", "An evening of fine dining and live jazz", and "Black tie optional. Doors open at 6 PM." Every word is spelled correctly and fully readable. No misspellings or illegible parts detected. VERDICT: PASS

Extreme 33%
A vintage movie poster for 'THE LAST VOYAGE' featuring a tagFAIL
A vintage movie poster for 'THE LAST VOYAGE' featuring a tagline 'One ship. One chance. No turning back.' with credits block reading 'Directed by James Harlow. Starring Elena Voss, Marcus Chen, and Sofia Reyes. Music by Thomas Grant. A Paramount Pict

Evaluator (Qwen 3 VL)

Text is mostly correct but contains errors: "SOFIA REYES" is misspelled as "SOFIA REYED NATON" (extra "NATON"), and "MUSIC BY THOMAS GRANT" is incorrectly rendered as "MUSIC BIY THOMAS GRANT by THOMAS GRANT.(ATLUCNE4)" (typo "BIY", extra text "(ATLUCNE4)"). Key names and phrases are illegible or corrupted. VERDICT: FAIL

A magazine cover titled 'NATIONAL EXPLORER' with cover linesFAIL
A magazine cover titled 'NATIONAL EXPLORER' with cover lines 'The Hidden Temples of Cambodia — page 34', 'Interview: Dr. Maria Santos on Climate Action', 'Top 10 Trails You Must Hike This Year', and a barcode area with 'Issue 247 | March 2025 | $6.99

Evaluator (Qwen 3 VL)

Text “Top 10 Trails You Must Hike This Year” is garbled; “You” and “Hike” are misrendered. “Cambodia” appears correctly but is misaligned. All other text matches intended content. Misspellings and illegibility prevent full accuracy. VERDICT: FAIL

A product label for 'GOLDEN HARVEST ORGANIC HONEY' with subtPASS
A product label for 'GOLDEN HARVEST ORGANIC HONEY' with subtitle 'Raw & Unfiltered', weight '12 oz (340g)', description 'Pure wildflower honey harvested from sustainable apiaries in the Pacific Northwest. No additives or preservatives.', and small te

Evaluator (Qwen 3 VL)

All text matches the intended content exactly: "GOLDEN HARVEST ORGANIC HONEY", "Raw & Unfiltered", "Pure wildflower honey harvested from sustainable apiaries in the Pacific Northwest. No additives or preservatives.", "12 oz (340g)", and "Best before: 06/2026. Store in cool, dry place. Product of USA." No misspellings or illegible parts. VERDICT: PASS

Spatial Reasoning 75%

Attributes BindingClick to expand100%
Easy 100%
A red ball and a blue cube sitting side by side on a white sPASS
A red ball and a blue cube sitting side by side on a white surface

Evaluator (Qwen 3.5 122B)

The image shows two objects on a white surface: 1. A red sphere (ball) — correct color and shape. 2. A blue cube — correct color and shape. Both objects are side by side, as requested. No attributes are swapped; colors and shapes match the prompt exactly. Size is not specified in the prompt, so relative sizing is acceptable. VERDICT: PASS

A green apple and an orange pumpkin on a wooden cutting boarPASS
A green apple and an orange pumpkin on a wooden cutting board

Evaluator (Qwen 3.5 122B)

The image shows: - One orange pumpkin (correct color, round shape, appropriate size relative to apples) - Two green apples (correct color, round shape, appropriate size) - All objects are on a wooden cutting board (correct surface) All attributes are correctly bound: the pumpkin is orange and large; the apples are green and smaller. No attribute swaps observed. The prompt asked for "a green apple" (singular) but two are present — however, this doesn’t violate core attribute binding. VERDICT: P

A yellow taxi and a black limousine parked next to each othePASS
A yellow taxi and a black limousine parked next to each other on a street

Evaluator (Qwen 3.5 122B)

The image shows a yellow taxi in the foreground and a black limousine parked behind it on a street. The taxi is yellow with a "T" logo, and the limousine is black, elongated, and has multiple windows. Both vehicles are correctly identified with their respective attributes (color, size, shape) as described in the prompt. There are no swaps or incorrect bindings of attributes to objects. VERDICT: PASS

Medium 100%
A large red circle, a small blue triangle, and a medium greePASS
A large red circle, a small blue triangle, and a medium green star arranged on a white canvas

Evaluator (Qwen 3.5 122B)

The image shows a white canvas with three shapes: a large red circle (top left), a medium blue triangle (top right), and a medium green star (bottom center). The prompt requested a “small” blue triangle, but the triangle appears medium-sized relative to the other shapes — not clearly small. However, since “small” is subjective without scale reference, and all other attributes (color, shape, arrangement) are correct, we accept it as passable. VERDICT: PASS

A tall thin white candle, a short fat red candle, and a mediPASS
A tall thin white candle, a short fat red candle, and a medium spiral yellow candle on a shelf

Evaluator (Qwen 3.5 122B)

The image shows three candles on a shelf: 1. Left: Short, fat, red candle — matches description. 2. Center: Tall, thin, white candle — matches description. 3. Right: Medium-height, spiral-shaped, yellow candle — matches description. All attributes (color, size, shape) are correctly bound to each object with no swaps. The spatial arrangement and visual properties align precisely with the prompt. VERDICT: PASS

A big brown horse, a small white rabbit, and a medium-sized PASS
A big brown horse, a small white rabbit, and a medium-sized black cat standing in a meadow

Evaluator (Qwen 3.5 122B)

The image shows a large brown horse with a lighter mane and a white marking on its forehead, standing in a grassy meadow. A small white rabbit is positioned near the horse’s front legs. A medium-sized black cat stands nearby, facing forward. All attributes (color, size, shape) are correctly bound to each object — no swaps or mismatches observed. The scene matches the prompt precisely. VERDICT: PASS

Hard 100%
A pink fire truck, a blue banana, and a green sun in a surrePASS
A pink fire truck, a blue banana, and a green sun in a surreal landscape

Evaluator (Qwen 3.5 122B)

The image contains a pink fire truck (correct color and object), a blue banana (correct color and object), and a green sun (correct color and object). All attributes are correctly bound to their respective objects with no swaps. The surreal landscape is represented by the grassy field and sky. All requested elements are present and accurately depicted. VERDICT: PASS

A purple carrot, a red tree with red leaves, and a white watPASS
A purple carrot, a red tree with red leaves, and a white watermelon sliced open

Evaluator (Qwen 3.5 122B)

The image contains: - A purple carrot (correct color and shape) - A red tree with red leaves (correct color and foliage) - A white watermelon, whole and sliced open (correct color and state) All attributes are correctly bound to their respective objects with no swaps. The objects match the prompt’s specifications in color, form, and presentation. VERDICT: PASS

An orange elephant, a yellow penguin, and a silver strawberrPASS
An orange elephant, a yellow penguin, and a silver strawberry arranged on a table

Evaluator (Qwen 3.5 122B)

The image shows three objects on a wooden surface: 1. An orange elephant — correct color, shape, and size relative to others. 2. A yellow penguin — correct color, shape, and size; white belly and black feet match typical penguin depiction. 3. A silver strawberry — correct metallic color and textured shape resembling a strawberry. All attributes (color, shape, object identity) are correctly bound with no swaps. VERDICT: PASS

CompositionalityClick to expand78%
Easy 100%
A tabby cat wearing oversized red sunglassesPASS
A tabby cat wearing oversized red sunglasses

Evaluator (Qwen 3.5 122B)

The image contains all required elements: - Tabby cat: Present (striped fur pattern visible) - Oversized red sunglasses: Present (large red frames with dark lenses) - Coherent combination: The sunglasses sit naturally on the cat’s face, with reflections in lenses showing the cat’s eyes All prompt elements are accurately represented and logically combined. VERDICT: PASS

A golden retriever wearing a tiny cowboy hatPASS
A golden retriever wearing a tiny cowboy hat

Evaluator (Qwen 3.5 122B)

The image contains: - Golden retriever: Present and clearly depicted - Cowboy hat: Present on the dog's head - "Tiny" descriptor: The hat appears proportionate to the dog's head, not notably tiny The core elements are present and coherently combined. While the hat isn't dramatically small, it fits the concept appropriately. VERDICT: PASS

A penguin holding a colorful umbrellaPASS
A penguin holding a colorful umbrella

Evaluator (Qwen 3.5 122B)

The image shows a penguin standing upright on a paved surface, holding a rainbow-colored umbrella in its flipper. The umbrella has distinct panels of red, orange, yellow, green, blue, and purple. The penguin is rendered with realistic texture and anatomy, and the umbrella is held naturally as if by a human. All described elements — penguin, colorful umbrella, and their coherent combination — are present and visually consistent. VERDICT: PASS

Medium 67%
A medieval knight in full armor sitting at a desk typing on PASS
A medieval knight in full armor sitting at a desk typing on a modern laptop

Evaluator (Qwen 3.5 122B)

The image contains all described elements: 1. Medieval knight in full armor - PRESENT (full plate armor with helmet and chainmail) 2. Sitting at a desk - PRESENT (wooden desk/table) 3. Typing on a modern laptop - PRESENT (Apple MacBook visible with hands positioned as if typing) The concepts are coherently combined in a single scene with appropriate lighting and perspective. The juxtaposition of medieval and modern elements is clear and intentional. VERDICT: PASS

A Roman gladiator taking a selfie with a smartphone in the CFAIL
A Roman gladiator taking a selfie with a smartphone in the Colosseum

Evaluator (Qwen 3.5 122B)

The image contains: - Roman gladiator: Present (helmet, armor, muscular build) - Selfie pose: Present (arm extended toward camera) - Smartphone: Present (floating above helmet, not held in hand) - Colosseum: Present (background structure) The smartphone is not being held by the gladiator — it’s floating unnaturally above his helmet, breaking coherence with the “taking a selfie” action. While all elements are visually present, their combination lacks logical interaction. VERDICT: FAIL

An astronaut in a spacesuit riding a bicycle through a parkPASS
An astronaut in a spacesuit riding a bicycle through a park

Evaluator (Qwen 3.5 122B)

The image contains all required elements: 1. Astronaut in a spacesuit - PRESENT (full white suit with helmet, American flag patch, life support pack) 2. Riding a bicycle - PRESENT (astronaut seated on bike, hands on handlebars, feet on pedals) 3. Through a park - PRESENT (paved path, green grass, trees, sunny outdoor setting) All elements are coherently combined in a single, logical scene. VERDICT: PASS

Hard 67%
An octopus wearing a top hat and monocle playing chess on thPASS
An octopus wearing a top hat and monocle playing chess on the surface of the moon with Earth visible in the background

Evaluator (Qwen 3.5 122B)

The image contains all described elements: - Octopus: Present (orange, with tentacles) - Top hat: Present (black, on octopus head) - Monocle/Goggles: Present (worn on hat) - Chess: Present (board with pieces) - Moon surface: Present (cratered gray terrain) - Earth in background: Present (visible in upper right) All elements are coherently combined in a single scene. VERDICT: PASS

A dinosaur in a business suit giving a PowerPoint presentatiPASS
A dinosaur in a business suit giving a PowerPoint presentation in a modern office while drinking coffee

Evaluator (Qwen 3.5 122B)

The image contains all required elements: 1. Dinosaur (T-Rex head/body) - Present 2. Business suit - Present (worn by dinosaur) 3. PowerPoint presentation - Present (on projector screen) 4. Modern office - Present (conference room setting) 5. Drinking coffee - Present (holding coffee cup) All elements are coherently combined in a single scene. The dinosaur is dressed professionally, presenting to colleagues in a corporate environment while holding a coffee cup. VERDICT: PASS

A robot dressed as a pirate sailing a paper boat on a lake oFAIL
A robot dressed as a pirate sailing a paper boat on a lake of lava while juggling three kittens

Evaluator (Qwen 3.5 122B)

Let’s examine each element from the prompt: 1. **Robot dressed as a pirate** – ✅ Present: Robot wears a tricorn hat, red bandana, and has glowing eyes; armor-like body with straps. 2. **Sailing a paper boat** – ✅ Present: The robot sits in a brown, folded-paper-style boat. 3. **On a lake of lava** – ✅ Present: Glowing orange molten liquid surrounds the boat, with rocky terrain and fire effects. 4. **Juggling three kittens** – ❌ Not juggling: Three kittens are present (two held in hands, one in

CountingClick to expand56%
Easy 100%
Exactly 3 red apples arranged on a wooden tablePASS
Exactly 3 red apples arranged on a wooden table

Evaluator (Qwen 3.5 122B)

The image shows three red apples on a wooden table: two whole apples and one apple cut in half, revealing its interior. The count of apples is exactly three, matching the prompt’s request. All are red and placed on a wooden surface. The arrangement and object types align with the description. VERDICT: PASS

Exactly 5 yellow rubber ducks sitting in a row on a bathroomPASS
Exactly 5 yellow rubber ducks sitting in a row on a bathroom shelf

Evaluator (Qwen 3.5 122B)

The image displays exactly five yellow rubber ducks arranged in a horizontal row on a white bathroom shelf or sink edge. Each duck is identical in appearance, with orange beaks and black eyes with lashes. The count matches the prompt’s request precisely: five ducks, yellow, rubber-like, in a row, on a bathroom surface. No extra or missing objects. VERDICT: PASS

Exactly 4 white candles standing on a dark surfacePASS
Exactly 4 white candles standing on a dark surface

Evaluator (Qwen 3.5 122B)

The image displays four white pillar candles arranged in a row on a dark surface. All candles are upright and unlit, with visible wicks. The count of candles is exactly four, matching the prompt’s request. No other objects are present. The description “white candles standing on a dark surface” accurately reflects the visual content. VERDICT: PASS

Medium 67%
2 orange cats and 3 black dogs sitting together on a grassy PASS
2 orange cats and 3 black dogs sitting together on a grassy lawn

Evaluator (Qwen 3.5 122B)

The image shows three black dogs and two orange cats on a grassy lawn. The dogs are sitting, and one cat is lying down while the other sits. The count of animals matches the prompt exactly: 3 black dogs and 2 orange cats. All are together on grass. VERDICT: PASS

4 red roses and 2 white daisies in a glass vase on a tableFAIL
4 red roses and 2 white daisies in a glass vase on a table

Evaluator (Qwen 3.5 122B)

The image shows 4 red roses and 3 white daisies in a glass vase on a table. The prompt requested 4 red roses and 2 white daisies. While the rose count is correct, the daisy count exceeds the request by one. Since the core criteria (exact match) is not met, this is a failure. VERDICT: FAIL

1 blue butterfly and 3 yellow butterflies resting on a gardePASS
1 blue butterfly and 3 yellow butterflies resting on a garden fence

Evaluator (Qwen 3.5 122B)

The image displays four butterflies resting on a wire fence. There is one large blue butterfly at the top center. Below it, there are three yellow butterflies: one partially visible on the left, one on the right, and one at the bottom center. The count matches the prompt exactly: 1 blue and 3 yellow butterflies. VERDICT: PASS

Hard 0%
Exactly 9 goldfish swimming in a large round fishbowlFAIL
Exactly 9 goldfish swimming in a large round fishbowl

Evaluator (Qwen 3.5 122B)

The image shows a round fishbowl containing goldfish. Counting them: one on the far left, one top-left, one bottom-center, one center-right (large), one behind it, one top-right, one far right, and two more near the center — totaling 8 clearly visible goldfish. There may be a partial ninth obscured, but only 8 are distinctly countable. The prompt requested exactly 9. Since the count does not match precisely, this fails the criterion. VERDICT: FAIL

A fruit bowl containing exactly 3 bananas, 4 oranges, and 2 FAIL
A fruit bowl containing exactly 3 bananas, 4 oranges, and 2 apples

Evaluator (Qwen 3.5 122B)

The image shows a fruit bowl with 3 bananas, 2 apples, and 3 oranges. The prompt requested 3 bananas, 4 oranges, and 2 apples. While the banana and apple counts are correct, the orange count is incorrect (3 instead of 4). Therefore, the total object count does not exactly match the request. VERDICT: FAIL

Exactly 11 colorful balloons floating against a clear blue sFAIL
Exactly 11 colorful balloons floating against a clear blue sky

Evaluator (Qwen 3.5 122B)

The image displays colorful balloons against a clear blue sky. Counting them: there are 9 balloons total — 2 red, 2 yellow, 2 pink, 1 green, 1 blue, and 1 patterned (confetti-style). The prompt requested exactly 11 balloons. Since only 9 are present, the count does not match. VERDICT: FAIL

NegationClick to expand67%
Easy 67%
A tropical beach scene with no peoplePASS
A tropical beach scene with no people

Evaluator (Qwen 3.5 122B)

The image shows a tropical beach scene with palm trees, turquoise water, white sand, and distant mountains under a blue sky. There are no people visible anywhere in the frame — not on the beach, in the water, or in the background. The exclusion criterion (“no people”) is fully satisfied. VERDICT: PASS

A cat lying on a couch without a collarFAIL
A cat lying on a couch without a collar

Evaluator (Qwen 3.5 122B)

The image shows a cat lying on a beige couch. The cat has white and tabby markings. Upon close inspection, the cat is wearing a dark-colored collar around its neck. Since the prompt explicitly excluded “a collar,” and one is clearly present, the excluded element is not absent. VERDICT: FAIL

A classroom with desks and a chalkboard but no studentsPASS
A classroom with desks and a chalkboard but no students

Evaluator (Qwen 3.5 122B)

The image shows a classroom with desks, chairs, a chalkboard, and a bulletin board. There are no students visible in the scene. The excluded element (students) is truly absent. VERDICT: PASS

Medium 100%
A sandy beach with palm trees, seashells, and a beach umbrelPASS
A sandy beach with palm trees, seashells, and a beach umbrella but no water or ocean visible

Evaluator (Qwen 3.5 122B)

The image displays a sandy beach foreground scattered with various seashells. In the mid-ground, there are several palm trees and a colorful beach umbrella. The background shows more greenery and a blue sky with clouds. Crucially, there is no water, ocean, or sea visible anywhere in the frame; the view is strictly of the sand and the vegetation behind it. The excluded element is absent. VERDICT: PASS

An office workspace with a desk, chair, bookshelf, and lamp PASS
An office workspace with a desk, chair, bookshelf, and lamp but no computers or screens

Evaluator (Qwen 3.5 122B)

The image shows an office workspace with a wooden bookshelf filled with books and binders, a light-colored desk, a black mesh office chair, and a black desk lamp. There are no computers, monitors, tablets, or any type of screen visible in the scene. The excluded elements are truly absent. VERDICT: PASS

A kitchen with pots, pans, a stove, and a cutting board but PASS
A kitchen with pots, pans, a stove, and a cutting board but no food

Evaluator (Qwen 3.5 122B)

The image shows a kitchen scene with a stainless steel pot with a lid, a smaller stainless steel saucepan, a black frying pan, and a wooden cutting board on a countertop next to a gas stove. The stove has a visible blue flame under the saucepan. There is no food visible in any of the cookware or on the cutting board. The excluded element (food) is absent. VERDICT: PASS

Hard 33%
A pepperoni pizza with tomato sauce, basil, and crust but abFAIL
A pepperoni pizza with tomato sauce, basil, and crust but absolutely no cheese

Evaluator (Qwen 3.5 122B)

The image shows a pepperoni pizza with visible melted cheese covering the surface, along with tomato sauce, basil leaves, and a golden crust. Since the prompt explicitly excluded cheese and the image clearly contains melted cheese, the excluded element is present. VERDICT: FAIL

A car body with windows, doors, and headlights but no wheelsPASS
A car body with windows, doors, and headlights but no wheels

Evaluator (Qwen 3.5 122B)

The image shows a white car body shell viewed from above. It includes the roof, door frames, and window openings. The prompt explicitly excluded wheels, and indeed, no wheels are visible in this top-down view of the bare chassis. The excluded element (wheels) is absent. VERDICT: PASS

A human face with eyes, nose, and mouth but no eyebrowsFAIL
A human face with eyes, nose, and mouth but no eyebrows

Evaluator (Qwen 3.5 122B)

The image shows a human face with clearly visible eyes, nose, and mouth. However, the face also has well-defined eyebrows above each eye — they are natural-looking, arched, and filled in with hair-like texture. Since the prompt explicitly excluded eyebrows, and they are present, this violates the constraint. VERDICT: FAIL

Relative PositionClick to expand83%
Easy 100%
A cat sitting on top of a cardboard boxPASS
A cat sitting on top of a cardboard box

Evaluator (Qwen 3.5 122B)

The image shows a tabby cat with white paws and chest sitting directly on the top surface of a closed cardboard box. The box is positioned on a wooden floor. The spatial relationship described in the prompt — “a cat sitting on top of a cardboard box” — is accurately represented in the image. All positional relationships are correct. VERDICT: PASS

A red ball underneath a wooden chairPASS
A red ball underneath a wooden chair

Evaluator (Qwen 3.5 122B)

The image shows a red ball positioned directly beneath the seat of a wooden chair, resting on a wooden floor. The spatial relationship described in the prompt — “a red ball underneath a wooden chair” — is accurately represented. The ball is clearly under the chair’s structure, and no conflicting positional claims are present. All described relationships are correct. VERDICT: PASS

A coffee mug next to an open book on a deskPASS
A coffee mug next to an open book on a desk

Evaluator (Qwen 3.5 122B)

The image shows a white coffee mug and an open book on a wooden surface, likely a desk. The mug is positioned to the left of the book. The book is open and lies flat on the desk. The spatial relationships described in the prompt — "a coffee mug next to an open book on a desk" — are accurately represented: both objects are on the desk, and the mug is beside (next to) the book. All positional relationships are correct. VERDICT: PASS

Medium 100%
A bird perched on a branch, the branch extending from a treePASS
A bird perched on a branch, the branch extending from a tree next to a park bench

Evaluator (Qwen 3.5 122B)

The image shows a bird perched on a branch that extends from a tree trunk on the left side of the frame. Below and to the right of the branch, a park bench is visible in the foreground. The spatial relationships described — bird on branch, branch extending from tree, tree next to bench — are all accurately represented. All positional cues (on, next to, below) are correct. VERDICT: PASS

A cat sitting on a pillow, the pillow placed on a wooden chaPASS
A cat sitting on a pillow, the pillow placed on a wooden chair

Evaluator (Qwen 3.5 122B)

The image shows a tabby cat sitting directly on a patterned pillow. The pillow is placed on the seat of a wooden chair. All spatial relationships described in the prompt — “cat on pillow,” “pillow on chair” — are accurately represented. No positional errors are present. VERDICT: PASS

A candle on top of a stack of books, the books next to a potPASS
A candle on top of a stack of books, the books next to a potted plant

Evaluator (Qwen 3.5 122B)

The image shows a candle placed on top of a stack of two books. To the left of the books is a potted plant in a terracotta pot. The spatial relationships described — “candle on top of books” and “books next to potted plant” — are accurately represented. All positional descriptors (on, next to) match the visual layout. VERDICT: PASS

Hard 67%
Five chess pieces on a board: a king in the center, a queen FAIL
Five chess pieces on a board: a king in the center, a queen to its left, a bishop to its right, a rook above the king, and a knight below the king

Evaluator (Qwen 3.5 122B)

The image shows five chess pieces on a board, but the spatial relationships do not match the prompt. The king is centered, but the queen is not to its left — instead, a bishop is there. A knight is to the left (not below), and a pawn is below the king (not a rook). The piece to the right is a pawn, not a bishop. No rook is visible above the king. Multiple positional errors exist. VERDICT: FAIL

A desk scene: laptop in the center, coffee mug to the left, PASS
A desk scene: laptop in the center, coffee mug to the left, phone to the right, lamp above, and notebook below

Evaluator (Qwen 3.5 122B)

The image shows a desk scene with a laptop centered, a coffee mug to its left, a phone to its right, a lamp above casting light, and an open notebook with a pen below the laptop. All specified spatial relationships — left, right, above, below — are accurately represented as described in the prompt. VERDICT: PASS

Five animals in a field: a horse on the far left, a cow to tPASS
Five animals in a field: a horse on the far left, a cow to the right of the horse, a sheep in the center, a goat to the right of the sheep, and a dog on the far right

Evaluator (Qwen 3.5 122B)

The image shows five animals in a field arranged left to right: a horse, a cow, a sheep, a goat, and a dog. The horse is on the far left. The cow is to the right of the horse. The sheep is in the center. The goat is to the right of the sheep. The dog is on the far right. All specified spatial relationships are accurately represented. VERDICT: PASS

Extreme 67%
A landscape with a red barn in the foreground at large scalePASS
A landscape with a red barn in the foreground at large scale, a windmill in the midground at medium scale, and snow-capped mountains in the background at small scale, all with consistent depth perspective

Evaluator (Qwen 3.5 122B)

The image shows a red barn in the foreground at large scale, consistent with the prompt. Two windmills appear in the midground — one on a hill to the left and another closer to the right — both at medium scale. Snow-capped mountains are in the background at small scale, maintaining depth perspective. All spatial relationships (foreground/midground/background) and relative positions (left/right hills, above/below elevation) are correctly represented. VERDICT: PASS

A beach scene with a sandcastle in the foreground, a surfer PASS
A beach scene with a sandcastle in the foreground, a surfer on the waves in the midground, and a cruise ship on the horizon in the background, sizes decreasing with distance

Evaluator (Qwen 3.5 122B)

The image shows a beach scene with a sandcastle in the foreground, a surfer riding a wave in the midground, and a cruise ship on the horizon in the background. The spatial arrangement matches the prompt: sandcastle closest (foreground), surfer in middle distance (midground), ship farthest (background). Sizes decrease appropriately with distance. All positional relationships are correctly represented. VERDICT: PASS

A city park with a park bench and pigeons in the foreground,FAIL
A city park with a park bench and pigeons in the foreground, a fountain in the midground, and skyscrapers in the background, each layer progressively smaller

Evaluator (Qwen 3.5 122B)

The image shows a park bench and pigeons in the foreground, a fountain in the midground, and skyscrapers in the background — matching the prompt’s layering. However, the pigeons are not “on” or “under” the bench as implied by typical spatial phrasing; some are on the ground near it, one is perched on the bench arm. The phrase “each layer progressively smaller” is visually accurate due to perspective. But since the prompt implies specific positional relationships (e.g., pigeons “on” bench) that a

Scale & ProportionsClick to expand67%
Easy 67%
A person standing next to a yellow school bus, showing realiPASS
A person standing next to a yellow school bus, showing realistic relative scale

Evaluator (Qwen 3.5 122B)

The image shows a person standing next to a yellow school bus. The person’s height relative to the bus appears realistic — approximately one-third to one-quarter of the bus’s height, which is consistent with real-world proportions. The bus’s length, wheel size, and window spacing also align with typical school bus dimensions. No obvious scaling distortions are present. VERDICT: PASS

A ladybug sitting on a large green leaf, shown at realistic PASS
A ladybug sitting on a large green leaf, shown at realistic proportions

Evaluator (Qwen 3.5 122B)

The ladybug appears appropriately sized relative to the leaf, consistent with real-world proportions. The leaf’s veins and texture are detailed and scale correctly with the insect. No obvious distortions or size mismatches are present. The composition reflects a natural, realistic relationship between the two objects. VERDICT: PASS

A child holding a basketball, both at correct relative sizesFAIL
A child holding a basketball, both at correct relative sizes

Evaluator (Qwen 3.5 122B)

The image shows a child holding a basketball. The ball appears disproportionately large relative to the child’s torso and hands — it nearly spans the width of their chest and is held with both hands in a way that suggests it’s too big for a typical child. In reality, a standard basketball (size 7) would be smaller relative to a child’s body; even a youth-sized ball (size 5 or 6) wouldn’t appear this oversized. This violates the “correct relative sizes” criterion. VERDICT: FAIL

Medium 67%
A tiny person standing on the rim of a coffee cup, looking dFAIL
A tiny person standing on the rim of a coffee cup, looking down into it

Evaluator (Qwen 3.5 122B)

The image shows a tiny figurine of a person sitting on the rim of a coffee cup, not standing. The person’s size relative to the cup is plausible for a miniature figure, but the pose contradicts the prompt’s description of “standing.” Since the core requirement includes both size relationship and posture, and the posture is incorrect, the criteria are not fully met. VERDICT: FAIL

A giant tabby cat walking between city skyscrapers like a kaPASS
A giant tabby cat walking between city skyscrapers like a kaiju

Evaluator (Qwen 3.5 122B)

The image depicts a giant tabby cat striding between skyscrapers, with its paws resting on building rooftops and its body towering over the urban landscape. The scale is consistent with the “kaiju” description — the cat’s head reaches mid-building height, and its spine spikes rise above surrounding structures. Proportions are internally consistent: limbs, tail, and facial features match a tabby cat, just massively scaled. No anatomical or spatial inconsistencies violate the prompt’s intent. VER

A miniature house sitting on the palm of someone's handPASS
A miniature house sitting on the palm of someone's hand

Evaluator (Qwen 3.5 122B)

The image shows a miniature house resting on an open human palm. The house is appropriately scaled to fit comfortably within the hand, appearing small enough to be held yet detailed enough to show windows, doors, and roof shingles. The proportions between the hand and the house are realistic and consistent with the prompt’s description of a “miniature house sitting on the palm.” Lighting and shadows also support spatial coherence. VERDICT: PASS

Hard 67%
A group photo of an elephant, a horse, a dog, a cat, and a mPASS
A group photo of an elephant, a horse, a dog, a cat, and a mouse, all standing in a line with correct real-world proportional sizes

Evaluator (Qwen 3.5 122B)

The image shows an elephant, horse, dog, cat, and mouse arranged in a line. The elephant is largest, followed by the horse, then the dog, then the cat, and finally the mouse — which matches real-world size proportions. All animals are standing upright and appear correctly scaled relative to each other. No obvious distortions or inversions in scale are present. VERDICT: PASS

A dinner table scene with a plate, fork, knife, wine glass, PASS
A dinner table scene with a plate, fork, knife, wine glass, salt shaker, and peppercorn, all at realistic relative sizes

Evaluator (Qwen 3.5 122B)

The image shows a table setting with a plate, two forks, a knife, a wine glass, and salt and pepper shakers. The relative sizes appear realistic: the plate is appropriately large compared to the cutlery, the wine glass is proportionate to the plate, and the shakers are correctly sized relative to the other items. All objects maintain consistent and believable proportions for a dining scene. VERDICT: PASS

A farm scene with a barn, tractor, farmer, cow, chicken, andFAIL
A farm scene with a barn, tractor, farmer, cow, chicken, and a fence post, all at correct proportional scale

Evaluator (Qwen 3.5 122B)

The image displays a farm scene with a barn, tractor, farmer, cow, chicken, and fence post. However, the proportions are inconsistent: the chicken in the foreground is unnaturally large relative to the cow and tractor, appearing almost as tall as the cow’s legs. The farmer also appears disproportionately small compared to the tractor and barn. These scale errors violate the requirement for correct proportional relationships. VERDICT: FAIL

Human realism 81%

Faces & ExpressionsClick to expand100%
Easy 100%
A close-up portrait of a young woman with a warm natural smiPASS
A close-up portrait of a young woman with a warm natural smile

Evaluator (Qwen 3 VL)

Face is nearly symmetrical with natural features. Smile appears warm and genuine, with clear crinkles around eyes and even teeth. Slight AI artifacts: overly smooth skin texture and unnaturally perfect teeth. Emotion is clearly readable as joyful. Minor distortions don’t detract from core intent. VERDICT: PASS

A portrait of a middle-aged man with a calm, serious expressPASS
A portrait of a middle-aged man with a calm, serious expression

Evaluator (Qwen 3 VL)

The face is symmetric with natural, correctly proportioned features. The expression is calm and serious, with neutral lips and steady gaze. No distortions are visible. Lighting and texture appear realistic. The portrait accurately fulfills the prompt’s requirements. VERDICT: PASS

A headshot of a teenager with a relaxed, neutral facePASS
A headshot of a teenager with a relaxed, neutral face

Evaluator (Qwen 3 VL)

The face is symmetric with correctly proportioned features. Eyes, nose, and mouth are aligned naturally. The expression is relaxed and neutral, conveying calmness without distortion. Skin texture and lighting are realistic. No visible artifacts or anatomical errors. VERDICT: PASS

Medium 100%
A close-up of a person's face showing clear disgust, nose wrPASS
A close-up of a person's face showing clear disgust, nose wrinkled and lip curled

Evaluator (Qwen 3 VL)

The face shows clear disgust: nose wrinkled, lips curled, and brows furrowed. Features are symmetric and anatomically plausible. No significant distortions. The emotion is unmistakably readable. All requested elements are accurately rendered. VERDICT: PASS

A portrait capturing genuine surprise — raised eyebrows, widPASS
A portrait capturing genuine surprise — raised eyebrows, wide eyes, slightly open mouth

Evaluator (Qwen 3 VL)

Face is symmetric with correctly rendered features. Eyes are wide, eyebrows raised, mouth slightly open — all clearly conveying genuine surprise. No significant distortions. Skin texture and lighting are natural. The expression is unmistakably the intended emotion. VERDICT: PASS

A face showing quiet melancholy — downcast eyes, slightly drPASS
A face showing quiet melancholy — downcast eyes, slightly drawn mouth, distant gaze

Evaluator (Qwen 3 VL)

Face is symmetric with natural, correctly rendered features. Downcast eyes and slightly drawn mouth convey quiet melancholy clearly. Distant gaze is evident through lowered eyelids and gaze direction. No distortions or unnatural artifacts present. Emotion is authentic and readable. VERDICT: PASS

Hard 100%
An elderly woman in her 80s with deep wrinkles and white haiPASS
An elderly woman in her 80s with deep wrinkles and white hair, smiling warmly with kind eyes

Evaluator (Qwen 3 VL)

Face is symmetric with natural, aged features and deep wrinkles. White hair is visible. Warm, genuine smile with kind eyes clearly conveys emotion. No significant distortions. Facial structure and expression align well with the prompt. VERDICT: PASS

A 5-year-old child with round cheeks and large eyes, crying PASS
A 5-year-old child with round cheeks and large eyes, crying with tears streaming down their face

Evaluator (Qwen 3 VL)

The child’s face is symmetric with natural features: round cheeks, large eyes, and tears streaming down. The crying expression is clearly readable, with furrowed brows and a downturned mouth. No significant distortions are present. The image accurately fulfills the prompt’s requirements. VERDICT: PASS

An elderly man in his 70s with age spots and thinning gray hPASS
An elderly man in his 70s with age spots and thinning gray hair, looking skeptically over reading glasses

Evaluator (Qwen 3 VL)

Face is symmetric with realistic, aged features: visible age spots, thinning gray hair, and reading glasses. Skeptical expression is clearly readable via furrowed brow and downturned mouth. No distortions; details are natural and well-defined. VERDICT: PASS

Extreme 100%
A group of three friends at a table: one laughing with head PASS
A group of three friends at a table: one laughing with head tilted back, one looking shocked with wide eyes, and one rolling their eyes with a smirk

Evaluator (Qwen 3 VL)

Faces show clear, symmetrical features with no distortions. Emotions are vividly readable: left man laughs with head back, center man looks shocked with wide eyes, right man rolls eyes with smirk. All three expressions match the prompt precisely. Lighting and anatomy are natural and consistent. VERDICT: PASS

A family photo with four people: a father beaming with pridePASS
A family photo with four people: a father beaming with pride, a mother smiling gently, a teenager looking bored, and a child grinning excitedly

Evaluator (Qwen 3 VL)

Faces are symmetrical with correct anatomical features. Emotions are clearly readable: father beams, mother smiles gently, teenager looks bored, child grins excitedly. No distortions or unnatural artifacts. All four individuals match the prompt’s emotional descriptions accurately. VERDICT: PASS

Three coworkers in a meeting: one looking frustrated with fuPASS
Three coworkers in a meeting: one looking frustrated with furrowed brows, one nodding with a pleased expression, and one looking confused with a tilted head

Evaluator (Qwen 3 VL)

Faces are symmetric with correct features. Emotions are clearly readable: left man shows frustration (furrowed brows), center woman shows pleased nodding (smile), right man shows confusion (tilted head, squinted eyes). No distortions. All three expressions match the prompt accurately. VERDICT: PASS

Full BodyClick to expand92%
Easy 67%
A woman standing upright in a relaxed pose, arms at her sideFAIL
A woman standing upright in a relaxed pose, arms at her sides, wearing jeans and a white t-shirt

Evaluator (Qwen 3 VL)

The image shows a torso and upper legs, but the full body is cut off. Limbs visible (arms, hands) are anatomically correct and at sides, consistent with a relaxed upright pose. However, the prompt specified a full-body view, which is not present. The pose is recognizable from the visible portion. VERDICT: FAIL

A man standing straight facing the camera, hands in pockets,PASS
A man standing straight facing the camera, hands in pockets, normal proportions

Evaluator (Qwen 3 VL)

The man stands straight facing forward with hands in pockets, matching the prompt. Limb count and proportions are anatomically correct. No distortions are visible. The pose is clearly recognizable as described. All elements align with the requested description. VERDICT: PASS

A person standing in a neutral pose in a park, full body visPASS
A person standing in a neutral pose in a park, full body visible from head to feet

Evaluator (Qwen 3 VL)

The image shows a full-body view of a person standing in a park, from head to feet. Limb count and proportions are anatomically correct. The pose is neutral, with arms at sides and feet flat on grass. No significant distortions are visible. The scene matches the prompt description accurately. VERDICT: PASS

Medium 100%
A woman mid-stride running through a park, hair and clothes PASS
A woman mid-stride running through a park, hair and clothes showing motion

Evaluator (Qwen 3 VL)

The woman’s limbs are correctly proportioned and count is accurate. Her running pose is anatomically plausible, with one leg forward, the other back, arms bent for momentum, and hair flowing backward — all consistent with mid-stride motion. No significant distortions are visible. The action and prompt are clearly recognizable and well-executed. VERDICT: PASS

A man captured mid-jump over a puddle, body in the air with PASS
A man captured mid-jump over a puddle, body in the air with bent knees

Evaluator (Qwen 3 VL)

The man’s limbs are correctly counted and proportioned. His pose—bent knees, arms extended for balance—is anatomically plausible for a jump. The action is clearly recognizable: mid-air leap over a puddle. No significant distortions are visible. The image accurately depicts the requested scene with natural motion and detail. VERDICT: PASS

A dancer in a dynamic spin, dress flowing outward with one lPASS
A dancer in a dynamic spin, dress flowing outward with one leg extended

Evaluator (Qwen 3 VL)

The dancer’s pose is anatomically plausible: one leg extended upward, the other supporting weight on the ball of the foot. Limb count and proportions are correct. The flowing dress captures motion realistically. The dynamic spin is recognizable, with fabric swirling naturally around the body. No significant distortions or anatomical errors are present. VERDICT: PASS

Hard 100%
A person in a yoga tree pose — standing on one leg with the PASS
A person in a yoga tree pose — standing on one leg with the other foot pressed to the inner thigh, arms raised overhead

Evaluator (Qwen 3 VL)

The person stands on one leg, with the other foot pressed against the inner thigh, arms raised overhead in prayer position. Limb count and proportions are correct. The pose is clearly recognizable as Tree Pose (Vrksasana). No anatomical distortions are visible. The body alignment appears natural and balanced. VERDICT: PASS

A rock climber clinging to a cliff face with arms and legs ePASS
A rock climber clinging to a cliff face with arms and legs extended to different holds

Evaluator (Qwen 3 VL)

The climber has two arms and two legs, with proportions and joint angles consistent with human anatomy. The pose is recognizable as dynamic rock climbing, with limbs extended to grips. No distortions are visible. The action matches the prompt precisely. VERDICT: PASS

A cellist seated with correct posture, legs around the instrPASS
A cellist seated with correct posture, legs around the instrument, bow arm extended

Evaluator (Qwen 3 VL)

The cellist’s posture and limb count are anatomically correct. Legs are wrapped around the instrument, bow arm is extended, and hands position is accurate for playing. Proportions are natural, with no visible distortions. The pose clearly matches the prompt’s description. VERDICT: PASS

Extreme 100%
Two tango dancers in a close embrace, the man dipping the woPASS
Two tango dancers in a close embrace, the man dipping the woman backward with intertwined legs

Evaluator (Qwen 3 VL)

The dancers’ limbs are correctly counted and proportioned. The woman’s legs are intertwined with the man’s, and her body is arched backward in a classic tango dip. The pose is anatomically plausible and matches the prompt. No significant distortions are visible. The action is clearly recognizable as a tango embrace with a dip. VERDICT: PASS

Two soccer players mid-tackle, one sliding and the other jumPASS
Two soccer players mid-tackle, one sliding and the other jumping over with the ball at their feet

Evaluator (Qwen 3 VL)

Both players show correct limb count and proportions. The sliding player’s body posture and the jumping player’s airborne pose are anatomically plausible and match the described action. The ball is near the jumping player’s feet. No significant distortions are visible. The dynamic interaction is recognizable as a mid-tackle scenario. VERDICT: PASS

Two gymnasts performing a synchronized handstand, side by siPASS
Two gymnasts performing a synchronized handstand, side by side with identical body alignment

Evaluator (Qwen 3 VL)

Both gymnasts have correct limb count and proportions. Their handstand poses are anatomically plausible, though not perfectly symmetrical — one has legs straight, the other slightly bent. The action is recognizable as a synchronized handstand. Minor distortions: one gymnast’s head/neck angle appears slightly unnatural. Overall, the core anatomical correctness and pose recognition are met. VERDICT: PASS

HandsClick to expand42%
Easy 33%
A person standing with both hands visible at their sides, fiFAIL
A person standing with both hands visible at their sides, fingers relaxed and naturally spread

Evaluator (Qwen 3.5 122B)

The left hand (viewer’s left) has 5 fingers, all appearing anatomically normal. The right hand (viewer’s right) also has 5 fingers, but the thumb is positioned unusually — it appears to emerge from the side of the palm rather than the base, and its orientation is awkward. Additionally, the pinky finger on the right hand seems slightly fused or misshapen at the tip. These anomalies suggest AI generation artifacts. VERDICT: FAIL

A woman holding her hands out in front of her palms-up, all FAIL
A woman holding her hands out in front of her palms-up, all ten fingers clearly visible

Evaluator (Qwen 3.5 122B)

Upon close examination, the left hand (viewer’s left) has six fingers: thumb, index, middle, ring, pinky, and an extra digit between index and middle. The right hand appears to have five fingers normally. This violates anatomical correctness and the prompt’s requirement of “exactly 5 per hand.” Anomaly: polydactyly on left hand. VERDICT: FAIL

A man with arms crossed, both hands resting on opposite uppePASS
A man with arms crossed, both hands resting on opposite upper arms with fingers visible

Evaluator (Qwen 3.5 122B)

The image shows a man with arms crossed, hands resting on opposite upper arms. Left hand (on right shoulder): 5 fingers visible, anatomically correct. Right hand (on left shoulder): 5 fingers visible, also anatomically correct. No anomalies in finger count or grip. Gesture is natural and consistent with the prompt. VERDICT: PASS

Medium 67%
A person holding a white ceramic coffee mug by the handle wiFAIL
A person holding a white ceramic coffee mug by the handle with their right hand

Evaluator (Qwen 3.5 122B)

The image shows a right hand gripping a white mug by the handle. The thumb is positioned on top of the handle, and four fingers are wrapped around it — but upon close inspection, there are only three visible fingers (index, middle, ring) plus the thumb; the pinky appears fused or missing. Additionally, the fingers show unnatural curvature and slight merging at the joints. Anatomical correctness fails due to incorrect finger count and structure. VERDICT: FAIL

A woman writing in a notebook with a pen gripped correctly iPASS
A woman writing in a notebook with a pen gripped correctly in her right hand

Evaluator (Qwen 3.5 122B)

The right hand holding the pen shows 5 fingers with a natural, anatomically correct grip. The left hand resting on the notebook also displays 5 fingers in a relaxed, realistic position. No anomalies or distortions are visible in either hand. The anatomy and gesture align with the prompt’s requirements. VERDICT: PASS

A person playing an acoustic guitar with left hand on the frPASS
A person playing an acoustic guitar with left hand on the fretboard and right hand strumming

Evaluator (Qwen 3.5 122B)

The left hand on the fretboard shows 4 fingers clearly visible, with the thumb likely behind the neck — anatomically plausible. The right hand is blurred due to motion (strumming), but appears to have 5 fingers in a natural strumming position. No obvious anatomical anomalies or extra/missing fingers are evident despite motion blur. Grip and gesture are consistent with playing guitar. VERDICT: PASS

Hard 67%
A person making a peace sign with their right hand, clearly PASS
A person making a peace sign with their right hand, clearly showing two fingers up and three curled

Evaluator (Qwen 3.5 122B)

The image shows a single right hand making a peace sign. The index and middle fingers are extended upward, while the ring finger, pinky, and thumb are curled inward — forming a natural, anatomically correct gesture. All five digits are present and positioned appropriately for this pose. No anomalies or extra/missing fingers are visible. The hand structure, knuckles, and nail details appear realistic and consistent with human anatomy. VERDICT: PASS

A hand giving a thumbs-up with four fingers curled and thumbPASS
A hand giving a thumbs-up with four fingers curled and thumb extended upward

Evaluator (Qwen 3.5 122B)

The image shows one hand making a thumbs-up gesture. The thumb is extended upward, and four fingers are curled into the palm — anatomically correct for this gesture. All five digits (thumb + 4 fingers) are present and properly positioned. No anomalies or extra/missing fingers. The anatomy is accurate. VERDICT: PASS

A person counting to three on their fingers, with index, midFAIL
A person counting to three on their fingers, with index, middle, and ring fingers extended

Evaluator (Qwen 3.5 122B)

The image shows two hands. The right hand (in foreground) has index, middle, and ring fingers extended — matching the prompt’s gesture. However, the left hand is gripping the right index finger unnaturally, with thumb and index finger pinching it mid-shaft — not a natural counting pose. Also, only 4 fingers are clearly visible on the left hand; the pinky is obscured or missing. Anatomical grip is incorrect for “counting to three.” Anomaly: unnatural finger pinch and incomplete finger visibility.

Extreme 0%
Two people shaking hands in a professional setting, all fingFAIL
Two people shaking hands in a professional setting, all fingers of both hands anatomically correct

Evaluator (Qwen 3.5 122B)

Upon close examination: The left hand (in suit sleeve) shows 4 fingers clearly visible — thumb, index, middle, and ring; pinky is obscured or missing. The right hand (bare arm) also shows only 4 fingers — thumb, index, middle, and ring; pinky is not visible. While this may be due to occlusion from the grip, the prompt requires “all fingers of both hands anatomically correct” — implying full visibility or at least plausible anatomy. The grip itself appears natural for a handshake. However, since

Two people doing a high-five, palms meeting with all fingersFAIL
Two people doing a high-five, palms meeting with all fingers extended on both hands

Evaluator (Qwen 3.5 122B)

The image shows two hands pressed together in a prayer-like or namaste gesture, not a high-five. The left hand (viewer’s left) has 5 fingers visible; the right hand also has 5 fingers. However, the gesture is anatomically incorrect for a high-five — palms are flat against each other vertically, not slapping horizontally. Fingers are extended but aligned in a vertical plane, inconsistent with a high-five. Anomaly: wrong gesture entirely. VERDICT: FAIL

One person handing a small gift box to another person, both FAIL
One person handing a small gift box to another person, both sets of hands gripping the box correctly

Evaluator (Qwen 3.5 122B)

The image shows four hands (two pairs) interacting with a gift box. The left pair appears to be one person’s hands, and the right pair another’s — but anatomically, it’s implausible for two people to each use both hands to grip the same small box from opposite sides without overlapping or awkward positioning. More critically: the left “hand” has only 4 visible fingers (thumb + 3), and the right hand has 5, but the left hand’s thumb is positioned unnaturally high, and the right hand’s pinky is pa

Multi-SubjectClick to expand100%
Easy 100%
Two women standing side by side: one with short blonde hair PASS
Two women standing side by side: one with short blonde hair wearing a red jacket, the other with long black hair wearing a blue dress

Evaluator (Qwen 3 VL)

Left woman: short blonde hair, red blazer over white shirt. Right woman: long black hair, blue sleeveless dress. Both visually distinct and match prompt descriptions precisely. No missing or mismatched elements. VERDICT: PASS

Two men at a cafe: one with a beard and glasses wearing a grPASS
Two men at a cafe: one with a beard and glasses wearing a green sweater, the other clean-shaven in a black hoodie

Evaluator (Qwen 3 VL)

Man on right: beard, glasses, green sweater — matches description. Man on left: clean-shaven, black hoodie — matches description. Both individuals are visually distinct and present. All features align with prompt. VERDICT: PASS

Two friends posing: one tall with curly red hair in denim ovPASS
Two friends posing: one tall with curly red hair in denim overalls, the other short with straight brown hair in a yellow sundress

Evaluator (Qwen 3 VL)

Tall person: curly red hair, denim overalls over green top. Short person: straight brown bob, yellow sundress. Both visually distinct, matching descriptions. No extra or missing individuals. All features accurately rendered. VERDICT: PASS

Medium 100%
Four people at a dinner table: a bald man in a suit, a womanPASS
Four people at a dinner table: a bald man in a suit, a woman with silver hair in a red blouse, a young man with dreadlocks in a denim jacket, and a woman with a hijab in a green dress

Evaluator (Qwen 3 VL)

All four individuals are present and visually distinct. Left: bald man in suit. Next: woman with silver hair, red blouse. Center: young man with dreadlocks, denim jacket. Right: woman in green hijab and dress. Features match prompt exactly. VERDICT: PASS

Three musicians on stage: a tall woman with short pink hair PASS
Three musicians on stage: a tall woman with short pink hair playing bass, a man with a beard and bandana on drums, and a slim man with glasses playing keyboard

Evaluator (Qwen 3 VL)

Tall woman: pink short hair, playing bass, wearing red top and black pants. Man on drums: beard, bandana, playing drums. Slim man: glasses, playing keyboard, wearing black shirt and jeans. All three are visually distinct and match descriptions. VERDICT: PASS

Three hikers on a trail: a man with a backpack and sunhat, aPASS
Three hikers on a trail: a man with a backpack and sunhat, a woman with braided hair and hiking poles, and a teenager with a baseball cap and water bottle

Evaluator (Qwen 3 VL)

Man: wears sunhat, large backpack, and hiking attire. Woman: has braided hair, uses hiking poles, wears shorts. Teenager: wears baseball cap, holds water bottle, carries backpack. All three are visually distinct and match described features precisely. VERDICT: PASS

Truthfulness 74%

PhotorealismClick to expand100%
Medium 100%
A still life on a wooden table featuring a clear glass of waPASS
A still life on a wooden table featuring a clear glass of water, a polished silver spoon, a folded linen napkin, and a ripe red apple — each material rendered with accurate texture and light behavior

Evaluator (Qwen 3.6 27B)

The image demonstrates high-fidelity material rendering. The glass exhibits correct refraction and transparency, distorting the background as expected. The metal spoon shows sharp, accurate specular highlights and reflections. The wooden table has a natural grain and warm tone, while the linen napkin displays a convincing woven texture with soft light interaction. The apple’s skin reflects light realistically with subtle subsurface scattering effects. All materials behave physically correctly un

A workspace desk with a brushed aluminum laptop, a leather-bPASS
A workspace desk with a brushed aluminum laptop, a leather-bound notebook, a ceramic coffee mug, and a glass paperweight — all materials visually distinct and realistic

Evaluator (Qwen 3.6 27B)

The image successfully renders distinct materials with high fidelity. The brushed aluminum laptop shows appropriate metallic sheen and reflections. The leather notebook exhibits realistic texture and grain. The ceramic mug displays a matte, speckled glaze with correct light interaction. The glass elements (paperweight and sheets) demonstrate accurate translucency, refraction, and edge reflections. The wood desk surface has a natural grain and subtle gloss. All materials are visually distinct and

A kitchen counter with a stainless steel pot, a wooden cuttiPASS
A kitchen counter with a stainless steel pot, a wooden cutting board, a glass bottle of olive oil, and a cotton dish towel — each surface reflecting light correctly

Evaluator (Qwen 3.6 27B)

The image successfully renders the distinct visual properties of each requested material. The stainless steel pot exhibits sharp, accurate reflections and metallic sheen. The wooden cutting board shows natural grain and matte texture. The glass bottle demonstrates translucency and refraction with the oil inside, while the cotton towel displays a soft, fibrous texture. All surfaces interact with light realistically, meeting the prompt's criteria for correct material rendering. VERDICT: PASS

Physics & ReflectionsClick to expand92%
Easy 100%
A red sphere on a white surface with a single shadow cast toPASS
A red sphere on a white surface with a single shadow cast to the right, consistent with a light source from the upper left

Evaluator (Qwen 3.6 27B)

The image displays a red sphere on a white surface. The shadow is cast to the right and slightly downward, which is consistent with a light source positioned to the upper left. The shadow's shape is an elongated oval, correctly distorted by perspective and the angle of the light. The highlight on the sphere also aligns with the upper-left light source. The physics of the shadow and reflection are accurate. VERDICT: PASS

A coffee mug on a sunny windowsill casting a sharp shadow toPASS
A coffee mug on a sunny windowsill casting a sharp shadow to the left

Evaluator (Qwen 3.6 27B)

The shadows are physically plausible. The light source is clearly coming from the right (the window), casting sharp, distinct shadows to the left of both the mug and the pitcher. The shadow shapes accurately correspond to the objects' forms, and the intensity suggests strong, direct sunlight. The reflections on the glossy surfaces also align with the lighting direction. VERDICT: PASS

A person standing outdoors at midday with a short shadow dirPASS
A person standing outdoors at midday with a short shadow directly below them

Evaluator (Qwen 3.6 27B)

The shadow is cast directly beneath the person, consistent with midday lighting. The direction and orientation are physically plausible, and there’s no distortion or unnatural coloration in the shadow. The length of the shadow is short, as expected at midday. Overall, the shadow aligns well with the lighting conditions described in the prompt. VERDICT: PASS

Medium 100%
A woman standing in front of a full-length mirror, her reflePASS
A woman standing in front of a full-length mirror, her reflection accurately showing the back of her outfit and reversed orientation

Evaluator (Qwen 3.6 27B)

The reflection is physically plausible. The mirror accurately captures the woman’s back, with correct reversed orientation (e.g., pockets and seams align symmetrically). Lighting and shadows match the room’s ambient illumination — no directional mismatches or unnatural distortions. Colors are consistent between subject and reflection. The mirror’s frame and edge reflections also behave realistically. No artifacts or physics violations detected. VERDICT: PASS

A cat sitting on a polished marble floor with its reflectionPASS
A cat sitting on a polished marble floor with its reflection visible below

Evaluator (Qwen 3.6 27B)

The reflection is vertically inverted and positioned directly beneath the cat, consistent with a polished floor. The distortion increases toward the foreground due to perspective, which is physically accurate. Colors and lighting in the reflection match the subject. The shadow under the cat aligns with the light source direction. No anomalies in orientation or physics are present. VERDICT: PASS

A vase of flowers on a glass table with a clear reflection vPASS
A vase of flowers on a glass table with a clear reflection visible on the table surface

Evaluator (Qwen 3.6 27B)

The reflection is vertically inverted and correctly positioned directly beneath the vase, consistent with a glass surface. The distortion and slight blurring in the reflection are physically plausible for a glossy table. The lighting and shadows align with the scene’s illumination, and the colors in the reflection match the original objects. No anomalies in direction or orientation are observed. VERDICT: PASS

Hard 67%
A row of trees reflected in a calm lake at sunset, with the PASS
A row of trees reflected in a calm lake at sunset, with the reflection slightly darker and gently rippled

Evaluator (Qwen 3.6 27B)

The reflections are physically plausible. The orientation is correct (inverted vertically), and the direction aligns with the light source. The gentle ripples distort the reflection naturally, creating a wavy, slightly darker appearance consistent with water physics. The colors match the sky and trees, with appropriate attenuation in the water. No anomalies detected. VERDICT: PASS

A city skyline reflected in a rain puddle on a street, with FAIL
A city skyline reflected in a rain puddle on a street, with mild distortion and color shift

Evaluator (Qwen 3.6 27B)

The reflections are physically plausible in orientation and direction, correctly mirroring the buildings and sky. However, the image exhibits severe chromatic aberration (RGB splitting) across the entire frame, not just as a subtle "color shift" in the reflection. This heavy artifacting creates a glitch-art aesthetic that overrides natural physics, making the visual result unrealistic rather than a faithful depiction of a puddle reflection. VERDICT: FAIL

A mountain range reflected in a still alpine lake, with the PASS
A mountain range reflected in a still alpine lake, with the reflection inverted and subtly muted in color

Evaluator (Qwen 3.6 27B)

The reflection is physically plausible. It is correctly inverted, mirroring the mountain range and shoreline with appropriate orientation. The water surface shows minimal distortion, consistent with a still lake, and the reflection is subtly muted in color and contrast compared to the actual landscape, which aligns with real-world optical behavior. No directional or structural inconsistencies are present. VERDICT: PASS

Extreme 100%
A clear glass sphere sitting on a mirror surface near a poolPASS
A clear glass sphere sitting on a mirror surface near a pool of water, showing refraction through the sphere, reflection on the mirror, and water reflection simultaneously

Evaluator (Qwen 3.6 27B)

The image demonstrates high physical plausibility. The glass sphere correctly refracts the background scene, inverting it as expected. The mirror surface provides a sharp, accurate reflection of the sphere and the sky. The water ripples distort the surrounding reflections naturally, and the lighting is consistent across all elements. The interaction between the sphere, the mirror, and the water is coherent and realistic. VERDICT: PASS

A chrome sculpture in a room with a glass floor and large wiPASS
A chrome sculpture in a room with a glass floor and large windows, showing multiple reflections and environmental mapping on every surface

Evaluator (Qwen 3.6 27B)

The reflections are physically plausible. The chrome sculptures accurately mirror the environment, including the ceiling lights and window frames, with appropriate distortion based on their curved surfaces. The glass floor creates a clear, inverted reflection of the sculptures and the room above, consistent with a highly reflective surface. The lighting and shadows align with the natural light coming from the large windows, enhancing the realism of the scene. VERDICT: PASS

A crystal wine glass on a polished marble counter next to a PASS
A crystal wine glass on a polished marble counter next to a window overlooking a lake — showing refraction in the glass, reflection on the marble, and the lake's own reflective surface

Evaluator (Qwen 3.6 27B)

The image displays physically plausible optics. The reflection on the marble is correctly oriented and positioned directly beneath the glass, with appropriate distortion. The refraction through the crystal bowl accurately distorts the background scenery, creating realistic caustic patterns and color dispersion. The lighting direction is consistent across the scene, casting soft shadows and highlights that align with the window's position. The lake’s surface in the background also shows natural r

World KnowledgeClick to expand50%
Easy 100%
The Eiffel Tower in Paris on a clear day, architecturally acPASS
The Eiffel Tower in Paris on a clear day, architecturally accurate with its iron lattice structure and three levels

Evaluator (Qwen 3.6 27B)

The image depicts the Eiffel Tower with high architectural accuracy, showcasing its iconic iron lattice structure, three distinct levels (including the top platform and antenna), and characteristic arched base. The setting—a clear blue sky with Parisian buildings and greenery at the base—matches the prompt’s description. No significant inaccuracies are visible; proportions, details, and context align with reality. VERDICT: PASS

The Taj Mahal in Agra seen from the reflecting pool, with acPASS
The Taj Mahal in Agra seen from the reflecting pool, with accurate dome, minarets, and symmetry

Evaluator (Qwen 3.6 27B)

The image depicts the Taj Mahal with high architectural accuracy, including the central dome, four minarets, and symmetrical layout. The reflection in the pool is rendered realistically. However, the people in the foreground appear slightly distorted or "melty," a common artifact in AI generation. Despite these minor human figure anomalies, the core subject—the monument itself—is recognizable and factually accurate. VERDICT: PASS

The Statue of Liberty on her island pedestal, with correct tPASS
The Statue of Liberty on her island pedestal, with correct torch, crown, tablet, and green patina

Evaluator (Qwen 3.6 27B)

The Statue of Liberty is clearly recognizable and accurately depicted with correct green patina, torch, crown, and tablet. The pedestal’s architectural details — including the colonnade, stone base, and surrounding structures — align with real-world references. Minor stylization or smoothing may be present, but no factual or structural inaccuracies are evident. Core criteria are fully met. VERDICT: PASS

Medium 33%
A traditional Japanese shamisen instrument with accurate thrFAIL
A traditional Japanese shamisen instrument with accurate three-stringed design, wooden body, and bachi plectrum

Evaluator (Qwen 3.6 27B)

The image depicts a shamisen, but it contains significant factual inaccuracies. The tuning pegs are incorrectly shaped like shuttlecocks (badminton birdies), which is not a real-world design. The body’s edge inlay appears garbled and nonsensical rather than traditional wood or lacquer patterns. While the three strings and bachi plectrum are present, the overall construction lacks authenticity and realism, failing to meet the “accurate” requirement. VERDICT: FAIL

A medieval astrolabe with accurate rete, tympan, and engraveFAIL
A medieval astrolabe with accurate rete, tympan, and engraved degree markings on brass

Evaluator (Qwen 3.6 27B)

The image depicts a brass astrolabe, but the text and symbols are nonsensical gibberish rather than accurate Latin or Arabic inscriptions typical of medieval instruments. The rete (star map) lacks recognizable constellations, and the degree markings appear inconsistent or decorative rather than functional. While the overall form resembles an astrolabe, the lack of factual accuracy in engravings and astronomical details fails the prompt’s requirement for an “accurate” medieval astrolabe. VERDICT

A Japanese kintsugi bowl — cracked ceramic repaired with visPASS
A Japanese kintsugi bowl — cracked ceramic repaired with visible gold lacquer lines

Evaluator (Qwen 3.6 27B)

The image depicts a ceramic bowl with visible cracks repaired using gold lacquer, which is the defining characteristic of kintsugi. The texture of the glaze, the specific pattern of the cracks (crazing), and the application of the gold repair are visually consistent with real-world examples of this art form. The lighting and depth of field suggest a high-quality photograph of a physical object. There are no architectural or factual inaccuracies regarding the subject matter. VERDICT: PASS

Hard 0%
A detailed diagram of the water cycle showing evaporation frFAIL
A detailed diagram of the water cycle showing evaporation from the ocean, condensation forming clouds, precipitation as rain, and collection in rivers flowing back to the sea

Evaluator (Qwen 3.6 27B)

The image depicts the water cycle with recognizable elements like evaporation, condensation, and precipitation. However, it contains significant factual errors in its text labels. The title is gibberish, "EVAPORATION" is misspelled as "EVAPORATIO," and "PRECIPITATION" is incorrectly labeled as "RAINI." These inaccuracies make the diagram factually incorrect. VERDICT: FAIL

A cross-section showing tectonic plates at a convergent bounFAIL
A cross-section showing tectonic plates at a convergent boundary with subduction, a trench, volcanic arc, and layered mantle

Evaluator (Qwen 3.6 27B)

The image depicts the correct geological concept of a convergent boundary with subduction, a trench, and a volcanic arc. However, it fails significantly on factual accuracy due to severe text hallucinations. Labels are garbled nonsense (e.g., "SUDULTION" for Subduction, "RENCH" for Trench, "VOCANIC ACR" for Volcanic Arc, "ACOOI" for Oceanic Crust). These errors make the diagram scientifically inaccurate and confusing. VERDICT: FAIL

An anatomical illustration of the human heart showing four cFAIL
An anatomical illustration of the human heart showing four chambers, valves, aorta, pulmonary arteries, and veins with correct placement

Evaluator (Qwen 3.6 27B)

The image is a stylized diagram of a heart, but it fails the accuracy criteria. The labels are nonsensical gibberish (e.g., "Neltuill," "Aigtim," "Priruptus Valve") instead of correct anatomical terms. Furthermore, the anatomical placement is incorrect; for instance, the vessel labeled "Arta" (presumably Aorta) is depicted emerging from the right side of the heart (blue side), whereas the aorta connects to the left ventricle. The pulmonary artery is also misplaced. VERDICT: FAIL

Extreme 67%
A Morpho menelaus butterfly with accurate iridescent blue doPASS
A Morpho menelaus butterfly with accurate iridescent blue dorsal wings, brown ventral wing pattern with eyespots, and correct body morphology

Evaluator (Qwen 3.6 27B)

The image depicts a Morpho butterfly with the characteristic iridescent blue dorsal wings and brown ventral patterns with eyespots. The body morphology, including the antennae and thorax, appears correct. The coloration and pattern are consistent with the species description. The presentation in a display case is realistic. There are no obvious factual or architectural inaccuracies regarding the subject matter. VERDICT: PASS

The Hagia Sophia interior showing the massive central dome, FAIL
The Hagia Sophia interior showing the massive central dome, pendentives, semi-domes, and Islamic calligraphy medallions alongside Byzantine mosaics

Evaluator (Qwen 3.6 27B)

The image depicts the Hagia Sophia interior with recognizable elements: the massive central dome, pendentives with angelic figures, semi-domes, and large Islamic calligraphy medallions. However, the Byzantine mosaics are not clearly visible or accurately rendered — instead, generic decorative patterns dominate. The architectural structure is mostly correct but lacks fine historical detail. Some lighting and texture appear AI-enhanced or stylized. VERDICT: FAIL

A specimen of bismuth crystal showing accurate iridescent oxPASS
A specimen of bismuth crystal showing accurate iridescent oxide layers, geometric hopper crystal formation, and characteristic staircase structure

Evaluator (Qwen 3.6 27B)

The image depicts a mineral specimen that closely resembles bismuth, featuring the characteristic geometric hopper crystal formation and staircase structure. The iridescent oxide layers are vividly rendered with accurate rainbow hues. However, the surrounding matrix appears to be composed of quartz-like crystals, which is not typical for pure bismuth specimens but could represent a composite or artistic interpretation. Despite this minor deviation, the core features of the bismuth crystal are re

Professional Studio 89%

Camera & LightingClick to expand83%
Easy 100%
A wide-angle shot of a modern living room interior showing ePASS
A wide-angle shot of a modern living room interior showing expansive space with slight barrel distortion at the edges

Evaluator (Gemma 4 26B)

The image successfully implements a wide-angle perspective with a distinct circular vignette and heavy barrel distortion at the edges, creating a fisheye-like effect. The lighting shows a modern interior with recessed ceiling lights and natural light from the large glass doors, matching the "modern living room" description. The depth of field is deep, keeping the entire room in focus as expected for a wide-angle architectural shot. VERDICT: PASS

A close-up macro shot of a single dewdrop on a rose petal, sPASS
A close-up macro shot of a single dewdrop on a rose petal, shallow depth of field

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. It features a close-up macro shot of a single, prominent dewdrop on a pink rose petal. The depth of field is shallow, with the central dewdrop and the immediate petal in sharp focus while the background and surrounding petals are softly blurred. The lighting is natural and even, highlighting the texture of the petal and the clarity of the water droplet. VERDICT: PASS

An aerial bird's-eye view looking straight down at a trafficPASS
An aerial bird's-eye view looking straight down at a traffic roundabout with cars and trees

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. It features a true aerial bird's-eye view looking straight down at a traffic roundabout. The perspective is top-down, showing the circular road, the central island with trees, and various cars moving through the intersection. The lighting is even and natural, consistent with a high-angle daylight shot. VERDICT: PASS

Medium 67%
A portrait lit with classic Rembrandt lighting — triangle ofFAIL
A portrait lit with classic Rembrandt lighting — triangle of light on the shadowed cheek, moody atmosphere

Evaluator (Gemma 4 26B)

The image features a portrait with a moody atmosphere and a shallow depth of field. However, it fails the specific lighting requirement. Rembrandt lighting is characterized by a small triangle of light on the shadowed cheek. In this image, the light is a broad, bright patch on the lit side of the face, and there is no distinct light triangle on the shadowed side. VERDICT: FAIL

A landscape photograph bathed in golden hour light — long waPASS
A landscape photograph bathed in golden hour light — long warm shadows, orange sky, soft glow

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. It features a landscape with a deep sense of depth through layered hills. The lighting is a textbook example of golden hour: the sky is a vibrant orange, there is a soft, hazy glow throughout the atmosphere, and long, dramatic shadows are cast across the foreground by the low-angle sunlight. VERDICT: PASS

A high-key portrait of a person against a bright white backgPASS
A high-key portrait of a person against a bright white background, minimal shadows, airy and luminous

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. It features a high-key portrait with a bright, white, and overexposed background. The lighting is soft and airy, creating a luminous effect with minimal shadows on the subject's face. The depth of field is shallow, focusing on the eyes while the rest of the image remains soft and bright, adhering to the requested aesthetic. VERDICT: PASS

Hard 67%
A portrait shot at 85mm f/1.4 with creamy bokeh background, PASS
A portrait shot at 85mm f/1.4 with creamy bokeh background, backlit with warm rim light separating the subject from a blurred garden

Evaluator (Gemma 4 26B)

The image perfectly matches the technical specifications. The portrait features a shallow depth of field with a very creamy, blurred bokeh background, consistent with an 85mm f/1.4 lens. The lighting is clearly backlit, evidenced by the bright, warm rim light highlighting the subject's hair and shoulders, effectively separating her from the blurred garden background. VERDICT: PASS

A street photograph with shallow depth of field as if shot aFAIL
A street photograph with shallow depth of field as if shot at 50mm f/1.8, subject sharp with soft city lights in the background

Evaluator (Gemma 4 26B)

The image fails to meet the core requirement of the prompt. While it features a shallow depth of field and bokeh, the prompt specifically requested a "subject sharp" with a blurred background. In this image, the entire scene, including the subjects in the foreground and midground, is completely out of focus and blurry. There is no sharp subject to anchor the composition. VERDICT: FAIL

A food photograph shot from above at 35mm f/2.8 with naturalPASS
A food photograph shot from above at 35mm f/2.8 with natural window light from the left, slight vignette, warm color temperature

Evaluator (Gemma 4 26B)

The image matches the prompt's requirements. The perspective is a top-down shot (flat lay). The lighting is directional, coming from the upper-left, creating soft shadows on the right side of the food. The color temperature is warm, and there is a subtle vignette around the edges of the frame. The depth of field is relatively shallow, consistent with an f/2.8 aperture. VERDICT: PASS

Extreme 100%
A tilt-shift photograph of a city street making real-life loPASS
A tilt-shift photograph of a city street making real-life look like a miniature model — selective focus strip, saturated colors

Evaluator (Gemma 4 26B)

The image successfully implements the tilt-shift effect. It features a distinct selective focus strip in the center of the street, with heavy blurring at the top and bottom to create a miniature model appearance. The colors are highly saturated, and the high-angle perspective enhances the toy-like aesthetic described in the prompt. VERDICT: PASS

A cinematic frame with blue and orange anamorphic lens flarePASS
A cinematic frame with blue and orange anamorphic lens flare streaking horizontally across a dark moody scene

Evaluator (Gemma 4 26B)

The image features prominent blue and orange light streaks that run horizontally across a dark background, accurately capturing the "blue and orange anamorphic lens flare" described. The streaks create a sense of depth and cinematic atmosphere within the dark, moody scene. The lighting pattern and horizontal orientation perfectly match the prompt's requirements. VERDICT: PASS

A portrait with dramatic split lighting — half the face brigPASS
A portrait with dramatic split lighting — half the face brightly lit, the other half in deep shadow, high contrast chiaroscuro

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. It features a close-up portrait with dramatic split lighting. One side of the face is brightly lit, while the other side is cast in deep, heavy shadow, creating a high-contrast chiaroscuro effect. The lighting pattern is sharp and distinct, dividing the face vertically as requested. VERDICT: PASS

Color PrecisionClick to expand100%
Easy 100%
A bright red sports car parked on a gray asphalt road — the PASS
A bright red sports car parked on a gray asphalt road — the red is vivid and unambiguous

Evaluator (Gemma 4 26B)

The image features a bright, vivid red sports car parked on a gray asphalt road. The red color is highly saturated, intense, and unambiguous, perfectly matching the prompt's description. The road is a clear, textured gray. All color requirements specified in the prompt are met accurately. VERDICT: PASS

A woman wearing a royal blue evening dress against a neutralPASS
A woman wearing a royal blue evening dress against a neutral background

Evaluator (Gemma 4 26B)

The woman is wearing a vibrant, deep blue dress that perfectly matches the description of "royal blue." The background is a solid, light beige/off-white color, which qualifies as a "neutral background." All color requirements from the prompt are accurately represented in the image. VERDICT: PASS

A field of pure sunflower yellow flowers under a clear blue PASS
A field of pure sunflower yellow flowers under a clear blue sky

Evaluator (Gemma 4 26B)

The image features a field of bright yellow sunflowers and a clear, vibrant blue sky. The colors perfectly match the prompt's description of "pure sunflower yellow flowers" and a "clear blue sky." There are no deviations in the color-coding or the requested hues. VERDICT: PASS

Medium 100%
A bedroom wall painted in #FF6B35 (burnt orange) with white PASS
A bedroom wall painted in #FF6B35 (burnt orange) with white furniture and natural daylight

Evaluator (Gemma 4 26B)

The image accurately reflects the prompt's requirements. The wall is a vibrant burnt orange that matches the hex code #FF6B35. The furniture, including the bed frame, nightstands, and lamp, is white. The scene is illuminated by natural daylight coming from the window. VERDICT: PASS

A forest scene where the foliage is precisely #2D5F2D (dark PASS
A forest scene where the foliage is precisely #2D5F2D (dark muted green) with brown tree trunks

Evaluator (Gemma 4 26B)

The foliage in the image consists of a muted, dark green color that aligns well with the description of #2D5F2D. The tree trunks are a consistent reddish-brown. The image also explicitly includes the text "#2D5F2D" in the bottom left corner, confirming the intended color match. VERDICT: PASS

A room with walls in #4A90D9 (medium blue) with warm wooden PASS
A room with walls in #4A90D9 (medium blue) with warm wooden floor and white trim

Evaluator (Gemma 4 26B)

The image accurately reflects the prompt's requirements. The walls are a medium blue that matches the hex code #4A90D9 (a bright, sky-blue shade). The floor is a warm-toned wood, and the trim (baseboard and crown molding) is white. All color specifications are met perfectly. VERDICT: PASS

Hard 100%
A minimalist room with walls in #F5E6CC (warm cream), a sofaPASS
A minimalist room with walls in #F5E6CC (warm cream), a sofa in #2C3E50 (dark navy), and accent pillows in #E74C3C (bright red)

Evaluator (Gemma 4 26B)

The image accurately reflects the specified colors. The walls are a warm cream color matching #F5E6CC, the sofa is a dark navy color matching #2C3E50, and the accent pillows are a bright red matching #E74C3C. The hex codes are even explicitly labeled on the objects within the image to confirm the match. VERDICT: PASS

A product flat-lay on a #1A1A2E (deep dark blue) background,PASS
A product flat-lay on a #1A1A2E (deep dark blue) background, with a #F0C27F (soft gold) watch and #FFFFFF (pure white) earbuds

Evaluator (Gemma 4 26B)

The image follows the color requirements well. The background is a deep dark blue, consistent with #1A1A2E. The watch features a soft gold/beige color, matching the #F0C27F description. The earbuds and their charging case are pure white (#FFFFFF). While the watch face contains a text artifact ("#00C27F"), the actual visual colors of the objects accurately reflect the requested hex values and color names. VERDICT: PASS

A graphic design composition with background #FAFAFA (near-wPASS
A graphic design composition with background #FAFAFA (near-white), main shape #FF4500 (orange-red), and accent lines #333333 (dark charcoal)

Evaluator (Gemma 4 26B)

The image accurately reflects the requested color palette. The background is a near-white color matching #FAFAFA. The main central shape is a vibrant orange-red, consistent with #FF4500. The accent lines are a dark charcoal/dark grey, matching the #333333 specification. The text overlays within the image also confirm these specific hex codes, and the visual colors align perfectly with the prompt's requirements. VERDICT: PASS

Extreme 100%
A smooth gradient on a wall going from #FF0000 (red) on the PASS
A smooth gradient on a wall going from #FF0000 (red) on the left to #0000FF (blue) on the right, transitioning evenly through purple

Evaluator (Gemma 4 26B)

The image shows a smooth gradient transitioning from a bright red on the left to a bright blue on the right, passing through a purple middle section. The left side matches the red color (#FF0000) and the right side matches the blue color (#0000FF) specified in the prompt. The transition is even and follows the requested color scheme perfectly. VERDICT: PASS

A sunset sky gradient on a flat surface from #FF6B35 (orangePASS
A sunset sky gradient on a flat surface from #FF6B35 (orange) at the bottom to #1B1464 (deep indigo) at the top

Evaluator (Gemma 4 26B)

The image features a vertical gradient. At the bottom, there is a vibrant orange color, which aligns well with the requested #FF6B35. Moving upwards, the colors transition through pink and purple tones before reaching a bright, saturated blue at the top. While the top color is a bright blue rather than the much darker, deep indigo (#1B1464) requested, the overall color scheme and the bottom color match the prompt's intent. VERDICT: PASS

A horizontal banner with a gradient from #00C9FF (cyan) on tPASS
A horizontal banner with a gradient from #00C9FF (cyan) on the left to #92FE9D (mint green) on the right

Evaluator (Gemma 4 26B)

The image displays a horizontal banner with a gradient. On the left, the color is a bright cyan, which matches the hex code #00C9FF. On the right, the color is a bright lime/mint green, which matches the hex code #92FE9D. The transition between these two colors is a smooth gradient as requested. VERDICT: PASS

PhotorealismClick to expand67%
Easy 67%
A single red sneaker on a clean white background with soft sPASS
A single red sneaker on a clean white background with soft studio lighting and gentle shadow — product photography style

Evaluator (Gemma 4 26B)

The image successfully meets the prompt's requirements. The red sneaker is presented on a clean white background with soft, professional studio lighting and a natural, gentle shadow underneath. The texture of the fabric and the details of the laces and sole appear realistic and consistent. There are no obvious AI artifacts, such as warped geometry or nonsensical textures, and it successfully passes as a high-quality product photograph. VERDICT: PASS

A clear glass perfume bottle on a reflective black surface wFAIL
A clear glass perfume bottle on a reflective black surface with studio lighting, no background distractions

Evaluator (Gemma 4 26B)

The image contains significant AI artifacts that prevent it from passing as a real photograph. The most obvious issue is the illogical geometry of the glass bottle and the cap; the edges are warped, inconsistent, and physically impossible. The internal straw (dip tube) is strangely shaped and does not align naturally with the bottle's structure. Additionally, the reflections and light refractions through the glass are nonsensical and lack the precision of a real studio shot. VERDICT: FAIL

A ripe green apple on a seamless white background with soft PASS
A ripe green apple on a seamless white background with soft side lighting and subtle shadow — catalog style

Evaluator (Gemma 4 26B)

The image successfully captures the requested studio aesthetic. The lighting is soft and directional, creating a natural-looking shadow beneath the apple. The texture of the skin, including the lenticels (small white spots), appears realistic and consistent. There are no obvious AI artifacts, such as unnatural warping or nonsensical textures. It looks like a high-quality, professional catalog photograph. VERDICT: PASS

Graphical design 54%

Data VisualisationClick to expand0%
Easy 0%
A bar chart showing monthly sales for January through June wFAIL
A bar chart showing monthly sales for January through June with values 12K, 18K, 15K, 22K, 19K, 25K — labeled axes, color-coded bars, and a title 'Monthly Sales 2025'

Evaluator (Qwen 3 VL)

The chart misrepresents the requested data: it shows multiple overlapping bars per month (not a simple bar chart), incorrect month labels (e.g., “Orodg” for March), and distorted values (e.g., 18K for January, not 12K). The legend and title are present but misleading. The chart type does not match the requested bar chart format. VERDICT: FAIL

A pie chart showing market share: Company A 35%, Company B 2FAIL
A pie chart showing market share: Company A 35%, Company B 25%, Company C 20%, Others 20% — each slice a different color with labels and percentages

Evaluator (Qwen 3 VL)

The pie chart misrepresents data: "Company B" is incorrectly labeled on two slices (25% and 20%), while "Company A" appears only on a blue slice (35%) but is mislabeled as "Company B" on the legend. "Others" is 20% but mislabeled as "Company B" in one callout. The chart type is correct, but critical labeling errors render it inaccurate. VERDICT: FAIL

A simple line graph showing daily website visitors over one FAIL
A simple line graph showing daily website visitors over one week (Mon-Sun): 1200, 1500, 1100, 1800, 2000, 1700, 1400 — with grid lines, axis labels, and title 'Weekly Traffic'

Evaluator (Qwen 3 VL)

The chart misrepresents data: y-axis labels are reversed and illegible, x-axis labels are garbled (e.g., “Weny” for Mon, “Fur” for Wed), and the title is corrupted. Values shown (1200, 1500, 1100, 1800, 2000, 1700, 1400) are incorrect — the prompt specified 1200, 1500, 1100, 1800, 2000, 1700, 1400, but the y-axis scale is inverted. The chart type is correct, but critical elements are unreadable and inaccurate. VERDICT: FAIL

Layout & DesignClick to expand33%
Easy 67%
A movie night poster with a large popcorn image in the centePASS
A movie night poster with a large popcorn image in the center, the title 'MOVIE NIGHT' in bold at the top, and the tagline 'Every Friday at 8 PM' at the bottom

Evaluator (Qwen 3 VL)

The image displays "MOVIE NIGHT" in bold at the top, a large popcorn graphic centered below it, and "Every Friday at 8 PM" at the bottom. Text is readable and correctly placed per the prompt. Visual hierarchy is clear, with title dominant, popcorn central, and tagline at base. All required layout elements are present and correctly positioned. VERDICT: PASS

An event poster featuring a guitar illustration, the headlinPASS
An event poster featuring a guitar illustration, the headline 'LIVE MUSIC FEST' in large text, and 'Tickets available now' as a tagline below

Evaluator (Qwen 3 VL)

The poster includes a guitar illustration, the headline "LIVE MUSIC FEST" in large, bold text, and the tagline "Tickets available now" below. Text is readable and correctly placed. Visual hierarchy is clear with headline dominating, tagline secondary, and guitar as a supporting graphic. All required elements are present and correctly formatted. VERDICT: PASS

A book club poster with an open book illustration, the titleFAIL
A book club poster with an open book illustration, the title 'JOIN THE BOOK CLUB' prominently displayed, and 'Meets every Wednesday' as the tagline

Evaluator (Qwen 3 VL)

The poster displays "JOIN THE BOOK CLUB" prominently at the top with correct hierarchy. An open book illustration is centered below. The tagline "Meets every Wednesday" is present but below it, placeholder text is garbled and unreadable. The core layout elements are present, but critical text is corrupted. VERDICT: FAIL

Medium 0%
A landing page design with a hero header image at the top, aFAIL
A landing page design with a hero header image at the top, a headline and subtitle in the middle, a 'Sign Up' call-to-action button, and a footer with three social media icons

Evaluator (Qwen 3 VL)

All layout zones are present: hero header, headline/subtitle, “Sign Up” button, and footer with three social icons. Visual hierarchy is correct. However, all text is gibberish and unreadable, violating the requirement for readable text. The core requirement (readable text) is not met. VERDICT: FAIL

A newsletter layout with a banner image header, a two-columnFAIL
A newsletter layout with a banner image header, a two-column body section with text on the left and an image on the right, a highlighted quote block, and a footer with unsubscribe link

Evaluator (Qwen 3 VL)

The image shows a newsletter layout with a header banner, two-column body (text left, image right), a highlighted quote block, and a footer with an unsubscribe link. However, all text is gibberish and unreadable, violating the requirement for readable, correctly placed text. The visual hierarchy is present but meaningless due to illegible content. VERDICT: FAIL

A restaurant menu layout with a logo header, three category FAIL
A restaurant menu layout with a logo header, three category sections (appetizers, mains, desserts) each with items and prices, and a footer with address and hours

Evaluator (Qwen 3 VL)

The image shows a menu with a logo header, three category sections (Appetizers, Mains, Desserts), and a footer with hours and address. Text is readable and correctly placed per layout. However, all item names and prices are gibberish, indicating AI-generated placeholder text. The core layout structure is intact, but content is invalid. VERDICT: FAIL

Hard 33%
A magazine double-page spread with a left page containing a FAIL
A magazine double-page spread with a left page containing a large hero photo, and a right page with a two-column article, a pull quote, a sidebar with author bio, and a page number

Evaluator (Qwen 3 VL)

The layout matches the prompt: left page has a large hero photo, right page has two columns, a pull quote, sidebar with author bio, and page number. Text is present but illegible due to AI artifacts—no readable content. Visual hierarchy is structurally correct, but textual elements fail functional readability. VERDICT: FAIL

A mobile app UI mockup showing a dashboard with a top navigaPASS
A mobile app UI mockup showing a dashboard with a top navigation bar, a search field, a 2x2 grid of feature cards with icons, a list of recent activity items, and a bottom tab bar

Evaluator (Qwen 3 VL)

The image shows a mobile app UI with a top navigation bar, search field, 2x2 grid of feature cards, recent activity list, and bottom tab bar — all present. Visual hierarchy is clear. Text is readable and correctly placed, though some Chinese characters appear garbled or nonsensical. Core layout elements are intact and correctly structured. VERDICT: PASS

A fashion magazine spread with a full-bleed left-page photo,FAIL
A fashion magazine spread with a full-bleed left-page photo, right page with three-column layout, headline, body copy, a smaller inset image, and styled page numbers

Evaluator (Qwen 3 VL)

Left page: full-bleed photo of model — present. Right page: three-column layout — present. Headline, body copy, inset image, and page number (22) — all visible. However, all text is gibberish, unreadable placeholder. Visual hierarchy is structurally correct but text fails functional requirement. VERDICT: FAIL

Style DiversityClick to expand83%
Easy 67%
A golden retriever sitting in a garden, painted in the stylePASS
A golden retriever sitting in a garden, painted in the style of an oil painting with visible brushstrokes and rich texture

Evaluator (Qwen 3 VL)

The image clearly depicts a golden retriever in a garden setting. Visible brushstrokes and textured application of paint are evident across the dog’s fur, grass, and foliage, mimicking traditional oil painting techniques. Color blending and impasto-like effects enhance the rich texture. The style aligns precisely with the requested oil painting aesthetic. VERDICT: PASS

A golden retriever sitting in a garden, rendered in JapaneseFAIL
A golden retriever sitting in a garden, rendered in Japanese anime style with flat colors and large expressive eyes

Evaluator (Qwen 3 VL)

The image shows a golden retriever in a garden, rendered in a Japanese anime style with large, expressive eyes and flat color areas. However, the fur has subtle shading and texture, deviating from the requested "flat colors." The style is recognizable but not fully aligned with the prompt’s specificity. VERDICT: FAIL

A golden retriever sitting in a garden, rendered in pixel arPASS
A golden retriever sitting in a garden, rendered in pixel art style with a 32x32 grid and limited color palette

Evaluator (Qwen 3 VL)

The image clearly depicts a golden retriever in a garden, rendered in a distinct pixel art style with visible 32x32 grid blocks and a limited, blocky color palette. Key characteristics—blocky forms, simplified textures, and low-resolution detail—are present. The subject and setting match the prompt’s description. VERDICT: PASS

Medium 100%
A city street scene in the style of Art Deco — geometric shaPASS
A city street scene in the style of Art Deco — geometric shapes, gold and black palette, ornamental symmetry

Evaluator (Qwen 3 VL)

The image clearly embodies Art Deco: bold geometric shapes, symmetrical composition, and a dominant gold-and-black palette. Buildings feature stepped forms, sunburst motifs, and stylized ornamentation. The street scene is framed by ornamental arches, and the foreground displays intricate Art Deco patterns. All requested stylistic elements are unmistakably present. VERDICT: PASS

A woman with a parasol in the style of Impressionism — soft PASS
A woman with a parasol in the style of Impressionism — soft brushstrokes, dappled light, pastel palette

Evaluator (Qwen 3 VL)

The image clearly depicts a woman with a parasol rendered in Impressionism. Soft, visible brushstrokes, dappled light effects in sky and dress, and a pastel palette (blues, yellows, greens) are all present. The style is unmistakably recognizable as Impressionist, matching the prompt’s key visual requirements. VERDICT: PASS

A portrait of a man in Pop Art style — bold outlines, Ben-DaPASS
A portrait of a man in Pop Art style — bold outlines, Ben-Day dots, flat vivid primary colors

Evaluator (Qwen 3 VL)

The image clearly exhibits Pop Art style: bold black outlines, Ben-Day dot patterns, and flat, vivid primary colors (pink, blue, yellow). Facial features are stylized with high contrast and saturated hues. All requested visual characteristics are unmistakably present. VERDICT: PASS

Hard 67%
A forest rendered in photorealistic watercolor — the detail PASS
A forest rendered in photorealistic watercolor — the detail and lighting of a photo combined with visible watercolor bleeds and paper texture

Evaluator (Qwen 3 VL)

The image shows a forest with visible watercolor bleeds, soft gradients, and textured paper grain. Tree details are rendered with painterly softness, yet retain photographic depth and lighting—sunlight filtering through canopy with realistic shadows. The style blends photorealistic composition with watercolor’s fluid, translucent qualities. All requested characteristics are clearly present. VERDICT: PASS

A woman in a futuristic city rendered in cyberpunk Art NouvePASS
A woman in a futuristic city rendered in cyberpunk Art Nouveau — neon colors and tech elements combined with flowing organic lines and floral borders

Evaluator (Qwen 3 VL)

The image clearly blends cyberpunk (neon lights, tech gear, futuristic city) with Art Nouveau (flowing organic lines, floral borders, golden decorative frame). The woman’s flowing hair and ornate frame match the requested style. Neon colors and tech elements are prominent. All key visual characteristics are present and well-integrated. VERDICT: PASS

A medieval castle depicted in Japanese ukiyo-e woodblock priFAIL
A medieval castle depicted in Japanese ukiyo-e woodblock print style combined with photorealistic sky and lighting

Evaluator (Qwen 3 VL)

The image blends ukiyo-e style with photorealistic sky: bold outlines, flat colors, and dramatic gradient sky. However, the castle’s architecture is distinctly Japanese (castle-style), not medieval. The prompt requested a medieval castle, which is absent. The style fusion is visually present but misaligned with the core subject. VERDICT: FAIL

Extreme 100%
An interior scene rendered in soft brutalism — raw concrete PASS
An interior scene rendered in soft brutalism — raw concrete textures combined with rounded pastel furniture and warm diffused light

Evaluator (Qwen 3 VL)

The image clearly shows raw concrete structures with exposed aggregate and organic forms, matching soft brutalism. Furniture is rounded, pastel-colored, and minimalist. Warm, diffused lighting enhances the cozy, sculptural feel. All requested elements are distinctly present and harmoniously integrated. VERDICT: PASS

A still life painted in anxious minimalism — sparse compositPASS
A still life painted in anxious minimalism — sparse composition with uncomfortable negative space, muted colors, and subtle visual tension

Evaluator (Qwen 3 VL)

The image displays a sparse still life with a single object in the lower-left corner, surrounded by vast, muted beige negative space. Colors are desaturated and restrained, creating visual tension through isolation and implied imbalance. The composition is minimalist, and the object’s stark placement evokes unease. The style aligns with “anxious minimalism” as requested. VERDICT: PASS

A cityscape rendered in nostalgic futurism — retro-futuristiPASS
A cityscape rendered in nostalgic futurism — retro-futuristic 1960s space-age aesthetics with a melancholic warm-toned patina

Evaluator (Qwen 3 VL)

The image displays unmistakable retro-futurism: sleek, curved, and spire-topped architecture evoking 1960s space-age design. Warm, faded sepia tones and soft lighting create a melancholic patina. Key visual elements — organic forms, verticality, and nostalgic color grading — are clearly present and align with the prompt’s aesthetic. VERDICT: PASS