ImageBench

ImageBench V1 —

192 evaluations across 6 categories

162
Pass
30
Fail
84%
Pass Rate
14.1s
Avg Latency
7.7s
Min Latency
38.0s
Max Latency
Text Rendering93%Spatial Reasoning79%Human realism91%Truthfulness82%Professional Studio78%Graphical design92%Latency22%
The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink
Text RenderingTypography StyleEasyfal/bytedance/seedream-v4

Prompt: The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink

Text Rendering 93%

Typography StyleClick to expand100%
Easy 100%
The word 'CHAPTER ONE' typed on aged paper with a vintage tyPASS
The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink

Evaluator (Qwen 3 VL)

Text reads "CHAPTER ONE" correctly. Font mimics vintage typewriter style with uneven, slightly smudged ink and distressed paper texture. Visible typewriter keys and carriage reinforce the medium. All prompt elements are convincingly rendered. VERDICT: PASS

The phrase 'TODAY'S SPECIAL' written in white chalk on a darPASS
The phrase 'TODAY'S SPECIAL' written in white chalk on a dark green chalkboard with chalk dust

Evaluator (Qwen 3 VL)

Text reads "TODAY'S SPECIAL" correctly. Letters show chalk texture, smudges, and dust particles on dark green board. Chalk dust is visible around letters and on ledge. Style convincingly mimics hand-written chalk. All elements match prompt. VERDICT: PASS

The word 'OPEN' displayed on a red LED sign with glowing pixPASS
The word 'OPEN' displayed on a red LED sign with glowing pixel-style letters

Evaluator (Qwen 3 VL)

The word "OPEN" is rendered in glowing red pixel-style letters on a matrix display, matching the LED sign description. The spelling is correct. The pixelated, blocky font convincingly simulates an LED screen. VERDICT: PASS

Writing accuracyClick to expand92%
Easy 100%
A single large word 'HELLO' centered on a white backgroundPASS
A single large word 'HELLO' centered on a white background

Evaluator (Qwen 3 VL)

The word "HELLO" is fully spelled correctly and clearly readable in bold black letters on a white background. All five letters are intact and legible. No misspellings, missing words, or illegible parts are present. The text matches the intended specification exactly. VERDICT: PASS

The word 'ADVENTURE' in big bold letters centered on a plainPASS
The word 'ADVENTURE' in big bold letters centered on a plain cream background

Evaluator (Qwen 3 VL)

The word "ADVENTURE" is spelled correctly and fully readable in bold, centered text on a cream background. All letters are intact and legible. No misspellings or missing parts. The text matches the intended specification exactly. VERDICT: PASS

The word 'SERENITY' written large and centered on a solid liPASS
The word 'SERENITY' written large and centered on a solid light blue background

Evaluator (Qwen 3 VL)

The word "SERENITY" is fully spelled correctly and clearly legible in white, centered on a solid light blue background. No misspellings, missing letters, or illegible parts are present. The text matches the intended specification exactly. VERDICT: PASS

Medium 100%
The sentence 'The quick brown fox jumps high' in clean blackPASS
The sentence 'The quick brown fox jumps high' in clean black text on a white background

Evaluator (Qwen 3 VL)

All words are spelled correctly and fully readable: "The", "quick", "brown", "fox", "jumps", "high". No misspellings or illegible parts. Text matches intended phrase exactly. Contrast is clean, font is legible. VERDICT: PASS

'Every sunset brings a new dawn ahead' written neatly on a pPASS
'Every sunset brings a new dawn ahead' written neatly on a pale yellow background

Evaluator (Qwen 3 VL)

The text reads: "Every sunset brings a new dawn ahead". All words are spelled correctly and fully readable against the pale yellow background. No misspellings or illegible parts detected. The phrase matches the intended text exactly. VERDICT: PASS

'Stars shine brightest in the dark night' displayed on a clePASS
'Stars shine brightest in the dark night' displayed on a clean off-white background

Evaluator (Qwen 3 VL)

Text reads: "Stars shine brightest in the dark night" — matches intended phrase exactly. All words are spelled correctly, fully visible, and legible. Background is clean off-white as specified. No errors detected. VERDICT: PASS

Hard 100%
A poster with headline 'SUMMER SALE' in large bold text, bodPASS
A poster with headline 'SUMMER SALE' in large bold text, body text 'Up to 50% off all items this weekend only' in medium size, and fine print 'Terms and conditions apply. See store for details.' at the bottom in small text

Evaluator (Qwen 3 VL)

All text matches the intended content exactly. "SUMMER SALE", "Up to 50% off all items this weekend only", and "Terms and conditions apply. See store for details." are all spelled correctly and fully readable. No misspellings or illegible parts detected. VERDICT: PASS

A flyer with large title 'GRAND OPENING', subtitle 'Join us PASS
A flyer with large title 'GRAND OPENING', subtitle 'Join us for food, music, and fun' in medium text, and small disclaimer 'RSVP required. Limited seating available.' at the bottom

Evaluator (Qwen 3 VL)

All text matches the intended message exactly: "GRAND OPENING", "Join us for food, music, and fun", and "RSVP required. Limited seating available." Every word is spelled correctly and fully readable. No misspellings or illegible parts detected. VERDICT: PASS

An event banner with headline 'ANNUAL GALA 2025' at the top PASS
An event banner with headline 'ANNUAL GALA 2025' at the top in large serif font, description 'An evening of fine dining and live jazz' in medium text, and footnote 'Black tie optional. Doors open at 6 PM.' in small text

Evaluator (Qwen 3 VL)

All text matches the intended content exactly: "ANNUAL GALA 2025", "An evening of fine dining and live jazz", and "Black tie optional. Doors open at 6 PM.". No misspellings, missing words, or illegible parts detected. Text is fully readable and accurate. VERDICT: PASS

Extreme 67%
A vintage movie poster for 'THE LAST VOYAGE' featuring a tagFAIL
A vintage movie poster for 'THE LAST VOYAGE' featuring a tagline 'One ship. One chance. No turning back.' with credits block reading 'Directed by James Harlow. Starring Elena Voss, Marcus Chen, and Sofia Reyes. Music by Thomas Grant. A Paramount Pict

Evaluator (Qwen 3 VL)

Text is mostly correct but contains errors: “Music by Thomas Grant. A Paramount Grant.” incorrectly repeats “Grant” instead of “Pictures.” All other text matches the intended script. The misspelling “Voss” (correctly spelled) is present, but “Grant” is duplicated. VERDICT: FAIL

A magazine cover titled 'NATIONAL EXPLORER' with cover linesPASS
A magazine cover titled 'NATIONAL EXPLORER' with cover lines 'The Hidden Temples of Cambodia — page 34', 'Interview: Dr. Maria Santos on Climate Action', 'Top 10 Trails You Must Hike This Year', and a barcode area with 'Issue 247 | March 2025 | $6.99

Evaluator (Qwen 3 VL)

All text matches the intended content exactly: "NATIONAL EXPLORER", "The Hidden Temples of Cambodia — page 34", "Interview: Dr. Maria Santos on Climate Action", "Top 10 Trails You Must Hike This Year", and "Issue 247 | March 2025 | $6.99". No misspellings, missing words, or illegible parts detected. Text is fully readable and accurate. VERDICT: PASS

A product label for 'GOLDEN HARVEST ORGANIC HONEY' with subtPASS
A product label for 'GOLDEN HARVEST ORGANIC HONEY' with subtitle 'Raw & Unfiltered', weight '12 oz (340g)', description 'Pure wildflower honey harvested from sustainable apiaries in the Pacific Northwest. No additives or preservatives.', and small te

Evaluator (Qwen 3 VL)

All text matches the intended content exactly. Spelling is correct, all words are present, and every line is fully readable. No misspellings, missing words, or illegible parts detected. The label accurately reproduces the specified text with perfect fidelity. VERDICT: PASS

Spatial Reasoning 79%

Attributes BindingClick to expand100%
Easy 100%
A red ball and a blue cube sitting side by side on a white sPASS
A red ball and a blue cube sitting side by side on a white surface

Evaluator (Qwen 3.5 122B)

The image shows two objects on a white surface. On the left is a red sphere (ball). On the right is a blue cube. Both objects are correctly colored and shaped as specified. They are positioned side by side. All attributes (color, shape) are correctly bound to the right objects with no swaps. VERDICT: PASS

A green apple and an orange pumpkin on a wooden cutting boarPASS
A green apple and an orange pumpkin on a wooden cutting board

Evaluator (Qwen 3.5 122B)

The image shows a green apple and an orange pumpkin placed on a wooden cutting board. The apple is correctly green, round, and has a stem. The pumpkin is orange, ribbed, and has a short stem. Both objects are appropriately sized relative to each other and the board. No attribute swaps or misidentifications are present. VERDICT: PASS

A yellow taxi and a black limousine parked next to each othePASS
A yellow taxi and a black limousine parked next to each other on a street

Evaluator (Qwen 3.5 122B)

The image shows two vehicles parked on a street. The vehicle on the left is a yellow taxi, matching the description in color and type. The vehicle on the right is a long black limousine, also matching the description in color and type. Both are parked next to each other as requested. No attributes are swapped or misassigned. VERDICT: PASS

Medium 100%
A large red circle, a small blue triangle, and a medium greePASS
A large red circle, a small blue triangle, and a medium green star arranged on a white canvas

Evaluator (Qwen 3.5 122B)

The image contains three distinct shapes on a white background. The large red circle is correctly rendered. The small blue triangle is correctly rendered. The medium green star is correctly rendered. All attributes (color, size, shape) are correctly bound to their respective objects with no swaps or errors. VERDICT: PASS

A tall thin white candle, a short fat red candle, and a mediPASS
A tall thin white candle, a short fat red candle, and a medium spiral yellow candle on a shelf

Evaluator (Qwen 3.5 122B)

The image shows three candles on a wooden shelf: 1. Left: Tall, thin, white candle — matches description. 2. Center: Short, fat, red candle — matches description. 3. Right: Medium-height, spiral-shaped, yellow candle — matches description. All attributes (color, size, shape) are correctly bound to each object with no swaps. VERDICT: PASS

A big brown horse, a small white rabbit, and a medium-sized PASS
A big brown horse, a small white rabbit, and a medium-sized black cat standing in a meadow

Evaluator (Qwen 3.5 122B)

The image shows a large brown horse standing in a meadow, a small white rabbit sitting on the grass, and a medium-sized black cat standing nearby. All attributes (color, size, shape) are correctly bound to each object with no swaps. The horse is big and brown, the rabbit is small and white, and the cat is medium-sized and black. VERDICT: PASS

Hard 100%
A pink fire truck, a blue banana, and a green sun in a surrePASS
A pink fire truck, a blue banana, and a green sun in a surreal landscape

Evaluator (Qwen 3.5 122B)

The image contains a pink fire truck with correct shape and features (ladder, wheels, cab). A large blue banana is present in the foreground with correct curved shape and stem. A green spiky sun appears in the sky with correct radial shape. All three requested objects are present with their specified colors and forms correctly bound — no attribute swaps observed. The surreal landscape context is also fulfilled. VERDICT: PASS

A purple carrot, a red tree with red leaves, and a white watPASS
A purple carrot, a red tree with red leaves, and a white watermelon sliced open

Evaluator (Qwen 3.5 122B)

The image shows: 1. A purple carrot with green leaves — correct color and object. 2. A red tree with red leaves — correct color and object. 3. A sliced watermelon with white flesh and green rind — matches “white watermelon sliced open” (though typically watermelons are red inside, the prompt specified white, so this is compliant). All attributes are correctly bound to their respective objects with no swaps. VERDICT: PASS

An orange elephant, a yellow penguin, and a silver strawberrPASS
An orange elephant, a yellow penguin, and a silver strawberry arranged on a table

Evaluator (Qwen 3.5 122B)

The image shows three objects on a wooden table: 1. An orange elephant figurine with white tusks and feet — matches “orange elephant” 2. A fluffy yellow penguin chick with black eyes, beak, and wingtips — matches “yellow penguin” 3. A metallic silver strawberry with textured surface and green calyx — matches “silver strawberry” All attributes (color, shape, object type) are correctly bound. No swaps or mismatches observed. VERDICT: PASS

CompositionalityClick to expand89%
Easy 67%
A tabby cat wearing oversized red sunglassesPASS
A tabby cat wearing oversized red sunglasses

Evaluator (Qwen 3.5 122B)

The image contains all described elements: 1. Tabby cat - Present (brown/black striped fur pattern) 2. Oversized red sunglasses - Present (large red frames with dark lenses) 3. Coherent combination - Present (glasses fit naturally on cat's face) All prompt requirements are met with proper integration of concepts. VERDICT: PASS

A golden retriever wearing a tiny cowboy hatFAIL
A golden retriever wearing a tiny cowboy hat

Evaluator (Qwen 3.5 122B)

The image shows a golden retriever sitting in a grassy field. The dog is wearing a cowboy hat. However, the hat is not "tiny" — it fits proportionally on the dog’s head, appearing normal-sized for a canine, not miniature or comically small as “tiny” would imply. While the core concept is present, the specific descriptor “tiny” is inaccurate. VERDICT: FAIL

A penguin holding a colorful umbrellaPASS
A penguin holding a colorful umbrella

Evaluator (Qwen 3.5 122B)

The image shows a penguin standing on snow with a blue sky and clouds in the background. The penguin has colorful wings (yellow, blue, red) and is holding a large, colorful umbrella (red, blue, yellow panels). All described elements — penguin, colorful umbrella, and their combination — are clearly present and coherently integrated. The unusual concept is fully realized. VERDICT: PASS

Medium 100%
A medieval knight in full armor sitting at a desk typing on PASS
A medieval knight in full armor sitting at a desk typing on a modern laptop

Evaluator (Qwen 3.5 122B)

The image shows a medieval knight in full armor sitting at a wooden desk. The knight is typing on a modern silver laptop (with an Apple logo). Scrolls and parchment are also present on the desk, enhancing the medieval setting. All described elements — knight, armor, desk, laptop — are clearly visible and coherently combined in a single scene. VERDICT: PASS

A Roman gladiator taking a selfie with a smartphone in the CPASS
A Roman gladiator taking a selfie with a smartphone in the Colosseum

Evaluator (Qwen 3.5 122B)

The image contains all described elements: 1. Roman gladiator - Present (wearing helmet, armor, and leather skirt) 2. Taking a selfie - Present (holding phone up in selfie pose) 3. Smartphone - Present (modern smartphone in hand) 4. Colosseum - Present (ancient Roman amphitheater architecture) All elements are coherently combined in a single scene with proper lighting and perspective. The anachronistic combination is executed logically. VERDICT: PASS

An astronaut in a spacesuit riding a bicycle through a parkPASS
An astronaut in a spacesuit riding a bicycle through a park

Evaluator (Qwen 3.5 122B)

The image shows an astronaut in a full white spacesuit with blue stripes, complete with helmet and backpack, riding a silver bicycle. The setting is a sunny park with green grass, trees, and colorful flowerbeds. All described elements — astronaut, spacesuit, bicycle, and park — are clearly present and coherently combined. The scene is visually consistent and logically assembled. VERDICT: PASS

Hard 100%
An octopus wearing a top hat and monocle playing chess on thPASS
An octopus wearing a top hat and monocle playing chess on the surface of the moon with Earth visible in the background

Evaluator (Qwen 3.5 122B)

The image contains all described elements: - Octopus: Present, large and detailed - Top hat: Present, black top hat on octopus - Monocle: Present, gold-rimmed eyewear on octopus - Chess: Present, wooden chessboard with one piece - Moon surface: Present, cratered lunar terrain - Earth in background: Present, blue planet visible in space All elements are coherently combined in a surreal but visually consistent composition. VERDICT: PASS

A dinosaur in a business suit giving a PowerPoint presentatiPASS
A dinosaur in a business suit giving a PowerPoint presentation in a modern office while drinking coffee

Evaluator (Qwen 3.5 122B)

The image contains all specified elements: 1. Dinosaur - Present (T-Rex head/body) 2. Business suit - Present (dark pinstripe suit with tie) 3. PowerPoint presentation - Present (projector screen with charts/graphs) 4. Modern office - Present (glass walls, conference table, laptop) 5. Drinking coffee - Present (holding steaming coffee mug) All elements are coherently combined in a single scene. The dinosaur is anthropomorphized appropriately for the business context. VERDICT: PASS

A robot dressed as a pirate sailing a paper boat on a lake oPASS
A robot dressed as a pirate sailing a paper boat on a lake of lava while juggling three kittens

Evaluator (Qwen 3.5 122B)

The image contains all described elements: - Robot dressed as a pirate: Present (wearing pirate hat, eye patch, striped shirt) - Paper boat: Present (white origami-style boat) - Lake of lava: Present (glowing orange/red molten surface with bubbles) - Juggling three kittens: Present (three fluffy kittens floating above robot’s hands in juggling motion) All components are coherently combined in a surreal, visually consistent scene. VERDICT: PASS

CountingClick to expand56%
Easy 100%
Exactly 3 red apples arranged on a wooden tablePASS
Exactly 3 red apples arranged on a wooden table

Evaluator (Qwen 3.5 122B)

The image displays exactly three red apples placed on a wooden table surface. The apples are arranged in a triangular formation with one in the foreground and two slightly behind. The count matches the requested quantity precisely. VERDICT: PASS

Exactly 5 yellow rubber ducks sitting in a row on a bathroomPASS
Exactly 5 yellow rubber ducks sitting in a row on a bathroom shelf

Evaluator (Qwen 3.5 122B)

The image shows exactly five yellow rubber ducks arranged in a row on a white bathroom shelf. Each duck is identical in appearance, with orange beaks and black eyes. The count matches the prompt’s request precisely. No extra or missing objects are present. The scene is clean, well-lit, and accurately reflects the described composition. VERDICT: PASS

Exactly 4 white candles standing on a dark surfacePASS
Exactly 4 white candles standing on a dark surface

Evaluator (Qwen 3.5 122B)

The image displays exactly four white candles arranged in a row on a dark, reflective surface. The count matches the requested quantity precisely. All candles are unlit and appear identical in shape and color. The dark background and surface provide strong contrast, highlighting the candles as intended. No extra or missing objects are present. VERDICT: PASS

Medium 67%
2 orange cats and 3 black dogs sitting together on a grassy PASS
2 orange cats and 3 black dogs sitting together on a grassy lawn

Evaluator (Qwen 3.5 122B)

The image shows three black dogs sitting in a row in the background and two orange cats sitting in front of them on a grassy lawn. The count of 3 black dogs and 2 orange cats exactly matches the requested prompt. All animals are clearly visible and correctly identified. VERDICT: PASS

4 red roses and 2 white daisies in a glass vase on a tablePASS
4 red roses and 2 white daisies in a glass vase on a table

Evaluator (Qwen 3.5 122B)

The image shows exactly 4 red roses and 2 white daisies arranged in a clear glass vase, placed on a wooden table. All requested objects are present in the correct quantities and context. The composition matches the prompt precisely. VERDICT: PASS

1 blue butterfly and 3 yellow butterflies resting on a gardeFAIL
1 blue butterfly and 3 yellow butterflies resting on a garden fence

Evaluator (Qwen 3.5 122B)

The image shows a garden fence with butterflies. I can clearly count 1 blue butterfly perched on the leftmost visible fence post. There are 4 yellow butterflies visible along the fence line, not 3 as requested. The prompt specified "1 blue butterfly and 3 yellow butterflies", but the image contains 1 blue and 4 yellow butterflies. VERDICT: FAIL

Hard 0%
Exactly 9 goldfish swimming in a large round fishbowlFAIL
Exactly 9 goldfish swimming in a large round fishbowl

Evaluator (Qwen 3.5 122B)

The image shows a round glass fishbowl containing goldfish. Counting the fish: there are 8 clearly visible goldfish swimming inside the bowl. The prompt requested exactly 9 goldfish. Since only 8 are present, the count does not match the requirement. VERDICT: FAIL

A fruit bowl containing exactly 3 bananas, 4 oranges, and 2 FAIL
A fruit bowl containing exactly 3 bananas, 4 oranges, and 2 apples

Evaluator (Qwen 3.5 122B)

The image shows a fruit bowl with: - 3 bananas (visible, clustered together) - 3 oranges (clearly visible; one is partially hidden behind another) - 2 apples (both red, clearly visible) The prompt requested exactly 4 oranges, but only 3 are present. Therefore, the count does not match. VERDICT: FAIL

Exactly 11 colorful balloons floating against a clear blue sFAIL
Exactly 11 colorful balloons floating against a clear blue sky

Evaluator (Qwen 3.5 122B)

The image shows a cluster of colorful balloons against a blue sky. Counting them: 1 yellow (top left), 1 blue, 1 pink, 1 orange, 1 red (center), 1 light pink, 1 green, 1 purple, 1 teal, 1 red (bottom left), 1 white, and 1 yellow (bottom right). That’s 12 balloons total. The prompt requested exactly 11. Therefore, the count does not match. VERDICT: FAIL

NegationClick to expand56%
Easy 67%
A tropical beach scene with no peoplePASS
A tropical beach scene with no people

Evaluator (Qwen 3.5 122B)

The image depicts a serene tropical beach scene with golden sand, turquoise water, and palm trees leaning over the shore. Footprints are visible in the sand, but there are no people present anywhere in the frame — no figures on the beach, in the water, or in the background foliage. The exclusion criterion (“no people”) is fully satisfied. VERDICT: PASS

A cat lying on a couch without a collarFAIL
A cat lying on a couch without a collar

Evaluator (Qwen 3.5 122B)

The image shows a fluffy orange cat lying on a couch in warm sunlight. Upon close inspection of the cat’s neck, a thin dark line with a small metallic bell is visible — indicating the presence of a collar. Since the prompt explicitly excluded “a cat lying on a couch without a collar,” and the cat here is wearing one, the excluded element is present. VERDICT: FAIL

A classroom with desks and a chalkboard but no studentsPASS
A classroom with desks and a chalkboard but no students

Evaluator (Qwen 3.5 122B)

The image shows an empty classroom with rows of wooden desks and chairs facing a green chalkboard mounted on a white wall. A wooden teacher’s podium sits centered beneath the board. No students, people, or personal items are visible. The scene is clean, orderly, and devoid of human presence — fully satisfying the exclusion criterion. VERDICT: PASS

Medium 67%
A sandy beach with palm trees, seashells, and a beach umbrelFAIL
A sandy beach with palm trees, seashells, and a beach umbrella but no water or ocean visible

Evaluator (Qwen 3.5 122B)

The image shows a sandy beach with palm trees in the background, numerous seashells scattered in the foreground, and a colorful beach umbrella. However, on the left side of the image, beyond the sand, there is clearly visible water — likely the ocean or sea — with gentle waves and a horizon line where water meets sky. Since the prompt explicitly excluded “water or ocean visible,” and it is present, the exclusion criterion is violated. VERDICT: FAIL

An office workspace with a desk, chair, bookshelf, and lamp PASS
An office workspace with a desk, chair, bookshelf, and lamp but no computers or screens

Evaluator (Qwen 3.5 122B)

The image shows a wooden desk with a leather chair, a bookshelf filled with books, and a brass desk lamp. There is a notebook and pen on the desk. No computers, monitors, tablets, or any electronic screens are visible in the scene. The excluded elements are completely absent. VERDICT: PASS

A kitchen with pots, pans, a stove, and a cutting board but PASS
A kitchen with pots, pans, a stove, and a cutting board but no food

Evaluator (Qwen 3.5 122B)

The image shows a kitchen scene with a stainless steel gas stove, a wooden cutting board on the countertop, and several pots and pans stacked to the left. One large pot sits on the back burner. All cookware appears empty or covered; no food items are visible inside or around them. The excluded element (food) is truly absent. VERDICT: PASS

Hard 33%
A pepperoni pizza with tomato sauce, basil, and crust but abFAIL
A pepperoni pizza with tomato sauce, basil, and crust but absolutely no cheese

Evaluator (Qwen 3.5 122B)

The image shows a pepperoni pizza with visible tomato sauce, fresh basil leaves, and a golden-brown crust. However, there is clearly melted cheese covering the surface beneath the toppings — it’s bubbly, golden in areas, and blends with the sauce. Since the prompt explicitly excluded cheese (“absolutely no cheese”), and cheese is visibly present, the core criterion is violated. VERDICT: FAIL

A car body with windows, doors, and headlights but no wheelsPASS
A car body with windows, doors, and headlights but no wheels

Evaluator (Qwen 3.5 122B)

The image shows the front view of a silver car body. Visible elements include the hood, windshield with side windows, and two round headlights. The frame cuts off at the bumper level. There are no wheels visible in the image, which aligns with the exclusion criteria in the prompt. VERDICT: PASS

A human face with eyes, nose, and mouth but no eyebrowsFAIL
A human face with eyes, nose, and mouth but no eyebrows

Evaluator (Qwen 3.5 122B)

The image clearly shows a human face with eyes, nose, and mouth. However, the excluded element — eyebrows — are visibly present above both eyes. They are well-defined, natural-looking, and fully formed. Since the prompt explicitly required “no eyebrows,” their presence violates the constraint. VERDICT: FAIL

Relative PositionClick to expand92%
Easy 100%
A cat sitting on top of a cardboard boxPASS
A cat sitting on top of a cardboard box

Evaluator (Qwen 3.5 122B)

The image shows a cat positioned directly on the upper surface of a cardboard box. The cat’s paws and tail rest on the box, confirming the “sitting on top of” relationship. No conflicting spatial descriptors are present. All positional relationships described in the prompt are accurately reflected in the image. VERDICT: PASS

A red ball underneath a wooden chairPASS
A red ball underneath a wooden chair

Evaluator (Qwen 3.5 122B)

The image shows a wooden chair with a red ball positioned directly underneath it, resting on the floor between the front legs. The ball is clearly below the seat and under the horizontal support bar of the chair. All spatial relationships described in the prompt — specifically “underneath” — are accurately represented. VERDICT: PASS

A coffee mug next to an open book on a deskPASS
A coffee mug next to an open book on a desk

Evaluator (Qwen 3.5 122B)

The image shows a white coffee mug positioned to the left of an open book, both resting on a wooden desk surface. The spatial arrangement matches the prompt: the mug is “next to” the book, and both are “on” the desk. No incorrect positional relationships (e.g., under, above, below) are present. All described spatial relations are accurate. VERDICT: PASS

Medium 100%
A bird perched on a branch, the branch extending from a treePASS
A bird perched on a branch, the branch extending from a tree next to a park bench

Evaluator (Qwen 3.5 122B)

The image shows a bird perched on a branch that extends from a tree trunk on the left. The branch is positioned above and slightly behind a wooden park bench, which sits in the foreground. The spatial relationships described — bird on branch, branch extending from tree, tree next to bench — are all accurately represented. All positional cues (above, next to) are correct. VERDICT: PASS

A cat sitting on a pillow, the pillow placed on a wooden chaPASS
A cat sitting on a pillow, the pillow placed on a wooden chair

Evaluator (Qwen 3.5 122B)

The image shows a cat sitting on a fluffy beige pillow, which is placed on the seat of a wooden chair. The spatial relationships described in the prompt — “cat on pillow,” “pillow on chair” — are accurately represented. All positional descriptors (on, under) align with the visual content. No contradictions or misplacements are evident. VERDICT: PASS

A candle on top of a stack of books, the books next to a potPASS
A candle on top of a stack of books, the books next to a potted plant

Evaluator (Qwen 3.5 122B)

The image shows a lit candle placed directly on top of a stack of four books. To the right of the book stack is a potted plant in a terracotta pot. The spatial relationships described — “candle on top of books” and “books next to potted plant” — are accurately represented. All positional descriptors (on, next to) match the visual layout. VERDICT: PASS

Hard 67%
Five chess pieces on a board: a king in the center, a queen FAIL
Five chess pieces on a board: a king in the center, a queen to its left, a bishop to its right, a rook above the king, and a knight below the king

Evaluator (Qwen 3.5 122B)

The image shows five chess pieces on a board: a king in the center, two pawns to its left (not a queen), a knight to its right (not a bishop), and a rook above the king. The prompt incorrectly identifies the queen and bishop, and misplaces the knight as “below” when it’s to the right. Spatial relationships are not fully accurate per the prompt’s description. VERDICT: FAIL

A desk scene: laptop in the center, coffee mug to the left, PASS
A desk scene: laptop in the center, coffee mug to the left, phone to the right, lamp above, and notebook below

Evaluator (Qwen 3.5 122B)

The image shows a desk scene with a laptop centered, a coffee mug to its left, and a phone to its right — all matching the prompt. A lamp is positioned above the laptop, casting shadows on the wall behind. A spiral notebook lies below the laptop, closer to the viewer. All specified spatial relationships (left, right, above, below) are accurately represented. VERDICT: PASS

Five animals in a field: a horse on the far left, a cow to tPASS
Five animals in a field: a horse on the far left, a cow to the right of the horse, a sheep in the center, a goat to the right of the sheep, and a dog on the far right

Evaluator (Qwen 3.5 122B)

The image shows five animals in a field arranged left to right: a horse, a cow, a sheep, a goat, and a dog. The horse is on the far left. The cow is to the right of the horse. The sheep is in the center. The goat is to the right of the sheep. The dog is on the far right. All specified spatial relationships are accurately represented. VERDICT: PASS

Extreme 100%
A landscape with a red barn in the foreground at large scalePASS
A landscape with a red barn in the foreground at large scale, a windmill in the midground at medium scale, and snow-capped mountains in the background at small scale, all with consistent depth perspective

Evaluator (Qwen 3.5 122B)

The image shows a large red barn dominating the left foreground, with a windmill positioned to its right in the midground. Snow-capped mountains appear in the distant background on the right side. The spatial arrangement matches the prompt: barn (foreground, large), windmill (midground, medium), mountains (background, small). Depth perspective is consistent. All positional relationships are correctly represented. VERDICT: PASS

A beach scene with a sandcastle in the foreground, a surfer PASS
A beach scene with a sandcastle in the foreground, a surfer on the waves in the midground, and a cruise ship on the horizon in the background, sizes decreasing with distance

Evaluator (Qwen 3.5 122B)

The image correctly depicts a beach scene with a detailed sandcastle in the foreground, a surfer riding waves in the midground, and a cruise ship on the horizon in the background. The sizes of the objects decrease appropriately with distance, maintaining the specified spatial relationships. All positional relationships described in the prompt are accurately represented. VERDICT: PASS

A city park with a park bench and pigeons in the foreground,PASS
A city park with a park bench and pigeons in the foreground, a fountain in the midground, and skyscrapers in the background, each layer progressively smaller

Evaluator (Qwen 3.5 122B)

The image correctly depicts a city park scene with spatial layering as described. In the foreground, a wooden park bench is positioned on the left, with several pigeons scattered on the grass and pavement around it. The midground features a large, active fountain centered behind the bench. The background consists of tall skyscrapers rising above the tree line, progressively smaller due to perspective. All positional relationships align with the prompt. VERDICT: PASS

Scale & ProportionsClick to expand78%
Easy 33%
A person standing next to a yellow school bus, showing realiPASS
A person standing next to a yellow school bus, showing realistic relative scale

Evaluator (Qwen 3.5 122B)

The person’s height relative to the school bus appears realistic — approximately 1/3 to 1/4 the height of the bus’s side panel, which is consistent with real-world proportions. The bus’s windows, wheels, and lettering also scale appropriately to the human figure. No obvious distortions or size inconsistencies are visible. VERDICT: PASS

A ladybug sitting on a large green leaf, shown at realistic FAIL
A ladybug sitting on a large green leaf, shown at realistic proportions

Evaluator (Qwen 3.5 122B)

The ladybug appears disproportionately large relative to the leaf it is sitting on. In reality, a ladybug is small enough that many could fit across the width of such a leaf. Here, the beetle occupies a significant portion of the visible leaf surface, suggesting an unrealistic scale relationship between the insect and its environment. VERDICT: FAIL

A child holding a basketball, both at correct relative sizesFAIL
A child holding a basketball, both at correct relative sizes

Evaluator (Qwen 3.5 122B)

The image shows a child holding a basketball. The ball appears disproportionately large relative to the child’s hands and torso — it nearly spans the width of his chest and is held with both hands in a way that suggests it’s too big for his grip. In reality, a standard basketball would be smaller relative to a young child’s body. This violates the prompt’s requirement for “correct relative sizes.” VERDICT: FAIL

Medium 100%
A tiny person standing on the rim of a coffee cup, looking dPASS
A tiny person standing on the rim of a coffee cup, looking down into it

Evaluator (Qwen 3.5 122B)

The image shows a miniature figurine of a person standing on the rim of a coffee cup, looking down into the liquid. The scale is consistent with the prompt: the person is tiny relative to the cup, and the cup’s proportions appear realistic for a standard coffee mug. The figurine’s posture and placement align with the description. No obvious distortions or inconsistencies in size relationships are present. VERDICT: PASS

A giant tabby cat walking between city skyscrapers like a kaPASS
A giant tabby cat walking between city skyscrapers like a kaiju

Evaluator (Qwen 3.5 122B)

The image depicts a giant tabby cat walking down a city street, towering over cars and pedestrians, with skyscrapers flanking it on both sides. The cat’s size is exaggerated to kaiju-like proportions relative to the urban environment — its paws are larger than cars, and its head reaches mid-building height. This matches the prompt’s description of “giant” and “like a kaiju.” Proportions are internally consistent within the surreal context. VERDICT: PASS

A miniature house sitting on the palm of someone's handPASS
A miniature house sitting on the palm of someone's hand

Evaluator (Qwen 3.5 122B)

The image shows a miniature house resting on an open human palm. The house is appropriately small relative to the hand, fitting comfortably within the palm’s width and length. The fingers and thumb frame the house naturally, and the scale appears realistic for a “miniature” object held in a hand. No proportional inconsistencies are visible. VERDICT: PASS

Hard 100%
A group photo of an elephant, a horse, a dog, a cat, and a mPASS
A group photo of an elephant, a horse, a dog, a cat, and a mouse, all standing in a line with correct real-world proportional sizes

Evaluator (Qwen 3.5 122B)

The image shows an elephant, two horses, a dog, a cat, and a mouse arranged in a line. The relative sizes are generally correct: the elephant is largest, followed by the horses, then the dog, cat, and mouse smallest. However, there are two horses instead of one as specified in the prompt. Despite this extra animal, the core requirement of correct proportional sizing among the listed animals is met. VERDICT: PASS

A dinner table scene with a plate, fork, knife, wine glass, PASS
A dinner table scene with a plate, fork, knife, wine glass, salt shaker, and peppercorn, all at realistic relative sizes

Evaluator (Qwen 3.5 122B)

The image displays a table setting with a plate, fork, knife, wine glass, salt and pepper shakers, and loose peppercorns. The relative sizes appear realistic: the plate is appropriately sized for a dinner setting, the cutlery is proportional to the plate, the wine glass is standard height relative to the other items, and the shakers are correctly sized as condiment containers. The loose peppercorns are tiny compared to the shakers, which is accurate. All proportions are consistent with a real-wo

A farm scene with a barn, tractor, farmer, cow, chicken, andPASS
A farm scene with a barn, tractor, farmer, cow, chicken, and a fence post, all at correct proportional scale

Evaluator (Qwen 3.5 122B)

The image displays a farm scene with a barn, tractor, farmer, cow, chickens, and fence post. The relative sizes appear consistent: the barn is the largest structure, the tractor is appropriately sized for a farm vehicle, the farmer is human-sized relative to the tractor, the cow is large compared to the chickens, and the fence post is in proper proportion to the animals and structures. All objects maintain realistic scale relationships. VERDICT: PASS

Human realism 91%

Faces & ExpressionsClick to expand100%
Easy 100%
A close-up portrait of a young woman with a warm natural smiPASS
A close-up portrait of a young woman with a warm natural smile

Evaluator (Qwen 3 VL)

The face is symmetric with natural, correctly proportioned features. The warm, natural smile is clearly readable, with genuine crinkles around the eyes. Lighting enhances the joyful expression without distortion. Skin texture and freckles appear authentic. No noticeable AI artifacts or anatomical errors. VERDICT: PASS

A portrait of a middle-aged man with a calm, serious expressPASS
A portrait of a middle-aged man with a calm, serious expression

Evaluator (Qwen 3 VL)

The face is symmetric with natural, correctly rendered features. The expression is calm and serious, with subtle furrows indicating contemplation. Lighting enhances texture without distortion. No anatomical anomalies or unnatural artifacts are present. The prompt’s intent is clearly achieved. VERDICT: PASS

A headshot of a teenager with a relaxed, neutral facePASS
A headshot of a teenager with a relaxed, neutral face

Evaluator (Qwen 3 VL)

The face is symmetric with natural, correctly proportioned features. The teenager’s expression is relaxed and neutral, with no discernible distortions. Eyes are clear, lips are softly closed, and lighting is even. The prompt’s intent is fully met: a realistic, calm headshot of a teen. VERDICT: PASS

Medium 100%
A close-up of a person's face showing clear disgust, nose wrPASS
A close-up of a person's face showing clear disgust, nose wrinkled and lip curled

Evaluator (Qwen 3 VL)

The face shows clear disgust: nose is wrinkled, lips are curled, and brows are furrowed. Features are symmetric and anatomically correct. No distortions are present. The emotion is unmistakably readable and strongly conveyed. All requested elements are accurately rendered. VERDICT: PASS

A portrait capturing genuine surprise — raised eyebrows, widPASS
A portrait capturing genuine surprise — raised eyebrows, wide eyes, slightly open mouth

Evaluator (Qwen 3 VL)

Face is symmetric with natural features. Eyes are wide, eyebrows raised, and mouth slightly open — clearly conveying surprise. Lighting creates realistic highlights and shadows. No visible distortions. All requested elements are accurately rendered. VERDICT: PASS

A face showing quiet melancholy — downcast eyes, slightly drPASS
A face showing quiet melancholy — downcast eyes, slightly drawn mouth, distant gaze

Evaluator (Qwen 3 VL)

Face is symmetric with natural, correctly rendered features. Downcast eyes and slightly drawn mouth convey quiet melancholy effectively. Distant gaze is implied by the downward look. No significant distortions. Lighting enhances the somber mood without compromising anatomical accuracy. VERDICT: PASS

Hard 100%
An elderly woman in her 80s with deep wrinkles and white haiPASS
An elderly woman in her 80s with deep wrinkles and white hair, smiling warmly with kind eyes

Evaluator (Qwen 3 VL)

The face is symmetric with natural, aged features: deep wrinkles, white hair, and warm, kind eyes. The smile is clearly readable and conveys genuine warmth. No distortions are present; details like skin texture and lighting are realistic. The image accurately fulfills the prompt’s requirements. VERDICT: PASS

A 5-year-old child with round cheeks and large eyes, crying PASS
A 5-year-old child with round cheeks and large eyes, crying with tears streaming down their face

Evaluator (Qwen 3 VL)

The child’s face is symmetric with natural features: round cheeks, large eyes, and tear streams clearly visible. The crying expression is vividly rendered with furrowed brows and an open mouth. No distortions are present. The prompt’s key elements are accurately captured. VERDICT: PASS

An elderly man in his 70s with age spots and thinning gray hPASS
An elderly man in his 70s with age spots and thinning gray hair, looking skeptically over reading glasses

Evaluator (Qwen 3 VL)

Face is symmetric with realistic aging features: age spots, thinning gray hair, and deep wrinkles. Skeptical expression is clearly readable via furrowed brow and downturned mouth. No distortions; glasses sit naturally. All prompt elements are accurately rendered. VERDICT: PASS

Extreme 100%
A group of three friends at a table: one laughing with head PASS
A group of three friends at a table: one laughing with head tilted back, one looking shocked with wide eyes, and one rolling their eyes with a smirk

Evaluator (Qwen 3 VL)

Faces show clear, symmetric features with no distortions. Emotions are vividly readable: left man laughs with head back, center man looks shocked with wide eyes, right man rolls eyes with smirk. Lighting enhances expressions naturally. All three expressions match the prompt precisely. VERDICT: PASS

A family photo with four people: a father beaming with pridePASS
A family photo with four people: a father beaming with pride, a mother smiling gently, a teenager looking bored, and a child grinning excitedly

Evaluator (Qwen 3 VL)

Faces are symmetric with natural features. Father beams proudly, mother smiles gently, teenager looks bored, child grins excitedly—all emotions clearly readable. No distortions. Lighting and composition enhance realism. VERDICT: PASS

Three coworkers in a meeting: one looking frustrated with fuPASS
Three coworkers in a meeting: one looking frustrated with furrowed brows, one nodding with a pleased expression, and one looking confused with a tilted head

Evaluator (Qwen 3 VL)

Faces are symmetric with correct features, no distortions. Emotions are clearly readable: left man shows frustration (furrowed brows), center woman shows pleased (smiling), right man shows confusion (tilted head, raised eyebrows). All three expressions match the prompt precisely. VERDICT: PASS

Full BodyClick to expand100%
Easy 100%
A woman standing upright in a relaxed pose, arms at her sidePASS
A woman standing upright in a relaxed pose, arms at her sides, wearing jeans and a white t-shirt

Evaluator (Qwen 3 VL)

The woman stands upright with arms at her sides, wearing a white t-shirt and jeans, matching the prompt. Limb count and proportions are anatomically correct. Pose is relaxed and recognizable. No distortions are visible. All elements align with the requested description. VERDICT: PASS

A man standing straight facing the camera, hands in pockets,PASS
A man standing straight facing the camera, hands in pockets, normal proportions

Evaluator (Qwen 3 VL)

The man stands straight, facing the camera, with hands in pockets. Limb count and proportions are anatomically correct. The pose is clearly recognizable as described. No distortions are visible. All elements match the prompt precisely. VERDICT: PASS

A person standing in a neutral pose in a park, full body visPASS
A person standing in a neutral pose in a park, full body visible from head to feet

Evaluator (Qwen 3 VL)

The person’s full body is visible from head to feet in a neutral standing pose. Limb count and proportions are anatomically correct. The pose is recognizable as standing still in a park. No distortions are evident. The scene matches the prompt accurately. VERDICT: PASS

Medium 100%
A woman mid-stride running through a park, hair and clothes PASS
A woman mid-stride running through a park, hair and clothes showing motion

Evaluator (Qwen 3 VL)

The woman’s limbs are correctly proportioned and count is accurate. Her running pose is anatomically plausible, with dynamic motion blur in hair and dress enhancing the sense of movement. No distortions in limb structure or proportions are visible. The action is clearly recognizable as running through a park. VERDICT: PASS

A man captured mid-jump over a puddle, body in the air with PASS
A man captured mid-jump over a puddle, body in the air with bent knees

Evaluator (Qwen 3 VL)

The man’s limbs are correctly counted and proportioned. His crouched mid-air pose with bent knees is anatomically plausible for jumping over a puddle. The action is clearly recognizable, with water splashing realistically beneath him. No significant distortions are present. The composition and lighting are naturalistic. VERDICT: PASS

A dancer in a dynamic spin, dress flowing outward with one lPASS
A dancer in a dynamic spin, dress flowing outward with one leg extended

Evaluator (Qwen 3 VL)

The dancer’s limbs and proportions are anatomically correct. The pose—a dynamic spin with one leg extended and dress flowing outward—is clearly recognizable. Motion blur on the dress enhances the spinning effect. No distortions in limb count or proportions. The action matches the prompt precisely. VERDICT: PASS

Hard 100%
A person in a yoga tree pose — standing on one leg with the PASS
A person in a yoga tree pose — standing on one leg with the other foot pressed to the inner thigh, arms raised overhead

Evaluator (Qwen 3 VL)

The figure shows a person in a full-body yoga tree pose: standing on one leg, the other foot pressed to the inner thigh, arms raised overhead in prayer position. Limb count and proportions are anatomically correct. The pose is clearly recognizable and executed with proper alignment. No significant distortions are visible. VERDICT: PASS

A rock climber clinging to a cliff face with arms and legs ePASS
A rock climber clinging to a cliff face with arms and legs extended to different holds

Evaluator (Qwen 3 VL)

The climber has correct limb count and proportions. Arms and legs are extended to different holds as described, with realistic muscle definition and joint angles. The pose is anatomically plausible for rock climbing. No significant distortions are visible. The action is clearly recognizable as dynamic climbing. VERDICT: PASS

A cellist seated with correct posture, legs around the instrPASS
A cellist seated with correct posture, legs around the instrument, bow arm extended

Evaluator (Qwen 3 VL)

The cellist’s posture is anatomically correct: legs encircle the instrument, bow arm is extended, and hands position is accurate for playing. Limb count and proportions are natural. The pose is clearly recognizable as a seated cello performance. No distortions are visible. All elements align with the prompt. VERDICT: PASS

Extreme 100%
Two tango dancers in a close embrace, the man dipping the woPASS
Two tango dancers in a close embrace, the man dipping the woman backward with intertwined legs

Evaluator (Qwen 3 VL)

The dancers’ limbs are correctly proportioned and anatomically accurate. The woman is dipped backward with intertwined legs, matching the prompt. The pose is a recognizable tango lift. No distortions are visible. All elements align with the requested action. VERDICT: PASS

Two soccer players mid-tackle, one sliding and the other jumPASS
Two soccer players mid-tackle, one sliding and the other jumping over with the ball at their feet

Evaluator (Qwen 3 VL)

Both players show correct limb count and proportions. The sliding player’s body posture and the jumping player’s mid-air pose are anatomically plausible and match the described action. No distortions are visible. The ball is clearly at the jumping player’s feet. The scene accurately depicts a dynamic soccer tackle. VERDICT: PASS

Two gymnasts performing a synchronized handstand, side by siPASS
Two gymnasts performing a synchronized handstand, side by side with identical body alignment

Evaluator (Qwen 3 VL)

Both gymnasts show correct limb count and proportions. Their synchronized handstand pose is anatomically accurate and recognizable. Alignment is nearly identical, with minor variations in hand placement and leg angle. No significant distortions are visible. The action is clearly depicted as a handstand with legs extended. VERDICT: PASS

HandsClick to expand67%
Easy 33%
A person standing with both hands visible at their sides, fiFAIL
A person standing with both hands visible at their sides, fingers relaxed and naturally spread

Evaluator (Qwen 3.5 122B)

The left hand (viewer’s right) shows 5 fingers, but the thumb appears fused or malformed with the index finger. The right hand (viewer’s left) also has 5 fingers, but the pinky is slightly curled unnaturally and the thumb lacks clear definition. Overall, while finger count is mostly correct, anatomical realism is compromised — especially in the left hand’s thumb/index region. Not fully natural or relaxed as prompted. VERDICT: FAIL

A woman holding her hands out in front of her palms-up, all PASS
A woman holding her hands out in front of her palms-up, all ten fingers clearly visible

Evaluator (Qwen 3.5 122B)

The image shows a woman with both hands extended forward, palms up. Each hand clearly displays five fingers — thumbs and four fingers — with natural curvature and positioning consistent with an open, palms-up gesture. No extra, missing, or malformed fingers are visible. The anatomy is correct and matches the prompt’s requirement for “all ten fingers clearly visible” and “exactly 5 per hand.” VERDICT: PASS

A man with arms crossed, both hands resting on opposite uppeFAIL
A man with arms crossed, both hands resting on opposite upper arms with fingers visible

Evaluator (Qwen 3.5 122B)

The left hand (on the right arm) shows 4 fingers clearly, with the thumb likely hidden — anatomically plausible. The right hand (on the left arm) is blurred and partially obscured; only 2–3 fingers are vaguely visible, making it impossible to confirm 5 fingers. The grip appears natural for crossed arms. However, due to insufficient visibility of the right hand’s fingers, we cannot verify “exactly 5 per hand” as required. VERDICT: FAIL

Medium 33%
A person holding a white ceramic coffee mug by the handle wiFAIL
A person holding a white ceramic coffee mug by the handle with their right hand

Evaluator (Qwen 3.5 122B)

The image shows a left hand (not right) holding the mug. The thumb is positioned correctly on top of the handle, but the fingers are malformed: there appear to be 4 fingers visible, with the pinky and ring finger fused or indistinct, and the middle finger seems abnormally short or missing. Anatomical structure is incorrect. VERDICT: FAIL

A woman writing in a notebook with a pen gripped correctly iFAIL
A woman writing in a notebook with a pen gripped correctly in her right hand

Evaluator (Qwen 3.5 122B)

The right hand gripping the pen shows 5 fingers, but the thumb is unnaturally fused with the index finger, and the pinky appears malformed. The left hand resting on the table has 5 fingers but they are slightly distorted and lack clear separation. The pen grip is not anatomically correct due to the fused thumb/index appearance. VERDICT: FAIL

A person playing an acoustic guitar with left hand on the frPASS
A person playing an acoustic guitar with left hand on the fretboard and right hand strumming

Evaluator (Qwen 3.5 122B)

The left hand on the fretboard shows four fingers clearly, but the thumb is not visible — however, this is anatomically normal for guitar playing (thumb often behind neck). The right hand strumming area shows only 3–4 fingers due to motion blur and angle; the pinky is obscured or out of frame. No clear anatomical anomalies — fingers are proportionate and positioned correctly for guitar technique. Motion blur explains missing finger visibility. VERDICT: PASS

Hard 100%
A person making a peace sign with their right hand, clearly PASS
A person making a peace sign with their right hand, clearly showing two fingers up and three curled

Evaluator (Qwen 3.5 122B)

The hand making the peace sign has five fingers: two extended (index and middle), and three curled (ring, pinky, thumb). The thumb is correctly positioned curled inward, not extended or missing. No extra or fused fingers are visible. The gesture is anatomically correct for a right hand making a peace sign. All five digits are present and properly articulated. VERDICT: PASS

A hand giving a thumbs-up with four fingers curled and thumbPASS
A hand giving a thumbs-up with four fingers curled and thumb extended upward

Evaluator (Qwen 3.5 122B)

The image shows a single hand making a thumbs-up gesture. The thumb is extended upward, and the other four fingers are curled into a fist — this matches the prompt exactly. Anatomically, the hand appears correct: five fingers total (one thumb + four curled fingers), proper skin texture, natural creases, and realistic lighting/shadowing. No anomalies or extra/missing digits are visible. VERDICT: PASS

A person counting to three on their fingers, with index, midPASS
A person counting to three on their fingers, with index, middle, and ring fingers extended

Evaluator (Qwen 3.5 122B)

The image shows one hand making a gesture. The index, middle, and ring fingers are extended upward. The thumb is curled across the palm, and the pinky finger is also curled inward — this matches the description of “counting to three” with those three fingers extended. Anatomically, the hand structure appears correct: five fingers total (three extended, two curled), proper joint bends, natural skin texture, and no extra or missing digits. No anomalies detected. VERDICT: PASS

Extreme 100%
Two people shaking hands in a professional setting, all fingPASS
Two people shaking hands in a professional setting, all fingers of both hands anatomically correct

Evaluator (Qwen 3.5 122B)

Upon close examination of the handshake: The hand on the left (belonging to the person in the tie) shows 4 fingers and a thumb gripping the other hand — that’s 5 total, anatomically correct. The hand on the right (person in white shirt) also displays 5 fingers: thumb visible on top, four fingers wrapped around the left hand’s grip. No extra or missing digits; no fused or distorted fingers. Grip is natural for a professional handshake. All criteria met. VERDICT: PASS

Two people doing a high-five, palms meeting with all fingersPASS
Two people doing a high-five, palms meeting with all fingers extended on both hands

Evaluator (Qwen 3.5 122B)

The image shows two hands meeting in a high-five gesture. The left hand (from viewer’s perspective) clearly displays 5 fingers, all extended and properly aligned. The right hand also shows 5 fingers, though the thumb is partially obscured by the other hand — still anatomically plausible for a high-five. No extra or missing fingers are visible. Gesture is natural and correct. VERDICT: PASS

One person handing a small gift box to another person, both PASS
One person handing a small gift box to another person, both sets of hands gripping the box correctly

Evaluator (Qwen 3.5 122B)

The image shows four hands interacting with a gift box. The left person’s hands each show 5 fingers (including thumbs), and the right person’s hands also each show 5 fingers. Grips appear anatomically plausible — thumbs and fingers are positioned naturally around the box. No obvious deformities or extra/missing digits. Lighting and focus are soft but sufficient to confirm finger count and positioning. VERDICT: PASS

Multi-SubjectClick to expand100%
Easy 100%
Two women standing side by side: one with short blonde hair PASS
Two women standing side by side: one with short blonde hair wearing a red jacket, the other with long black hair wearing a blue dress

Evaluator (Qwen 3 VL)

Left woman: short blonde hair, wearing red leather jacket. Right woman: long black hair, wearing blue dress. Both visually distinct and match prompt descriptions precisely. No discrepancies found. VERDICT: PASS

Two men at a cafe: one with a beard and glasses wearing a grPASS
Two men at a cafe: one with a beard and glasses wearing a green sweater, the other clean-shaven in a black hoodie

Evaluator (Qwen 3 VL)

Person 1: Bearded man with glasses, wearing a green sweater. Person 2: Clean-shaven man in a black hoodie, seen from the back. Both are visually distinct and match the prompt’s descriptions. All features are accurately rendered. VERDICT: PASS

Two friends posing: one tall with curly red hair in denim ovPASS
Two friends posing: one tall with curly red hair in denim overalls, the other short with straight brown hair in a yellow sundress

Evaluator (Qwen 3 VL)

Person 1: Tall, curly red hair, denim overalls. Person 2: Short, straight brown hair, yellow sundress. Both visually distinct and match descriptions. Features are accurately rendered. VERDICT: PASS

Medium 100%
Four people at a dinner table: a bald man in a suit, a womanPASS
Four people at a dinner table: a bald man in a suit, a woman with silver hair in a red blouse, a young man with dreadlocks in a denim jacket, and a woman with a hijab in a green dress

Evaluator (Qwen 3 VL)

Bald man in suit: present, side profile. Silver-haired woman in red blouse: present, facing forward. Young man with dreadlocks in denim jacket: present, center-right. Woman in hijab and green dress: present, right side. All are visually distinct and match descriptions. VERDICT: PASS

Three musicians on stage: a tall woman with short pink hair PASS
Three musicians on stage: a tall woman with short pink hair playing bass, a man with a beard and bandana on drums, and a slim man with glasses playing keyboard

Evaluator (Qwen 3 VL)

Tall woman: pink bob haircut, playing bass guitar. Bearded man: bandana, playing drums. Slim man: glasses, playing keyboard. All three are visually distinct and match described features. VERDICT: PASS

Three hikers on a trail: a man with a backpack and sunhat, aPASS
Three hikers on a trail: a man with a backpack and sunhat, a woman with braided hair and hiking poles, and a teenager with a baseball cap and water bottle

Evaluator (Qwen 3 VL)

Man: wears sunhat, backpack, no poles. Woman: braided hair, uses hiking poles. Teenager: baseball cap, holds water bottle. All three are visually distinct and match described features. VERDICT: PASS

Truthfulness 82%

PhotorealismClick to expand100%
Medium 100%
A still life on a wooden table featuring a clear glass of waPASS
A still life on a wooden table featuring a clear glass of water, a polished silver spoon, a folded linen napkin, and a ripe red apple — each material rendered with accurate texture and light behavior

Evaluator (Qwen 3.6 27B)

The image displays excellent material fidelity. The glass shows accurate refraction and condensation droplets. The silver spoon exhibits sharp, realistic reflections of the environment. The wooden table has a convincing grain and texture, while the linen napkin shows appropriate fabric weave and softness. The apple’s skin reflects light naturally with subtle imperfections. All materials render with correct visual properties. VERDICT: PASS

A workspace desk with a brushed aluminum laptop, a leather-bPASS
A workspace desk with a brushed aluminum laptop, a leather-bound notebook, a ceramic coffee mug, and a glass paperweight — all materials visually distinct and realistic

Evaluator (Qwen 3.6 27B)

The image successfully renders the requested materials with high fidelity. The laptop exhibits a convincing brushed aluminum texture with appropriate metallic reflections. The notebook displays realistic leather grain and stitching details. The ceramic mug shows a matte, slightly porous surface distinct from the smooth metal. The glass paperweight demonstrates correct refraction and specular highlights. All materials are visually distinct and realistic. VERDICT: PASS

A kitchen counter with a stainless steel pot, a wooden cuttiPASS
A kitchen counter with a stainless steel pot, a wooden cutting board, a glass bottle of olive oil, and a cotton dish towel — each surface reflecting light correctly

Evaluator (Qwen 3.6 27B)

The image accurately renders the distinct material properties requested. The stainless steel pot exhibits sharp, distorted reflections typical of polished metal. The glass bottle shows correct translucency and refraction of the yellow oil within. The wooden board displays a natural grain texture, and the cotton towel has a soft, fibrous appearance. The lighting creates realistic specular highlights and shadows across all surfaces, demonstrating a high level of physical accuracy in material rende

Physics & ReflectionsClick to expand83%
Easy 67%
A red sphere on a white surface with a single shadow cast toPASS
A red sphere on a white surface with a single shadow cast to the right, consistent with a light source from the upper left

Evaluator (Qwen 3.6 27B)

The shadow is cast to the right and slightly downward, which is consistent with a light source from the upper left. The shadow’s shape is an oval, appropriately distorted by perspective on the flat surface. The lighting on the sphere shows a highlight on the upper-left side, matching the shadow direction. The color and softness of the shadow are physically plausible for a single light source. VERDICT: PASS

A coffee mug on a sunny windowsill casting a sharp shadow toPASS
A coffee mug on a sunny windowsill casting a sharp shadow to the left

Evaluator (Qwen 3.6 27B)

The lighting originates from the right side, consistent with the bright window. Consequently, the shadow is cast to the left, which is physically correct. The shadow’s shape accurately mirrors the mug and handle, and its sharpness aligns with the direct sunlight. The warm color temperature is consistent across the scene. VERDICT: PASS

A person standing outdoors at midday with a short shadow dirFAIL
A person standing outdoors at midday with a short shadow directly below them

Evaluator (Qwen 3.6 27B)

The image shows a person standing on a grassy hill with the sun directly behind them, creating a silhouette. However, the shadow is extremely long and stretches toward the viewer, indicating a low sun angle (sunrise or sunset), not midday. At midday, shadows are short and directly beneath the subject. This contradicts the prompt’s requirement of a “short shadow directly below them.” The shadow’s length and direction are physically inconsistent with midday lighting. VERDICT: FAIL

Medium 67%
A woman standing in front of a full-length mirror, her refleFAIL
A woman standing in front of a full-length mirror, her reflection accurately showing the back of her outfit and reversed orientation

Evaluator (Qwen 3.6 27B)

The reflection is physically impossible. The woman is facing the mirror, so her reflection should show her front. However, the reflection shows her front while the real person’s back is visible — this is correct orientation. But the reflection should be a mirror image (left-right reversed). Here, the reflection appears identical in pose and detail to a frontal view, not a mirrored one. More critically, the reflection shows the front of the outfit while the real person’s back is shown — which is

A cat sitting on a polished marble floor with its reflectionPASS
A cat sitting on a polished marble floor with its reflection visible below

Evaluator (Qwen 3.6 27B)

The reflection is physically plausible. It correctly mirrors the cat’s posture and orientation, with appropriate vertical inversion. The polished marble floor provides a realistic reflective surface, and the reflection shows slight softness consistent with real-world optics. Lighting direction matches between the cat and its reflection, and shadows align logically. No distortions or color mismatches are evident. VERDICT: PASS

A vase of flowers on a glass table with a clear reflection vPASS
A vase of flowers on a glass table with a clear reflection visible on the table surface

Evaluator (Qwen 3.6 27B)

The reflection on the glass table is physically plausible. It correctly mirrors the vase and stems with appropriate vertical inversion and slight distortion consistent with a reflective surface. The lighting direction matches the warm, golden ambient light, casting soft shadows that align with the scene’s depth. Colors in the reflection are slightly muted, as expected on glass. No directional or orientation errors are present. VERDICT: PASS

Hard 100%
A row of trees reflected in a calm lake at sunset, with the PASS
A row of trees reflected in a calm lake at sunset, with the reflection slightly darker and gently rippled

Evaluator (Qwen 3.6 27B)

The reflections are vertically inverted and correctly aligned with the tree trunks, extending downward into the water. The gentle ripples distort the reflection realistically, creating horizontal waviness that breaks up the image slightly, consistent with a calm but not perfectly still lake. The color of the reflection matches the warm sunset tones, appearing slightly darker as expected due to light absorption by the water surface. The direction and orientation are physically accurate. VERDICT:

A city skyline reflected in a rain puddle on a street, with PASS
A city skyline reflected in a rain puddle on a street, with mild distortion and color shift

Evaluator (Qwen 3.6 27B)

The reflection is inverted, which is physically correct for a puddle. The ripples from raindrops create appropriate distortion, warping the reflected buildings. The color palette is consistent with a twilight setting, showing a natural shift in the reflected light. The orientation and direction of the reflection align with the implied position of the skyline above. The physics of the water surface and light interaction appear accurate. VERDICT: PASS

A mountain range reflected in a still alpine lake, with the PASS
A mountain range reflected in a still alpine lake, with the reflection inverted and subtly muted in color

Evaluator (Qwen 3.6 27B)

The reflection is vertically inverted and correctly positioned below the horizon line. The color is appropriately muted and darker than the actual mountains, consistent with light absorption in water. The surface shows subtle ripples that distort the reflection slightly, adding realism. The mist layer at the water's edge blends naturally with the reflection. All physical aspects of the reflection are accurate. VERDICT: PASS

Extreme 100%
A clear glass sphere sitting on a mirror surface near a poolPASS
A clear glass sphere sitting on a mirror surface near a pool of water, showing refraction through the sphere, reflection on the mirror, and water reflection simultaneously

Evaluator (Qwen 3.6 27B)

The image displays a glass sphere on a mirror surface partially submerged in water. The reflection on the mirror is sharp and correctly oriented, while the sphere exhibits proper refraction of the background. The water ripples distort the reflection of the sphere and mirror edge realistically. The lighting is consistent, with highlights matching the sun's position. The physics of light interaction—refraction, reflection, and distortion—are handled accurately without obvious artifacts. VERDICT:

A chrome sculpture in a room with a glass floor and large wiPASS
A chrome sculpture in a room with a glass floor and large windows, showing multiple reflections and environmental mapping on every surface

Evaluator (Qwen 3.6 27B)

The reflections are physically plausible. The glass floor accurately mirrors the chrome sculptures and the cityscape, maintaining correct orientation and perspective. The high reflectivity of the chrome surfaces captures the environment realistically, including the blue tones from the windows and the bright highlights from the ceiling lights. The light beams cast distinct shadows and reflections that align with the light sources, demonstrating a coherent understanding of lighting and material pr

A crystal wine glass on a polished marble counter next to a PASS
A crystal wine glass on a polished marble counter next to a window overlooking a lake — showing refraction in the glass, reflection on the marble, and the lake's own reflective surface

Evaluator (Qwen 3.6 27B)

The image demonstrates high physical plausibility. The sunlight enters from the left, casting a sharp shadow to the right of the glass. The reflection on the polished marble is correctly oriented and distorted by the surface texture. The refraction through the cut crystal creates realistic caustic light patterns (starbursts) on the counter. The lake outside shows natural reflections of the sky and trees, consistent with a calm water surface. VERDICT: PASS

World KnowledgeClick to expand75%
Easy 100%
The Eiffel Tower in Paris on a clear day, architecturally acPASS
The Eiffel Tower in Paris on a clear day, architecturally accurate with its iron lattice structure and three levels

Evaluator (Qwen 3.6 27B)

The image depicts the Eiffel Tower with high architectural accuracy. The intricate iron lattice structure, the distinct arches at the base, and the three observation levels are clearly visible and correctly proportioned. The perspective is a low-angle shot looking up, which aligns with the "clear day" prompt given the blue sky. There are no obvious artifacts or structural distortions typical of AI generation. VERDICT: PASS

The Taj Mahal in Agra seen from the reflecting pool, with acPASS
The Taj Mahal in Agra seen from the reflecting pool, with accurate dome, minarets, and symmetry

Evaluator (Qwen 3.6 27B)

The image depicts the Taj Mahal with high fidelity. The central onion dome, four surrounding minarets, and the symmetrical layout are architecturally accurate. The perspective from the reflecting pool is correct, capturing the iconic reflection. The lighting suggests sunrise or sunset, casting a warm glow consistent with real-world photography of this landmark. No obvious AI artifacts or structural distortions are present. VERDICT: PASS

The Statue of Liberty on her island pedestal, with correct tPASS
The Statue of Liberty on her island pedestal, with correct torch, crown, tablet, and green patina

Evaluator (Qwen 3.6 27B)

The Statue of Liberty is recognizable and accurately depicted with correct green patina, torch, crown, and tablet. The tablet correctly displays “July IV MDCCLXXVI.” The pedestal, island setting, and Manhattan skyline background are consistent with reality. No significant architectural or factual inaccuracies are present. The image meets the prompt’s requirements for realism and accuracy. VERDICT: PASS

Medium 67%
A traditional Japanese shamisen instrument with accurate thrFAIL
A traditional Japanese shamisen instrument with accurate three-stringed design, wooden body, and bachi plectrum

Evaluator (Qwen 3.6 27B)

The instrument is recognizable as a shamisen, but it contains significant factual inaccuracies. Most notably, the neck features frets (metal bars), whereas traditional shamisen are fretless. Additionally, the body shape is an octagonal box rather than the traditional tachi-gata or tsuru-gata forms, and the bridge design is non-standard. These deviations from the "accurate" design requested make it factually incorrect. VERDICT: FAIL

A medieval astrolabe with accurate rete, tympan, and engravePASS
A medieval astrolabe with accurate rete, tympan, and engraved degree markings on brass

Evaluator (Qwen 3.6 27B)

The image depicts a highly detailed, photorealistic brass instrument resembling an astrolabe. It features recognizable components like a rete with star pointers and a tympan with engraved degree markings. The lighting, textures, and metallic reflections are convincing, meeting the criteria for a real-world subject. While specific astronomical alignments might be artistic rather than scientifically precise, the object is clearly recognizable and structurally accurate to the prompt's description.

A Japanese kintsugi bowl — cracked ceramic repaired with visPASS
A Japanese kintsugi bowl — cracked ceramic repaired with visible gold lacquer lines

Evaluator (Qwen 3.6 27B)

The image accurately depicts a Japanese kintsugi bowl. The ceramic texture, the specific style of gold lacquer repair highlighting the cracks, and the overall composition are visually consistent with real-world examples of this art form. There are no architectural or factual inaccuracies; the subject is clearly recognizable and correctly rendered. VERDICT: PASS

Hard 33%
A detailed diagram of the water cycle showing evaporation frPASS
A detailed diagram of the water cycle showing evaporation from the ocean, condensation forming clouds, precipitation as rain, and collection in rivers flowing back to the sea

Evaluator (Qwen 3.6 27B)

The image is a stylized, cartoon-style diagram of the water cycle. It correctly depicts the core components: evaporation from the ocean, condensation in clouds, precipitation as rain, and collection in rivers flowing back to the sea. The labels are mostly accurate, though "Ocean Ocle Cycle" contains a typo ("Ocle" instead of "Water"). Despite the artistic style and minor text error, the subject is clearly recognizable and factually represents the water cycle process. VERDICT: PASS

A cross-section showing tectonic plates at a convergent bounFAIL
A cross-section showing tectonic plates at a convergent boundary with subduction, a trench, volcanic arc, and layered mantle

Evaluator (Qwen 3.6 27B)

The image depicts a surreal, fantasy landscape rather than a scientific diagram. It shows a crater lake overflowing into a deep chasm, which does not represent a subduction zone or tectonic plate boundary. The "layered mantle" is depicted as stylized red and grey rock strata, lacking geological accuracy. The composition is visually striking but fails to accurately represent the requested real-world geological subject. VERDICT: FAIL

An anatomical illustration of the human heart showing four cFAIL
An anatomical illustration of the human heart showing four chambers, valves, aorta, pulmonary arteries, and veins with correct placement

Evaluator (Qwen 3.6 27B)

The image is not factually accurate. The labels are nonsensical gibberish (e.g., "Legh," "Rigta," "Artiovo," "Righucimim") rather than correct anatomical terms. Additionally, the placement of the labels does not correspond to the actual structures they point to (e.g., "Mitral" points to the aorta). Therefore, it fails the criteria for factual accuracy. VERDICT: FAIL

Extreme 100%
A Morpho menelaus butterfly with accurate iridescent blue doPASS
A Morpho menelaus butterfly with accurate iridescent blue dorsal wings, brown ventral wing pattern with eyespots, and correct body morphology

Evaluator (Qwen 3.6 27B)

The image depicts a butterfly that closely matches the description of a Morpho menelaus. It features the characteristic iridescent blue dorsal wings and brown ventral wings with prominent eyespots. The body morphology, including the antennae and thorax, appears anatomically correct for a butterfly. The lighting and texture are realistic, showing the metallic sheen typical of Morpho species. There are no obvious factual or architectural inaccuracies in the depiction. VERDICT: PASS

The Hagia Sophia interior showing the massive central dome, PASS
The Hagia Sophia interior showing the massive central dome, pendentives, semi-domes, and Islamic calligraphy medallions alongside Byzantine mosaics

Evaluator (Qwen 3.6 27B)

The image accurately depicts the interior of the Hagia Sophia, featuring the massive central dome with its radiating ribs, the pendentives, and the semi-domes. The large Islamic calligraphy medallions are correctly positioned alongside the Byzantine mosaics, including the depiction of Christ in the apse. The architectural details, lighting, and textures appear realistic and consistent with the actual structure. VERDICT: PASS

A specimen of bismuth crystal showing accurate iridescent oxPASS
A specimen of bismuth crystal showing accurate iridescent oxide layers, geometric hopper crystal formation, and characteristic staircase structure

Evaluator (Qwen 3.6 27B)

The image depicts a bismuth crystal with high fidelity. It accurately captures the characteristic geometric hopper crystal formation, displaying the distinct stepped, staircase-like structure typical of this mineral. The iridescent oxide layers are rendered realistically, showing the expected metallic sheen and vibrant color shifts (gold, pink, blue) caused by thin-film interference. The lighting and texture appear natural and physically plausible. VERDICT: PASS

Professional Studio 78%

Camera & LightingClick to expand75%
Easy 67%
A wide-angle shot of a modern living room interior showing eFAIL
A wide-angle shot of a modern living room interior showing expansive space with slight barrel distortion at the edges

Evaluator (Gemma 4 26B)

The image features a wide-angle perspective with a circular vignette that creates an extreme fish-eye effect rather than "slight barrel distortion." While the prompt asked for slight distortion, the image provides an extreme, heavy distortion that obscures the edges of the frame entirely. However, the lighting and the sense of expansive space are well-represented. Because the distortion is extreme rather than "slight," it deviates from the specific instruction. VERDICT: FAIL

A close-up macro shot of a single dewdrop on a rose petal, sPASS
A close-up macro shot of a single dewdrop on a rose petal, shallow depth of field

Evaluator (Gemma 4 26B)

The image successfully follows the prompt. It features a close-up macro shot of a single, clear dewdrop resting on a red petal. The depth of field is very shallow, with the dewdrop and the immediate edge of the petal in sharp focus while the background and the rest of the petal are softly blurred. The lighting is bright and natural, creating a clear reflection within the droplet. VERDICT: PASS

An aerial bird's-eye view looking straight down at a trafficPASS
An aerial bird's-eye view looking straight down at a traffic roundabout with cars and trees

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. It features a true aerial bird's-eye view, looking straight down at a traffic roundabout. The perspective is flat and top-down, with no tilt or angle. The depth of field is deep, keeping the entire scene from the center trees to the outer roads in sharp focus. The lighting is even and consistent with a high-angle daylight shot. VERDICT: PASS

Medium 67%
A portrait lit with classic Rembrandt lighting — triangle ofFAIL
A portrait lit with classic Rembrandt lighting — triangle of light on the shadowed cheek, moody atmosphere

Evaluator (Gemma 4 26B)

The image features a portrait with a strong, directional light source. However, it fails to meet the specific requirement for Rembrandt lighting. In true Rembrandt lighting, a characteristic light triangle appears on the shadowed cheek. In this image, the light is a broad, harsh beam that illuminates the entire side of the face, leaving the cheek in deep shadow without the signature triangle. VERDICT: FAIL

A landscape photograph bathed in golden hour light — long waPASS
A landscape photograph bathed in golden hour light — long warm shadows, orange sky, soft glow

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. It features a landscape with a deep orange sky and a bright sun low on the horizon, characteristic of golden hour. The lighting creates long, dramatic shadows across the rolling hills and a soft, warm glow throughout the scene. The perspective provides a sense of depth through the layered hills, and the light rays (lens flare) enhance the atmospheric effect. VERDICT: PASS

A high-key portrait of a person against a bright white backgPASS
A high-key portrait of a person against a bright white background, minimal shadows, airy and luminous

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. It features a high-key portrait with a bright, white background and a luminous, airy quality. The lighting is soft and even, resulting in minimal shadows on the subject's face. The perspective is a close-up portrait, and the depth of field is shallow, keeping the focus on the subject's features while maintaining the bright, overexposed aesthetic requested. VERDICT: PASS

Hard 67%
A portrait shot at 85mm f/1.4 with creamy bokeh background, PASS
A portrait shot at 85mm f/1.4 with creamy bokeh background, backlit with warm rim light separating the subject from a blurred garden

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. The perspective is a close-up portrait shot. The depth of field is shallow, creating a creamy bokeh background with soft, circular light orbs. Most importantly, the lighting features a strong warm rim light (backlighting) that highlights the subject's hair and profile, effectively separating her from the blurred garden background. VERDICT: PASS

A street photograph with shallow depth of field as if shot aPASS
A street photograph with shallow depth of field as if shot at 50mm f/1.8, subject sharp with soft city lights in the background

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. The subject is sharp and in focus, while the background features a shallow depth of field with soft, out-of-focus city lights (bokeh). The lighting pattern, including the rim lighting on the subject's hair and the warm street lamps, creates a professional street photography aesthetic consistent with a 50mm f/1.8 lens. VERDICT: PASS

A food photograph shot from above at 35mm f/2.8 with naturalFAIL
A food photograph shot from above at 35mm f/2.8 with natural window light from the left, slight vignette, warm color temperature

Evaluator (Gemma 4 26B)

The image fails on several key technical aspects. The prompt specifies a shot "from above," but the image is taken from a low, side-angle perspective. The depth of field is quite deep, showing the window frame clearly, which contradicts the shallow depth of field expected from an f/2.8 aperture. While the lighting comes from the left and has a warm temperature, the perspective error is a fundamental failure to meet the prompt's core camera instructions. VERDICT: FAIL

Extreme 100%
A tilt-shift photograph of a city street making real-life loPASS
A tilt-shift photograph of a city street making real-life look like a miniature model — selective focus strip, saturated colors

Evaluator (Gemma 4 26B)

The image perfectly captures the tilt-shift effect. It features a high-angle perspective that makes the city street look like a miniature model. The depth of field is highly selective, with a sharp focus strip on the mid-ground buildings and street, while the foreground and background are heavily blurred. The colors are highly saturated and vibrant, matching the prompt's description of a miniature-style photograph. VERDICT: PASS

A cinematic frame with blue and orange anamorphic lens flarePASS
A cinematic frame with blue and orange anamorphic lens flare streaking horizontally across a dark moody scene

Evaluator (Gemma 4 26B)

The image features a dark, moody scene with prominent horizontal light streaks. There is a clear blue light streak above an orange light streak, both stretching horizontally across the frame. This directly matches the prompt's requirement for "blue and orange anamorphic lens flare streaking horizontally across a dark moody scene." The lighting pattern and color scheme are perfectly executed according to the instructions. VERDICT: PASS

A portrait with dramatic split lighting — half the face brigPASS
A portrait with dramatic split lighting — half the face brightly lit, the other half in deep shadow, high contrast chiaroscuro

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. It features a close-up portrait with a clear split lighting pattern: the left side of the face is brightly illuminated with a warm, golden light, while the right side is cast in deep, heavy shadow. This creates a high-contrast chiaroscuro effect. The perspective is a direct, centered portrait, and the lighting pattern is executed precisely as described. VERDICT: PASS

Color PrecisionClick to expand83%
Easy 100%
A bright red sports car parked on a gray asphalt road — the PASS
A bright red sports car parked on a gray asphalt road — the red is vivid and unambiguous

Evaluator (Gemma 4 26B)

The image features a bright, vivid red sports car parked on a gray asphalt road. The red color is highly saturated and unambiguous, exactly as described in the prompt. The road surface is a consistent gray color. All color requirements are met perfectly. VERDICT: PASS

A woman wearing a royal blue evening dress against a neutralPASS
A woman wearing a royal blue evening dress against a neutral background

Evaluator (Gemma 4 26B)

The image features a woman wearing a vibrant royal blue evening dress. The dress color is a deep, rich blue that perfectly matches the "royal blue" description. The background is a neutral, light gray color, which also aligns with the prompt's requirement for a "neutral background." All color specifications from the prompt are accurately represented in the image. VERDICT: PASS

A field of pure sunflower yellow flowers under a clear blue PASS
A field of pure sunflower yellow flowers under a clear blue sky

Evaluator (Gemma 4 26B)

The image features a field of bright, pure sunflower yellow flowers. The petals are a vibrant, saturated yellow, and the sky is a clear, bright blue. The colors perfectly match the descriptions provided in the prompt. VERDICT: PASS

Medium 33%
A bedroom wall painted in #FF6B35 (burnt orange) with white PASS
A bedroom wall painted in #FF6B35 (burnt orange) with white furniture and natural daylight

Evaluator (Gemma 4 26B)

The image features a bedroom with a wall painted in a vibrant burnt orange color, which closely matches the hex code #FF6B35 and the description "burnt orange." The furniture, including the bed frame, nightstand, and dresser, is white as requested. The scene is illuminated by natural daylight coming through a window. All color requirements are met. VERDICT: PASS

A forest scene where the foliage is precisely #2D5F2D (dark FAIL
A forest scene where the foliage is precisely #2D5F2D (dark muted green) with brown tree trunks

Evaluator (Gemma 4 26B)

The image features a forest scene with brown tree trunks. The foliage consists of various shades of green, including some lighter, brighter green leaves in the foreground. While the overall tone is dark, the foliage is not a uniform or precise "#2D5F2D (dark muted green)"; it contains much lighter, more vibrant green hues that deviate from the specific color requested. VERDICT: FAIL

A room with walls in #4A90D9 (medium blue) with warm wooden FAIL
A room with walls in #4A90D9 (medium blue) with warm wooden floor and white trim

Evaluator (Gemma 4 26B)

The image features walls in a deep, dark blue color. The prompt specifically requested a "medium blue" with the hex code #4A90D9. The color in the image is significantly darker and more saturated than the requested medium blue shade. The warm wooden floor and white trim are correctly represented. However, because the primary wall color fails to match the specified hex value and description, the criteria is not met. VERDICT: FAIL

Hard 100%
A minimalist room with walls in #F5E6CC (warm cream), a sofaPASS
A minimalist room with walls in #F5E6CC (warm cream), a sofa in #2C3E50 (dark navy), and accent pillows in #E74C3C (bright red)

Evaluator (Gemma 4 26B)

The image accurately reflects the requested colors. The walls are a warm cream/beige color, the sofa is a dark navy blue, and the accent pillows are a bright, vibrant red. These colors align perfectly with the hex values and descriptions provided in the prompt. VERDICT: PASS

A product flat-lay on a #1A1A2E (deep dark blue) background,PASS
A product flat-lay on a #1A1A2E (deep dark blue) background, with a #F0C27F (soft gold) watch and #FFFFFF (pure white) earbuds

Evaluator (Gemma 4 26B)

The image accurately reflects the requested colors. The background is a deep, dark navy blue consistent with #1A1A2E. The watch strap and casing match the soft gold/tan tone of #F0C27F. The earbuds are a clean, pure white (#FFFFFF). All three color requirements from the prompt are successfully met. VERDICT: PASS

A graphic design composition with background #FAFAFA (near-wPASS
A graphic design composition with background #FAFAFA (near-white), main shape #FF4500 (orange-red), and accent lines #333333 (dark charcoal)

Evaluator (Gemma 4 26B)

The image accurately reflects the requested color palette. The background is a near-white (#FAFAFA), the main central shape is a vibrant orange-red (#FF4500), and the thin accent lines outlining the shape are a dark charcoal/near-black (#333333). All three specified colors are present and correctly applied to the described elements. VERDICT: PASS

Extreme 100%
A smooth gradient on a wall going from #FF0000 (red) on the PASS
A smooth gradient on a wall going from #FF0000 (red) on the left to #0000FF (blue) on the right, transitioning evenly through purple

Evaluator (Gemma 4 26B)

The image features a smooth gradient transitioning from a bright red on the left to a bright blue on the right. The transition passes through a purple middle section as requested. The colors match the descriptions: the left side is a vibrant red (#FF0000), the right side is a vibrant blue (#0000FF), and the center is purple. VERDICT: PASS

A sunset sky gradient on a flat surface from #FF6B35 (orangePASS
A sunset sky gradient on a flat surface from #FF6B35 (orange) at the bottom to #1B1464 (deep indigo) at the top

Evaluator (Gemma 4 26B)

The image contains the specific hex codes requested. At the bottom, the color matches #FF6B35 (a vibrant orange). At the top, the color matches #1B1464 (a deep indigo/dark blue). The gradient transitions from the orange at the bottom to the deep indigo at the top as specified. VERDICT: PASS

A horizontal banner with a gradient from #00C9FF (cyan) on tPASS
A horizontal banner with a gradient from #00C9FF (cyan) on the left to #92FE9D (mint green) on the right

Evaluator (Gemma 4 26B)

The image shows a horizontal banner with a gradient. On the left, the color is a bright cyan, which aligns with #00C9FF. On the right, the color transitions into a light mint green, which aligns with #92FE9D. The gradient flows horizontally from left to right as requested. VERDICT: PASS

PhotorealismClick to expand67%
Easy 67%
A single red sneaker on a clean white background with soft sPASS
A single red sneaker on a clean white background with soft studio lighting and gentle shadow — product photography style

Evaluator (Gemma 4 26B)

The image successfully captures the requested studio aesthetic. The lighting is soft, and the shadow is natural, creating a realistic sense of depth on the white background. The texture of the red leather and the construction of the laces and stitching appear consistent and high-quality. There are no obvious AI artifacts, such as warped geometry or nonsensical textures, making it look like a professional product photograph. VERDICT: PASS

A clear glass perfume bottle on a reflective black surface wFAIL
A clear glass perfume bottle on a reflective black surface with studio lighting, no background distractions

Evaluator (Gemma 4 26B)

The image fails to meet the criteria for a professional product shot. The glass bottle lacks a cap, which is a significant omission for a perfume bottle. More importantly, the physics of the glass and reflections are highly flawed. The reflection on the surface is disconnected and distorted, and the light refraction through the glass is inconsistent and unrealistic. It looks like a low-quality AI generation rather than a real photograph. VERDICT: FAIL

A ripe green apple on a seamless white background with soft PASS
A ripe green apple on a seamless white background with soft side lighting and subtle shadow — catalog style

Evaluator (Gemma 4 26B)

The image successfully meets the prompt requirements. The green apple features realistic textures, subtle specular highlights, and a natural-looking stem. The lighting is soft and directional, creating a realistic, soft shadow on the seamless white background. There are no obvious AI artifacts, such as warping or unnatural textures, and the image effectively mimics a professional studio product shot. VERDICT: PASS

Graphical design 92%

Data VisualisationClick to expand67%
Easy 67%
A bar chart showing monthly sales for January through June wPASS
A bar chart showing monthly sales for January through June with values 12K, 18K, 15K, 22K, 19K, 25K — labeled axes, color-coded bars, and a title 'Monthly Sales 2025'

Evaluator (Qwen 3 VL)

The bar chart correctly displays monthly sales for January–June with values 12K, 18K, 15K, 22K, 19K, 25K. Axes are labeled ("Month" and "Sales (K)"), bars are color-coded, and the title "Monthly Sales 2025" is present. All text is readable and the chart type matches the request. VERDICT: PASS

A pie chart showing market share: Company A 35%, Company B 2FAIL
A pie chart showing market share: Company A 35%, Company B 25%, Company C 20%, Others 20% — each slice a different color with labels and percentages

Evaluator (Qwen 3 VL)

The chart is a pie chart with four colored slices. Company A (35%) is shown twice, incorrectly. Company B is labeled 25% but its slice is 25% of the circle. Company C is 20%, and "Others" is 20%. The percentages don't sum to 100% (35+35+25+20=115%). Labels and colors are readable, but data is inaccurate. VERDICT: FAIL

A simple line graph showing daily website visitors over one PASS
A simple line graph showing daily website visitors over one week (Mon-Sun): 1200, 1500, 1100, 1800, 2000, 1700, 1400 — with grid lines, axis labels, and title 'Weekly Traffic'

Evaluator (Qwen 3 VL)

The chart correctly displays a line graph titled "Weekly Traffic" with labeled axes and grid lines. Data points match the requested values: 1200, 1500, 1100, 1800, 2000, 1700, 1400 for Mon–Sun. All elements are clearly readable and the chart type is appropriate. VERDICT: PASS

Layout & DesignClick to expand100%
Easy 100%
A movie night poster with a large popcorn image in the centePASS
A movie night poster with a large popcorn image in the center, the title 'MOVIE NIGHT' in bold at the top, and the tagline 'Every Friday at 8 PM' at the bottom

Evaluator (Qwen 3 VL)

The image features "MOVIE NIGHT" in bold at the top, a large popcorn pile centered, and "Every Friday at 8 PM" at the bottom. Text is readable and correctly placed per the prompt. Visual hierarchy is clear with title dominant, popcorn central, and tagline below. All required elements are present and correctly positioned. VERDICT: PASS

An event poster featuring a guitar illustration, the headlinPASS
An event poster featuring a guitar illustration, the headline 'LIVE MUSIC FEST' in large text, and 'Tickets available now' as a tagline below

Evaluator (Qwen 3 VL)

The image displays a guitar illustration, the headline "LIVE MUSIC FEST" in large, bold text at the top, and "Tickets available now" as a tagline at the bottom. Text is readable and correctly placed with proper visual hierarchy. All required layout elements are present and clearly rendered. VERDICT: PASS

A book club poster with an open book illustration, the titlePASS
A book club poster with an open book illustration, the title 'JOIN THE BOOK CLUB' prominently displayed, and 'Meets every Wednesday' as the tagline

Evaluator (Qwen 3 VL)

The poster features an open book illustration, the title "JOIN THE BOOK CLUB" prominently at the top, and the tagline "Meets every Wednesday" correctly placed below it. Text is readable and hierarchy is clear. All required layout elements are present and correctly positioned. VERDICT: PASS

Medium 100%
A landing page design with a hero header image at the top, aPASS
A landing page design with a hero header image at the top, a headline and subtitle in the middle, a 'Sign Up' call-to-action button, and a footer with three social media icons

Evaluator (Qwen 3 VL)

The image shows a hero header with a man on a phone, followed by a centered headline “Welcome to Our Platform,” subtitle “Discover Exclusive Benefits Today,” and a “Sign Up” button. A footer with three social media icons (Facebook, Instagram, Twitter) is present at the bottom. All required layout elements are correctly placed with clear visual hierarchy. Text is readable and properly positioned. VERDICT: PASS

A newsletter layout with a banner image header, a two-columnPASS
A newsletter layout with a banner image header, a two-column body section with text on the left and an image on the right, a highlighted quote block, and a footer with unsubscribe link

Evaluator (Qwen 3 VL)

The image shows a newsletter layout with a header, two-column body (text left, image right), a highlighted quote block, and an unsubscribe footer. Text is readable and correctly placed. All required elements are present with appropriate visual hierarchy. The layout is clean and functional. VERDICT: PASS

A restaurant menu layout with a logo header, three category PASS
A restaurant menu layout with a logo header, three category sections (appetizers, mains, desserts) each with items and prices, and a footer with address and hours

Evaluator (Qwen 3 VL)

The image shows a menu with a logo header (“The Garden Bistro”), three category sections (Appetizers, Mains, Desserts) with items and prices, and a footer with address and hours. Text is readable, correctly placed, and visually hierarchized. All required layout elements are present and clearly defined. VERDICT: PASS

Hard 100%
A magazine double-page spread with a left page containing a PASS
A magazine double-page spread with a left page containing a large hero photo, and a right page with a two-column article, a pull quote, a sidebar with author bio, and a page number

Evaluator (Qwen 3 VL)

Left page has large hero photo. Right page shows two-column article, pull quote with quote marks, author bio (“Author: Jane Doe, Freelance Journalist”), and page number (12). Text is readable and correctly placed with proper visual hierarchy. All specified layout elements are present and correctly implemented. VERDICT: PASS

A mobile app UI mockup showing a dashboard with a top navigaPASS
A mobile app UI mockup showing a dashboard with a top navigation bar, a search field, a 2x2 grid of feature cards with icons, a list of recent activity items, and a bottom tab bar

Evaluator (Qwen 3 VL)

The image shows a dashboard UI with a top navigation bar, search field, 2x2 feature cards with icons and labels, recent activity list, and bottom tab bar. Text is readable and correctly placed. All required layout zones are present with appropriate visual hierarchy. The design is clean and functional. VERDICT: PASS

A fashion magazine spread with a full-bleed left-page photo,PASS
A fashion magazine spread with a full-bleed left-page photo, right page with three-column layout, headline, body copy, a smaller inset image, and styled page numbers

Evaluator (Qwen 3 VL)

Left page has full-bleed photo. Right page shows headline, body copy in two columns, inset image, and page number. Text is readable and correctly placed. Layout hierarchy is clear. All specified elements are present and correctly arranged. VERDICT: PASS

Style DiversityClick to expand92%
Easy 67%
A golden retriever sitting in a garden, painted in the stylePASS
A golden retriever sitting in a garden, painted in the style of an oil painting with visible brushstrokes and rich texture

Evaluator (Qwen 3 VL)

The image clearly depicts a golden retriever in a garden, rendered with thick, visible brushstrokes and rich texture characteristic of oil painting. The style is unmistakable, with impasto techniques and textured surfaces throughout. All key elements of the requested style are present and well-executed. VERDICT: PASS

A golden retriever sitting in a garden, rendered in JapaneseFAIL
A golden retriever sitting in a garden, rendered in Japanese anime style with flat colors and large expressive eyes

Evaluator (Qwen 3 VL)

The image shows a golden retriever in a garden, rendered in a stylized, cartoonish manner. It features large, expressive eyes and simplified forms, aligning with Japanese anime aesthetics. However, the style lacks flat colors—noticeable gradients and shading are present on the dog’s fur and background. The overall look is more painterly than flat. VERDICT: FAIL

A golden retriever sitting in a garden, rendered in pixel arPASS
A golden retriever sitting in a garden, rendered in pixel art style with a 32x32 grid and limited color palette

Evaluator (Qwen 3 VL)

The image clearly depicts a golden retriever in a garden using a pixel art style. Visible square pixels form the dog and background, and a limited color palette is evident. However, the 32x32 grid resolution is not discernible due to image scaling. The core pixel art style and subject are recognizable. VERDICT: PASS

Medium 100%
A city street scene in the style of Art Deco — geometric shaPASS
A city street scene in the style of Art Deco — geometric shapes, gold and black palette, ornamental symmetry

Evaluator (Qwen 3 VL)

The image clearly embodies Art Deco: bold geometric patterns, gold accents on black facades, ornamental symmetry in building facades and street lamps, and stylized forms. The palette is predominantly gold and black as requested. The composition emphasizes verticality and decorative motifs. All key visual characteristics are present and well-executed. VERDICT: PASS

A woman with a parasol in the style of Impressionism — soft PASS
A woman with a parasol in the style of Impressionism — soft brushstrokes, dappled light, pastel palette

Evaluator (Qwen 3 VL)

The image clearly depicts a woman with a parasol rendered in Impressionism. It features soft, visible brushstrokes, dappled light filtering through foliage, and a pastel palette of pinks, greens, and yellows. The style is unmistakably recognizable, capturing the movement and light characteristic of the movement. VERDICT: PASS

A portrait of a man in Pop Art style — bold outlines, Ben-DaPASS
A portrait of a man in Pop Art style — bold outlines, Ben-Day dots, flat vivid primary colors

Evaluator (Qwen 3 VL)

The image clearly exhibits Pop Art style: bold black outlines, Ben-Day dot patterns for shading, and flat, vivid primary colors (red, yellow, blue). Facial features are stylized with high contrast and simplified forms. Background uses solid color blocks with halftone dots. All requested visual characteristics are unmistakably present. VERDICT: PASS

Hard 100%
A forest rendered in photorealistic watercolor — the detail PASS
A forest rendered in photorealistic watercolor — the detail and lighting of a photo combined with visible watercolor bleeds and paper texture

Evaluator (Qwen 3 VL)

The image clearly depicts a forest in a watercolor style, featuring visible paper texture, soft bleeds, and translucent washes. Lighting mimics photographic realism with directional sunlight and depth. Detail is rendered with painterly softness, not photographic sharpness, but the core watercolor aesthetic and atmospheric lighting are present. VERDICT: PASS

A woman in a futuristic city rendered in cyberpunk Art NouvePASS
A woman in a futuristic city rendered in cyberpunk Art Nouveau — neon colors and tech elements combined with flowing organic lines and floral borders

Evaluator (Qwen 3 VL)

The image clearly blends cyberpunk (neon tech, futuristic city) with Art Nouveau (flowing floral borders, organic lines). The woman’s glowing suit and environment use neon colors and tech elements, while the ornate, flower-filled frame and swirling motifs strongly evoke Art Nouveau. Key visual characteristics are present and well-integrated. VERDICT: PASS

A medieval castle depicted in Japanese ukiyo-e woodblock priPASS
A medieval castle depicted in Japanese ukiyo-e woodblock print style combined with photorealistic sky and lighting

Evaluator (Qwen 3 VL)

The image clearly depicts a medieval castle rendered in ukiyo-e style: flat planes of color, bold outlines, and stylized clouds. The sky and lighting show photorealistic gradients and shadows, contrasting with the woodblock aesthetic. Key ukiyo-e characteristics like simplified forms and dramatic perspective are present. The fusion of styles is intentional and well-executed. VERDICT: PASS

Extreme 100%
An interior scene rendered in soft brutalism — raw concrete PASS
An interior scene rendered in soft brutalism — raw concrete textures combined with rounded pastel furniture and warm diffused light

Evaluator (Qwen 3 VL)

The image clearly shows raw concrete walls, ceiling, and floor — core brutalist elements. Soft, rounded pastel furniture (pink, mint, purple) contrasts beautifully with the industrial textures. Warm, diffused light from recessed ceiling fixtures and windows enhances the mood. All requested visual characteristics are present and well-executed. VERDICT: PASS

A still life painted in anxious minimalism — sparse compositPASS
A still life painted in anxious minimalism — sparse composition with uncomfortable negative space, muted colors, and subtle visual tension

Evaluator (Qwen 3 VL)

Sparse composition with a single sunflower in a cracked vase on a wooden table. Muted, desaturated tones dominate, punctuated by the flower’s yellow. Strong chiaroscuro lighting creates dramatic shadows and visual tension. The cracked wall and wilting flower enhance unease. Negative space is ample and intentional. The style aligns with anxious minimalism’s emotional weight and restrained aesthetic. VERDICT: PASS

A cityscape rendered in nostalgic futurism — retro-futuristiPASS
A cityscape rendered in nostalgic futurism — retro-futuristic 1960s space-age aesthetics with a melancholic warm-toned patina

Evaluator (Qwen 3 VL)

The image clearly embodies nostalgic futurism with retro-futuristic 1960s space-age aesthetics. Key elements include chrome robots, mushroom-shaped domes, orbital trains, and cosmic murals—all rendered in a melancholic, warm-toned patina with visible decay. The style is unmistakable and fully aligns with the prompt’s visual requirements. VERDICT: PASS