ImageBench

ImageBench V1 —

192 evaluations across 6 categories

158
Pass
34
Fail
82%
Pass Rate
16.6s
Avg Latency
6.4s
Min Latency
53.0s
Max Latency
Text Rendering93%Spatial Reasoning77%Human realism86%Truthfulness78%Professional Studio93%Graphical design75%Latency17%
The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink
Text RenderingTypography StyleEasyfal/ideogram/v4

Prompt: The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink

Text Rendering 93%

Typography StyleClick to expand100%
Easy 100%
The word 'CHAPTER ONE' typed on aged paper with a vintage tyPASS
The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink

Evaluator (Qwen 3 VL)

Text reads “CHAPTER ONE” correctly. Font mimics vintage typewriter style with blocky, slightly uneven ink strokes and subtle smudges. Paper texture appears aged and slightly creased. Ink variation and character spacing convincingly simulate typewriter output. All elements align with the prompt’s description. VERDICT: PASS

The phrase 'TODAY'S SPECIAL' written in white chalk on a darPASS
The phrase 'TODAY'S SPECIAL' written in white chalk on a dark green chalkboard with chalk dust

Evaluator (Qwen 3 VL)

Text reads "TODAY'S SPECIAL" correctly. Letters show white chalk texture with rough, uneven edges and smudges, convincingly mimicking chalk on a dark green board. Chalk dust effect is subtly implied in texture. Background is a plain chalkboard with minor wear. All elements align with the prompt. VERDICT: PASS

The word 'OPEN' displayed on a red LED sign with glowing pixPASS
The word 'OPEN' displayed on a red LED sign with glowing pixel-style letters

Evaluator (Qwen 3 VL)

The word "OPEN" is correctly spelled and rendered in a glowing, pixel-style LED matrix with distinct red light-emitting diodes. The style convincingly matches the described medium, showing individual illuminated dots forming each letter. The text is clear, properly formed, and visually consistent with an LED sign. VERDICT: PASS

Writing accuracyClick to expand92%
Easy 100%
A single large word 'HELLO' centered on a white backgroundPASS
A single large word 'HELLO' centered on a white background

Evaluator (Qwen 3 VL)

The word "HELLO" is fully spelled correctly and is clearly legible in bold, uppercase letters. No misspellings, missing letters, or illegible parts are present. The text is centered on a white background as intended. All criteria are met. VERDICT: PASS

The word 'ADVENTURE' in big bold letters centered on a plainPASS
The word 'ADVENTURE' in big bold letters centered on a plain cream background

Evaluator (Qwen 3 VL)

The word "ADVENTURE" is spelled correctly and fully readable in bold, centered text on a plain cream background. All letters are distinct and legible. No misspellings or missing parts. The design matches the intended specification exactly. VERDICT: PASS

The word 'SERENITY' written large and centered on a solid liPASS
The word 'SERENITY' written large and centered on a solid light blue background

Evaluator (Qwen 3 VL)

The word "SERENITY" is spelled correctly and fully readable, centered on a solid light blue background. All letters are distinct and legible. No misspellings or missing words. The text matches the intended specification exactly. VERDICT: PASS

Medium 100%
The sentence 'The quick brown fox jumps high' in clean blackPASS
The sentence 'The quick brown fox jumps high' in clean black text on a white background

Evaluator (Qwen 3 VL)

Text reads exactly: "The quick brown fox jumps high". All words are spelled correctly and fully legible against the white background. No misspellings or missing words detected. Text matches the intended phrase precisely. VERDICT: PASS

'Every sunset brings a new dawn ahead' written neatly on a pPASS
'Every sunset brings a new dawn ahead' written neatly on a pale yellow background

Evaluator (Qwen 3 VL)

Text reads: "Every sunset brings a new dawn ahead". All words are spelled correctly and fully legible against the pale yellow background. No misspellings or missing words detected. The phrase matches the intended text exactly. VERDICT: PASS

'Stars shine brightest in the dark night' displayed on a clePASS
'Stars shine brightest in the dark night' displayed on a clean off-white background

Evaluator (Qwen 3 VL)

Text reads exactly: "Stars shine brightest in the dark night". All words are spelled correctly, fully present, and clearly legible against the off-white background. No misspellings or missing words detected. VERDICT: PASS

Hard 100%
A poster with headline 'SUMMER SALE' in large bold text, bodPASS
A poster with headline 'SUMMER SALE' in large bold text, body text 'Up to 50% off all items this weekend only' in medium size, and fine print 'Terms and conditions apply. See store for details.' at the bottom in small text

Evaluator (Qwen 3 VL)

All text matches the intended content exactly. "SUMMER SALE", "Up to 50% off all items this weekend only", and "Terms and conditions apply. See store for details." are all spelled correctly and fully readable. No misspellings or illegible parts detected. VERDICT: PASS

A flyer with large title 'GRAND OPENING', subtitle 'Join us PASS
A flyer with large title 'GRAND OPENING', subtitle 'Join us for food, music, and fun' in medium text, and small disclaimer 'RSVP required. Limited seating available.' at the bottom

Evaluator (Qwen 3 VL)

All text matches the intended content exactly: “GRAND OPENING”, “Join us for food, music, and fun”, and “RSVP required. Limited seating available.”. Every word is spelled correctly and fully readable, with no missing or illegible parts. Font sizes and hierarchy are appropriate. VERDICT: PASS

An event banner with headline 'ANNUAL GALA 2025' at the top PASS
An event banner with headline 'ANNUAL GALA 2025' at the top in large serif font, description 'An evening of fine dining and live jazz' in medium text, and footnote 'Black tie optional. Doors open at 6 PM.' in small text

Evaluator (Qwen 3 VL)

All text matches the intended content exactly: "ANNUAL GALA 2025", "An evening of fine dining and live jazz", and "Black tie optional. Doors open at 6 PM.". No misspellings, missing words, or illegible parts are present. Font sizes and hierarchy are appropriate. VERDICT: PASS

Extreme 67%
A vintage movie poster for 'THE LAST VOYAGE' featuring a tagFAIL
A vintage movie poster for 'THE LAST VOYAGE' featuring a tagline 'One ship. One chance. No turning back.' with credits block reading 'Directed by James Harlow. Starring Elena Voss, Marcus Chen, and Sofia Reyes. Music by Thomas Grant. A Paramount Pict

Evaluator (Qwen 3 VL)

Text contains multiple errors: “Elena Voss” → “Elenca Voxxs”, “Marcus Chen” → “vazcus Chen, Chen,”, “Thomas Grant” → “Thotmes Grant”, “Paramount Pictures Release” → “Plasyoc Reslanes”, “Music by Thomas Grant” → “Musit by Thotmes Grant”, “Coming Summer 2025” → “Coming Summer 2025.” Several names and credits are garbled or missing. VERDICT: FAIL

A magazine cover titled 'NATIONAL EXPLORER' with cover linesPASS
A magazine cover titled 'NATIONAL EXPLORER' with cover lines 'The Hidden Temples of Cambodia — page 34', 'Interview: Dr. Maria Santos on Climate Action', 'Top 10 Trails You Must Hike This Year', and a barcode area with 'Issue 247 | March 2025 | $6.99

Evaluator (Qwen 3 VL)

All text matches the intended content exactly: "NATIONAL EXPLORER", "The Hidden Temples of Cambodia — page 34", "Interview: Dr. Maria Santos on Climate Action", "Top 10 Trails You Must Hike This Year", and "Issue 247 | March 2025 | $6.99". All words are spelled correctly and fully readable. No misspellings or illegible parts detected. VERDICT: PASS

A product label for 'GOLDEN HARVEST ORGANIC HONEY' with subtPASS
A product label for 'GOLDEN HARVEST ORGANIC HONEY' with subtitle 'Raw & Unfiltered', weight '12 oz (340g)', description 'Pure wildflower honey harvested from sustainable apiaries in the Pacific Northwest. No additives or preservatives.', and small te

Evaluator (Qwen 3 VL)

All text matches the intended content exactly. Spelling, punctuation, and wording are correct and fully readable. No misspellings, missing words, or illegible parts detected. The layout and typography are clear and consistent with the specification. VERDICT: PASS

Spatial Reasoning 77%

Attributes BindingClick to expand89%
Easy 100%
A red ball and a blue cube sitting side by side on a white sPASS
A red ball and a blue cube sitting side by side on a white surface

Evaluator (Qwen 3.5 122B)

The image shows two objects on a white surface: a red sphere (ball) and a blue cube. The red ball is on the left, the blue cube on the right — matching the prompt’s description. Colors, shapes, and spatial arrangement are correctly bound with no swaps. Size appears comparable and appropriate for “side by side” placement. VERDICT: PASS

A green apple and an orange pumpkin on a wooden cutting boarPASS
A green apple and an orange pumpkin on a wooden cutting board

Evaluator (Qwen 3.5 122B)

The image shows a green apple and an orange pumpkin placed on a wooden cutting board. The apple is correctly colored green and has the typical round shape of an apple. The pumpkin is correctly colored orange and has the characteristic ribbed, rounded shape of a pumpkin. Both objects are appropriately sized relative to each other and the board. No attributes are swapped or misassigned. VERDICT: PASS

A yellow taxi and a black limousine parked next to each othePASS
A yellow taxi and a black limousine parked next to each other on a street

Evaluator (Qwen 3.5 122B)

The image shows a yellow taxi on the left and a black limousine on the right, parked next to each other on a street. The taxi is correctly colored yellow and has taxi markings. The limousine is correctly colored black and has the elongated shape of a limousine. All attributes (color, size, shape) are correctly bound to the right objects with no swaps. VERDICT: PASS

Medium 100%
A large red circle, a small blue triangle, and a medium greePASS
A large red circle, a small blue triangle, and a medium green star arranged on a white canvas

Evaluator (Qwen 3.5 122B)

The image contains three distinct geometric shapes on a white background. The large red circle is present in the upper left. The small blue triangle is located on the right side. The medium green star is positioned at the bottom center. All colors, sizes, and shapes are correctly bound to their respective objects as described in the prompt. VERDICT: PASS

A tall thin white candle, a short fat red candle, and a mediPASS
A tall thin white candle, a short fat red candle, and a medium spiral yellow candle on a shelf

Evaluator (Qwen 3.5 122B)

The image shows three candles on a shelf. From left to right: 1. Tall, thin, white candle — matches description. 2. Short, fat, red candle — matches description. 3. Medium-height, spiral-shaped, yellow candle — matches description. All attributes (color, size, shape) are correctly bound to the right objects with no swaps. VERDICT: PASS

A big brown horse, a small white rabbit, and a medium-sized PASS
A big brown horse, a small white rabbit, and a medium-sized black cat standing in a meadow

Evaluator (Qwen 3.5 122B)

The image shows a large brown horse with black lower legs and tail, standing in a grassy meadow. In front of it are a small white rabbit and a medium-sized black cat sitting on the grass. All objects match their described attributes: horse is big and brown, rabbit is small and white, cat is medium and black. No attribute swaps or misidentifications observed. VERDICT: PASS

Hard 67%
A pink fire truck, a blue banana, and a green sun in a surrePASS
A pink fire truck, a blue banana, and a green sun in a surreal landscape

Evaluator (Qwen 3.5 122B)

The image contains: - A pink fire truck (correct color and object) - A blue banana (correct color and object) - A green sun (correct color and object) All attributes are correctly bound to their respective objects with no swaps. The surreal landscape is also present with purple hills and other whimsical elements. VERDICT: PASS

A purple carrot, a red tree with red leaves, and a white watFAIL
A purple carrot, a red tree with red leaves, and a white watermelon sliced open

Evaluator (Qwen 3.5 122B)

The image contains a purple carrot with green leaves, correctly matching the prompt. The tree has red leaves and a brown trunk, also matching. However, the watermelon is not white — it’s green on the outside and pink/red inside, which contradicts the “white watermelon” requirement. Additionally, there’s a small orange fruit on top of the watermelon slice, which wasn’t requested. Attributes are mostly bound correctly except for the watermelon color. VERDICT: FAIL

An orange elephant, a yellow penguin, and a silver strawberrPASS
An orange elephant, a yellow penguin, and a silver strawberry arranged on a table

Evaluator (Qwen 3.5 122B)

The image shows three objects on a blue surface: 1. A yellow penguin with black feet and beak, white chest — matches “yellow penguin” 2. An orange elephant with raised trunk — matches “orange elephant” 3. A silver strawberry with textured surface and green leaves — matches “silver strawberry” All attributes (color, shape, object type) are correctly bound. No swaps or mismatches observed. VERDICT: PASS

CompositionalityClick to expand100%
Easy 100%
A tabby cat wearing oversized red sunglassesPASS
A tabby cat wearing oversized red sunglasses

Evaluator (Qwen 3.5 122B)

The image contains all required elements: - Tabby cat: Present, with clear tabby markings and typical feline features - Oversized red sunglasses: Present, large red-framed glasses covering the cat's eyes - Coherent combination: The sunglasses fit naturally on the cat's face without distortion All described elements are present and properly combined in a realistic manner. VERDICT: PASS

A golden retriever wearing a tiny cowboy hatPASS
A golden retriever wearing a tiny cowboy hat

Evaluator (Qwen 3.5 122B)

The image contains all required elements: 1. Golden retriever - Present (clearly visible, golden fur, correct breed features) 2. Tiny cowboy hat - Present (small brown hat with characteristic brim and crown) 3. Coherent combination - Present (hat sits naturally on dog's head, proportions are appropriate) All described elements are accurately represented and logically combined in the image. VERDICT: PASS

A penguin holding a colorful umbrellaPASS
A penguin holding a colorful umbrella

Evaluator (Qwen 3.5 122B)

The image shows a penguin standing on snow/ice. The penguin is holding a colorful umbrella (red, blue, yellow panels) with its flipper. All described elements — penguin, colorful umbrella, and the act of holding it — are present and coherently combined. No contradictions or missing components. VERDICT: PASS

Medium 100%
A medieval knight in full armor sitting at a desk typing on PASS
A medieval knight in full armor sitting at a desk typing on a modern laptop

Evaluator (Qwen 3.5 122B)

The image shows a medieval knight in full armor sitting at a wooden desk, typing on a modern laptop. All described elements are present and coherently combined: - Medieval knight in full armor: Present - Sitting at a desk: Present - Typing on a modern laptop: Present The scene is well-integrated, with the knight’s hands positioned as if typing on the laptop, and the setting (stone walls, chandelier) complements the medieval theme. VERDICT: PASS

A Roman gladiator taking a selfie with a smartphone in the CPASS
A Roman gladiator taking a selfie with a smartphone in the Colosseum

Evaluator (Qwen 3.5 122B)

The image contains all required elements: 1. Roman gladiator - Present (wearing armor, skirt, greaves) 2. Selfie with smartphone - Present (holding phone up in selfie pose) 3. Colosseum - Present (ancient Roman amphitheater structure visible) All three core concepts are clearly depicted and coherently combined in a single scene. The anachronistic combination of ancient gladiator with modern smartphone is intentionally portrayed as requested. VERDICT: PASS

An astronaut in a spacesuit riding a bicycle through a parkPASS
An astronaut in a spacesuit riding a bicycle through a park

Evaluator (Qwen 3.5 122B)

The image contains all required elements: 1. Astronaut in a spacesuit - PRESENT (full white suit with helmet, American flag patch, and backpack) 2. Riding a bicycle - PRESENT (astronaut is seated on and pedaling a bicycle) 3. Through a park - PRESENT (green grass, trees, paved path, and park setting visible) All elements are coherently combined in a single scene. VERDICT: PASS

Hard 100%
An octopus wearing a top hat and monocle playing chess on thPASS
An octopus wearing a top hat and monocle playing chess on the surface of the moon with Earth visible in the background

Evaluator (Qwen 3.5 122B)

The image contains all described elements: - Octopus: Present (center, orange) - Top hat: Present (on octopus head) - Monocle: Present (on octopus eye) - Chessboard with pieces: Present (in front of octopus) - Moon surface: Present (gray, cratered terrain) - Earth in background: Present (large, blue/white planet) All elements are coherently combined in a surreal but visually consistent composition. VERDICT: PASS

A dinosaur in a business suit giving a PowerPoint presentatiPASS
A dinosaur in a business suit giving a PowerPoint presentation in a modern office while drinking coffee

Evaluator (Qwen 3.5 122B)

Analysis of elements: 1. Dinosaur: Present (T-Rex). 2. Business suit: Present (grey suit, tie, shirt). 3. Giving a PowerPoint presentation: Partially present. There is a screen showing a chart titled "Q3 GROWTH PROJECTED", but the dinosaur is not actively presenting (e.g., pointing or gesturing at it); he is standing away from it. 4. Modern office: Present (glass walls, city view, wood floors). 5. Drinking coffee: Present (holding a white mug). The dinosaur is holding a coffee mug and wearing a

A robot dressed as a pirate sailing a paper boat on a lake oPASS
A robot dressed as a pirate sailing a paper boat on a lake of lava while juggling three kittens

Evaluator (Qwen 3.5 122B)

The image contains all described elements: - Robot dressed as a pirate: Present (metallic robot with pirate hat and sash) - Paper boat: Present (white origami-style boat) - Lake of lava: Present (glowing red-orange molten surface) - Three kittens: Present (orange, black-and-white, and gray tabby kittens appearing to float/juggle around the robot) All elements are coherently combined in a surreal but visually consistent composition. VERDICT: PASS

CountingClick to expand22%
Easy 33%
Exactly 3 red apples arranged on a wooden tablePASS
Exactly 3 red apples arranged on a wooden table

Evaluator (Qwen 3.5 122B)

The image displays exactly three red apples positioned on a wooden surface. The count matches the requested quantity precisely. The apples are arranged in a triangular formation, and the wooden texture is consistent with a table. No extra or missing objects are present. VERDICT: PASS

Exactly 5 yellow rubber ducks sitting in a row on a bathroomFAIL
Exactly 5 yellow rubber ducks sitting in a row on a bathroom shelf

Evaluator (Qwen 3.5 122B)

The image shows six yellow rubber ducks arranged in a row on a white shelf. The prompt requested exactly five ducks. Since there are six visible ducks (including one partially cut off on the far left), the count does not match the requested number. VERDICT: FAIL

Exactly 4 white candles standing on a dark surfaceFAIL
Exactly 4 white candles standing on a dark surface

Evaluator (Qwen 3.5 122B)

The image shows exactly 4 white candles, which matches the requested count. However, they are standing on a transparent/checkered background (not a dark surface), and there is no visible surface — only reflections suggesting one. The core object count is correct, but the surface condition fails the prompt’s requirement. VERDICT: FAIL

Medium 33%
2 orange cats and 3 black dogs sitting together on a grassy PASS
2 orange cats and 3 black dogs sitting together on a grassy lawn

Evaluator (Qwen 3.5 122B)

The image shows 2 orange cats and 3 black dogs sitting together on a grassy lawn. The count exactly matches the requested objects: two orange cats (one on each end) and three black dogs (in the center). All animals are present and correctly identified. VERDICT: PASS

4 red roses and 2 white daisies in a glass vase on a tableFAIL
4 red roses and 2 white daisies in a glass vase on a table

Evaluator (Qwen 3.5 122B)

The image shows a bouquet with 6 red roses and at least 3 white daisies (possibly more partially hidden). The prompt requested exactly 4 red roses and 2 white daisies. The counts do not match — there are more roses and daisies than specified. Therefore, the core requirement of exact object count is not met. VERDICT: FAIL

1 blue butterfly and 3 yellow butterflies resting on a gardeFAIL
1 blue butterfly and 3 yellow butterflies resting on a garden fence

Evaluator (Qwen 3.5 122B)

The image shows 1 blue butterfly on the top rail. Below, there is 1 yellow/black swallowtail, 1 yellow butterfly, and 2 pale yellow/green butterflies on the lower rail. This totals 1 blue and 4 yellow butterflies. The prompt requested 1 blue and 3 yellow butterflies. The count does not exactly match. VERDICT: FAIL

Hard 0%
Exactly 9 goldfish swimming in a large round fishbowlFAIL
Exactly 9 goldfish swimming in a large round fishbowl

Evaluator (Qwen 3.5 122B)

The image shows a round fishbowl with water and green plants. Counting the goldfish: there are 10 visible fish (not 9). The prompt requested exactly 9 goldfish, so the count does not match. Even though the bowl is large and round, the fish count is incorrect. VERDICT: FAIL

A fruit bowl containing exactly 3 bananas, 4 oranges, and 2 FAIL
A fruit bowl containing exactly 3 bananas, 4 oranges, and 2 apples

Evaluator (Qwen 3.5 122B)

The image shows a fruit bowl with 3 bananas, 3 oranges, and 2 apples. The prompt requested exactly 3 bananas, 4 oranges, and 2 apples. While the banana and apple counts are correct, the orange count is off by one (3 instead of 4). Therefore, the total object count does not exactly match the request. VERDICT: FAIL

Exactly 11 colorful balloons floating against a clear blue sFAIL
Exactly 11 colorful balloons floating against a clear blue sky

Evaluator (Qwen 3.5 122B)

The image contains 20 colorful balloons floating against a clear blue sky. The prompt requested exactly 11 balloons, but the actual count is nearly double that. Since the number of objects does not match the specified quantity, the core criterion is not met. VERDICT: FAIL

NegationClick to expand56%
Easy 67%
A tropical beach scene with no peoplePASS
A tropical beach scene with no people

Evaluator (Qwen 3.5 122B)

The image depicts a serene tropical beach with white sand, turquoise water, and palm trees under a clear blue sky. There are no visible people in the scene — no individuals on the beach, in the water, or in the background foliage. The exclusion criterion (“no people”) is fully satisfied. VERDICT: PASS

A cat lying on a couch without a collarFAIL
A cat lying on a couch without a collar

Evaluator (Qwen 3.5 122B)

The image shows a tabby cat lying on a beige couch. Upon close inspection of the cat’s neck, there is a thin, dark band visible — consistent with a collar. Since the prompt explicitly excluded “a cat lying on a couch without a collar,” and the cat in the image appears to be wearing one, the excluded element is present. VERDICT: FAIL

A classroom with desks and a chalkboard but no studentsPASS
A classroom with desks and a chalkboard but no students

Evaluator (Qwen 3.5 122B)

The image shows an empty classroom with wooden desks and chairs arranged in rows. A large chalkboard is mounted on the front wall with "WELCOME BACK TO SCHOOL" written on it. There are maps on the right wall, windows on the left letting in sunlight, and a speaker mounted high on the wall. No students are visible anywhere in the scene. VERDICT: PASS

Medium 100%
A sandy beach with palm trees, seashells, and a beach umbrelPASS
A sandy beach with palm trees, seashells, and a beach umbrella but no water or ocean visible

Evaluator (Qwen 3.5 122B)

The image shows a sandy beach with several palm trees, a pile of large seashells at the base of the trees, and a yellow-and-white striped beach umbrella on the left. The background consists of more sand and a clear sky — there is no visible water, ocean, or sea horizon line. The excluded element (water/ocean) is truly absent. VERDICT: PASS

An office workspace with a desk, chair, bookshelf, and lamp PASS
An office workspace with a desk, chair, bookshelf, and lamp but no computers or screens

Evaluator (Qwen 3.5 122B)

The image shows a traditional office workspace featuring a wooden desk, a leather chair, a tall bookshelf filled with books, and a green banker’s lamp. There are no computers, monitors, tablets, or any electronic screens visible in the scene. The excluded elements are fully absent. VERDICT: PASS

A kitchen with pots, pans, a stove, and a cutting board but PASS
A kitchen with pots, pans, a stove, and a cutting board but no food

Evaluator (Qwen 3.5 122B)

The image shows a kitchen with pots, pans, a stove, and a cutting board. The excluded element — “no food” — is correctly absent: there are no visible food items on the counter, stove, or cutting board. Bottles of oil/vinegar, spice jars, and kitchen tools are present, but none constitute food. The scene adheres to the constraint. VERDICT: PASS

Hard 0%
A pepperoni pizza with tomato sauce, basil, and crust but abFAIL
A pepperoni pizza with tomato sauce, basil, and crust but absolutely no cheese

Evaluator (Qwen 3.5 122B)

The image shows a pepperoni pizza with tomato sauce, basil leaves, and a crust. However, there is clearly melted white cheese visible beneath and around the pepperoni slices and basil leaves. The prompt explicitly excluded cheese, but cheese is present in the image. VERDICT: FAIL

A car body with windows, doors, and headlights but no wheelsFAIL
A car body with windows, doors, and headlights but no wheels

Evaluator (Qwen 3.5 122B)

The image shows a gray sedan with windows, doors, and headlights clearly visible. However, the wheels are not fully absent — both front and rear wheel wells expose brake rotors and suspension components, indicating wheels have been removed. Since the prompt explicitly excluded “no wheels,” and the image still shows wheel assemblies (rotors, hubs), the excluded element is partially present. The core intent of “no wheels” is violated. VERDICT: FAIL

A human face with eyes, nose, and mouth but no eyebrowsFAIL
A human face with eyes, nose, and mouth but no eyebrows

Evaluator (Qwen 3.5 122B)

The image shows a close-up of a human face with clearly visible eyes, nose, and mouth. Importantly, the person has well-defined eyebrows above each eye — they are dark, shaped, and natural-looking. Since the prompt explicitly excluded eyebrows (“no eyebrows”), and they are present in the image, the core criterion is violated. VERDICT: FAIL

Relative PositionClick to expand92%
Easy 100%
A cat sitting on top of a cardboard boxPASS
A cat sitting on top of a cardboard box

Evaluator (Qwen 3.5 122B)

The image shows an orange tabby cat sitting directly on top of a brown cardboard box. The box is placed on a wooden floor, and the cat is centered on the box’s upper surface. All spatial relationships described in the prompt — specifically “on top of” — are accurately represented. No positional errors are present. VERDICT: PASS

A red ball underneath a wooden chairPASS
A red ball underneath a wooden chair

Evaluator (Qwen 3.5 122B)

The image shows a red ball positioned on the floor directly beneath the seat of a wooden chair. The ball is centered between the front legs of the chair, clearly “underneath” it as specified. All spatial relationships described in the prompt are accurately represented. VERDICT: PASS

A coffee mug next to an open book on a deskPASS
A coffee mug next to an open book on a desk

Evaluator (Qwen 3.5 122B)

The image shows a white coffee mug filled with dark liquid, positioned to the right of an open book on a wooden desk. The book is open and lies flat on the desk surface. The spatial relationship described — “a coffee mug next to an open book on a desk” — is accurately represented: the mug is adjacent (to the right) of the book, and both rest on the desk. No positional errors are present. VERDICT: PASS

Medium 100%
A bird perched on a branch, the branch extending from a treePASS
A bird perched on a branch, the branch extending from a tree next to a park bench

Evaluator (Qwen 3.5 122B)

The image shows a bird perched on a branch that extends from a tree. The branch is positioned above and slightly in front of a park bench, which sits on grass below. The spatial relationships described — bird “on” branch, branch “next to” bench (with implied proximity and adjacency), and bench “under” the branch — are all visually accurate. No contradictions exist. VERDICT: PASS

A cat sitting on a pillow, the pillow placed on a wooden chaPASS
A cat sitting on a pillow, the pillow placed on a wooden chair

Evaluator (Qwen 3.5 122B)

The image shows an orange tabby cat sitting directly on a floral-patterned pillow. The pillow is placed on the seat of a wooden chair. All spatial relationships described in the prompt — “cat on pillow,” “pillow on chair” — are accurately represented. The cat is above the pillow, the pillow is above the chair seat, and no conflicting left/right/under/above relationships are violated. VERDICT: PASS

A candle on top of a stack of books, the books next to a potPASS
A candle on top of a stack of books, the books next to a potted plant

Evaluator (Qwen 3.5 122B)

The image shows a candle placed directly on top of a stack of three books. To the right of the books is a potted plant in a terracotta pot. All specified spatial relationships — “candle on top of books” and “books next to potted plant” — are accurately represented. The arrangement matches the prompt’s description precisely. VERDICT: PASS

Hard 100%
Five chess pieces on a board: a king in the center, a queen PASS
Five chess pieces on a board: a king in the center, a queen to its left, a bishop to its right, a rook above the king, and a knight below the king

Evaluator (Qwen 3.5 122B)

The image shows five light-colored wooden chess pieces on a checkered board. The king is centrally located. To its left is a queen. To its right is a bishop. Above the king (in the background) is a rook. Below the king (in the foreground) is a knight. All described spatial relationships — left, right, above, below, and center — are accurately represented in the image. VERDICT: PASS

A desk scene: laptop in the center, coffee mug to the left, PASS
A desk scene: laptop in the center, coffee mug to the left, phone to the right, lamp above, and notebook below

Evaluator (Qwen 3.5 122B)

The image shows a top-down view of a desk with a laptop centered. To the left of the laptop is a coffee mug (and a small container of coffee beans). To the right is a smartphone. Above the laptop is a black desk lamp. Below the laptop is a dark notebook. All specified spatial relationships — left, right, above, below, and center — are accurately represented. VERDICT: PASS

Five animals in a field: a horse on the far left, a cow to tPASS
Five animals in a field: a horse on the far left, a cow to the right of the horse, a sheep in the center, a goat to the right of the sheep, and a dog on the far right

Evaluator (Qwen 3.5 122B)

The image shows five animals in a field arranged left to right: a brown horse, a black-and-white cow, a white sheep, a light-brown goat, and a sitting German Shepherd dog. The spatial relationships described — horse on far left, cow to its right, sheep in center, goat to sheep’s right, dog on far right — are all accurately reflected in the image. All positional descriptors match the visual layout. VERDICT: PASS

Extreme 67%
A landscape with a red barn in the foreground at large scalePASS
A landscape with a red barn in the foreground at large scale, a windmill in the midground at medium scale, and snow-capped mountains in the background at small scale, all with consistent depth perspective

Evaluator (Qwen 3.5 122B)

The image shows a red barn in the foreground at large scale, a windmill slightly behind and to the left (midground, medium scale), and snow-capped mountains in the background at smaller scale — consistent with depth perspective. All positional relationships (foreground/midground/background) are correctly rendered. No contradictions in spatial layout. VERDICT: PASS

A beach scene with a sandcastle in the foreground, a surfer PASS
A beach scene with a sandcastle in the foreground, a surfer on the waves in the midground, and a cruise ship on the horizon in the background, sizes decreasing with distance

Evaluator (Qwen 3.5 122B)

The image shows a beach scene with a detailed sandcastle in the foreground, two surfers riding waves in the midground, and a cruise ship on the horizon in the background. The spatial arrangement matches the prompt: sandcastle closest, surfers further back on waves, ship farthest away. Sizes decrease appropriately with distance. All positional relationships are correctly represented. VERDICT: PASS

A city park with a park bench and pigeons in the foreground,FAIL
A city park with a park bench and pigeons in the foreground, a fountain in the midground, and skyscrapers in the background, each layer progressively smaller

Evaluator (Qwen 3.5 122B)

The image shows a city park with a bench and pigeons in the foreground, a fountain in the midground, and skyscrapers in the background — matching the prompt’s spatial layering. However, the prompt specifies “each layer progressively smaller,” which is not visually accurate: the fountain appears larger than the distant skyscrapers due to perspective, but the skyscrapers themselves are massive structures that would be larger than the fountain if viewed at same scale. The phrase “progressively smal

Scale & ProportionsClick to expand100%
Easy 100%
A person standing next to a yellow school bus, showing realiPASS
A person standing next to a yellow school bus, showing realistic relative scale

Evaluator (Qwen 3.5 122B)

The image shows a young boy standing next to a large yellow school bus. The relative scale appears realistic: the bus is significantly larger than the child, with its front wheel alone being nearly as tall as the boy. The perspective and proportions of both the person and the vehicle are consistent with real-world expectations. VERDICT: PASS

A ladybug sitting on a large green leaf, shown at realistic PASS
A ladybug sitting on a large green leaf, shown at realistic proportions

Evaluator (Qwen 3.5 122B)

The ladybug appears appropriately sized relative to the leaf surface it rests on, consistent with real-world proportions. The leaf’s vein structure and texture further support realistic scaling. No exaggerated or distorted size relationships are evident. VERDICT: PASS

A child holding a basketball, both at correct relative sizesPASS
A child holding a basketball, both at correct relative sizes

Evaluator (Qwen 3.5 122B)

The child’s hands are holding the basketball, and the ball appears appropriately sized relative to the child’s torso and arms — consistent with a real-world basketball held by a young boy. No anatomical or scale distortions are visible. The proportions are natural and physically plausible. VERDICT: PASS

Medium 100%
A tiny person standing on the rim of a coffee cup, looking dPASS
A tiny person standing on the rim of a coffee cup, looking down into it

Evaluator (Qwen 3.5 122B)

The image shows a tiny person standing on the rim of a coffee cup, looking down into it. The person is appropriately scaled relative to the cup — small enough to be “tiny” in context, yet clearly visible and proportionally consistent with the cup’s dimensions. The perspective and lighting support the surreal scale without visual contradictions. Core size relationship is accurately represented. VERDICT: PASS

A giant tabby cat walking between city skyscrapers like a kaPASS
A giant tabby cat walking between city skyscrapers like a kaiju

Evaluator (Qwen 3.5 122B)

The image shows a tabby cat that is disproportionately large compared to the surrounding skyscrapers, fitting the "giant" and "kaiju" description. The cat towers over the buildings, with its head reaching near the top of the frame while its paws are positioned at street level. The perspective and scale are intentionally exaggerated to match the prompt's requirement for a giant creature among city structures. VERDICT: PASS

A miniature house sitting on the palm of someone's handPASS
A miniature house sitting on the palm of someone's hand

Evaluator (Qwen 3.5 122B)

The image shows a small wooden house model resting in the palm of a human hand. The house is appropriately sized to fit comfortably within the palm, with its dimensions appearing consistent with a miniature or toy-scale object relative to the hand’s anatomy. The proportions between the house and the hand are realistic and visually coherent. VERDICT: PASS

Hard 100%
A group photo of an elephant, a horse, a dog, a cat, and a mPASS
A group photo of an elephant, a horse, a dog, a cat, and a mouse, all standing in a line with correct real-world proportional sizes

Evaluator (Qwen 3.5 122B)

The image displays an elephant, horse, dog, cat, and mouse arranged in a line from largest to smallest. The elephant is the largest, followed by the horse, then the dog, cat, and finally the mouse as the smallest. The relative sizes are consistent with real-world proportions. VERDICT: PASS

A dinner table scene with a plate, fork, knife, wine glass, PASS
A dinner table scene with a plate, fork, knife, wine glass, salt shaker, and peppercorn, all at realistic relative sizes

Evaluator (Qwen 3.5 122B)

The image displays a top-down view of a table setting. The plate is centrally located with a fork to its left and a knife to its right. A salt shaker is positioned above the fork, and a wine glass is above the knife. Scattered peppercorns are near the top center. All objects appear to be at realistic relative sizes for a dinner table scene. The proportions between the plate, cutlery, glass, and shaker are consistent with real-world expectations. VERDICT: PASS

A farm scene with a barn, tractor, farmer, cow, chicken, andPASS
A farm scene with a barn, tractor, farmer, cow, chicken, and a fence post, all at correct proportional scale

Evaluator (Qwen 3.5 122B)

The image displays a farm scene with a red barn, green tractor, farmer, cows, chickens, and fence posts. The relative sizes appear consistent: the barn is largest, followed by the tractor and cows, then the farmer, with chickens being smallest. Fence posts are appropriately scaled relative to the animals and structures. No obvious distortions or scale inconsistencies are visible. VERDICT: PASS

Human realism 86%

Faces & ExpressionsClick to expand100%
Easy 100%
A close-up portrait of a young woman with a warm natural smiPASS
A close-up portrait of a young woman with a warm natural smile

Evaluator (Qwen 3 VL)

Face is symmetric with natural features, no distortions. Warm, genuine smile clearly conveys friendliness. Lighting and composition enhance the natural look. All elements align with the prompt’s request for a young woman with a warm natural smile. VERDICT: PASS

A portrait of a middle-aged man with a calm, serious expressPASS
A portrait of a middle-aged man with a calm, serious expression

Evaluator (Qwen 3 VL)

Face is symmetric with natural, correctly proportioned features. Expression is calm and serious, with neutral lips and steady gaze. No visible distortions or artifacts. Lighting and texture appear realistic. Meets all criteria for the prompt. VERDICT: PASS

A headshot of a teenager with a relaxed, neutral facePASS
A headshot of a teenager with a relaxed, neutral face

Evaluator (Qwen 3 VL)

Face is symmetric with natural, correctly proportioned features. Expression is neutral and relaxed, matching the prompt. No visible distortions or artifacts. Lighting and focus are professional. The teenager’s gaze is steady and composed. VERDICT: PASS

Medium 100%
A close-up of a person's face showing clear disgust, nose wrPASS
A close-up of a person's face showing clear disgust, nose wrinkled and lip curled

Evaluator (Qwen 3 VL)

Face is symmetric with natural features. Disgust is clearly conveyed: nose wrinkled, lips curled, brows furrowed. No significant distortions. Skin texture and lighting are realistic. Emotion is unambiguous and well-executed. VERDICT: PASS

A portrait capturing genuine surprise — raised eyebrows, widPASS
A portrait capturing genuine surprise — raised eyebrows, wide eyes, slightly open mouth

Evaluator (Qwen 3 VL)

Face is symmetric with natural features. Eyes are wide, eyebrows raised, mouth slightly open — clearly conveying genuine surprise. No visible distortions. Skin texture and hair look authentic. All key elements of the prompt are accurately rendered. VERDICT: PASS

A face showing quiet melancholy — downcast eyes, slightly drPASS
A face showing quiet melancholy — downcast eyes, slightly drawn mouth, distant gaze

Evaluator (Qwen 3 VL)

Face is symmetric with natural, correctly rendered features. Eyes are downcast, lips slightly drawn, gaze distant — clearly conveying quiet melancholy. No visible distortions. Lighting and texture are realistic. Emotion is unmistakable and aligns perfectly with the prompt. VERDICT: PASS

Hard 100%
An elderly woman in her 80s with deep wrinkles and white haiPASS
An elderly woman in her 80s with deep wrinkles and white hair, smiling warmly with kind eyes

Evaluator (Qwen 3 VL)

The face is symmetric with natural, aged features: deep wrinkles, white hair, and warm, kind eyes. The smile is clearly readable and genuine, conveying warmth. No significant distortions are present. Facial structure and proportions are realistic and accurate. VERDICT: PASS

A 5-year-old child with round cheeks and large eyes, crying PASS
A 5-year-old child with round cheeks and large eyes, crying with tears streaming down their face

Evaluator (Qwen 3 VL)

The child’s face is nearly symmetrical with natural features: round cheeks, large eyes, and a tear streaming down the right cheek. The crying expression is clearly readable through furrowed brows, downturned mouth, and tear. No significant distortions are present. The image accurately fulfills the prompt’s requirements. VERDICT: PASS

An elderly man in his 70s with age spots and thinning gray hPASS
An elderly man in his 70s with age spots and thinning gray hair, looking skeptically over reading glasses

Evaluator (Qwen 3 VL)

Face is symmetric with natural wrinkles, age spots, and thinning gray hair. Reading glasses are present. Skeptical expression is clearly readable via furrowed brow and downturned lips. No significant distortions. Features align well with prompt. VERDICT: PASS

Extreme 100%
A group of three friends at a table: one laughing with head PASS
A group of three friends at a table: one laughing with head tilted back, one looking shocked with wide eyes, and one rolling their eyes with a smirk

Evaluator (Qwen 3 VL)

Faces are symmetric with correct features. Emotions are clearly readable: left woman laughs with head back, center man looks shocked, right woman smirks while rolling eyes. No distortions. All three expressions match the prompt. VERDICT: PASS

A family photo with four people: a father beaming with pridePASS
A family photo with four people: a father beaming with pride, a mother smiling gently, a teenager looking bored, and a child grinning excitedly

Evaluator (Qwen 3 VL)

Faces are symmetric with correct anatomical features. Emotions are clearly readable: father beams, mother smiles gently, teenager looks bored, child grins excitedly. No distortions or unnatural artifacts. All four individuals match the described expressions and roles. VERDICT: PASS

Three coworkers in a meeting: one looking frustrated with fuPASS
Three coworkers in a meeting: one looking frustrated with furrowed brows, one nodding with a pleased expression, and one looking confused with a tilted head

Evaluator (Qwen 3 VL)

Faces are symmetric with correct anatomical features. Emotions are clearly readable: left man shows frustration (furrowed brows, clenched fist), center woman shows pleased (smiling, nodding), right man shows confusion (tilted head, raised eyebrows). No distortions or artifacts. All three expressions match the prompt accurately. VERDICT: PASS

Full BodyClick to expand100%
Easy 100%
A woman standing upright in a relaxed pose, arms at her sidePASS
A woman standing upright in a relaxed pose, arms at her sides, wearing jeans and a white t-shirt

Evaluator (Qwen 3 VL)

The woman stands upright with arms at her sides, wearing jeans and a white t-shirt as described. Limb count and proportions are anatomically correct. Pose is relaxed and recognizable. No distortions or anomalies are visible. All elements match the prompt precisely. VERDICT: PASS

A man standing straight facing the camera, hands in pockets,PASS
A man standing straight facing the camera, hands in pockets, normal proportions

Evaluator (Qwen 3 VL)

The man stands straight, facing forward, with hands in pockets. Limb count and proportions are anatomically correct. The pose is clearly recognizable as described. No distortions are visible. All elements match the prompt precisely. VERDICT: PASS

A person standing in a neutral pose in a park, full body visPASS
A person standing in a neutral pose in a park, full body visible from head to feet

Evaluator (Qwen 3 VL)

The person’s full body is visible from head to feet in a neutral standing pose. Limb count and proportions are anatomically correct. The pose is recognizable as standing still in a park. No distortions are apparent. All elements align with the prompt. VERDICT: PASS

Medium 100%
A woman mid-stride running through a park, hair and clothes PASS
A woman mid-stride running through a park, hair and clothes showing motion

Evaluator (Qwen 3 VL)

The woman’s limbs are correctly proportioned and anatomically accurate. Her running pose is dynamic and recognizable, with arms bent and legs in mid-stride. Hair and clothing show motion blur consistent with running. No distortions or anatomical errors are visible. The scene matches the prompt’s description. VERDICT: PASS

A man captured mid-jump over a puddle, body in the air with PASS
A man captured mid-jump over a puddle, body in the air with bent knees

Evaluator (Qwen 3 VL)

The man’s limbs are correctly counted and proportioned. His bent-knee jump over the puddle is anatomically plausible and recognizable. No distortions are visible. The pose matches the prompt accurately. VERDICT: PASS

A dancer in a dynamic spin, dress flowing outward with one lPASS
A dancer in a dynamic spin, dress flowing outward with one leg extended

Evaluator (Qwen 3 VL)

The dancer’s limbs are correctly proportioned and anatomically plausible. The pose captures a dynamic spin with one leg extended and the dress flowing outward as described. No distortions in limb count or proportions are visible. The action is clearly recognizable as a spinning motion. VERDICT: PASS

Hard 100%
A person in a yoga tree pose — standing on one leg with the PASS
A person in a yoga tree pose — standing on one leg with the other foot pressed to the inner thigh, arms raised overhead

Evaluator (Qwen 3 VL)

The person stands on one leg, with the other foot pressed against the inner thigh, arms raised overhead — accurately depicting the tree pose. Limb count and proportions are correct. No anatomical distortions are visible. The pose is clearly recognizable and executed with proper form. VERDICT: PASS

A rock climber clinging to a cliff face with arms and legs ePASS
A rock climber clinging to a cliff face with arms and legs extended to different holds

Evaluator (Qwen 3 VL)

The climber has correct limb count and proportions. His arms and legs are extended to different holds, matching the prompt. The pose is anatomically plausible for rock climbing, showing tension and balance. No significant distortions are visible. The action is clearly recognizable as dynamic bouldering or sport climbing. VERDICT: PASS

A cellist seated with correct posture, legs around the instrPASS
A cellist seated with correct posture, legs around the instrument, bow arm extended

Evaluator (Qwen 3 VL)

The cellist’s posture and limb count are anatomically correct. Legs are wrapped around the instrument, bow arm is extended, and the cello rests on a stand. Proportions are natural, and the playing pose is clearly recognizable. No distortions are visible. VERDICT: PASS

Extreme 100%
Two tango dancers in a close embrace, the man dipping the woPASS
Two tango dancers in a close embrace, the man dipping the woman backward with intertwined legs

Evaluator (Qwen 3 VL)

The dancers’ limbs are correctly counted and proportioned. The man dips the woman backward with their legs intertwined, matching the prompt. The pose is anatomically plausible and recognizable as a tango move. No distortions are visible. VERDICT: PASS

Two soccer players mid-tackle, one sliding and the other jumPASS
Two soccer players mid-tackle, one sliding and the other jumping over with the ball at their feet

Evaluator (Qwen 3 VL)

Both players show correct limb count and proportions. The sliding player’s pose is anatomically plausible, and the jumping player’s posture is dynamic and recognizable as a tackle evasion. The ball is at the jumping player’s feet as described. No significant distortions are visible. The action matches the prompt accurately. VERDICT: PASS

Two gymnasts performing a synchronized handstand, side by siPASS
Two gymnasts performing a synchronized handstand, side by side with identical body alignment

Evaluator (Qwen 3 VL)

Both gymnasts show correct limb count and proportions. Their handstands are anatomically plausible, though slight asymmetry exists—right gymnast’s legs are slightly more spread. Pose is recognizable as synchronized handstands. Minor distortions: one gymnast’s head is slightly tilted, and limbs appear subtly elongated. Overall, the core requirements are met. VERDICT: PASS

HandsClick to expand58%
Easy 33%
A person standing with both hands visible at their sides, fiFAIL
A person standing with both hands visible at their sides, fingers relaxed and naturally spread

Evaluator (Qwen 3.5 122B)

The left hand (viewer’s right) shows 5 fingers, all appearing anatomically correct and relaxed. The right hand (viewer’s left) also displays 5 fingers, but the thumb is partially obscured or merged with the palm — not clearly separated or naturally spread. While not severely distorted, it fails the “naturally spread” criterion for that hand. Core requirement of 5 clearly visible, relaxed fingers per hand is not fully met. VERDICT: FAIL

A woman holding her hands out in front of her palms-up, all PASS
A woman holding her hands out in front of her palms-up, all ten fingers clearly visible

Evaluator (Qwen 3.5 122B)

The image shows a woman with both hands extended forward, palms facing up. Each hand clearly displays five fingers — left hand: thumb, index, middle, ring, pinky; right hand: same. All ten fingers are visible and anatomically correct in positioning and structure. No anomalies such as extra, missing, or fused fingers. Gesture is natural and consistent with the prompt. VERDICT: PASS

A man with arms crossed, both hands resting on opposite uppeFAIL
A man with arms crossed, both hands resting on opposite upper arms with fingers visible

Evaluator (Qwen 3.5 122B)

The left hand (on the right side of image) shows 5 fingers correctly positioned. The right hand (on left side of image) also displays 5 fingers, but the thumb appears unnaturally fused or missing its distinct separation from the index finger — a subtle anatomical anomaly. Overall gesture is mostly correct, but minor distortion in thumb anatomy reduces realism. VERDICT: FAIL

Medium 67%
A person holding a white ceramic coffee mug by the handle wiFAIL
A person holding a white ceramic coffee mug by the handle with their right hand

Evaluator (Qwen 3.5 122B)

The visible right hand gripping the mug handle shows only four fingers — thumb, index, middle, and ring — with no pinky visible or implied. The grip appears anatomically plausible for holding a small mug, but the missing fifth finger violates the “exactly 5 per hand” criterion. No other hands are visible to evaluate. Since the prompt requires exactly 5 fingers per hand and this is not met, it fails. VERDICT: FAIL

A woman writing in a notebook with a pen gripped correctly iPASS
A woman writing in a notebook with a pen gripped correctly in her right hand

Evaluator (Qwen 3.5 122B)

The woman’s right hand gripping the pen shows 5 fingers, and the left hand resting on the notebook also shows 5 fingers. The grip appears anatomically correct — thumb and index finger hold the pen, middle finger supports it, ring and pinky are curled naturally. No visible anomalies in finger count or positioning. Both hands look realistic and consistent with natural writing posture. VERDICT: PASS

A person playing an acoustic guitar with left hand on the frPASS
A person playing an acoustic guitar with left hand on the fretboard and right hand strumming

Evaluator (Qwen 3.5 122B)

The left hand (on the fretboard) shows 5 fingers with a natural grip. The right hand (strumming area) also shows 5 fingers, though the thumb is partially obscured by the guitar body — still anatomically plausible. No obvious deformities or extra/missing digits. Grip and posture are consistent with real guitar playing. Minor occlusion doesn’t invalidate anatomy. VERDICT: PASS

Hard 67%
A person making a peace sign with their right hand, clearly PASS
A person making a peace sign with their right hand, clearly showing two fingers up and three curled

Evaluator (Qwen 3.5 122B)

The image shows a single right hand making a peace sign. The index and middle fingers are extended upward, while the thumb, ring, and pinky fingers are curled inward — consistent with the gesture described. All five fingers are present and anatomically positioned correctly for this gesture. No anomalies or extra/missing digits are visible. The grip is natural and matches the prompt’s description. VERDICT: PASS

A hand giving a thumbs-up with four fingers curled and thumbPASS
A hand giving a thumbs-up with four fingers curled and thumb extended upward

Evaluator (Qwen 3.5 122B)

The image shows one hand making a thumbs-up gesture. The thumb is extended upward, and the other four fingers are curled into a fist — this matches the prompt exactly. Anatomically, the hand appears correct: five fingers total (one thumb + four curled fingers), with natural skin texture, knuckles, and proportions. No anomalies or extra/missing digits are visible. The gesture is standard and realistic. VERDICT: PASS

A person counting to three on their fingers, with index, midFAIL
A person counting to three on their fingers, with index, middle, and ring fingers extended

Evaluator (Qwen 3.5 122B)

The image shows one hand with four fingers extended (index, middle, ring, and pinky) — not three as prompted. The thumb is curled inward, making it appear as if only four fingers are visible. Anatomically, the hand has five digits total, but the gesture does not match “counting to three” with index, middle, and ring fingers extended — the pinky is also up. This misalignment with the prompt’s specific finger count constitutes a failure. VERDICT: FAIL

Extreme 67%
Two people shaking hands in a professional setting, all fingPASS
Two people shaking hands in a professional setting, all fingers of both hands anatomically correct

Evaluator (Qwen 3.5 122B)

The handshake shows two hands clasped. The left hand (in the suit sleeve) has 5 fingers visible: thumb, index, middle, ring, and pinky — all appear anatomically correct. The right hand (in blue shirt) also shows 5 fingers: thumb on top, and four fingers wrapped around — all present and correctly positioned. No extra or missing fingers; grip is natural and professional. Anomalies: none detected. VERDICT: PASS

Two people doing a high-five, palms meeting with all fingersFAIL
Two people doing a high-five, palms meeting with all fingers extended on both hands

Evaluator (Qwen 3.5 122B)

The image shows two hands meeting in a high-five gesture. The left hand (on the blue sleeve) has 5 fingers clearly visible and anatomically correct. The right hand (on the white sleeve) also has 5 fingers, but the thumb is positioned unusually — it appears to be fused or merged with the index finger, lacking clear separation. This is an anatomical anomaly. While the gesture is recognizable, the hand structure is not fully correct. VERDICT: FAIL

One person handing a small gift box to another person, both PASS
One person handing a small gift box to another person, both sets of hands gripping the box correctly

Evaluator (Qwen 3.5 122B)

The image shows four hands interacting with a gift box. The left hand (with ring) has 5 fingers visible and anatomically correct grip. The right hand also has 5 fingers and proper grip. However, the two background hands are partially obscured — one appears to have only 3–4 visible fingers due to occlusion, not necessarily missing digits. No clear anatomical anomalies or extra/missing fingers are evident in fully visible hands. Grip is natural and correct. VERDICT: PASS

Multi-SubjectClick to expand83%
Easy 67%
Two women standing side by side: one with short blonde hair PASS
Two women standing side by side: one with short blonde hair wearing a red jacket, the other with long black hair wearing a blue dress

Evaluator (Qwen 3 VL)

Left woman: short blonde hair, wearing red blazer over white shirt. Right woman: long dark hair, wearing navy blue dress. Both visually distinct and match prompt descriptions precisely. No mismatches or omissions. VERDICT: PASS

Two men at a cafe: one with a beard and glasses wearing a grPASS
Two men at a cafe: one with a beard and glasses wearing a green sweater, the other clean-shaven in a black hoodie

Evaluator (Qwen 3 VL)

Person 1: Bearded, wears glasses, green sweater. Person 2: Clean-shaven, wears black hoodie. Both individuals are visually distinct and match described features exactly. All prompt details are accurately represented. VERDICT: PASS

Two friends posing: one tall with curly red hair in denim ovFAIL
Two friends posing: one tall with curly red hair in denim overalls, the other short with straight brown hair in a yellow sundress

Evaluator (Qwen 3 VL)

Image shows three women, not two. Central woman has curly red hair and denim overalls (matches tall description). Two others wear yellow dresses; one has straight brown hair (matches short description). However, the second person in yellow dress is not clearly short or tall relative to the central figure, and the third person’s hair is not distinctly straight brown. The prompt’s specific pairings are not fully realized. VERDICT: FAIL

Medium 100%
Four people at a dinner table: a bald man in a suit, a womanPASS
Four people at a dinner table: a bald man in a suit, a woman with silver hair in a red blouse, a young man with dreadlocks in a denim jacket, and a woman with a hijab in a green dress

Evaluator (Qwen 3 VL)

All four individuals are present and visually distinct. Left: young man with dreadlocks in denim jacket. Next: woman with silver hair in red blouse. Center: bald man in suit. Right: woman in green hijab and dress. Features match prompt exactly. VERDICT: PASS

Three musicians on stage: a tall woman with short pink hair PASS
Three musicians on stage: a tall woman with short pink hair playing bass, a man with a beard and bandana on drums, and a slim man with glasses playing keyboard

Evaluator (Qwen 3 VL)

Left: Tall woman with short pink hair, playing bass guitar. Center: Man with beard and bandana, on drums. Right: Slim man with glasses, playing keyboard. All three are visually distinct and match described features. VERDICT: PASS

Three hikers on a trail: a man with a backpack and sunhat, aPASS
Three hikers on a trail: a man with a backpack and sunhat, a woman with braided hair and hiking poles, and a teenager with a baseball cap and water bottle

Evaluator (Qwen 3 VL)

Man: wears sunhat and backpack. Woman: has braided hair, uses hiking poles. Teenager: wears red baseball cap, carries water bottle. All three are visually distinct and match descriptions. VERDICT: PASS

Truthfulness 78%

PhotorealismClick to expand100%
Medium 100%
A still life on a wooden table featuring a clear glass of waPASS
A still life on a wooden table featuring a clear glass of water, a polished silver spoon, a folded linen napkin, and a ripe red apple — each material rendered with accurate texture and light behavior

Evaluator (Qwen 3.6 27B)

The image successfully renders the requested materials with high fidelity. The glass exhibits correct refraction and transparency, distorting the background elements behind it. The silver spoon displays sharp, accurate specular highlights consistent with polished metal. The wooden table shows a convincing grain texture and matte finish, while the linen napkin captures the soft, woven texture of fabric. The apple’s skin reflects light naturally. All visual properties align with the prompt’s requi

A workspace desk with a brushed aluminum laptop, a leather-bPASS
A workspace desk with a brushed aluminum laptop, a leather-bound notebook, a ceramic coffee mug, and a glass paperweight — all materials visually distinct and realistic

Evaluator (Qwen 3.6 27B)

The image successfully renders the requested materials with high fidelity. The brushed aluminum laptop shows appropriate metallic sheen and subtle reflections. The leather notebook displays realistic texture and stitching. The ceramic mug has a matte, opaque finish, while the glass paperweight exhibits correct transparency, refraction, and specular highlights. The wooden desk features natural grain patterns and surface texture. All materials are visually distinct and realistic. VERDICT: PASS

A kitchen counter with a stainless steel pot, a wooden cuttiPASS
A kitchen counter with a stainless steel pot, a wooden cutting board, a glass bottle of olive oil, and a cotton dish towel — each surface reflecting light correctly

Evaluator (Qwen 3.6 27B)

The image accurately renders the distinct material properties requested. The stainless steel pot exhibits sharp, metallic reflections. The wooden cutting board shows realistic grain and end-grain texture. The glass bottle demonstrates correct translucency and refraction of the oil within, while the cotton towel displays appropriate fabric folds and matte texture. The lighting interacts naturally with each surface, creating believable highlights and shadows. VERDICT: PASS

Physics & ReflectionsClick to expand92%
Easy 100%
A red sphere on a white surface with a single shadow cast toPASS
A red sphere on a white surface with a single shadow cast to the right, consistent with a light source from the upper left

Evaluator (Qwen 3.6 27B)

The image displays a red sphere on a white surface. The shadow is cast to the right and slightly behind the object, which is consistent with a light source positioned to the upper left. The shadow's softness and gradient are physically plausible for a studio lighting setup. Additionally, there is a subtle reflection on the surface beneath the sphere, indicating a slightly glossy floor, which adds to the realism. The lighting and shadows are coherent. VERDICT: PASS

A coffee mug on a sunny windowsill casting a sharp shadow toPASS
A coffee mug on a sunny windowsill casting a sharp shadow to the left

Evaluator (Qwen 3.6 27B)

The lighting originates from the right side, consistent with the window frame visible on the right. Consequently, the mug casts a distinct, sharp shadow to the left, which aligns perfectly with the prompt's requirement. The shadow’s intensity and angle are physically plausible for direct sunlight. The reflections on the coffee surface and the subtle highlights on the mug’s rim are also consistent with the light source. VERDICT: PASS

A person standing outdoors at midday with a short shadow dirPASS
A person standing outdoors at midday with a short shadow directly below them

Evaluator (Qwen 3.6 27B)

The shadow is cast directly beneath the subject, consistent with a midday sun position. The shape accurately mirrors the person’s stance and clothing folds, showing no distortion or misalignment. The shadow’s darkness and soft edges are realistic for bright outdoor lighting. No reflections are expected in this scene, and none appear. All shadow characteristics align with physical plausibility. VERDICT: PASS

Medium 67%
A woman standing in front of a full-length mirror, her refleFAIL
A woman standing in front of a full-length mirror, her reflection accurately showing the back of her outfit and reversed orientation

Evaluator (Qwen 3.6 27B)

The reflection is physically impossible. The woman is standing with her back to the mirror, yet the reflection shows her front. A mirror cannot reflect the front of a person when their back is facing it. Additionally, the reflection’s orientation and lighting do not match the real-world geometry. These are fundamental violations of optical physics. VERDICT: FAIL

A cat sitting on a polished marble floor with its reflectionPASS
A cat sitting on a polished marble floor with its reflection visible below

Evaluator (Qwen 3.6 27B)

The reflection is physically plausible. It correctly mirrors the cat’s orientation and position, appearing directly beneath it on the polished marble floor. The slight blur and distortion align with real-world reflections on glossy surfaces. The color matches the cat’s dark fur, and the lighting direction is consistent—no contradictory shadows or misaligned highlights. The reflection fades naturally toward the edges, mimicking how light interacts with reflective flooring. Overall, it adheres to

A vase of flowers on a glass table with a clear reflection vPASS
A vase of flowers on a glass table with a clear reflection visible on the table surface

Evaluator (Qwen 3.6 27B)

The reflection on the glass table is physically plausible. It correctly mirrors the vase and stems with appropriate vertical inversion and slight blurring due to the glass surface. The orientation matches the object above, and the distortion is consistent with a reflective plane. Shadows are soft and ambient, fitting the bright indoor lighting. No directional inconsistencies or unnatural color shifts are present. VERDICT: PASS

Hard 100%
A row of trees reflected in a calm lake at sunset, with the PASS
A row of trees reflected in a calm lake at sunset, with the reflection slightly darker and gently rippled

Evaluator (Qwen 3.6 27B)

The reflections are vertically inverted and correctly positioned below the waterline. The gentle ripples distort the reflection slightly, consistent with a calm lake surface. The colors in the reflection match the sky and trees above, appearing slightly darker as expected due to light absorption by the water. The orientation and distortion are physically plausible for this scene. VERDICT: PASS

A city skyline reflected in a rain puddle on a street, with PASS
A city skyline reflected in a rain puddle on a street, with mild distortion and color shift

Evaluator (Qwen 3.6 27B)

The reflections are physically plausible. The orientation is correctly inverted, mirroring the buildings and lights above. The distortion is consistent with a wet, uneven surface, showing slight waviness and elongation typical of puddle reflections. The color shift is accurate, with warm yellow and cool blue lights reflecting appropriately. The intensity and spread of the light streaks match the physics of light on water. VERDICT: PASS

A mountain range reflected in a still alpine lake, with the PASS
A mountain range reflected in a still alpine lake, with the reflection inverted and subtly muted in color

Evaluator (Qwen 3.6 27B)

The reflection is perfectly inverted and mirrors the mountain range accurately, consistent with a still alpine lake. The colors in the reflection are slightly darker and more muted than the actual mountains, which is physically plausible due to light absorption and scattering in water. The orientation and lack of distortion align with calm water conditions. Shadows and highlights match the source lighting. VERDICT: PASS

Extreme 100%
A clear glass sphere sitting on a mirror surface near a poolPASS
A clear glass sphere sitting on a mirror surface near a pool of water, showing refraction through the sphere, reflection on the mirror, and water reflection simultaneously

Evaluator (Qwen 3.6 27B)

The reflections and refractions are physically plausible. The mirror surface correctly inverts the spheres vertically, maintaining proper orientation and perspective. The water pool shows realistic distortion and refraction effects consistent with liquid surfaces. Light paths through the glass spheres demonstrate accurate bending and focusing. The dark shadow beneath the spheres aligns with the light source direction. Color rendering appears natural with appropriate transparency effects. VERDIC

A chrome sculpture in a room with a glass floor and large wiPASS
A chrome sculpture in a room with a glass floor and large windows, showing multiple reflections and environmental mapping on every surface

Evaluator (Qwen 3.6 27B)

The reflections on the chrome sculpture are physically plausible, showing distorted environmental mapping of the windows and cityscape consistent with its curved geometry. The glass floor reflection is sharp and correctly oriented, mirroring the sculpture’s position. Shadows are minimal but consistent with bright, diffuse daylight from large windows. No directional or color inconsistencies are evident. VERDICT: PASS

A crystal wine glass on a polished marble counter next to a PASS
A crystal wine glass on a polished marble counter next to a window overlooking a lake — showing refraction in the glass, reflection on the marble, and the lake's own reflective surface

Evaluator (Qwen 3.6 27B)

The wine glass casts a shadow to the right, consistent with light coming from the left window. The reflection on the marble is vertically inverted and slightly blurred, which is physically accurate for a polished surface. The refraction through the glass distorts the background view appropriately. The colors in the reflection match the scene’s lighting. All elements align with real-world optics. VERDICT: PASS

World KnowledgeClick to expand58%
Easy 100%
The Eiffel Tower in Paris on a clear day, architecturally acPASS
The Eiffel Tower in Paris on a clear day, architecturally accurate with its iron lattice structure and three levels

Evaluator (Qwen 3.6 27B)

The image depicts the Eiffel Tower with high architectural accuracy. The iconic iron lattice structure, the four arched legs, and the three distinct levels (including the top spire) are clearly visible and correctly proportioned. The background shows a clear blue sky and Parisian buildings, matching the prompt's description of a clear day. There are no obvious distortions or hallucinations in the structure. VERDICT: PASS

The Taj Mahal in Agra seen from the reflecting pool, with acPASS
The Taj Mahal in Agra seen from the reflecting pool, with accurate dome, minarets, and symmetry

Evaluator (Qwen 3.6 27B)

The image depicts the Taj Mahal with high architectural accuracy, including the central dome, four minarets, and symmetrical layout. The perspective from the reflecting pool is correct, showing the iconic reflection. Minor details like the smaller domes and arches are well-rendered. The presence of visitors adds realism. No significant inaccuracies are observed in the structure or composition. VERDICT: PASS

The Statue of Liberty on her island pedestal, with correct tPASS
The Statue of Liberty on her island pedestal, with correct torch, crown, tablet, and green patina

Evaluator (Qwen 3.6 27B)

The Statue of Liberty is clearly recognizable and accurately depicted with correct proportions, green patina, raised torch, spiked crown, and tablet. The pedestal and base structure align with real-world architecture. Minor details like the American flag and surrounding island environment are plausible. No significant factual or architectural inaccuracies are present. VERDICT: PASS

Medium 33%
A traditional Japanese shamisen instrument with accurate thrFAIL
A traditional Japanese shamisen instrument with accurate three-stringed design, wooden body, and bachi plectrum

Evaluator (Qwen 3.6 27B)

The depicted instrument is not a shamisen. It appears to be a Chinese erhu or similar bowed string instrument, characterized by its long neck, two tuning pegs, and small square resonator. A shamisen has three strings, a larger body, and is played with a plectrum (bachi), none of which are present here. The bachi shown is unrelated to the instrument’s actual design. This is a clear factual inaccuracy. VERDICT: FAIL

A medieval astrolabe with accurate rete, tympan, and engraveFAIL
A medieval astrolabe with accurate rete, tympan, and engraved degree markings on brass

Evaluator (Qwen 3.6 27B)

The image depicts a brass instrument resembling an armillary sphere rather than a traditional flat astrolabe. While the materials and general aesthetic fit the "medieval" theme, the specific components mentioned in the prompt are inaccurate. A real astrolabe consists of a flat disk (mater) with interchangeable plates (tympans) and a star map (rete) that rotates on a pivot; this object is a 3D wireframe sphere. The engraved markings are nonsensical gibberish, lacking the precise mathematical accu

A Japanese kintsugi bowl — cracked ceramic repaired with visPASS
A Japanese kintsugi bowl — cracked ceramic repaired with visible gold lacquer lines

Evaluator (Qwen 3.6 27B)

The image depicts a ceramic bowl with visible cracks repaired using gold lacquer, which is the defining characteristic of kintsugi. The bowl’s shape, glaze texture, and the application of gold along the fracture lines are consistent with traditional Japanese kintsugi aesthetics. The subject is clearly recognizable and factually accurate to the prompt. VERDICT: PASS

Hard 33%
A detailed diagram of the water cycle showing evaporation frFAIL
A detailed diagram of the water cycle showing evaporation from the ocean, condensation forming clouds, precipitation as rain, and collection in rivers flowing back to the sea

Evaluator (Qwen 3.6 27B)

The image depicts the water cycle with recognizable elements like mountains, ocean, and clouds. However, it contains significant factual and typographical errors. The label "PREOIPITATION" is misspelled (should be Precipitation). Furthermore, the arrows for "EVAPORATION" and "CONDENSATION" are confusingly placed and overlapping, making the diagram scientifically inaccurate and difficult to interpret. VERDICT: FAIL

A cross-section showing tectonic plates at a convergent bounPASS
A cross-section showing tectonic plates at a convergent boundary with subduction, a trench, volcanic arc, and layered mantle

Evaluator (Qwen 3.6 27B)

The image accurately depicts the geological features requested: an oceanic plate subducting beneath a continental plate, forming a trench and a volcanic arc. The labels are correctly placed, and the cross-section clearly illustrates the layered mantle and crustal structures. The visual representation aligns with scientific understanding of convergent boundaries. No significant inaccuracies are present. VERDICT: PASS

An anatomical illustration of the human heart showing four cFAIL
An anatomical illustration of the human heart showing four chambers, valves, aorta, pulmonary arteries, and veins with correct placement

Evaluator (Qwen 3.6 27B)

The image is not factually accurate. The text labels are nonsensical gibberish (e.g., "ONDINNE," "FPRCINA," "LINGIFAGT") rather than correct anatomical terms. Furthermore, the anatomical placement is incorrect; for instance, the label "AORTA" points to the left atrium/ventricle area, and the "PULMONARY ARCTERY" label points to the coronary arteries on the ventricle surface. The structure of the heart itself appears distorted and unrealistic. VERDICT: FAIL

Extreme 67%
A Morpho menelaus butterfly with accurate iridescent blue doFAIL
A Morpho menelaus butterfly with accurate iridescent blue dorsal wings, brown ventral wing pattern with eyespots, and correct body morphology

Evaluator (Qwen 3.6 27B)

The image depicts a butterfly with blue and brown wings, but it is not a Morpho menelaus. Morpho butterflies have solid, iridescent blue dorsal wings without the complex brown patterns, eyespots, or white markings seen here. The depicted subject resembles a different species (possibly a Blue Morpho hybrid or another Nymphalid), failing the specific accuracy requirement for Morpho menelaus morphology. VERDICT: FAIL

The Hagia Sophia interior showing the massive central dome, PASS
The Hagia Sophia interior showing the massive central dome, pendentives, semi-domes, and Islamic calligraphy medallions alongside Byzantine mosaics

Evaluator (Qwen 3.6 27B)

The image accurately depicts the interior of the Hagia Sophia, showcasing its iconic central dome, semi-domes, and pendentives. The massive Islamic calligraphy medallions are correctly positioned on the piers, coexisting with the visible Byzantine mosaics in the upper arches. The architectural details, including the marble columns, galleries, and lighting, are factually consistent with the real-world structure. VERDICT: PASS

A specimen of bismuth crystal showing accurate iridescent oxPASS
A specimen of bismuth crystal showing accurate iridescent oxide layers, geometric hopper crystal formation, and characteristic staircase structure

Evaluator (Qwen 3.6 27B)

The image depicts a bismuth crystal with high fidelity. It accurately captures the characteristic geometric hopper crystal formation, the distinct staircase structure on the faces, and the vibrant, multi-colored iridescent oxide layers resulting from thin-film interference. The lighting and texture appear realistic, matching the physical properties of real bismuth specimens. VERDICT: PASS

Professional Studio 93%

Camera & LightingClick to expand100%
Easy 100%
A wide-angle shot of a modern living room interior showing ePASS
A wide-angle shot of a modern living room interior showing expansive space with slight barrel distortion at the edges

Evaluator (Gemma 4 26B)

The image successfully implements a wide-angle shot of a modern living room. The perspective shows an expansive space with a clear sense of depth. Most importantly, the image features significant barrel distortion at the edges (visible in the curved lines of the walls and ceiling), which directly matches the prompt's specific requirement. The lighting is bright and natural, consistent with the large windows shown. VERDICT: PASS

A close-up macro shot of a single dewdrop on a rose petal, sPASS
A close-up macro shot of a single dewdrop on a rose petal, shallow depth of field

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. It features a close-up macro shot of a single, clear dewdrop resting on a red rose petal. The depth of field is extremely shallow, with the dewdrop and the immediate edge of the petal in sharp focus, while the background and the rest of the petal are heavily blurred. The lighting is soft and natural, highlighting the transparency and reflections within the droplet. VERDICT: PASS

An aerial bird's-eye view looking straight down at a trafficPASS
An aerial bird's-eye view looking straight down at a traffic roundabout with cars and trees

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. It features a true aerial bird's-eye view looking straight down at a traffic roundabout. The perspective is top-down, showing the circular road, the central island with trees, and several cars. The lighting is even and consistent with a high-angle shot, and the depth of field is deep, keeping the entire scene in sharp focus. VERDICT: PASS

Medium 100%
A portrait lit with classic Rembrandt lighting — triangle ofPASS
A portrait lit with classic Rembrandt lighting — triangle of light on the shadowed cheek, moody atmosphere

Evaluator (Gemma 4 26B)

The image successfully demonstrates Rembrandt lighting. There is a clear light source from the side, creating a distinct triangle of light on the shadowed cheek (the subject's right cheek, viewer's left). The lighting creates a moody, high-contrast atmosphere with deep shadows. The perspective is a standard portrait shot, and the depth of field is shallow, with the background being a soft, out-of-focus neutral tone. VERDICT: PASS

A landscape photograph bathed in golden hour light — long waPASS
A landscape photograph bathed in golden hour light — long warm shadows, orange sky, soft glow

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. The lighting is a classic golden hour, featuring a vibrant orange sky and a soft, warm glow across the landscape. The low angle of the light creates long, soft shadows across the rolling hills and highlights the texture of the grass. The depth of field is wide, maintaining clarity from the foreground to the distant horizon, which is appropriate for a landscape photograph. VERDICT: PASS

A high-key portrait of a person against a bright white backgPASS
A high-key portrait of a person against a bright white background, minimal shadows, airy and luminous

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. It features a high-key portrait with a bright white background. The lighting is soft, even, and luminous, creating a bright, airy feel with minimal shadows on the subject's face and clothing. The perspective is a standard eye-level portrait, and the depth of field is appropriately shallow, keeping the focus on the person. VERDICT: PASS

Hard 100%
A portrait shot at 85mm f/1.4 with creamy bokeh background, PASS
A portrait shot at 85mm f/1.4 with creamy bokeh background, backlit with warm rim light separating the subject from a blurred garden

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. The depth of field is shallow, creating a creamy bokeh background with soft, circular light orbs. The subject features a distinct warm rim light along her hair and shoulders, effectively separating her from the blurred garden background. The perspective is a classic portrait shot consistent with an 85mm lens. VERDICT: PASS

A street photograph with shallow depth of field as if shot aPASS
A street photograph with shallow depth of field as if shot at 50mm f/1.8, subject sharp with soft city lights in the background

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt's technical requirements. The subject is sharp and in focus, while the background features a shallow depth of field with soft, circular bokeh from city lights, characteristic of a 50mm f/1.8 lens. The perspective and lighting pattern accurately reflect a street photograph taken at night with a wide aperture. VERDICT: PASS

A food photograph shot from above at 35mm f/2.8 with naturalPASS
A food photograph shot from above at 35mm f/2.8 with natural window light from the left, slight vignette, warm color temperature

Evaluator (Gemma 4 26B)

The image matches the prompt's requirements. The perspective is a high-angle shot (from above) consistent with a 35mm lens. The depth of field is shallow (f/2.8), with the avocado toast in sharp focus and the juice carafe and background slightly blurred. The lighting clearly originates from the left, creating soft shadows to the right. The color temperature is warm, and a subtle vignette is present around the edges. VERDICT: PASS

Extreme 100%
A tilt-shift photograph of a city street making real-life loPASS
A tilt-shift photograph of a city street making real-life look like a miniature model — selective focus strip, saturated colors

Evaluator (Gemma 4 26B)

The image successfully implements the tilt-shift effect. It features a high-angle perspective and a distinct selective focus strip (shallow depth of field) that blurs the foreground and background, creating the "miniature model" illusion. The colors are highly saturated, and the lighting is bright and even, consistent with the prompt's requirements for a tilt-shift aesthetic. VERDICT: PASS

A cinematic frame with blue and orange anamorphic lens flarePASS
A cinematic frame with blue and orange anamorphic lens flare streaking horizontally across a dark moody scene

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. It features a dark, moody, and cinematic scene with a strong sense of depth. The lighting pattern consists of prominent blue and orange light streaks that stretch horizontally across the frame, accurately simulating the look of anamorphic lens flares. The perspective creates a sense of a deep, dark corridor or alleyway, and the light streaks are the primary focus of the lighting design. VERDICT: PASS

A portrait with dramatic split lighting — half the face brigPASS
A portrait with dramatic split lighting — half the face brightly lit, the other half in deep shadow, high contrast chiaroscuro

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. It features a close-up portrait with a clear split lighting pattern: the left side of the subject's face is brightly illuminated, while the right side is cast in deep, heavy shadow. This creates the high-contrast chiaroscuro effect requested. The perspective is a direct frontal portrait, and the lighting successfully divides the face into light and dark halves. VERDICT: PASS

Color PrecisionClick to expand92%
Easy 100%
A bright red sports car parked on a gray asphalt road — the PASS
A bright red sports car parked on a gray asphalt road — the red is vivid and unambiguous

Evaluator (Gemma 4 26B)

The image features a sports car in a bright, vivid, and unambiguous red color. The car is parked on a gray asphalt road, which matches the prompt's description perfectly. The colors are accurate to the specified requirements. VERDICT: PASS

A woman wearing a royal blue evening dress against a neutralPASS
A woman wearing a royal blue evening dress against a neutral background

Evaluator (Gemma 4 26B)

The image features two women wearing evening dresses in a deep, saturated blue that accurately matches the description of "royal blue." The background is a light, neutral grey/off-white, which also aligns with the prompt's requirement for a neutral background. VERDICT: PASS

A field of pure sunflower yellow flowers under a clear blue PASS
A field of pure sunflower yellow flowers under a clear blue sky

Evaluator (Gemma 4 26B)

The image perfectly matches the color descriptions provided in the prompt. The field is filled with vibrant, pure sunflower yellow flowers, and the sky above is a clear, solid blue. There are no deviations in color or hue from the specified requirements. VERDICT: PASS

Medium 67%
A bedroom wall painted in #FF6B35 (burnt orange) with white PASS
A bedroom wall painted in #FF6B35 (burnt orange) with white furniture and natural daylight

Evaluator (Gemma 4 26B)

The wall is a vibrant, saturated orange that aligns well with the description of #FF6B35 (burnt orange). The furniture, including the nightstand and the tall shelving unit, is clearly white. The lighting appears to be natural daylight, creating soft shadows and highlights on the wall and floor. All color requirements specified in the prompt are met. VERDICT: PASS

A forest scene where the foliage is precisely #2D5F2D (dark FAIL
A forest scene where the foliage is precisely #2D5F2D (dark muted green) with brown tree trunks

Evaluator (Gemma 4 26B)

The foliage in the image consists of a variety of green tones. While there are darker areas, the overall color palette is quite diverse, featuring many lighter, more vibrant greens and yellowish-greens that do not strictly adhere to the specific, muted dark green color #2D5F2D. The colors are more natural and varied than the single, precise hex value requested. VERDICT: FAIL

A room with walls in #4A90D9 (medium blue) with warm wooden PASS
A room with walls in #4A90D9 (medium blue) with warm wooden floor and white trim

Evaluator (Gemma 4 26B)

The walls are a medium blue color that closely aligns with the specified hex code #4A90D9. The floor is a warm-toned wood, and the trim is white, matching all the color descriptions provided in the prompt. VERDICT: PASS

Hard 100%
A minimalist room with walls in #F5E6CC (warm cream), a sofaPASS
A minimalist room with walls in #F5E6CC (warm cream), a sofa in #2C3E50 (dark navy), and accent pillows in #E74C3C (bright red)

Evaluator (Gemma 4 26B)

The image accurately reflects the requested colors. The walls are a warm cream (#F5E6CC), the sofa is a dark navy (#2C3E50), and the accent pillows are a bright red (#E74C3C). All three specific color requirements are met precisely as described in the prompt. VERDICT: PASS

A product flat-lay on a #1A1A2E (deep dark blue) background,PASS
A product flat-lay on a #1A1A2E (deep dark blue) background, with a #F0C27F (soft gold) watch and #FFFFFF (pure white) earbuds

Evaluator (Gemma 4 26B)

The image accurately reflects the requested colors. The background is a deep, dark blue consistent with #1A1A2E. The watch features a soft gold tone (#F0C27F) rather than a bright yellow gold. The earbuds are a clean, pure white (#FFFFFF). All three color requirements are met. VERDICT: PASS

A graphic design composition with background #FAFAFA (near-wPASS
A graphic design composition with background #FAFAFA (near-white), main shape #FF4500 (orange-red), and accent lines #333333 (dark charcoal)

Evaluator (Gemma 4 26B)

The image accurately reflects the requested color palette. The background is a near-white color consistent with #FAFAFA. The central square is a vibrant orange-red, matching the #FF4500 specification. The thin intersecting lines are a dark charcoal color, consistent with #333333. All three specified colors are present and correctly applied to the designated elements. VERDICT: PASS

Extreme 100%
A smooth gradient on a wall going from #FF0000 (red) on the PASS
A smooth gradient on a wall going from #FF0000 (red) on the left to #0000FF (blue) on the right, transitioning evenly through purple

Evaluator (Gemma 4 26B)

The image shows a smooth horizontal gradient. On the far left, the color is a vibrant red (#FF0000). On the far right, the color is a bright blue (#0000FF). The transition between these two colors passes through a purple/magenta middle section, exactly as described in the prompt. The colors and the transition are accurate to the specified hex values and names. VERDICT: PASS

A sunset sky gradient on a flat surface from #FF6B35 (orangePASS
A sunset sky gradient on a flat surface from #FF6B35 (orange) at the bottom to #1B1464 (deep indigo) at the top

Evaluator (Gemma 4 26B)

The image shows a vertical gradient. The bottom color is a vibrant orange, which closely matches the requested #FF6B35. The top color is a very dark, deep indigo/navy, which aligns with the requested #1B1464. The transition between the two colors is smooth and follows the specified direction (orange at the bottom, deep indigo at the top). VERDICT: PASS

A horizontal banner with a gradient from #00C9FF (cyan) on tPASS
A horizontal banner with a gradient from #00C9FF (cyan) on the left to #92FE9D (mint green) on the right

Evaluator (Gemma 4 26B)

The image shows a horizontal banner with a gradient. On the left, the color is a bright cyan, which matches the description of #00C9FF. On the right, the color transitions into a light mint green, which matches the description of #92FE9D. The gradient flows smoothly from the cyan color to the mint green color as specified. VERDICT: PASS

PhotorealismClick to expand67%
Easy 67%
A single red sneaker on a clean white background with soft sPASS
A single red sneaker on a clean white background with soft studio lighting and gentle shadow — product photography style

Evaluator (Gemma 4 26B)

The image successfully captures the requested studio aesthetic. The red canvas texture, white laces, and rubber sole look highly realistic. The lighting is soft, and the gentle shadow beneath the shoe is well-rendered, creating a convincing sense of depth. There are no obvious AI artifacts, such as warped textures or nonsensical geometry, in the shoe's structure or the background. It passes as a high-quality professional product photograph. VERDICT: PASS

A clear glass perfume bottle on a reflective black surface wFAIL
A clear glass perfume bottle on a reflective black surface with studio lighting, no background distractions

Evaluator (Gemma 4 26B)

The image fails to meet the quality of a real photograph. There are significant AI artifacts regarding the physics of light and glass. The liquid inside the bottle has nonsensical, floating white lines that do not correspond to any realistic refraction or straw. Additionally, the reflection on the surface is inconsistent with the bottle's shape, and the glass edges appear warped and unnatural. VERDICT: FAIL

A ripe green apple on a seamless white background with soft PASS
A ripe green apple on a seamless white background with soft side lighting and subtle shadow — catalog style

Evaluator (Gemma 4 26B)

The image successfully captures a studio-quality look. The lighting is soft and directional, creating a natural, subtle shadow on the white seamless background. The texture of the apple skin, including the slight color variations and the stem, appears realistic and free of obvious AI artifacts or "uncanny" textures. It passes as a professional product photograph for a catalog. VERDICT: PASS

Graphical design 75%

Data VisualisationClick to expand33%
Easy 33%
A bar chart showing monthly sales for January through June wFAIL
A bar chart showing monthly sales for January through June with values 12K, 18K, 15K, 22K, 19K, 25K — labeled axes, color-coded bars, and a title 'Monthly Sales 2025'

Evaluator (Qwen 3 VL)

The chart shows correct months and values (12K, 18K, 15K, 22K, 19K, 25K) with a title and labeled axes. However, the Y-axis label "Thousands" is misaligned with the scale, and "S/per colde" is a garbled X-axis label. The June value is incorrectly labeled as "25%" (should be "25K"). VERDICT: FAIL

A pie chart showing market share: Company A 35%, Company B 2PASS
A pie chart showing market share: Company A 35%, Company B 25%, Company C 20%, Others 20% — each slice a different color with labels and percentages

Evaluator (Qwen 3 VL)

The pie chart correctly displays all four categories with exact percentages: Company A (35%), Company B (25%), Company C (20%), and Others (20%). Each slice has a distinct color and clear, readable labels with percentages. The chart type matches the requested pie chart visualization. All elements are accurately rendered and legible. VERDICT: PASS

A simple line graph showing daily website visitors over one FAIL
A simple line graph showing daily website visitors over one week (Mon-Sun): 1200, 1500, 1100, 1800, 2000, 1700, 1400 — with grid lines, axis labels, and title 'Weekly Traffic'

Evaluator (Qwen 3 VL)

The chart is a line graph titled "Weekly Traffic" with correct axis labels (Mon-Sun) and grid lines. However, the y-axis values do not match the requested data (1200, 1500, 1100, 1800, 2000, 1700, 1400). The plotted points visually approximate the data but lack exact numerical labels for each point, making precise verification impossible. The chart type is correct. VERDICT: FAIL

Layout & DesignClick to expand67%
Easy 100%
A movie night poster with a large popcorn image in the centePASS
A movie night poster with a large popcorn image in the center, the title 'MOVIE NIGHT' in bold at the top, and the tagline 'Every Friday at 8 PM' at the bottom

Evaluator (Qwen 3 VL)

The image shows a popcorn bucket centered, with "MOVIE NIGHT" in bold yellow at the top and "Every Friday at 8 PM" at the bottom. Text is readable and correctly placed per the prompt. Visual hierarchy is clear, with the popcorn as the focal point and text framing it appropriately. All required layout elements are present and correctly positioned. VERDICT: PASS

An event poster featuring a guitar illustration, the headlinPASS
An event poster featuring a guitar illustration, the headline 'LIVE MUSIC FEST' in large text, and 'Tickets available now' as a tagline below

Evaluator (Qwen 3 VL)

The image displays a guitar illustration centered on a blue background. The headline "LIVE MUSIC FEST" is prominently placed above the guitar in large white text, and "Tickets available now" appears below it in smaller text. All required elements are present with correct visual hierarchy and readable placement. VERDICT: PASS

A book club poster with an open book illustration, the titlePASS
A book club poster with an open book illustration, the title 'JOIN THE BOOK CLUB' prominently displayed, and 'Meets every Wednesday' as the tagline

Evaluator (Qwen 3 VL)

The image displays an open book illustration centered below the title "JOIN THE BOOK CLUB," which is prominently placed at the top. The tagline "Meets every Wednesday" is correctly positioned at the bottom. All text is readable, and the visual hierarchy is clear: title > illustration > tagline. Layout and text placement match the prompt exactly. VERDICT: PASS

Medium 67%
A landing page design with a hero header image at the top, aPASS
A landing page design with a hero header image at the top, a headline and subtitle in the middle, a 'Sign Up' call-to-action button, and a footer with three social media icons

Evaluator (Qwen 3 VL)

Image shows hero header image, headline “Unlock Your Potential,” subtitle, “Sign Up” button, and three social icons (Instagram, Twitter, LinkedIn) at bottom. All elements are present, correctly placed, and text is readable. Visual hierarchy is clear with proper spacing and contrast. Layout matches prompt exactly. VERDICT: PASS

A newsletter layout with a banner image header, a two-columnFAIL
A newsletter layout with a banner image header, a two-column body section with text on the left and an image on the right, a highlighted quote block, and a footer with unsubscribe link

Evaluator (Qwen 3 VL)

The image shows a newsletter layout with a header banner, two-column body (text left, image right), a highlighted quote block, and a footer with unsubscribe link. However, all text is gibberish placeholder text, unreadable and incorrectly placed. Visual hierarchy is present but content is invalid. VERDICT: FAIL

A restaurant menu layout with a logo header, three category PASS
A restaurant menu layout with a logo header, three category sections (appetizers, mains, desserts) each with items and prices, and a footer with address and hours

Evaluator (Qwen 3 VL)

The image shows a clean menu with a logo header (“THE FORAGER KITCHEN”), three clearly labeled sections (Appetizers, Mains, Desserts) with items and prices, and a footer with address and hours. Text is readable, well-spaced, and correctly placed with appropriate visual hierarchy. All required layout elements are present and accurately rendered. VERDICT: PASS

Hard 33%
A magazine double-page spread with a left page containing a FAIL
A magazine double-page spread with a left page containing a large hero photo, and a right page with a two-column article, a pull quote, a sidebar with author bio, and a page number

Evaluator (Qwen 3 VL)

The left page features a large hero photo as required. The right page has a two-column article, a pull quote, and a sidebar with author bio, all correctly placed. The page number “42” is visible. However, all text is gibberish placeholder text, making it unreadable and violating the requirement for readable content. VERDICT: FAIL

A mobile app UI mockup showing a dashboard with a top navigaFAIL
A mobile app UI mockup showing a dashboard with a top navigation bar, a search field, a 2x2 grid of feature cards with icons, a list of recent activity items, and a bottom tab bar

Evaluator (Qwen 3 VL)

The image shows a mobile app UI with a top navigation bar, search field, 2x2 feature cards with icons, "RECENT ACTIVITY" list, and bottom tab bar — all present. However, placeholder text is garbled and unreadable, violating the requirement for readable, correctly placed text. Visual hierarchy is mostly correct, but illegible text undermines functionality. VERDICT: FAIL

A fashion magazine spread with a full-bleed left-page photo,PASS
A fashion magazine spread with a full-bleed left-page photo, right page with three-column layout, headline, body copy, a smaller inset image, and styled page numbers

Evaluator (Qwen 3 VL)

Left page has full-bleed photo. Right page features headline, body copy, inset image, and page number — all correctly placed with clear visual hierarchy. Text is readable, though minor typos exist (e.g., “geomtric”). Layout matches prompt description precisely. VERDICT: PASS

Style DiversityClick to expand92%
Easy 100%
A golden retriever sitting in a garden, painted in the stylePASS
A golden retriever sitting in a garden, painted in the style of an oil painting with visible brushstrokes and rich texture

Evaluator (Qwen 3 VL)

The image clearly depicts a golden retriever in a garden, rendered with visible brushstrokes and textured paint application characteristic of oil painting. Colors are rich and layered, with impasto effects in fur and foliage. The style matches the requested oil painting aesthetic precisely. VERDICT: PASS

A golden retriever sitting in a garden, rendered in JapanesePASS
A golden retriever sitting in a garden, rendered in Japanese anime style with flat colors and large expressive eyes

Evaluator (Qwen 3 VL)

The image shows a golden retriever in a sitting pose, rendered with large, expressive eyes and simplified, flat color shading typical of Japanese anime style. While fur texture is somewhat detailed, the overall aesthetic matches the requested style. The garden setting is absent, but the core visual characteristics are present. VERDICT: PASS

A golden retriever sitting in a garden, rendered in pixel arPASS
A golden retriever sitting in a garden, rendered in pixel art style with a 32x32 grid and limited color palette

Evaluator (Qwen 3 VL)

The image displays a golden retriever in a garden, rendered in a clear 32x32 pixel grid with a limited color palette, matching the pixel art style requested. The dog is sitting, and background elements like flowers are also pixelated. However, the image contains nonsensical text overlay, which is not part of the requested style. The core visual style is present. VERDICT: PASS

Medium 100%
A city street scene in the style of Art Deco — geometric shaPASS
A city street scene in the style of Art Deco — geometric shapes, gold and black palette, ornamental symmetry

Evaluator (Qwen 3 VL)

The image clearly depicts an Art Deco city street scene with geometric forms, symmetrical composition, and a gold-and-black palette. Ornamental details, stylized skyscrapers, and the word “METROPOLIS” in Art Deco font reinforce the requested style. The central lamppost adds period-appropriate detail. All key visual characteristics are present and well-executed. VERDICT: PASS

A woman with a parasol in the style of Impressionism — soft PASS
A woman with a parasol in the style of Impressionism — soft brushstrokes, dappled light, pastel palette

Evaluator (Qwen 3 VL)

The image clearly exhibits Impressionism: visible brushstrokes, dappled light effects, and a soft pastel palette. The woman holds a parasol, and the scene captures fleeting outdoor moments with atmospheric color. These core stylistic elements are unmistakably present and well-executed. VERDICT: PASS

A portrait of a man in Pop Art style — bold outlines, Ben-DaPASS
A portrait of a man in Pop Art style — bold outlines, Ben-Day dots, flat vivid primary colors

Evaluator (Qwen 3 VL)

The image clearly exhibits Pop Art style: bold black outlines, Ben-Day dot patterns on skin tones, and flat, vivid primary colors (blue shirt, yellow skin). Facial features are simplified and stylized, consistent with the requested aesthetic. All key visual characteristics are present and unmistakable. VERDICT: PASS

Hard 67%
A forest rendered in photorealistic watercolor — the detail PASS
A forest rendered in photorealistic watercolor — the detail and lighting of a photo combined with visible watercolor bleeds and paper texture

Evaluator (Qwen 3 VL)

The image blends photorealistic forest detail with visible watercolor bleeds and paper texture, especially around edges and in background haze. Lighting mimics a photo with sunbeams and depth. Watercolor effects are present but not uniformly applied, sometimes obscuring realism. Core style elements are recognizable, though execution is inconsistent. VERDICT: PASS

A woman in a futuristic city rendered in cyberpunk Art NouvePASS
A woman in a futuristic city rendered in cyberpunk Art Nouveau — neon colors and tech elements combined with flowing organic lines and floral borders

Evaluator (Qwen 3 VL)

The image clearly depicts a cyberpunk scene with neon lighting and tech elements. It features flowing, organic Art Nouveau floral borders framing the composition and ornate, curvilinear mechanical designs on the woman’s armor. The color palette uses vibrant neon tones against dark backgrounds. All requested stylistic elements are visibly present and well-integrated. VERDICT: PASS

A medieval castle depicted in Japanese ukiyo-e woodblock priFAIL
A medieval castle depicted in Japanese ukiyo-e woodblock print style combined with photorealistic sky and lighting

Evaluator (Qwen 3 VL)

The image blends ukiyo-e style (bold outlines, flat areas of color, woodblock texture) with photorealistic sky and lighting. The castle’s architecture is Japanese, not medieval, which misrepresents the prompt. However, the requested stylistic fusion is visually present. The core stylistic elements are recognizable despite the architectural mismatch. VERDICT: FAIL

Extreme 100%
An interior scene rendered in soft brutalism — raw concrete PASS
An interior scene rendered in soft brutalism — raw concrete textures combined with rounded pastel furniture and warm diffused light

Evaluator (Qwen 3 VL)

The image clearly shows raw concrete walls and floor, matching soft brutalism. Rounded pastel furniture (pink sofa, pale green table, pink mirror) and warm, diffused light from the arched window are present. The style is unmistakably recognizable and aligns with the prompt’s key visual elements. VERDICT: PASS

A still life painted in anxious minimalism — sparse compositPASS
A still life painted in anxious minimalism — sparse composition with uncomfortable negative space, muted colors, and subtle visual tension

Evaluator (Qwen 3 VL)

Sparse composition with a single object, muted gray tones, and significant negative space create an anxious minimalism. The dark pitcher contrasts subtly with the plain background, and its isolated placement evokes visual tension. The red mark adds a small, unsettling detail. All key elements of the requested style are clearly present. VERDICT: PASS

A cityscape rendered in nostalgic futurism — retro-futuristiPASS
A cityscape rendered in nostalgic futurism — retro-futuristic 1960s space-age aesthetics with a melancholic warm-toned patina

Evaluator (Qwen 3 VL)

The image clearly depicts retro-futuristic 1960s space-age aesthetics with domed structures, flying saucer-like buildings, and elevated transit. Warm, melancholic lighting and a patina effect are present, evoking nostalgia. The style is unmistakably recognizable and aligns with the prompt’s core visual requirements. VERDICT: PASS