ImageBench

ImageBench V1 —

192 evaluations across 6 categories

125
Pass
67
Fail
65%
Pass Rate
64.7s
Avg Latency
64.5s
Min Latency
65.1s
Max Latency
Text Rendering47%Spatial Reasoning61%Human realism79%Truthfulness59%Professional Studio85%Graphical design46%Latency0%
The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink
Text RenderingTypography StyleEasylocal/prxpixel-t2i-7b

Prompt: The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink

Text Rendering 47%

Typography StyleClick to expand100%
Easy 100%
The word 'CHAPTER ONE' typed on aged paper with a vintage tyPASS
The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink

Evaluator (Qwen 3 VL)

Text reads “CHAPTER ONE” correctly. Font resembles vintage typewriter style with bold, slightly uneven strokes and speckled ink texture. Paper shows aged, creased, and torn edges. However, the ink lacks the authentic smudged or inconsistent pressure typical of real typewriter output. VERDICT: PASS

The phrase 'TODAY'S SPECIAL' written in white chalk on a darPASS
The phrase 'TODAY'S SPECIAL' written in white chalk on a dark green chalkboard with chalk dust

Evaluator (Qwen 3 VL)

Text reads “TODAY’S SPECIAL” correctly. Letters show chalk-like texture, smudges, and dust particles on dark green surface, convincingly mimicking chalk on a chalkboard. Style matches prompt precisely. VERDICT: PASS

The word 'OPEN' displayed on a red LED sign with glowing pixPASS
The word 'OPEN' displayed on a red LED sign with glowing pixel-style letters

Evaluator (Qwen 3 VL)

The word "OPEN" is correctly spelled and rendered in glowing red LED-style letters with visible pixel-like dots, matching the described medium. The lighting and texture convincingly simulate an LED sign. VERDICT: PASS

Writing accuracyClick to expand33%
Easy 100%
A single large word 'HELLO' centered on a white backgroundPASS
A single large word 'HELLO' centered on a white background

Evaluator (Qwen 3 VL)

The word "HELLO" is fully spelled correctly and is clearly readable. All five letters are intact, properly formed, and centered on the white background. No misspellings, missing letters, or illegible parts are present. The text matches the intended specification exactly. VERDICT: PASS

The word 'ADVENTURE' in big bold letters centered on a plainPASS
The word 'ADVENTURE' in big bold letters centered on a plain cream background

Evaluator (Qwen 3 VL)

The word "ADVENTURE" is spelled correctly, fully visible, and clearly readable in bold, centered black letters on a plain cream background. No misspellings, missing words, or illegible parts are present. All text matches the intended specification. VERDICT: PASS

The word 'SERENITY' written large and centered on a solid liPASS
The word 'SERENITY' written large and centered on a solid light blue background

Evaluator (Qwen 3 VL)

The word "SERENITY" is fully spelled correctly and clearly legible in bold, dark blue font. It is centered on a solid light blue background, matching the intended design. No misspellings, missing words, or illegible parts are present. All text elements align with the specification. VERDICT: PASS

Medium 33%
The sentence 'The quick brown fox jumps high' in clean blackFAIL
The sentence 'The quick brown fox jumps high' in clean black text on a white background

Evaluator (Qwen 3 VL)

The word "jumps" is misspelled as "breps". All other words are present and readable. The text is clean black on white background as intended. The misspelling constitutes a failure to meet the required accuracy. VERDICT: FAIL

'Every sunset brings a new dawn ahead' written neatly on a pPASS
'Every sunset brings a new dawn ahead' written neatly on a pale yellow background

Evaluator (Qwen 3 VL)

The text reads: "Every sunset brings a new dawn ahead". All words are spelled correctly and fully legible. The phrase matches the intended text exactly, with no missing or extra words. The cursive font is clear against the pale yellow background. VERDICT: PASS

'Stars shine brightest in the dark night' displayed on a cleFAIL
'Stars shine brightest in the dark night' displayed on a clean off-white background

Evaluator (Qwen 3 VL)

The text reads "Stars shine biggest in the night," misspelling "brightest" as "biggest." All words are fully readable, but the intended word is incorrect. The background is clean off-white as specified. VERDICT: FAIL

Hard 0%
A poster with headline 'SUMMER SALE' in large bold text, bodFAIL
A poster with headline 'SUMMER SALE' in large bold text, body text 'Up to 50% off all items this weekend only' in medium size, and fine print 'Terms and conditions apply. See store for details.' at the bottom in small text

Evaluator (Qwen 3 VL)

The headline "SUMMER SALE" is correct. Body text reads "Up to 50% off weenkems this only" — "weekends" is misspelled. Fine print reads "Ternes and corddisnoirns for dpils." — "Terms and conditions apply. See store for details." is misspelled. All text is readable but contains multiple errors. VERDICT: FAIL

A flyer with large title 'GRAND OPENING', subtitle 'Join us FAIL
A flyer with large title 'GRAND OPENING', subtitle 'Join us for food, music, and fun' in medium text, and small disclaimer 'RSVP required. Limited seating available.' at the bottom

Evaluator (Qwen 3 VL)

Text errors: "renitted" (should be "required"), "avaialble" (should be "available"). "RSVP" is correct but misaligned with intent. All words are readable, but key phrases are misspelled. Core message is distorted. VERDICT: FAIL

An event banner with headline 'ANNUAL GALA 2025' at the top FAIL
An event banner with headline 'ANNUAL GALA 2025' at the top in large serif font, description 'An evening of fine dining and live jazz' in medium text, and footnote 'Black tie optional. Doors open at 6 PM.' in small text

Evaluator (Qwen 3 VL)

Text contains errors: “fine din” (missing “ing”), “jazzz.” (extra ‘z’), and “opitall6M” (misspelled “optional” with “6” instead of “6 PM”). All words are readable but not correctly spelled. Core intent is visible but inaccurate. VERDICT: FAIL

Extreme 0%
A vintage movie poster for 'THE LAST VOYAGE' featuring a tagFAIL
A vintage movie poster for 'THE LAST VOYAGE' featuring a tagline 'One ship. One chance. No turning back.' with credits block reading 'Directed by James Harlow. Starring Elena Voss, Marcus Chen, and Sofia Reyes. Music by Thomas Grant. A Paramount Pict

Evaluator (Qwen 3 VL)

Text contains multiple errors: "bak" (should be "back"), "Eleced" (should be "Directed"), "Elema Vowss" (should be "Elena Voss"), "Saheil Chen" (should be "Marcus Chen"), "Chenes" (should be "and"), "Soolda Bynosts Reayes" (should be "Sofia Reyes"), "Sarmect Musicby" (should be "Music by"), "Pananurt Sumuer" (should be "Paramount Pictures"), "Coming Summer 2025" is missing. VERDICT: FAIL

A magazine cover titled 'NATIONAL EXPLORER' with cover linesFAIL
A magazine cover titled 'NATIONAL EXPLORER' with cover lines 'The Hidden Temples of Cambodia — page 34', 'Interview: Dr. Maria Santos on Climate Action', 'Top 10 Trails You Must Hike This Year', and a barcode area with 'Issue 247 | March 2025 | $6.99

Evaluator (Qwen 3 VL)

Text contains multiple errors: “Ipase 41 Mara d’ Santes” (incorrect), “The Hidden Temipes of Acmabce.” (misspelled), “Page Iverrisx on: Top Trails Actiiton” (misspelled), “Clnortwt You Mlust or This Year” (misspelled). Barcode text “Issue 247 | March 2025 | $6.99” is correct. Most cover lines are unreadable or incorrect. VERDICT: FAIL

A product label for 'GOLDEN HARVEST ORGANIC HONEY' with subtFAIL
A product label for 'GOLDEN HARVEST ORGANIC HONEY' with subtitle 'Raw & Unfiltered', weight '12 oz (340g)', description 'Pure wildflower honey harvested from sustainable apiaries in the Pacific Northwest. No additives or preservatives.', and small te

Evaluator (Qwen 3 VL)

Text contains multiple errors: "ORANIC" (misspelled), "Unflerted" (misspelled), "Purr & wild forveer" (incorrect), "hors rouurebg..." (garbled), "puutfndeve..." (garbled), "12 00Z) nos!(340g" (incorrect format), "Best breeaty: 2006" (misspelled), "Prouct Croorest of USA" (misspelled). All text is illegible or incorrect. VERDICT: FAIL

Spatial Reasoning 61%

Attributes BindingClick to expand78%
Easy 100%
A red ball and a blue cube sitting side by side on a white sPASS
A red ball and a blue cube sitting side by side on a white surface

Evaluator (Qwen 3.5 122B)

The image shows two objects on a white surface: 1. Left object: Red sphere (ball) — correct color and shape. 2. Right object: Blue cube — correct color and shape. Both objects are side by side, as requested. No attribute swaps observed. Colors, shapes, and spatial arrangement match the prompt exactly. VERDICT: PASS

A green apple and an orange pumpkin on a wooden cutting boarPASS
A green apple and an orange pumpkin on a wooden cutting board

Evaluator (Qwen 3.5 122B)

The image shows a green apple and an orange pumpkin on a wooden cutting board. The apple is correctly green, round, and has a stem. The pumpkin is correctly orange, round with vertical ridges, and has a dark stem. Both objects are appropriately sized relative to each other and placed on the board as described. No attribute swaps or misidentifications are present. VERDICT: PASS

A yellow taxi and a black limousine parked next to each othePASS
A yellow taxi and a black limousine parked next to each other on a street

Evaluator (Qwen 3.5 122B)

The image shows a yellow taxi and a black limousine parked side by side on a street. The taxi is correctly colored yellow with typical taxi markings and roof sign. The limousine is correctly colored black, elongated in shape, and positioned next to the taxi. All attributes (color, size, shape) are correctly bound to each object with no swaps. VERDICT: PASS

Medium 100%
A large red circle, a small blue triangle, and a medium greePASS
A large red circle, a small blue triangle, and a medium green star arranged on a white canvas

Evaluator (Qwen 3.5 122B)

The image shows three objects on a white background: - A large red circle (correct) - A small blue triangle (correct) - A medium green star (correct) All attributes are correctly bound to their respective objects with no swaps. The sizes, colors, and shapes match the prompt exactly. VERDICT: PASS

A tall thin white candle, a short fat red candle, and a mediPASS
A tall thin white candle, a short fat red candle, and a medium spiral yellow candle on a shelf

Evaluator (Qwen 3.5 122B)

The image shows three candles on a wooden shelf: 1. Left: Tall, thin, white candle — matches description. 2. Center: Short, fat, red candle — matches description. 3. Right: Medium-sized, spiral-shaped, yellow candle — matches description. All attributes (color, size, shape) are correctly bound to each object with no swaps. VERDICT: PASS

A big brown horse, a small white rabbit, and a medium-sized PASS
A big brown horse, a small white rabbit, and a medium-sized black cat standing in a meadow

Evaluator (Qwen 3.5 122B)

The image shows a big brown horse with a white star on its forehead, standing in a meadow. In front of the horse are a small white rabbit sitting and a medium-sized black cat standing. All attributes (color, size, shape) are correctly bound to each object with no swaps. The scene matches the prompt exactly. VERDICT: PASS

Hard 33%
A pink fire truck, a blue banana, and a green sun in a surreFAIL
A pink fire truck, a blue banana, and a green sun in a surreal landscape

Evaluator (Qwen 3.5 122B)

The image shows a pink fire truck, which matches the prompt. However, the banana is green (not blue), and the sun is yellow/gold (not green). The attributes are swapped or incorrect relative to the prompt’s requirements. Specifically: - Fire truck: pink (correct) - Banana: green (should be blue) - Sun: yellow (should be green) VERDICT: FAIL

A purple carrot, a red tree with red leaves, and a white watPASS
A purple carrot, a red tree with red leaves, and a white watermelon sliced open

Evaluator (Qwen 3.5 122B)

The image contains: 1. A purple carrot — correct color, shape, and object. 2. A red tree with red leaves — correct color and foliage; trunk is brown, which is natural. 3. A sliced watermelon — but it’s white inside, not red/pink as typical; however, the prompt specified “white watermelon,” so this matches. Seeds and rind are accurate. All attributes are correctly bound to objects with no swaps. VERDICT: PASS

An orange elephant, a yellow penguin, and a silver strawberrFAIL
An orange elephant, a yellow penguin, and a silver strawberry arranged on a table

Evaluator (Qwen 3.5 122B)

The image shows an orange elephant, a yellow penguin, and a red strawberry on a table. The prompt requested a silver strawberry, but the strawberry is red. All other attributes (orange elephant, yellow penguin) are correctly bound. Since the strawberry’s color is incorrect, the core criteria of accurate attribute binding is not fully met. VERDICT: FAIL

CompositionalityClick to expand89%
Easy 67%
A tabby cat wearing oversized red sunglassesPASS
A tabby cat wearing oversized red sunglasses

Evaluator (Qwen 3.5 122B)

The image contains all described elements: 1. Tabby cat - Present (distinctive striped fur pattern) 2. Oversized red sunglasses - Present (large red frames covering eyes) 3. Coherent combination - Present (glasses fit naturally on cat's face) All prompt requirements are met with proper integration of concepts. VERDICT: PASS

A golden retriever wearing a tiny cowboy hatFAIL
A golden retriever wearing a tiny cowboy hat

Evaluator (Qwen 3.5 122B)

The image contains: - Golden retriever: Present and clearly visible. - Cowboy hat: Present, but not tiny — it’s proportionally large on the dog’s head. - Coherent combination: The hat sits naturally on the dog, though size contradicts “tiny.” Since “tiny” is a key descriptor in the prompt and the hat is clearly not tiny, the core criteria are not fully met. VERDICT: FAIL

A penguin holding a colorful umbrellaPASS
A penguin holding a colorful umbrella

Evaluator (Qwen 3.5 122B)

The image shows a penguin standing upright on a snowy surface. The penguin is holding a large, colorful umbrella with rainbow-colored panels (red, orange, yellow, green, blue, purple). The umbrella’s handle is curved and black, and the penguin grips it with one flipper. All described elements — penguin, colorful umbrella, and the act of holding it — are clearly present and coherently combined. VERDICT: PASS

Medium 100%
A medieval knight in full armor sitting at a desk typing on PASS
A medieval knight in full armor sitting at a desk typing on a modern laptop

Evaluator (Qwen 3.5 122B)

The image shows a figure in full medieval plate armor, including helmet and gauntlets, seated at a wooden desk. The knight’s hands are positioned on a modern laptop keyboard as if typing. The setting includes an arched window and stone walls, suggesting a medieval interior. All described elements — knight, armor, desk, laptop, typing action — are present and coherently combined. VERDICT: PASS

A Roman gladiator taking a selfie with a smartphone in the CPASS
A Roman gladiator taking a selfie with a smartphone in the Colosseum

Evaluator (Qwen 3.5 122B)

The image contains all required elements: 1. Roman gladiator: Present (wearing armor, helmet, red tunic) 2. Selfie action: Present (holding phone up, looking at screen) 3. Smartphone: Present (modern black smartphone with triple camera) 4. Colosseum: Present (architectural background matches Roman amphitheater) All elements are coherently combined in a single scene. The anachronistic combination is intentional and executed clearly. VERDICT: PASS

An astronaut in a spacesuit riding a bicycle through a parkPASS
An astronaut in a spacesuit riding a bicycle through a park

Evaluator (Qwen 3.5 122B)

The image shows an astronaut in a full white spacesuit with helmet and backpack, riding a black bicycle along a paved path in a green park with trees and grass. All three core elements — astronaut, bicycle, park — are clearly present and coherently combined. The scene is logically consistent and visually unified. VERDICT: PASS

Hard 100%
An octopus wearing a top hat and monocle playing chess on thPASS
An octopus wearing a top hat and monocle playing chess on the surface of the moon with Earth visible in the background

Evaluator (Qwen 3.5 122B)

The image contains all described elements: - Octopus: Present, central figure - Top hat: Present, worn by octopus - Monocle: Present, on octopus's eye - Chess game: Present, board with pieces being played - Moon surface: Present, cratered ground - Earth in background: Present, visible in upper left All elements are coherently combined in a surreal but visually consistent composition. VERDICT: PASS

A dinosaur in a business suit giving a PowerPoint presentatiPASS
A dinosaur in a business suit giving a PowerPoint presentation in a modern office while drinking coffee

Evaluator (Qwen 3.5 122B)

Analysis of elements: 1. Dinosaur: Present (T-Rex type). 2. Business suit: Present (jacket, trousers, tie). 3. Giving a PowerPoint presentation: Present (holographic chart/graph projected in front of it). 4. Modern office: Present (desks, chairs, large windows, city view). 5. Drinking coffee: Present (holding a steaming cup). All elements are present and coherently combined. VERDICT: PASS

A robot dressed as a pirate sailing a paper boat on a lake oPASS
A robot dressed as a pirate sailing a paper boat on a lake of lava while juggling three kittens

Evaluator (Qwen 3.5 122B)

The image contains all described elements: a robot dressed as a pirate (with hat, bandana, and skull emblem), sailing in a white paper boat, on water that reflects lava flows from surrounding rocky terrain (suggesting “lake of lava”), and juggling three kittens — one on its head, two held aloft in each hand. The scene is surreal but coherent. All prompt elements are present and integrated. VERDICT: PASS

CountingClick to expand22%
Easy 33%
Exactly 3 red apples arranged on a wooden tableFAIL
Exactly 3 red apples arranged on a wooden table

Evaluator (Qwen 3.5 122B)

The image shows four red apples on a wooden table, not three as requested. There is also a notebook in the background, which was not mentioned in the prompt. Since the count of apples does not match the specified “exactly 3,” the core requirement is unmet. VERDICT: FAIL

Exactly 5 yellow rubber ducks sitting in a row on a bathroomPASS
Exactly 5 yellow rubber ducks sitting in a row on a bathroom shelf

Evaluator (Qwen 3.5 122B)

The image shows five yellow rubber ducks arranged in a row on a white bathroom shelf against a tiled wall. Each duck is identical in appearance, with orange beaks and black eyes. The count of ducks matches the requested number exactly. All objects are clearly visible and correctly positioned as described. VERDICT: PASS

Exactly 4 white candles standing on a dark surfaceFAIL
Exactly 4 white candles standing on a dark surface

Evaluator (Qwen 3.5 122B)

The image displays five white candles standing upright on a dark surface, each with a lit flame. The prompt requested exactly four candles. Since the actual count is five, the requirement is not met. VERDICT: FAIL

Medium 33%
2 orange cats and 3 black dogs sitting together on a grassy FAIL
2 orange cats and 3 black dogs sitting together on a grassy lawn

Evaluator (Qwen 3.5 122B)

The image shows four animals sitting on grass: two orange tabby cats on the left and two black dogs (likely Labrador Retrievers) on the right. The prompt requested “2 orange cats and 3 black dogs.” While the cat count is correct, the dog count is off by one — only two dogs are present, not three. Therefore, the object counts do not exactly match the prompt. VERDICT: FAIL

4 red roses and 2 white daisies in a glass vase on a tableFAIL
4 red roses and 2 white daisies in a glass vase on a table

Evaluator (Qwen 3.5 122B)

The image shows a bouquet in a glass vase on a wooden table. Counting the flowers: there are 6 red roses and 2 white daisies. The prompt requested “4 red roses and 2 white daisies.” While the daisy count matches, the rose count exceeds the request by 2. Since the core criterion (exact match) is not met, this fails the specification. VERDICT: FAIL

1 blue butterfly and 3 yellow butterflies resting on a gardePASS
1 blue butterfly and 3 yellow butterflies resting on a garden fence

Evaluator (Qwen 3.5 122B)

The image shows 1 blue butterfly and 3 yellow butterflies resting on a wooden garden fence. The count exactly matches the requested prompt: one blue and three yellow butterflies. All objects are clearly visible and correctly positioned on the fence as described. VERDICT: PASS

Hard 0%
Exactly 9 goldfish swimming in a large round fishbowlFAIL
Exactly 9 goldfish swimming in a large round fishbowl

Evaluator (Qwen 3.5 122B)

The image shows a round fishbowl containing goldfish. Counting the visible fish: there are 11 goldfish clearly depicted inside the bowl. The prompt requested exactly 9 goldfish. Since 11 ≠ 9, the count does not match the requirement. VERDICT: FAIL

A fruit bowl containing exactly 3 bananas, 4 oranges, and 2 FAIL
A fruit bowl containing exactly 3 bananas, 4 oranges, and 2 apples

Evaluator (Qwen 3.5 122B)

The image shows a fruit bowl with 3 bananas, 4 oranges, and 1 red apple. The prompt requested exactly 2 apples, but only 1 is present. Therefore, the count does not match the specification. VERDICT: FAIL

Exactly 11 colorful balloons floating against a clear blue sFAIL
Exactly 11 colorful balloons floating against a clear blue sky

Evaluator (Qwen 3.5 122B)

The image shows a cluster of colorful balloons against a clear blue sky. Counting them: there are 13 balloons visible — pink, light green, dark blue, yellow, purple, orange, red, lime green, and two more blues and purples. The prompt requested exactly 11 balloons. Since 13 ≠ 11, the count does not match. VERDICT: FAIL

NegationClick to expand33%
Easy 67%
A tropical beach scene with no peoplePASS
A tropical beach scene with no people

Evaluator (Qwen 3.5 122B)

The image shows a pristine tropical beach with white sand, turquoise water, and a clear blue sky with scattered clouds. There are no people visible anywhere in the scene — no figures on the sand, in the water, or along the horizon. The exclusion criterion (“no people”) is fully satisfied. VERDICT: PASS

A cat lying on a couch without a collarFAIL
A cat lying on a couch without a collar

Evaluator (Qwen 3.5 122B)

The image shows a tabby cat lying on a gray couch. The cat is wearing a black collar with a small metallic tag attached. Since the prompt explicitly excluded “a cat lying on a couch without a collar,” and the cat in the image clearly has a collar, the excluded element is present. VERDICT: FAIL

A classroom with desks and a chalkboard but no studentsPASS
A classroom with desks and a chalkboard but no students

Evaluator (Qwen 3.5 122B)

The image shows an empty classroom with wooden desks and chairs arranged in rows facing a large chalkboard mounted on the wall. The chalkboard has faint chalk smudges but no writing. There are no students visible anywhere in the scene — the room is completely unoccupied. The excluded element (students) is fully absent. VERDICT: PASS

Medium 33%
A sandy beach with palm trees, seashells, and a beach umbrelFAIL
A sandy beach with palm trees, seashells, and a beach umbrella but no water or ocean visible

Evaluator (Qwen 3.5 122B)

The image shows a sandy beach with palm trees, seashells in the foreground, and a yellow beach umbrella. However, the ocean and water are clearly visible in the background — waves are breaking on the shore, and the sea extends to the horizon under a blue sky. Since the prompt explicitly excluded “no water or ocean visible,” and both are present, the core constraint is violated. VERDICT: FAIL

An office workspace with a desk, chair, bookshelf, and lamp FAIL
An office workspace with a desk, chair, bookshelf, and lamp but no computers or screens

Evaluator (Qwen 3.5 122B)

The image clearly shows a computer monitor (an Apple iMac) on the desk, along with a keyboard and mouse. The prompt explicitly excluded “computers or screens,” yet a screen is prominently present. This violates the exclusion criterion. VERDICT: FAIL

A kitchen with pots, pans, a stove, and a cutting board but PASS
A kitchen with pots, pans, a stove, and a cutting board but no food

Evaluator (Qwen 3.5 122B)

The image shows a clean, modern kitchen with a stainless steel stove, two pots on the burners, a frying pan on a wooden cutting board, and utensils in a holder. No food is visible anywhere — no ingredients, cooked meals, or raw items. The excluded element (“food”) is truly absent. All other elements (pots, pans, stove, cutting board) are present as described. VERDICT: PASS

Hard 0%
A pepperoni pizza with tomato sauce, basil, and crust but abFAIL
A pepperoni pizza with tomato sauce, basil, and crust but absolutely no cheese

Evaluator (Qwen 3.5 122B)

The image shows a pepperoni pizza with tomato sauce, basil, and crust — but it clearly includes melted white cheese covering the surface beneath the pepperoni and basil. The prompt explicitly excluded cheese (“absolutely no cheese”), yet cheese is visibly present. Therefore, the excluded element is not absent. VERDICT: FAIL

A car body with windows, doors, and headlights but no wheelsFAIL
A car body with windows, doors, and headlights but no wheels

Evaluator (Qwen 3.5 122B)

The image shows a vintage-style van or small bus with a metallic, weathered body. It clearly has windows along the side and front, doors (including a sliding door on the side), and illuminated headlights. Crucially, it also has four visible wheels — front and rear — mounted on axles with tires. Since the prompt explicitly excluded “no wheels,” and wheels are present, the excluded element is not absent. VERDICT: FAIL

A human face with eyes, nose, and mouth but no eyebrowsFAIL
A human face with eyes, nose, and mouth but no eyebrows

Evaluator (Qwen 3.5 122B)

The image clearly shows a human face with eyes, nose, and mouth — as expected. However, the prompt explicitly excluded eyebrows, and in this image, prominent, well-defined eyebrows are clearly visible above both eyes. Since the excluded element (eyebrows) is present, the image fails to meet the exclusion criterion. VERDICT: FAIL

Relative PositionClick to expand83%
Easy 100%
A cat sitting on top of a cardboard boxPASS
A cat sitting on top of a cardboard box

Evaluator (Qwen 3.5 122B)

The image shows a tabby cat sitting directly on top of a closed cardboard box. The cat’s paws are placed on the upper surface of the box, and its body is elevated above it. The tail extends off the right side of the box. All spatial relationships described — specifically “on top of” — are accurately represented. No contradictions or misplacements are visible. VERDICT: PASS

A red ball underneath a wooden chairPASS
A red ball underneath a wooden chair

Evaluator (Qwen 3.5 122B)

The image shows a red ball positioned on the floor directly beneath the seat of a wooden chair. The ball is centered between the front legs and under the chair’s frame, satisfying the spatial relationship “underneath a wooden chair.” All positional descriptors in the prompt are accurately reflected. VERDICT: PASS

A coffee mug next to an open book on a deskPASS
A coffee mug next to an open book on a desk

Evaluator (Qwen 3.5 122B)

The image shows a coffee mug positioned to the left and slightly behind an open book, both resting on a wooden desk surface. The spatial relationships described in the prompt — “next to” and “on a desk” — are accurately represented. The mug is adjacent to the book, and both are clearly placed on the desk. No positional errors are present. VERDICT: PASS

Medium 100%
A bird perched on a branch, the branch extending from a treePASS
A bird perched on a branch, the branch extending from a tree next to a park bench

Evaluator (Qwen 3.5 122B)

The image shows a bird perched on a branch that extends from a tree trunk on the right side of the frame. The branch stretches leftward across the upper portion of the image. Below and to the left of the branch, in the mid-ground, is a park bench sitting on grass near a paved path. All spatial relationships described — bird “on” branch, branch “from” tree, tree “next to” bench (with bench below and left) — are accurately represented. VERDICT: PASS

A cat sitting on a pillow, the pillow placed on a wooden chaPASS
A cat sitting on a pillow, the pillow placed on a wooden chair

Evaluator (Qwen 3.5 122B)

The image shows a tabby cat sitting directly on a white pillow. The pillow is placed on the seat of a wooden chair. All spatial relationships described in the prompt — “cat on pillow,” “pillow on chair” — are accurately represented. The cat is above the pillow, the pillow is above the chair seat, and no conflicting left/right/under/above relationships are present. VERDICT: PASS

A candle on top of a stack of books, the books next to a potPASS
A candle on top of a stack of books, the books next to a potted plant

Evaluator (Qwen 3.5 122B)

The image shows a lit candle placed directly on top of a stack of books, which matches the prompt’s “candle on top of a stack of books.” The potted plant is positioned to the left and slightly behind the book stack — consistent with “books next to a potted plant.” All specified spatial relationships (on, next to) are accurately represented. No contradictions or misplacements detected. VERDICT: PASS

Hard 33%
Five chess pieces on a board: a king in the center, a queen FAIL
Five chess pieces on a board: a king in the center, a queen to its left, a bishop to its right, a rook above the king, and a knight below the king

Evaluator (Qwen 3.5 122B)

The image shows four chess pieces: a golden king in the center, two black knights (one to the left, one to the right), and a wooden pawn in front. There is no queen, bishop, or rook visible. The prompt specified five pieces with specific roles and positions — none of which match the actual content beyond the king’s centrality. Spatial relationships described (queen left, bishop right, etc.) are entirely absent or incorrect. VERDICT: FAIL

A desk scene: laptop in the center, coffee mug to the left, PASS
A desk scene: laptop in the center, coffee mug to the left, phone to the right, lamp above, and notebook below

Evaluator (Qwen 3.5 122B)

The image shows a desk scene with a laptop centered, a coffee mug to its left, a phone to its right, a lamp hanging above, and an open notebook positioned below the laptop. All specified spatial relationships — left, right, above, below, and center — are accurately represented in the composition. Lighting and shadows support the arrangement’s realism. VERDICT: PASS

Five animals in a field: a horse on the far left, a cow to tFAIL
Five animals in a field: a horse on the far left, a cow to the right of the horse, a sheep in the center, a goat to the right of the sheep, and a dog on the far right

Evaluator (Qwen 3.5 122B)

The image shows five animals in a field, but the spatial arrangement does not match the prompt. From left to right: a brown horse, a black horse (not a cow), a sheep, a goat, and a dog. The prompt incorrectly identifies the second animal as a cow. All other positional relationships are correct. VERDICT: FAIL

Extreme 100%
A landscape with a red barn in the foreground at large scalePASS
A landscape with a red barn in the foreground at large scale, a windmill in the midground at medium scale, and snow-capped mountains in the background at small scale, all with consistent depth perspective

Evaluator (Qwen 3.5 122B)

The image shows a red barn in the foreground at large scale, a windmill in the midground at medium scale, and snow-capped mountains in the background at small scale. The spatial arrangement matches the prompt: barn is closest and largest, windmill is farther and smaller, mountains are farthest and smallest. Depth perspective is consistent. All positional relationships (foreground/midground/background) are correctly represented. VERDICT: PASS

A beach scene with a sandcastle in the foreground, a surfer PASS
A beach scene with a sandcastle in the foreground, a surfer on the waves in the midground, and a cruise ship on the horizon in the background, sizes decreasing with distance

Evaluator (Qwen 3.5 122B)

The image correctly depicts a beach scene with a sandcastle in the foreground, a surfer riding waves in the midground, and a cruise ship on the horizon in the background. The sizes of the objects decrease appropriately with distance, maintaining proper spatial relationships as described. All positional relationships (foreground, midground, background) are accurate. VERDICT: PASS

A city park with a park bench and pigeons in the foreground,PASS
A city park with a park bench and pigeons in the foreground, a fountain in the midground, and skyscrapers in the background, each layer progressively smaller

Evaluator (Qwen 3.5 122B)

The image correctly shows pigeons and a bench in the foreground, a fountain in the midground, and skyscrapers in the background. The spatial layering is accurate: foreground objects are largest and closest, background objects smaller and farther away. All described positional relationships (e.g., pigeons on ground, fountain behind bench, buildings above trees) are visually consistent with the prompt. No contradictions or misplacements detected. VERDICT: PASS

Scale & ProportionsClick to expand56%
Easy 67%
A person standing next to a yellow school bus, showing realiPASS
A person standing next to a yellow school bus, showing realistic relative scale

Evaluator (Qwen 3.5 122B)

The image shows a person standing next to a yellow school bus. The person’s height relative to the bus appears realistic — approximately 1/3 to 1/4 of the bus’s height, which is consistent with real-world proportions. The bus’s wheels, windows, and overall structure also scale appropriately to the human figure. No obvious distortions or size inconsistencies are present. VERDICT: PASS

A ladybug sitting on a large green leaf, shown at realistic FAIL
A ladybug sitting on a large green leaf, shown at realistic proportions

Evaluator (Qwen 3.5 122B)

The ladybug appears disproportionately large relative to the leaf structure and vein details. In reality, a ladybug is small enough that multiple could fit on a single leaf segment; here, the insect dominates the frame in a way that suggests macro photography or digital scaling rather than natural proportion. The leaf’s texture and serrated edges appear oversized compared to the bug, breaking realistic size relationships. VERDICT: FAIL

A child holding a basketball, both at correct relative sizesPASS
A child holding a basketball, both at correct relative sizes

Evaluator (Qwen 3.5 122B)

The child’s hand proportions and grip on the basketball appear anatomically plausible for a young child. The ball size is consistent with standard youth basketballs relative to the child’s torso and arm length. No obvious scaling errors or distortions are present in the object-to-body relationships. VERDICT: PASS

Medium 67%
A tiny person standing on the rim of a coffee cup, looking dFAIL
A tiny person standing on the rim of a coffee cup, looking down into it

Evaluator (Qwen 3.5 122B)

The image shows a tiny person standing on the rim of a coffee cup, but he is not looking down into it — he is facing away and looking to the side. Additionally, his feet are submerged in the coffee, suggesting he’s standing *in* the liquid rather than on the rim. The size relationship (tiny person vs. cup) is plausible, but the pose and positioning contradict the prompt’s description. VERDICT: FAIL

A giant tabby cat walking between city skyscrapers like a kaPASS
A giant tabby cat walking between city skyscrapers like a kaiju

Evaluator (Qwen 3.5 122B)

The image shows a giant tabby cat walking between skyscrapers, consistent with the "kaiju" description. The cat’s size relative to the buildings is exaggerated as intended — it towers over street-level elements while fitting within the urban canyon. Proportions are internally consistent: paws, tail, and head scale appropriately with its massive body. No anatomical or spatial contradictions are visible. VERDICT: PASS

A miniature house sitting on the palm of someone's handPASS
A miniature house sitting on the palm of someone's hand

Evaluator (Qwen 3.5 122B)

The image shows a miniature house resting on an open human palm. The house is appropriately scaled to fit comfortably within the hand, with proportions that suggest it is a small model or toy. The fingers and palm appear natural in size relative to the house, maintaining realistic spatial relationships. No distortions or inconsistencies in scale are evident. VERDICT: PASS

Hard 33%
A group photo of an elephant, a horse, a dog, a cat, and a mFAIL
A group photo of an elephant, a horse, a dog, a cat, and a mouse, all standing in a line with correct real-world proportional sizes

Evaluator (Qwen 3.5 122B)

The image shows an elephant, horse, cat, and two mice — but no dog. The prompt explicitly requires a dog to be present. Additionally, while the elephant and horse are proportionally large relative to the cat and mice, the absence of the dog violates the prompt’s requirement. Since the core criterion (correct objects + correct proportions) is not fully met due to missing subject, this fails. VERDICT: FAIL

A dinner table scene with a plate, fork, knife, wine glass, PASS
A dinner table scene with a plate, fork, knife, wine glass, salt shaker, and peppercorn, all at realistic relative sizes

Evaluator (Qwen 3.5 122B)

The image displays a dinner table setting with a plate, fork, knife, wine glass, salt shakers, and peppercorns. The relative sizes appear realistic: the plate is appropriately large compared to the cutlery, the wine glass is proportionate to the plate, and the salt shakers are correctly sized relative to the other items. The peppercorns are small and scattered naturally. All objects maintain consistent and plausible proportions for a real-world dining scene. VERDICT: PASS

A farm scene with a barn, tractor, farmer, cow, chicken, andFAIL
A farm scene with a barn, tractor, farmer, cow, chicken, and a fence post, all at correct proportional scale

Evaluator (Qwen 3.5 122B)

The image shows a farm scene with a barn, tractor, farmer, cow, chicken, and fence post. However, the proportions are inconsistent: the rooster perched on the fence post is nearly as tall as the farmer’s head, which is unrealistic. The chicken near the farmer is also disproportionately large compared to the cow and tractor. These size distortions violate the prompt’s requirement for correct proportional scale. VERDICT: FAIL

Human realism 79%

Faces & ExpressionsClick to expand75%
Easy 100%
A close-up portrait of a young woman with a warm natural smiPASS
A close-up portrait of a young woman with a warm natural smile

Evaluator (Qwen 3 VL)

The face is symmetric with natural, well-defined features. The warm, genuine smile is clearly readable, with crinkled eyes and relaxed cheeks. No significant distortions are present. Lighting and skin texture appear realistic. The portrait successfully captures the requested emotion and aesthetic. VERDICT: PASS

A portrait of a middle-aged man with a calm, serious expressPASS
A portrait of a middle-aged man with a calm, serious expression

Evaluator (Qwen 3 VL)

Face is symmetric with natural, correctly proportioned features. Expression conveys calm seriousness with neutral mouth and direct gaze. No visible distortions or anomalies. Lighting and texture are realistic. Meets all criteria for the prompt. VERDICT: PASS

A headshot of a teenager with a relaxed, neutral facePASS
A headshot of a teenager with a relaxed, neutral face

Evaluator (Qwen 3 VL)

The face is symmetric with natural features: eyes, nose, and mouth are proportionate and centered. Skin texture and freckles appear realistic. The expression is a subtle, relaxed smile conveying neutrality. No visible distortions or unnatural artifacts. Meets all criteria for a teenager’s relaxed, neutral headshot. VERDICT: PASS

Medium 67%
A close-up of a person's face showing clear disgust, nose wrPASS
A close-up of a person's face showing clear disgust, nose wrinkled and lip curled

Evaluator (Qwen 3 VL)

The face shows strong disgust: brows furrowed, nose wrinkled, lips curled. Features are mostly symmetric and anatomically plausible. Minor distortions include slightly unnatural skin texture and a small, odd bump below the chin. The intended emotion is clearly readable and exaggerated. VERDICT: PASS

A portrait capturing genuine surprise — raised eyebrows, widPASS
A portrait capturing genuine surprise — raised eyebrows, wide eyes, slightly open mouth

Evaluator (Qwen 3 VL)

The face is symmetric with correctly rendered features. The emotion of surprise is clearly conveyed through raised eyebrows, wide eyes, and an open mouth. No significant distortions are present. Facial details like eyelashes, skin texture, and hair appear natural and well-defined. VERDICT: PASS

A face showing quiet melancholy — downcast eyes, slightly drFAIL
A face showing quiet melancholy — downcast eyes, slightly drawn mouth, distant gaze

Evaluator (Qwen 3 VL)

The face is largely symmetric with correctly rendered features. The downcast eyes and slightly drawn mouth convey quiet melancholy. However, excessive forehead wrinkles and unnatural skin texture suggest AI artifacts. The emotion is readable but distorted by these imperfections. VERDICT: FAIL

Hard 67%
An elderly woman in her 80s with deep wrinkles and white haiPASS
An elderly woman in her 80s with deep wrinkles and white hair, smiling warmly with kind eyes

Evaluator (Qwen 3 VL)

The face is symmetric with natural, aged features: deep wrinkles, white hair, and a warm smile. Eyes crinkle naturally, conveying genuine kindness. No significant distortions are present. The emotion is clearly readable as joyful and warm. All prompt elements are accurately rendered. VERDICT: PASS

A 5-year-old child with round cheeks and large eyes, crying PASS
A 5-year-old child with round cheeks and large eyes, crying with tears streaming down their face

Evaluator (Qwen 3 VL)

The child’s face is symmetric with large, expressive eyes and prominent round cheeks. Tears stream clearly down both cheeks, and the crying expression—furrowed brow, open mouth—is vividly rendered. No significant distortions are present. Features align well with the prompt’s request for a crying 5-year-old. VERDICT: PASS

An elderly man in his 70s with age spots and thinning gray hFAIL
An elderly man in his 70s with age spots and thinning gray hair, looking skeptically over reading glasses

Evaluator (Qwen 3 VL)

The face shows asymmetrical age spots concentrated on one side, unnatural skin texture, and distorted features like an oddly shaped chin and uneven ear. The skeptical expression is partially readable but undermined by these artifacts. Overall, the image fails to meet the prompt’s anatomical and emotional clarity requirements. VERDICT: FAIL

Extreme 67%
A group of three friends at a table: one laughing with head PASS
A group of three friends at a table: one laughing with head tilted back, one looking shocked with wide eyes, and one rolling their eyes with a smirk

Evaluator (Qwen 3 VL)

Faces are mostly symmetrical with correct anatomical features. Emotions are clearly readable: left person laughs with head tilted, center looks shocked with wide eyes, right person rolls eyes with smirk. No significant distortions. All three expressions match the prompt accurately. VERDICT: PASS

A family photo with four people: a father beaming with prideFAIL
A family photo with four people: a father beaming with pride, a mother smiling gently, a teenager looking bored, and a child grinning excitedly

Evaluator (Qwen 3 VL)

Faces show symmetrical features with realistic anatomy. Emotions are clearly readable: father beams, mother smiles gently, child grins excitedly. Teenager’s expression is ambiguous—neither clearly bored nor smiling. Slight facial distortions (e.g., eyes/eyelids) suggest AI artifacts. Core emotional intent for three is met; teenager’s emotion is unclear. VERDICT: FAIL

Three coworkers in a meeting: one looking frustrated with fuPASS
Three coworkers in a meeting: one looking frustrated with furrowed brows, one nodding with a pleased expression, and one looking confused with a tilted head

Evaluator (Qwen 3 VL)

All three faces show clear, symmetrical features with no distortions. The left man displays frustration (furrowed brows, clenched jaw). The center woman shows confusion (tilted head, furrowed brow). The right man shows concern or frustration (hand on cheek, downturned mouth). All emotions are clearly readable and match the prompt. VERDICT: PASS

Full BodyClick to expand100%
Easy 100%
A woman standing upright in a relaxed pose, arms at her sidePASS
A woman standing upright in a relaxed pose, arms at her sides, wearing jeans and a white t-shirt

Evaluator (Qwen 3 VL)

The woman stands upright with arms at her sides, wearing jeans and a white t-shirt as described. Limb count and proportions are anatomically correct. Pose is relaxed and recognizable. No distortions or anomalies are visible. All elements match the prompt precisely. VERDICT: PASS

A man standing straight facing the camera, hands in pockets,PASS
A man standing straight facing the camera, hands in pockets, normal proportions

Evaluator (Qwen 3 VL)

The man stands straight, facing forward, with hands in pockets — perfectly matching the prompt. Limb count and proportions are anatomically correct. No distortions are visible. The pose is clearly recognizable and natural. All criteria are satisfied. VERDICT: PASS

A person standing in a neutral pose in a park, full body visPASS
A person standing in a neutral pose in a park, full body visible from head to feet

Evaluator (Qwen 3 VL)

The person’s full body is visible from head to feet, standing in a neutral pose. Limb count and proportions are anatomically correct. The pose is recognizable as standing still, hands at sides. No significant distortions are present. The park setting matches the prompt. VERDICT: PASS

Medium 100%
A woman mid-stride running through a park, hair and clothes PASS
A woman mid-stride running through a park, hair and clothes showing motion

Evaluator (Qwen 3 VL)

The woman’s limbs are correctly proportioned and count is accurate. Her running pose is anatomically plausible, with dynamic motion in hair and clothing. No distortions are visible. The action is clearly recognizable as mid-stride running. All elements align with the prompt. VERDICT: PASS

A man captured mid-jump over a puddle, body in the air with PASS
A man captured mid-jump over a puddle, body in the air with bent knees

Evaluator (Qwen 3 VL)

The man’s limbs are correctly counted and proportioned. His pose—bent knees, arms out—matches the jump over a puddle. The splash and reflection are physically accurate. No anatomical distortions are visible. The action is clearly recognizable as mid-air jump. All elements align with the prompt. VERDICT: PASS

A dancer in a dynamic spin, dress flowing outward with one lPASS
A dancer in a dynamic spin, dress flowing outward with one leg extended

Evaluator (Qwen 3 VL)

The dancer’s pose is dynamic, with one leg extended and arms raised, matching the prompt. Limb count and proportions are anatomically correct. The dress flows outward realistically, suggesting motion. No distortions are visible. The action is clearly recognizable as a spinning dance move. VERDICT: PASS

Hard 100%
A person in a yoga tree pose — standing on one leg with the PASS
A person in a yoga tree pose — standing on one leg with the other foot pressed to the inner thigh, arms raised overhead

Evaluator (Qwen 3 VL)

The person stands on one leg, with the other foot pressed against the inner thigh, arms raised overhead in prayer position. Limb count and proportions are accurate. The pose is clearly recognizable as Tree Pose (Vrksasana). No anatomical distortions are visible. The execution is precise and natural. VERDICT: PASS

A rock climber clinging to a cliff face with arms and legs ePASS
A rock climber clinging to a cliff face with arms and legs extended to different holds

Evaluator (Qwen 3 VL)

The climber has two arms and two legs, with proportions and pose consistent with realistic rock climbing. Both hands grip holds, feet are on footholds, and body position reflects dynamic climbing. No anatomical distortions are visible. The action matches the prompt precisely. VERDICT: PASS

A cellist seated with correct posture, legs around the instrPASS
A cellist seated with correct posture, legs around the instrument, bow arm extended

Evaluator (Qwen 3 VL)

The cellist’s posture and limb count are anatomically correct. Legs are wrapped around the instrument, bow arm is extended, and the cello is properly supported. Proportions are natural, and the pose is clearly recognizable as playing. No significant distortions are visible. VERDICT: PASS

Extreme 100%
Two tango dancers in a close embrace, the man dipping the woPASS
Two tango dancers in a close embrace, the man dipping the woman backward with intertwined legs

Evaluator (Qwen 3 VL)

The dancers’ limbs are correctly counted and proportioned. The woman’s body is arched backward in a dip, with legs intertwined as described, though her right leg appears slightly bent unnaturally. The man supports her securely. The pose is recognizable as a tango dip. Minor distortion in leg alignment, but core anatomy and action are accurate. VERDICT: PASS

Two soccer players mid-tackle, one sliding and the other jumPASS
Two soccer players mid-tackle, one sliding and the other jumping over with the ball at their feet

Evaluator (Qwen 3 VL)

Both players show correct limb count and proportions. The sliding player’s body posture and the jumping player’s airborne pose are anatomically plausible and match the described action. The ball is at the feet of the jumping player. No significant distortions are visible. The scene accurately depicts a mid-tackle scenario. VERDICT: PASS

Two gymnasts performing a synchronized handstand, side by siPASS
Two gymnasts performing a synchronized handstand, side by side with identical body alignment

Evaluator (Qwen 3 VL)

Both gymnasts show correct limb count and proportions. Their synchronized handstand pose is anatomically accurate and recognizable. Limbs are aligned symmetrically, with feet together and hands firmly planted. No distortions are visible. The pose matches the prompt precisely. VERDICT: PASS

HandsClick to expand50%
Easy 67%
A person standing with both hands visible at their sides, fiFAIL
A person standing with both hands visible at their sides, fingers relaxed and naturally spread

Evaluator (Qwen 3.5 122B)

The left hand (viewer’s right) shows 5 fingers, all anatomically plausible. The right hand (viewer’s left) also displays 5 fingers, but the thumb appears abnormally short and fused near the palm, and the index finger is slightly malformed or merged with the thumb base — suggesting an AI artifact. While finger count is correct, anatomical correctness fails on the right hand. VERDICT: FAIL

A woman holding her hands out in front of her palms-up, all PASS
A woman holding her hands out in front of her palms-up, all ten fingers clearly visible

Evaluator (Qwen 3.5 122B)

The image shows a woman with both hands extended forward, palms facing up. Each hand clearly displays five fingers — left hand: thumb, index, middle, ring, pinky; right hand: same. All ten fingers are visible and anatomically positioned correctly. No extra, missing, or malformed fingers. Gesture is natural and consistent with the prompt. VERDICT: PASS

A man with arms crossed, both hands resting on opposite uppePASS
A man with arms crossed, both hands resting on opposite upper arms with fingers visible

Evaluator (Qwen 3.5 122B)

The left hand (on the right side of image) shows 5 fingers clearly — thumb, index, middle, ring, pinky — all anatomically positioned. The right hand (on left side of image) also displays 5 fingers, though the pinky is partially obscured by the arm; however, its presence is implied by knuckle alignment and hand structure. No extra or missing digits. Grip is natural for crossed arms. Minor occlusion doesn’t violate anatomical correctness. VERDICT: PASS

Medium 33%
A person holding a white ceramic coffee mug by the handle wiPASS
A person holding a white ceramic coffee mug by the handle with their right hand

Evaluator (Qwen 3.5 122B)

The hand gripping the mug handle shows 4 fingers wrapped around it, with the thumb positioned on top — anatomically plausible for a right-hand grip. However, the fingers appear slightly fused or indistinct at the knuckles, and the pinky finger is not clearly separated. While not grossly distorted, subtle anatomical inconsistencies suggest AI generation artifacts. Core criteria (5 fingers, correct grip) are mostly met but with minor flaws. VERDICT: PASS

A woman writing in a notebook with a pen gripped correctly iFAIL
A woman writing in a notebook with a pen gripped correctly in her right hand

Evaluator (Qwen 3.5 122B)

The right hand gripping the pen shows 5 fingers, but the thumb and index finger appear fused or malformed at the tip. The left hand resting on the page has 5 fingers, but they are unnaturally elongated and slightly misaligned. While the count is technically five per hand, the grip and finger anatomy show clear distortions inconsistent with realistic human hands. VERDICT: FAIL

A person playing an acoustic guitar with left hand on the frFAIL
A person playing an acoustic guitar with left hand on the fretboard and right hand strumming

Evaluator (Qwen 3.5 122B)

Upon close inspection, the right hand (strumming) shows only four fingers clearly visible, with the thumb partially obscured — but more critically, the left hand (on fretboard) has six fingers: five normal fingers plus an extra digit near the pinky. This is anatomically impossible and indicates AI generation artifact. Grip gestures are otherwise plausible, but the finger count anomaly violates basic human anatomy. VERDICT: FAIL

Hard 33%
A person making a peace sign with their right hand, clearly FAIL
A person making a peace sign with their right hand, clearly showing two fingers up and three curled

Evaluator (Qwen 3.5 122B)

The hand making the peace sign has five fingers total, but the gesture is anatomically incorrect. The thumb is curled inward and appears fused or malformed with the index finger’s base. The middle and ring fingers are curled together unnaturally, and the pinky is partially visible but awkwardly positioned. The grip doesn’t match a natural peace sign — it looks distorted and AI-generated. VERDICT: FAIL

A hand giving a thumbs-up with four fingers curled and thumbPASS
A hand giving a thumbs-up with four fingers curled and thumb extended upward

Evaluator (Qwen 3.5 122B)

The image shows one hand making a thumbs-up gesture. The thumb is extended upward, and four fingers are curled into a fist — this matches the prompt. Anatomically, the hand appears correct: four fingers curled, thumb up, no extra or missing digits. No anomalies detected. The hand structure, knuckles, and skin texture appear natural and consistent with human anatomy. VERDICT: PASS

A person counting to three on their fingers, with index, midFAIL
A person counting to three on their fingers, with index, middle, and ring fingers extended

Evaluator (Qwen 3.5 122B)

The image shows three hands, not two, which contradicts the prompt’s implication of a single person counting to three. Each hand displays only one extended finger (index, middle, ring), but anatomically, each hand should have five fingers — here, only four are visible per hand (thumb + three curled fingers). The gesture is also inconsistent: three separate hands are used instead of one hand showing three fingers. This is a clear anatomical and compositional anomaly. VERDICT: FAIL

Extreme 67%
Two people shaking hands in a professional setting, all fingFAIL
Two people shaking hands in a professional setting, all fingers of both hands anatomically correct

Evaluator (Qwen 3.5 122B)

The handshake shows two hands clasped. The left hand (from viewer’s perspective) has 5 fingers visible, correctly positioned. The right hand also shows 5 fingers, but the thumb placement and finger overlap appear slightly unnatural — one finger seems to merge or be obscured unnaturally under the other hand’s grip. While mostly anatomical, subtle distortions in finger alignment and pressure points suggest AI artifacting. Not fully correct. VERDICT: FAIL

Two people doing a high-five, palms meeting with all fingersPASS
Two people doing a high-five, palms meeting with all fingers extended on both hands

Evaluator (Qwen 3.5 122B)

The image shows two hands meeting in a high-five gesture. The left hand (woman’s) clearly displays 5 fingers, all extended and properly aligned. The right hand (man’s) also appears to have 5 fingers, though the thumb is slightly curled inward — still anatomically plausible for a high-five. No extra or missing fingers are visible. The gesture is natural and consistent with human anatomy. VERDICT: PASS

One person handing a small gift box to another person, both PASS
One person handing a small gift box to another person, both sets of hands gripping the box correctly

Evaluator (Qwen 3.5 122B)

Upon close examination, the hands appear anatomically correct with no visible anomalies. The left person’s hands show 5 fingers each (thumb + 4 fingers), gripping the box naturally. The right person’s hands also display 5 fingers per hand, with proper thumb placement and finger curvature around the gift. No extra, missing, or fused digits are present. Grip posture is realistic for handing over a small box. VERDICT: PASS

Multi-SubjectClick to expand100%
Easy 100%
Two women standing side by side: one with short blonde hair PASS
Two women standing side by side: one with short blonde hair wearing a red jacket, the other with long black hair wearing a blue dress

Evaluator (Qwen 3 VL)

Left woman: short blonde hair, wearing red jacket. Right woman: long black hair, wearing blue dress. Both are visually distinct and match descriptions precisely. No discrepancies found. VERDICT: PASS

Two men at a cafe: one with a beard and glasses wearing a grPASS
Two men at a cafe: one with a beard and glasses wearing a green sweater, the other clean-shaven in a black hoodie

Evaluator (Qwen 3 VL)

Left man: beard, glasses, green sweater. Right man: clean-shaven (bald head), black hoodie. Both visually distinct and match descriptions perfectly. No discrepancies in features or presence. VERDICT: PASS

Two friends posing: one tall with curly red hair in denim ovPASS
Two friends posing: one tall with curly red hair in denim overalls, the other short with straight brown hair in a yellow sundress

Evaluator (Qwen 3 VL)

Tall person: curly red hair, denim overalls, shirtless. Short person: straight brown bob haircut, yellow sundress. Both visually distinct and match descriptions precisely. VERDICT: PASS

Medium 100%
Four people at a dinner table: a bald man in a suit, a womanPASS
Four people at a dinner table: a bald man in a suit, a woman with silver hair in a red blouse, a young man with dreadlocks in a denim jacket, and a woman with a hijab in a green dress

Evaluator (Qwen 3 VL)

Bald man in suit: present, distinct. Silver-haired woman in red blouse: present, distinct. Young man with dreadlocks in denim jacket: present, distinct. Woman with hijab in green dress: present, distinct. All four match descriptions precisely. VERDICT: PASS

Three musicians on stage: a tall woman with short pink hair PASS
Three musicians on stage: a tall woman with short pink hair playing bass, a man with a beard and bandana on drums, and a slim man with glasses playing keyboard

Evaluator (Qwen 3 VL)

All three musicians are present and visually distinct. Left: tall woman with short pink hair playing bass. Center: bearded man with bandana on drums. Right: slim man with glasses playing keyboard. Features match prompt exactly. VERDICT: PASS

Three hikers on a trail: a man with a backpack and sunhat, aPASS
Three hikers on a trail: a man with a backpack and sunhat, a woman with braided hair and hiking poles, and a teenager with a baseball cap and water bottle

Evaluator (Qwen 3 VL)

Man: wears sunhat, backpack, and hiking boots. Woman: has braided hair, uses hiking poles, wears backpack. Teenager: wears baseball cap, holds water bottle, carries backpack. All three are visually distinct and match described features. VERDICT: PASS

Truthfulness 59%

PhotorealismClick to expand100%
Medium 100%
A still life on a wooden table featuring a clear glass of waPASS
A still life on a wooden table featuring a clear glass of water, a polished silver spoon, a folded linen napkin, and a ripe red apple — each material rendered with accurate texture and light behavior

Evaluator (Qwen 3.6 27B)

The image successfully renders the requested materials with high fidelity. The glass shows accurate refraction and transparency, the spoon exhibits sharp metallic reflections, the apple has a realistic glossy skin texture, the napkin displays a convincing woven fabric structure, and the table surface shows appropriate wood grain and sheen. Lighting and shadows are consistent across all objects. VERDICT: PASS

A workspace desk with a brushed aluminum laptop, a leather-bPASS
A workspace desk with a brushed aluminum laptop, a leather-bound notebook, a ceramic coffee mug, and a glass paperweight — all materials visually distinct and realistic

Evaluator (Qwen 3.6 27B)

The image successfully renders the requested materials with high fidelity. The laptop exhibits a convincing brushed aluminum texture with appropriate metallic reflections. The notebook displays realistic leather grain and stitching details. The ceramic mug shows correct opacity and glaze texture, while the glass paperweight demonstrates accurate refraction and transparency. The wooden desk surface has a subtle, believable grain. All materials are visually distinct and physically plausible. VERD

A kitchen counter with a stainless steel pot, a wooden cuttiPASS
A kitchen counter with a stainless steel pot, a wooden cutting board, a glass bottle of olive oil, and a cotton dish towel — each surface reflecting light correctly

Evaluator (Qwen 3.6 27B)

The image successfully renders the distinct material properties requested. The stainless steel pot exhibits sharp, accurate reflections on the countertop and its own surface. The wooden cutting board shows natural grain and matte texture. The glass bottle demonstrates translucency with the green oil visible inside, while the cotton towel displays a soft, fibrous texture. All surfaces interact with light realistically, fulfilling the prompt’s requirements for correct visual properties. VERDICT:

Physics & ReflectionsClick to expand75%
Easy 67%
A red sphere on a white surface with a single shadow cast toPASS
A red sphere on a white surface with a single shadow cast to the right, consistent with a light source from the upper left

Evaluator (Qwen 3.6 27B)

The image displays a red sphere on a white surface. The shadow is cast to the right and slightly downward, which is consistent with a light source positioned in the upper left. The shadow's shape is an ellipse, correctly representing the perspective distortion of a sphere's shadow on a flat plane. The lighting on the sphere itself also matches, with the highlight on the upper left and shading on the lower right. The physics are accurate. VERDICT: PASS

A coffee mug on a sunny windowsill casting a sharp shadow toPASS
A coffee mug on a sunny windowsill casting a sharp shadow to the left

Evaluator (Qwen 3.6 27B)

The light source is clearly coming from the right (the window), which correctly casts the shadow to the left. The shadow's shape accurately mimics the mug and handle, and its sharpness aligns with the hard sunlight depicted. The lighting on the mug itself is consistent with this direction. VERDICT: PASS

A person standing outdoors at midday with a short shadow dirFAIL
A person standing outdoors at midday with a short shadow directly below them

Evaluator (Qwen 3.6 27B)

The shadow is cast directly beneath the subject, which is physically plausible for midday lighting. However, the shadow exhibits significant distortion and artifacts; it appears to merge with the ground texture unnaturally and lacks the crisp definition expected from a hard light source like the midday sun. The edges are blurry and the shape is somewhat amorphous, suggesting a rendering error rather than a natural optical phenomenon. VERDICT: FAIL

Medium 33%
A woman standing in front of a full-length mirror, her refleFAIL
A woman standing in front of a full-length mirror, her reflection accurately showing the back of her outfit and reversed orientation

Evaluator (Qwen 3.6 27B)

The reflection is physically impossible. The woman is facing the mirror, so her reflection should show her front. Instead, the reflection shows her front while the real woman shows her back, but the reflection is not a mirror image—it’s a duplicate of the front view, violating basic optics. Additionally, the reflection’s orientation and lighting don’t match the real scene’s perspective or shadow direction. This breaks physical plausibility. VERDICT: FAIL

A cat sitting on a polished marble floor with its reflectionFAIL
A cat sitting on a polished marble floor with its reflection visible below

Evaluator (Qwen 3.6 27B)

The reflection is vertically inverted, which is physically correct for a horizontal surface. However, the reflection is unnaturally sharp and clear, lacking the expected light scattering or distortion from the marble’s veining. Additionally, the reflection extends into a completely white, featureless area below the floor line, breaking the illusion of a continuous surface. This suggests an artificial or composited effect rather than a natural optical reflection. VERDICT: FAIL

A vase of flowers on a glass table with a clear reflection vPASS
A vase of flowers on a glass table with a clear reflection visible on the table surface

Evaluator (Qwen 3.6 27B)

The reflection is physically plausible. It correctly mirrors the vase and stems with appropriate vertical inversion. The glass table’s transparency allows the reflection to appear slightly dimmer and softer than the real object, which aligns with real-world optics. No directional or color distortions are present. The reflection’s position directly beneath the vase confirms accurate spatial alignment. Overall, the rendering respects physical laws of reflection on a smooth, transparent surface. V

Hard 100%
A row of trees reflected in a calm lake at sunset, with the PASS
A row of trees reflected in a calm lake at sunset, with the reflection slightly darker and gently rippled

Evaluator (Qwen 3.6 27B)

The reflections are vertically inverted and correctly positioned beneath the trees, matching the expected behavior of a calm water surface. The gentle ripples distort the reflection slightly, which is physically accurate for a lake with minor surface movement. The reflection appears darker than the original scene, consistent with light absorption and scattering in water. The sunset’s glow is also mirrored appropriately. All aspects align with real-world optics. VERDICT: PASS

A city skyline reflected in a rain puddle on a street, with PASS
A city skyline reflected in a rain puddle on a street, with mild distortion and color shift

Evaluator (Qwen 3.6 27B)

The reflection in the puddle is physically plausible. It correctly mirrors the skyline above, showing an inverted orientation. The mild distortion is consistent with water surface tension and ripples. The color shift is subtle but present, with the reflection appearing slightly darker and more saturated than the actual buildings, which aligns with how water reflects light. The direction of the reflection matches the perspective of the scene. VERDICT: PASS

A mountain range reflected in a still alpine lake, with the PASS
A mountain range reflected in a still alpine lake, with the reflection inverted and subtly muted in color

Evaluator (Qwen 3.6 27B)

The reflection is perfectly symmetrical and inverted, matching the mountain range above. The water surface is extremely still, creating a mirror-like effect with minimal distortion. The reflected colors are slightly darker and more muted than the actual mountains, which is physically accurate for reflections on water. The lighting and shadows align correctly between the real scene and its reflection. VERDICT: PASS

Extreme 100%
A clear glass sphere sitting on a mirror surface near a poolPASS
A clear glass sphere sitting on a mirror surface near a pool of water, showing refraction through the sphere, reflection on the mirror, and water reflection simultaneously

Evaluator (Qwen 3.6 27B)

The image displays a glass sphere resting on a reflective surface, creating a clear reflection below it. The sphere also refracts the background scenery (trees and sky), which is inverted as expected through a lens. The reflection on the surface mirrors the sphere accurately. However, the prompt specifies "near a pool of water," yet the surface appears to be a solid mirror rather than water, given the perfect stillness and lack of ripples except for artificial-looking concentric circles around t

A chrome sculpture in a room with a glass floor and large wiPASS
A chrome sculpture in a room with a glass floor and large windows, showing multiple reflections and environmental mapping on every surface

Evaluator (Qwen 3.6 27B)

The reflections are physically plausible. The chrome sculpture exhibits accurate environmental mapping, reflecting the surrounding windows and room structure with appropriate distortion based on its curved geometry. The glass floor provides a clear, undistorted reflection of the sculpture and the room’s grid lines, consistent with a polished surface. Lighting is diffuse and even, matching the overcast exterior view, and no directional inconsistencies or impossible shadows are present. VERDICT:

A crystal wine glass on a polished marble counter next to a PASS
A crystal wine glass on a polished marble counter next to a window overlooking a lake — showing refraction in the glass, reflection on the marble, and the lake's own reflective surface

Evaluator (Qwen 3.6 27B)

The image displays physically plausible optics. The reflection on the marble is correctly oriented and distorted by the glass base. The refraction through the glass bowl inverts and distorts the background lake scene, consistent with lens physics. The lighting direction is coherent, casting soft shadows and highlights that match the window light source. The colors are natural, and the depth of field is appropriate for a macro-style shot. VERDICT: PASS

World KnowledgeClick to expand33%
Easy 67%
The Eiffel Tower in Paris on a clear day, architecturally acPASS
The Eiffel Tower in Paris on a clear day, architecturally accurate with its iron lattice structure and three levels

Evaluator (Qwen 3.6 27B)

The image depicts the Eiffel Tower with high architectural accuracy, correctly showing its iron lattice structure, three distinct levels, and iconic arched base. The perspective and details align with real-world photography. Minor background elements like trees and distant buildings are plausible. No significant inaccuracies are present. VERDICT: PASS

The Taj Mahal in Agra seen from the reflecting pool, with acFAIL
The Taj Mahal in Agra seen from the reflecting pool, with accurate dome, minarets, and symmetry

Evaluator (Qwen 3.6 27B)

The image depicts the Taj Mahal, but it contains significant architectural inaccuracies. The minarets are incorrectly designed with multiple balconies and a different structural style than the actual four slender minarets. The central dome appears distorted and lacks the characteristic finial. The reflection in the pool is unnaturally perfect and includes a strange floating object not present in reality. These deviations from factual architecture make it inaccurate. VERDICT: FAIL

The Statue of Liberty on her island pedestal, with correct tPASS
The Statue of Liberty on her island pedestal, with correct torch, crown, tablet, and green patina

Evaluator (Qwen 3.6 27B)

The Statue of Liberty is clearly recognizable and accurately depicted with correct green patina, torch, crown, and tablet. The pedestal and island setting are architecturally plausible. Minor details like the flame’s shape or base texture may show slight AI artifacts, but overall fidelity to real-world structure and iconography is strong. No major factual or architectural errors detected. VERDICT: PASS

Medium 33%
A traditional Japanese shamisen instrument with accurate thrFAIL
A traditional Japanese shamisen instrument with accurate three-stringed design, wooden body, and bachi plectrum

Evaluator (Qwen 3.6 27B)

The image depicts a shamisen, but it contains significant factual inaccuracies. The bridge is incorrectly designed with a horizontal bar that does not match traditional shamisen bridges. The tailpiece and tuning pegs are also stylized in ways that deviate from authentic instruments. While the overall shape and three-stringed design are recognizable, these structural errors undermine its accuracy as a representation of a real-world shamisen. VERDICT: FAIL

A medieval astrolabe with accurate rete, tympan, and engraveFAIL
A medieval astrolabe with accurate rete, tympan, and engraved degree markings on brass

Evaluator (Qwen 3.6 27B)

The image depicts a brass navigational instrument, but it fails the accuracy test. The markings are nonsensical gibberish rather than accurate degree scales or zodiac symbols. The structure is a chaotic hybrid of a compass rose and astrolabe components, lacking the distinct, functional layers (mater, tympan, rete) of a real medieval astrolabe. The "engraved degree markings" are illegible and inconsistent, failing the prompt's requirement for factual accuracy. VERDICT: FAIL

A Japanese kintsugi bowl — cracked ceramic repaired with visPASS
A Japanese kintsugi bowl — cracked ceramic repaired with visible gold lacquer lines

Evaluator (Qwen 3.6 27B)

The image depicts a ceramic bowl with visible cracks repaired using gold lacquer, which is the defining characteristic of kintsugi. The subject is clearly recognizable and accurately represents the art form described in the prompt. The lighting, texture, and composition appear realistic and consistent with a real-world photograph. No factual or architectural inaccuracies are present. VERDICT: PASS

Hard 33%
A detailed diagram of the water cycle showing evaporation frPASS
A detailed diagram of the water cycle showing evaporation from the ocean, condensation forming clouds, precipitation as rain, and collection in rivers flowing back to the sea

Evaluator (Qwen 3.6 27B)

The image depicts a stylized, cartoonish diagram of the water cycle. It clearly shows evaporation (rising from the ocean), condensation (clouds forming), precipitation (rain falling), and collection (water returning to the sea). While not photorealistic, it accurately represents the core components and flow of the water cycle as requested. The subject is recognizable and factually accurate in its simplified representation. VERDICT: PASS

A cross-section showing tectonic plates at a convergent bounFAIL
A cross-section showing tectonic plates at a convergent boundary with subduction, a trench, volcanic arc, and layered mantle

Evaluator (Qwen 3.6 27B)

The image depicts a rift valley with divergent plate movement, evidenced by the central fissure and spreading crust, rather than a convergent boundary with subduction. There is no visible subducting slab, trench, or overriding plate typical of convergent zones. The glowing blue stream resembles energy or water, not mantle convection or magma flow. While visually striking, it misrepresents the requested tectonic setting. VERDICT: FAIL

An anatomical illustration of the human heart showing four cFAIL
An anatomical illustration of the human heart showing four chambers, valves, aorta, pulmonary arteries, and veins with correct placement

Evaluator (Qwen 3.6 27B)

The image is not factually accurate. It depicts a surreal, AI-generated hybrid rather than a correct anatomical illustration. Key errors include: the aorta is rendered as a thick, ribbed red tube with multiple unnatural branches; the pulmonary artery is blue but incorrectly shaped and positioned; there are no visible chambers or valves; and strange, non-anatomical structures (like a spiral valve-like object and bulbous masses) replace real anatomy. The coronary vessels are overly stylized and mi

Extreme 0%
A Morpho menelaus butterfly with accurate iridescent blue doFAIL
A Morpho menelaus butterfly with accurate iridescent blue dorsal wings, brown ventral wing pattern with eyespots, and correct body morphology

Evaluator (Qwen 3.6 27B)

The image depicts a butterfly with the characteristic iridescent blue dorsal wings of a Morpho, but the ventral side is incorrectly shown simultaneously. In reality, butterflies cannot display both dorsal and ventral patterns at once; the ventral side is only visible when wings are closed. Additionally, the eyespots on the hindwings are more typical of species like Junonia or Morpho peleides, not Morpho menelaus, which lacks such prominent eyespots. The body morphology also appears slightly off.

The Hagia Sophia interior showing the massive central dome, FAIL
The Hagia Sophia interior showing the massive central dome, pendentives, semi-domes, and Islamic calligraphy medallions alongside Byzantine mosaics

Evaluator (Qwen 3.6 27B)

The image depicts the interior of the Hagia Sophia with recognizable elements like the central dome, pendentives, and calligraphy medallions. However, it contains significant architectural inaccuracies: the floor is a modern tiled pattern rather than historic marble; the lower structures (small domed pavilions) are not present in the actual building; and the calligraphy medallions are placed symmetrically in a way that doesn’t match the real layout. These deviations make it factually inaccurate.

A specimen of bismuth crystal showing accurate iridescent oxFAIL
A specimen of bismuth crystal showing accurate iridescent oxide layers, geometric hopper crystal formation, and characteristic staircase structure

Evaluator (Qwen 3.6 27B)

The image depicts a cluster of hexagonal prismatic crystals, which are characteristic of quartz, not bismuth. Bismuth crystals are known for their cubic hopper structure and distinct stepped "staircase" formations, neither of which are present here. The iridescence is also inconsistent with bismuth’s oxide layer appearance. Therefore, the subject is not recognizable as bismuth and fails factual accuracy. VERDICT: FAIL

Professional Studio 85%

Camera & LightingClick to expand83%
Easy 67%
A wide-angle shot of a modern living room interior showing eFAIL
A wide-angle shot of a modern living room interior showing expansive space with slight barrel distortion at the edges

Evaluator (Gemma 4 26B)

The image features a wide-angle perspective of a modern living room. However, instead of the requested "slight barrel distortion at the edges," the image utilizes a heavy vignette/circular mask that creates a "fisheye" or "peephole" effect, obscuring the edges entirely. The distortion is not a subtle lens characteristic but a heavy framing element. The lighting and depth of field are appropriate for a modern interior. VERDICT: FAIL

A close-up macro shot of a single dewdrop on a rose petal, sPASS
A close-up macro shot of a single dewdrop on a rose petal, shallow depth of field

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. It features a close-up macro shot of a single, clear dewdrop resting on a pink petal. The depth of field is shallow, with the foreground and background petals softly blurred, drawing focus entirely to the droplet. The lighting is bright and clear, highlighting the texture of the petal and the refraction within the water drop. VERDICT: PASS

An aerial bird's-eye view looking straight down at a trafficPASS
An aerial bird's-eye view looking straight down at a traffic roundabout with cars and trees

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. It features a true aerial bird's-eye view, looking straight down at a circular traffic roundabout. The perspective is top-down, with no tilt or angle. The lighting is consistent with a bright, sunny day, casting clear shadows from the trees and cars onto the pavement. The depth of field is deep, keeping the entire scene from the center to the edges in sharp focus. VERDICT: PASS

Medium 100%
A portrait lit with classic Rembrandt lighting — triangle ofPASS
A portrait lit with classic Rembrandt lighting — triangle of light on the shadowed cheek, moody atmosphere

Evaluator (Gemma 4 26B)

The image successfully implements the requested Rembrandt lighting. There is a distinct, small triangle of light on the shadowed cheek (the subject's left cheek), which is the hallmark of this lighting style. The high contrast between the bright side of the face and the deep shadows creates the requested moody atmosphere. The perspective is a standard portrait shot, and the lighting pattern is executed accurately. VERDICT: PASS

A landscape photograph bathed in golden hour light — long waPASS
A landscape photograph bathed in golden hour light — long warm shadows, orange sky, soft glow

Evaluator (Gemma 4 26B)

The image perfectly captures the requested golden hour aesthetic. The sky is a warm orange/yellow hue, and the sun provides a soft glow. Most notably, the lighting pattern features prominent, long, dark shadows stretching across the landscape, which directly matches the prompt's requirement for "long warm shadows." The perspective and depth of field are consistent with a professional landscape photograph. VERDICT: PASS

A high-key portrait of a person against a bright white backgPASS
A high-key portrait of a person against a bright white background, minimal shadows, airy and luminous

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. It features a high-key portrait with a bright white background and a very bright, luminous aesthetic. The lighting is soft and even, resulting in minimal shadows on the subject's face and clothing. The depth of field is shallow, keeping the focus on the person while the background remains a seamless, bright white, creating an airy and ethereal feel. VERDICT: PASS

Hard 67%
A portrait shot at 85mm f/1.4 with creamy bokeh background, PASS
A portrait shot at 85mm f/1.4 with creamy bokeh background, backlit with warm rim light separating the subject from a blurred garden

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. It features a portrait shot with a shallow depth of field (creamy bokeh) in the background. The lighting includes a distinct warm rim light along the subject's hair and shoulders, creating a clear separation from the blurred garden background. The perspective and lighting pattern are highly accurate to the technical specifications provided. VERDICT: PASS

A street photograph with shallow depth of field as if shot aFAIL
A street photograph with shallow depth of field as if shot at 50mm f/1.8, subject sharp with soft city lights in the background

Evaluator (Gemma 4 26B)

The image fails to meet the core requirement of the prompt. While the background features soft, bokeh-style city lights, the subject is significantly out of focus. The prompt explicitly requested a "subject sharp" look characteristic of a 50mm f/1.8 lens. Instead, the subject is blurry and lacks any sharp detail, failing the primary instruction regarding depth of field and subject focus. VERDICT: FAIL

A food photograph shot from above at 35mm f/2.8 with naturalPASS
A food photograph shot from above at 35mm f/2.8 with natural window light from the left, slight vignette, warm color temperature

Evaluator (Gemma 4 26B)

The image matches all the specified criteria. The perspective is a top-down shot (shot from above). The lighting is clearly coming from the window on the left, creating a natural light pattern. The color temperature is warm, and there is a subtle vignette effect around the edges of the bowl. The depth of field is consistent with a 35mm f/2.8 setting, with a slight soft focus on the edges. VERDICT: PASS

Extreme 100%
A tilt-shift photograph of a city street making real-life loPASS
A tilt-shift photograph of a city street making real-life look like a miniature model — selective focus strip, saturated colors

Evaluator (Gemma 4 26B)

The image successfully implements the tilt-shift effect. It features a clear selective focus strip in the mid-ground, with the foreground and background heavily blurred to create a miniature model aesthetic. The colors are highly saturated, and the perspective effectively mimics a macro-photography view of a small-scale model. The lighting is bright and even, consistent with the described effect. VERDICT: PASS

A cinematic frame with blue and orange anamorphic lens flarePASS
A cinematic frame with blue and orange anamorphic lens flare streaking horizontally across a dark moody scene

Evaluator (Gemma 4 26B)

The image features a horizontal light streak with blue and orange tones, matching the color description. The light streaks horizontally across a dark, moody background as requested. While the "anamorphic" effect is somewhat stylized and lacks a complex cinematic scene, the core elements of the lighting pattern and horizontal orientation are present. VERDICT: PASS

A portrait with dramatic split lighting — half the face brigPASS
A portrait with dramatic split lighting — half the face brightly lit, the other half in deep shadow, high contrast chiaroscuro

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt's requirements. It features a close-up portrait with dramatic split lighting. One side of the subject's face is brightly illuminated, while the other side is cast in deep, heavy shadow, creating a high-contrast chiaroscuro effect. The lighting pattern is sharp and distinct, effectively dividing the face into light and dark halves as requested. VERDICT: PASS

Color PrecisionClick to expand92%
Easy 100%
A bright red sports car parked on a gray asphalt road — the PASS
A bright red sports car parked on a gray asphalt road — the red is vivid and unambiguous

Evaluator (Gemma 4 26B)

The image features a bright, vivid red sports car parked on a gray asphalt road. The red color of the car is highly saturated, bright, and unambiguous, exactly as described in the prompt. The road is a clear gray color. All color requirements are met perfectly. VERDICT: PASS

A woman wearing a royal blue evening dress against a neutralPASS
A woman wearing a royal blue evening dress against a neutral background

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt's color requirements. The woman is wearing a vibrant, deep royal blue evening dress. The background is a solid, neutral beige/tan color, which aligns with the "neutral background" description. All specified colors are accurately represented. VERDICT: PASS

A field of pure sunflower yellow flowers under a clear blue PASS
A field of pure sunflower yellow flowers under a clear blue sky

Evaluator (Gemma 4 26B)

The image features a vast field of bright, sunflower-yellow flowers. The petals are a vibrant, saturated yellow, and the centers are a dark brownish-yellow. Above the field, there is a clear, solid blue sky. The colors perfectly match the prompt's requirement for "pure sunflower yellow flowers" and a "clear blue sky." VERDICT: PASS

Medium 100%
A bedroom wall painted in #FF6B35 (burnt orange) with white PASS
A bedroom wall painted in #FF6B35 (burnt orange) with white furniture and natural daylight

Evaluator (Gemma 4 26B)

The image features a bedroom wall painted in a vibrant burnt orange color, which closely matches the hex code #FF6B35 and the description "burnt orange." The furniture, including the bed and the nightstand, is white as requested. The scene is illuminated by natural daylight coming from the window on the left. All color requirements are accurately met. VERDICT: PASS

A forest scene where the foliage is precisely #2D5F2D (dark PASS
A forest scene where the foliage is precisely #2D5F2D (dark muted green) with brown tree trunks

Evaluator (Gemma 4 26B)

The foliage in the image consists of various shades of green. The leaves in the upper canopy are a dark, muted green, which aligns well with the hex code #2D5F2D. The tree trunks are a consistent brown color as requested. The overall color palette of the forest scene adheres to the specific color requirements provided in the prompt. VERDICT: PASS

A room with walls in #4A90D9 (medium blue) with warm wooden PASS
A room with walls in #4A90D9 (medium blue) with warm wooden floor and white trim

Evaluator (Gemma 4 26B)

The image features walls in a medium blue color that closely aligns with the hex code #4A90D9. The floor is a warm wooden texture, and the trim is white, matching all the specific color requirements provided in the prompt. VERDICT: PASS

Hard 100%
A minimalist room with walls in #F5E6CC (warm cream), a sofaPASS
A minimalist room with walls in #F5E6CC (warm cream), a sofa in #2C3E50 (dark navy), and accent pillows in #E74C3C (bright red)

Evaluator (Gemma 4 26B)

The image accurately reflects the specified colors. The walls are a warm cream (#F5E6CC), the sofa is a dark navy (#2C3E50), and the accent pillows are a bright red (#E74C3C). All three color-coded elements match the prompt's requirements perfectly. VERDICT: PASS

A product flat-lay on a #1A1A2E (deep dark blue) background,PASS
A product flat-lay on a #1A1A2E (deep dark blue) background, with a #F0C27F (soft gold) watch and #FFFFFF (pure white) earbuds

Evaluator (Gemma 4 26B)

The image accurately reflects the requested colors. The background is a deep, dark blue consistent with #1A1A2E. The watch features a soft gold tone (#F0C27F) for both the casing and the face. The earbuds are a pure, bright white (#FFFFFF). All three color requirements specified in the prompt are correctly represented in the final image. VERDICT: PASS

A graphic design composition with background #FAFAFA (near-wPASS
A graphic design composition with background #FAFAFA (near-white), main shape #FF4500 (orange-red), and accent lines #333333 (dark charcoal)

Evaluator (Gemma 4 26B)

The image follows the color specifications accurately. The background is a near-white (#FAFAFA), the main square shape features a vibrant orange-red gradient consistent with #FF4500, and the thin accent lines are a dark charcoal color (#333333). The colors are correctly applied to the specified elements. VERDICT: PASS

Extreme 67%
A smooth gradient on a wall going from #FF0000 (red) on the PASS
A smooth gradient on a wall going from #FF0000 (red) on the left to #0000FF (blue) on the right, transitioning evenly through purple

Evaluator (Gemma 4 26B)

The image displays a smooth gradient transitioning from red on the left to blue on the right, passing through purple in the center. The leftmost color is a vibrant red, and the rightmost color is a deep blue, which aligns with the specified hex values #FF0000 and #0000FF. The transition is even and follows the requested color sequence perfectly. VERDICT: PASS

A sunset sky gradient on a flat surface from #FF6B35 (orangeFAIL
A sunset sky gradient on a flat surface from #FF6B35 (orange) at the bottom to #1B1464 (deep indigo) at the top

Evaluator (Gemma 4 26B)

The image features a gradient from a bright yellow/white sun at the bottom to a deep purple/blue at the top. The bottom color is a bright yellow-orange, which is significantly lighter and more yellow than the requested #FF6B35 (a saturated orange). The top color is a deep indigo/violet, which aligns well with #1B1464. However, because the bottom color deviates significantly from the specific orange hex code, it fails the exact color requirement. VERDICT: FAIL

A horizontal banner with a gradient from #00C9FF (cyan) on tPASS
A horizontal banner with a gradient from #00C9FF (cyan) on the left to #92FE9D (mint green) on the right

Evaluator (Gemma 4 26B)

The image features a horizontal gradient. On the left, the color is a bright cyan, which aligns with #00C9FF. On the right, the color transitions into a light mint green, which aligns with #92FE9D. The gradient flows smoothly from left to right as requested. VERDICT: PASS

PhotorealismClick to expand67%
Easy 67%
A single red sneaker on a clean white background with soft sPASS
A single red sneaker on a clean white background with soft studio lighting and gentle shadow — product photography style

Evaluator (Gemma 4 26B)

The image successfully captures the requested studio aesthetic. The lighting is soft and even, creating a natural-looking shadow beneath the shoe. The texture of the material and the laces appear realistic, and the background is a clean, seamless white. There are no obvious AI artifacts, such as warped shapes or nonsensical geometry, in the shoe's structure or the way it interacts with the surface. It passes as a high-quality product photograph. VERDICT: PASS

A clear glass perfume bottle on a reflective black surface wFAIL
A clear glass perfume bottle on a reflective black surface with studio lighting, no background distractions

Evaluator (Gemma 4 26B)

The image fails to meet the criteria for a high-quality studio product shot. There are significant AI artifacts regarding the physics of the bottle. The liquid level is inconsistent with the internal straw, and the straw itself is nonsensical, appearing to bend through the liquid in a way that defies physical reality. Additionally, the reflection on the surface is blurry and lacks the sharp detail expected in a professional studio photograph. VERDICT: FAIL

A ripe green apple on a seamless white background with soft PASS
A ripe green apple on a seamless white background with soft side lighting and subtle shadow — catalog style

Evaluator (Gemma 4 26B)

The image successfully meets the prompt requirements. It depicts a ripe green apple on a seamless white background with soft side lighting and a subtle shadow. The texture of the skin, the stem, and the lighting transitions are highly realistic, appearing as a professional studio product shot. There are no obvious AI artifacts or distortions visible; the surface details and lighting look natural and high-quality. VERDICT: PASS

Graphical design 46%

Data VisualisationClick to expand0%
Easy 0%
A bar chart showing monthly sales for January through June wFAIL
A bar chart showing monthly sales for January through June with values 12K, 18K, 15K, 22K, 19K, 25K — labeled axes, color-coded bars, and a title 'Monthly Sales 2025'

Evaluator (Qwen 3 VL)

The chart is a bar chart with a title, axes, and color-coded bars. However, the title has a typo (“Moithaly” instead of “Monthly”). The y-axis labels are incorrect (e.g., “12” appears twice, “2s” is invalid). The x-axis values do not match the requested sequence (12K, 18K, 15K, 22K, 19K, 25K). The chart type is correct, but data and labels are inaccurate. VERDICT: FAIL

A pie chart showing market share: Company A 35%, Company B 2FAIL
A pie chart showing market share: Company A 35%, Company B 25%, Company C 20%, Others 20% — each slice a different color with labels and percentages

Evaluator (Qwen 3 VL)

The chart is a pie chart with correct percentages (35%, 25%, 20%, 20%) and distinct colors. However, the label "A Othhers" is misspelled and ambiguous. The "Others" slice lacks a clear label, and "A" is incorrectly placed. The chart type matches, but critical labeling errors invalidate accuracy. VERDICT: FAIL

A simple line graph showing daily website visitors over one FAIL
A simple line graph showing daily website visitors over one week (Mon-Sun): 1200, 1500, 1100, 1800, 2000, 1700, 1400 — with grid lines, axis labels, and title 'Weekly Traffic'

Evaluator (Qwen 3 VL)

The chart is a line graph with grid lines and axis labels, but the data values are wildly incorrect (e.g., 112,000 vs. 1200). Axis labels are misaligned and inconsistent. The title 'Weekly Traffic' is missing. The chart type matches, but data and labels are fundamentally wrong and unreadable. VERDICT: FAIL

Layout & DesignClick to expand22%
Easy 67%
A movie night poster with a large popcorn image in the centePASS
A movie night poster with a large popcorn image in the center, the title 'MOVIE NIGHT' in bold at the top, and the tagline 'Every Friday at 8 PM' at the bottom

Evaluator (Qwen 3 VL)

The image features a large popcorn pile centered, with "MOVIE NIGHT" in bold white text at the top and "Every Friday at 8PM" at the bottom. Text is readable and correctly placed per the prompt. Visual hierarchy is clear: title > popcorn > tagline. All required layout elements are present and accurately rendered. VERDICT: PASS

An event poster featuring a guitar illustration, the headlinPASS
An event poster featuring a guitar illustration, the headline 'LIVE MUSIC FEST' in large text, and 'Tickets available now' as a tagline below

Evaluator (Qwen 3 VL)

The image shows a guitar illustration, "LIVE MUSIC" headline (though not "FEST"), and "Tickets available now" tagline. The text is readable and correctly placed with proper visual hierarchy: headline top, guitar center, tagline bottom. The prompt’s "FEST" is missing, but all other elements are present and correctly formatted. VERDICT: PASS

A book club poster with an open book illustration, the titleFAIL
A book club poster with an open book illustration, the title 'JOIN THE BOOK CLUB' prominently displayed, and 'Meets every Wednesday' as the tagline

Evaluator (Qwen 3 VL)

The poster includes an open book illustration and the tagline “Meets every Wnnerday,” but the title contains errors: “THEIR” is misspelled as “THEK” and “BOOK” is misspelled as “BLOOK.” The text hierarchy is visually correct, with the title prominent above the tagline and illustration. However, critical text errors invalidate the layout’s accuracy. VERDICT: FAIL

Medium 0%
A landing page design with a hero header image at the top, aFAIL
A landing page design with a hero header image at the top, a headline and subtitle in the middle, a 'Sign Up' call-to-action button, and a footer with three social media icons

Evaluator (Qwen 3 VL)

The image shows a hero header with headline and subtitle, a "Sign Up" button, and footer social icons. However, the text is gibberish and unreadable, violating the requirement for readable, correctly placed text. The layout structure is present but the textual content fails the core criteria. VERDICT: FAIL

A newsletter layout with a banner image header, a two-columnFAIL
A newsletter layout with a banner image header, a two-column body section with text on the left and an image on the right, a highlighted quote block, and a footer with unsubscribe link

Evaluator (Qwen 3 VL)

The image lacks a two-column body section with text on the left and image on the right. The quote block is present but not highlighted as intended. No footer unsubscribe link is visible. Text is unreadable due to gibberish. Layout elements are missing or misaligned with the prompt’s requirements. VERDICT: FAIL

A restaurant menu layout with a logo header, three category FAIL
A restaurant menu layout with a logo header, three category sections (appetizers, mains, desserts) each with items and prices, and a footer with address and hours

Evaluator (Qwen 3 VL)

The image shows a menu with a logo header, three columns labeled “Appetiiers,” “cttanslens,” and “Descriires,” and a footer with “Aodurre. Address Hours.” However, category labels are misspelled or gibberish, item names are nonsensical, and prices are inconsistently formatted. Text is unreadable due to errors. Layout hierarchy is present but content is invalid. VERDICT: FAIL

Hard 0%
A magazine double-page spread with a left page containing a FAIL
A magazine double-page spread with a left page containing a large hero photo, and a right page with a two-column article, a pull quote, a sidebar with author bio, and a page number

Evaluator (Qwen 3 VL)

The left page shows a large hero photo with a name above it, matching the prompt. The right page is mostly blank, lacking the two-column article, pull quote, sidebar, or page number. Text elements are present but illegible or misaligned. Layout fails to meet the specified structure and visual hierarchy. VERDICT: FAIL

A mobile app UI mockup showing a dashboard with a top navigaFAIL
A mobile app UI mockup showing a dashboard with a top navigation bar, a search field, a 2x2 grid of feature cards with icons, a list of recent activity items, and a bottom tab bar

Evaluator (Qwen 3 VL)

The image shows a mobile UI with a top navigation bar, search field, two feature cards, recent activity list, and bottom tab bar — matching the prompt’s layout. However, all text is garbled and unreadable, violating the requirement for readable, correctly placed text. Visual hierarchy is present but text illegibility undermines functionality. VERDICT: FAIL

A fashion magazine spread with a full-bleed left-page photo,FAIL
A fashion magazine spread with a full-bleed left-page photo, right page with three-column layout, headline, body copy, a smaller inset image, and styled page numbers

Evaluator (Qwen 3 VL)

The left page features a full-bleed photo as required. The right page has a three-column layout with a headline, body copy, and a smaller inset image. Page numbers are styled correctly at the bottom. However, all text is gibberish and unreadable, violating the requirement for readable text. Layout structure is correct, but content fails. VERDICT: FAIL

Style DiversityClick to expand75%
Easy 67%
A golden retriever sitting in a garden, painted in the stylePASS
A golden retriever sitting in a garden, painted in the style of an oil painting with visible brushstrokes and rich texture

Evaluator (Qwen 3 VL)

The image clearly depicts a golden retriever in a garden, rendered with visible, textured brushstrokes and rich color saturation characteristic of oil painting. The fur, foliage, and background show distinct impasto-like strokes and painterly texture. All key elements of the requested style are present and recognizable. VERDICT: PASS

A golden retriever sitting in a garden, rendered in JapaneseFAIL
A golden retriever sitting in a garden, rendered in Japanese anime style with flat colors and large expressive eyes

Evaluator (Qwen 3 VL)

The image shows a golden retriever in a garden, rendered in a Japanese anime style with large, expressive eyes and flat colors. However, the fur has subtle shading and texture, deviating from pure flat color. The style is recognizable but not fully adherent to the prompt’s strict requirements. VERDICT: FAIL

A golden retriever sitting in a garden, rendered in pixel arPASS
A golden retriever sitting in a garden, rendered in pixel art style with a 32x32 grid and limited color palette

Evaluator (Qwen 3 VL)

The image clearly depicts a golden retriever in a garden, rendered in a distinct pixel art style. It features a 32x32 grid appearance with blocky pixels and a limited color palette. Outlines are sharp, and the scene uses few colors for foliage, sky, and flowers. The style is unmistakably pixel art, matching the prompt’s core requirements. VERDICT: PASS

Medium 100%
A city street scene in the style of Art Deco — geometric shaPASS
A city street scene in the style of Art Deco — geometric shapes, gold and black palette, ornamental symmetry

Evaluator (Qwen 3 VL)

The image clearly embodies Art Deco: bold geometric shapes, symmetrical composition, and a dominant gold-and-black palette. Ornamental details like sunburst motifs and stylized patterns adorn buildings and the sky. Reflections enhance symmetry. All requested stylistic elements are unmistakably present. VERDICT: PASS

A woman with a parasol in the style of Impressionism — soft PASS
A woman with a parasol in the style of Impressionism — soft brushstrokes, dappled light, pastel palette

Evaluator (Qwen 3 VL)

The image displays soft, visible brushstrokes, a pastel color palette, and dappled light effects on the parasol and dress—core Impressionist traits. The composition and texture align well with the requested style. VERDICT: PASS

A portrait of a man in Pop Art style — bold outlines, Ben-DaPASS
A portrait of a man in Pop Art style — bold outlines, Ben-Day dots, flat vivid primary colors

Evaluator (Qwen 3 VL)

The image clearly exhibits Pop Art style: bold black outlines, Ben-Day dot patterns in background and shading, and flat, vivid primary colors (yellow, pink, blue). Facial features are stylized with high contrast and simplified forms. All requested visual characteristics are present and unmistakable. VERDICT: PASS

Hard 33%
A forest rendered in photorealistic watercolor — the detail FAIL
A forest rendered in photorealistic watercolor — the detail and lighting of a photo combined with visible watercolor bleeds and paper texture

Evaluator (Qwen 3 VL)

The image shows a forest with visible watercolor bleeds, paper texture, and soft, diffused lighting. However, it lacks photorealistic detail—textures appear overly smooth and stylized, and depth/complexity of foliage is simplified. The “photorealistic” component of the prompt is not convincingly achieved. VERDICT: FAIL

A woman in a futuristic city rendered in cyberpunk Art NouvePASS
A woman in a futuristic city rendered in cyberpunk Art Nouveau — neon colors and tech elements combined with flowing organic lines and floral borders

Evaluator (Qwen 3 VL)

The image clearly blends cyberpunk (neon lights, tech suit, futuristic city) with Art Nouveau (flowing floral border, organic curves). Neon colors and tech elements are prominent, and the floral frame integrates seamlessly. The style is unmistakably recognizable and meets all requested visual characteristics. VERDICT: PASS

A medieval castle depicted in Japanese ukiyo-e woodblock priFAIL
A medieval castle depicted in Japanese ukiyo-e woodblock print style combined with photorealistic sky and lighting

Evaluator (Qwen 3 VL)

The image shows a Japanese ukiyo-e style castle with bold outlines, flat planes of color, and a gradient sky. However, it depicts a Japanese castle, not a medieval European one. The photorealistic sky and lighting requested are absent — the sky is stylized, not realistic. The core subject is incorrect. VERDICT: FAIL

Extreme 100%
An interior scene rendered in soft brutalism — raw concrete PASS
An interior scene rendered in soft brutalism — raw concrete textures combined with rounded pastel furniture and warm diffused light

Evaluator (Qwen 3 VL)

The image clearly shows raw concrete walls, ceiling, and floor — core brutalist elements. The furniture is rounded and pastel pink/purple, matching the prompt. Warm, diffused light from the window illuminates the scene. All requested visual characteristics are present and well-executed. VERDICT: PASS

A still life painted in anxious minimalism — sparse compositPASS
A still life painted in anxious minimalism — sparse composition with uncomfortable negative space, muted colors, and subtle visual tension

Evaluator (Qwen 3 VL)

The image presents a sparse still life with a single vase, ample negative space, and muted earthy tones. Subtle visual tension arises from the vase’s shadow and the textured, slightly cracked surface. The composition is minimalist, and the subdued palette aligns with “anxious minimalism.” The style is clearly recognizable and its key characteristics are present. VERDICT: PASS

A cityscape rendered in nostalgic futurism — retro-futuristiPASS
A cityscape rendered in nostalgic futurism — retro-futuristic 1960s space-age aesthetics with a melancholic warm-toned patina

Evaluator (Qwen 3 VL)

The image clearly depicts retro-futuristic 1960s space-age architecture with sleek, bulbous domes, spires, and curved forms. Warm, sepia-toned lighting and a vintage film grain enhance the nostalgic, melancholic patina. Key visual characteristics are unmistakably present, matching the prompt’s aesthetic requirements. VERDICT: PASS