ImageBench

ImageBench V1 —

192 evaluations across 6 categories

146
Pass
46
Fail
76%
Pass Rate
4.1s
Avg Latency
3.4s
Min Latency
94.2s
Max Latency
Text Rendering27%Spatial Reasoning72%Human realism95%Truthfulness85%Professional Studio96%Graphical design50%Latency59%
The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink
Text RenderingTypography StyleEasygx10/bonsai-image-4b

Prompt: The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink

Text Rendering 27%

Typography Style33%
Easy 33%
The word 'CHAPTER ONE' typed on aged paper with a vintage tyFAIL
The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink

Evaluator (Qwen 3 VL)

The text mimics a vintage typewriter font with uneven ink and aged paper texture, matching the medium. However, “CHAYTER” is misspelled (should be “CHAPTER”). Spelling correctness is a core requirement. VERDICT: FAIL

The phrase 'TODAY'S SPECIAL' written in white chalk on a darFAIL
The phrase 'TODAY'S SPECIAL' written in white chalk on a dark green chalkboard with chalk dust

Evaluator (Qwen 3 VL)

The text mimics chalk on a greenboard with visible dust and uneven strokes, matching the medium. However, “SPPES.CAL” is misspelled (should be “SPECIAL”). The core requirement is spelling correctness, which is not met. VERDICT: FAIL

The word 'OPEN' displayed on a red LED sign with glowing pixPASS
The word 'OPEN' displayed on a red LED sign with glowing pixel-style letters

Evaluator (Qwen 3 VL)

The text “OPEN” is spelled correctly and rendered in a convincing pixel-style LED format, with individual glowing dots forming each letter against a dark background. The red illumination and blocky, segmented lettering match the described medium precisely. VERDICT: PASS

Writing accuracy25%
Easy 100%
A single large word 'HELLO' centered on a white backgroundPASS
A single large word 'HELLO' centered on a white background

Evaluator (Qwen 3 VL)

The word "HELLO" is spelled correctly and fully readable. All letters are present and clearly rendered in a bold, sans-serif font against a white background. No misspellings, missing letters, or illegible parts are visible. VERDICT: PASS

The word 'ADVENTURE' in big bold letters centered on a plainPASS
The word 'ADVENTURE' in big bold letters centered on a plain cream background

Evaluator (Qwen 3 VL)

The word "ADVENTURE" is spelled correctly and is fully readable in bold, centered text on a cream background. No misspellings, missing words, or illegible parts are present. All elements match the intended specification. VERDICT: PASS

The word 'SERENITY' written large and centered on a solid liPASS
The word 'SERENITY' written large and centered on a solid light blue background

Evaluator (Qwen 3 VL)

The word "SERENITY" is spelled correctly and is fully readable, centered on a solid light blue background. No misspellings, missing words, or illegible parts are present. VERDICT: PASS

Medium 0%
The sentence 'The quick brown fox jumps high' in clean blackFAIL
The sentence 'The quick brown fox jumps high' in clean black text on a white background

Evaluator (Qwen 3 VL)

"quick" is misspelled as "grichck", "brown" as "breom", "jumps" as "Jrmst." (with garbled characters), and "high" as "migh-". All words are illegible or incorrect. The intended phrase is not accurately rendered. VERDICT: FAIL

'Every sunset brings a new dawn ahead' written neatly on a pFAIL
'Every sunset brings a new dawn ahead' written neatly on a pale yellow background

Evaluator (Qwen 3 VL)

Text reads: "Every sunset / heur day / aheerb." — misspellings: "heur" (should be "new"), "aheerb" (should be "ahead"). "ahead" is partially obscured and misrendered. All words are illegible or incorrect. VERDICT: FAIL

'Stars shine brightest in the dark night' displayed on a cleFAIL
'Stars shine brightest in the dark night' displayed on a clean off-white background

Evaluator (Qwen 3 VL)

The text is heavily distorted and illegible. "Stars shine brightest in the dark night" is corrupted: "Stars" → "Starstrim", "shine brightest" → "loiknkgristnt", "in" → missing, "the" → correct, "dark" → "drack", "night" → "nnighttt". All words are misspelled or unrecognizable. VERDICT: FAIL

Hard 0%
A poster with headline 'SUMMER SALE' in large bold text, bodFAIL
A poster with headline 'SUMMER SALE' in large bold text, body text 'Up to 50% off all items this weekend only' in medium size, and fine print 'Terms and conditions apply. See store for details.' at the bottom in small text

Evaluator (Qwen 3 VL)

Misspellings and errors: “SUMMER” → “SUIM EER”; “SALE” → “SOLKA ALE”; “5%” → “5%” (correct); “items” → “itemts”; “this weekend” → “tthiss wreenaod”; “Terms” → “Terns”; “conditions” → “connttions”. All text is illegible or incorrect. VERDICT: FAIL

A flyer with large title 'GRAND OPENING', subtitle 'Join us FAIL
A flyer with large title 'GRAND OPENING', subtitle 'Join us for food, music, and fun' in medium text, and small disclaimer 'RSVP required. Limited seating available.' at the bottom

Evaluator (Qwen 3 VL)

“GRAND” is misspelled as “GRANND”; subtitle reads “Join us α foot c, andcr” (missing “for”, “music”, and “fun”); disclaimer has “RSSTP” (should be “RSVP”), “eolpie al” (should be “RSVP required”), “Limiting avaitale” (should be “Limited seating available”), and “reqeuied” (should be “required”). All text is illegible or incorrect. VERDICT: FAIL

An event banner with headline 'ANNUAL GALA 2025' at the top FAIL
An event banner with headline 'ANNUAL GALA 2025' at the top in large serif font, description 'An evening of fine dining and live jazz' in medium text, and footnote 'Black tie optional. Doors open at 6 PM.' in small text

Evaluator (Qwen 3 VL)

Multiple words are misspelled or illegible: “ANNUAL GALA” has “G” instead of “L”; “G0025” uses “0” for “O”; “evening” → “eevering”; “dining” → “eofriing”; “live jazz” → “livVH las7”; “optional” → “opiinnal”; “Doors” → “Dørrs”; “6 PM” → “6Pm.”. Text is not fully readable or accurate. VERDICT: FAIL

Extreme 0%
A vintage movie poster for 'THE LAST VOYAGE' featuring a tagFAIL
A vintage movie poster for 'THE LAST VOYAGE' featuring a tagline 'One ship. One chance. No turning back.' with credits block reading 'Directed by James Harlow. Starring Elena Voss, Marcus Chen, and Sofia Reyes. Music by Thomas Grant. A Paramount Pict

Evaluator (Qwen 3 VL)

Misspellings: “shipu” → “ship”; “chanone” → “chance”; “tnunning” → “turning”. Missing word: “The” before “Last Voyage” is partially obscured but present. Credits block has “SIRCEO” (should be “STARRING”), “MHAHROS” (should be “MARCUS”), and “A COMING” (should be “A COMING”). VERDICT: FAIL

A magazine cover titled 'NATIONAL EXPLORER' with cover linesFAIL
A magazine cover titled 'NATIONAL EXPLORER' with cover lines 'The Hidden Temples of Cambodia — page 34', 'Interview: Dr. Maria Santos on Climate Action', 'Top 10 Trails You Must Hike This Year', and a barcode area with 'Issue 247 | March 2025 | $6.99

Evaluator (Qwen 3 VL)

“NATIONAL” is misspelled as “NATIONANL”. “The Hidden Temples of Cambodia” is garbled: “The Htideen To. rmpdeas. D. Maircs on Clnaation Camoirpinc”. “Interview: Dr. Maria Santos on Climate Action” is unreadable. “Top 10 Trails You Must Hike This Year” is partially legible but distorted. Barcode area text is illegible. Multiple critical errors. VERDICT: FAIL

A product label for 'GOLDEN HARVEST ORGANIC HONEY' with subtFAIL
A product label for 'GOLDEN HARVEST ORGANIC HONEY' with subtitle 'Raw & Unfiltered', weight '12 oz (340g)', description 'Pure wildflower honey harvested from sustainable apiaries in the Pacific Northwest. No additives or preservatives.', and small te

Evaluator (Qwen 3 VL)

Misspellings: "GOLDIEN HARIEST" (should be GOLDEN HARVEST), "OIRGANIC HENBY" (should be ORGANIC HONEY), "Unfffted" (should be Unfiltered), "suskennm" (should be sustainable), "aploetinns" (should be apiaries), "Míronerete" (should be Pacific Northwest), "apaluimerh" (should be additives), "presiadormes" (should be preservatives), "Best bfore" (should be Best before), "Store in cool, dry place" (partially illegible), "Prodictc of USA" (should be Product of USA). VERDICT: FAIL

Spatial Reasoning 72%

Attributes Binding78%
Easy 100%
A red ball and a blue cube sitting side by side on a white sPASS
A red ball and a blue cube sitting side by side on a white surface

Evaluator (Qwen 3.5 122B)

Red ball: correct color, spherical shape. Blue cube: correct color, cubic shape. Both sit side by side on white surface. No attribute swaps or mismatches. All specified attributes are correctly bound. VERDICT: PASS

A green apple and an orange pumpkin on a wooden cutting boarPASS
A green apple and an orange pumpkin on a wooden cutting board

Evaluator (Qwen 3.5 122B)

Green apple: correct color (green), round shape, medium size. Orange pumpkin: correct color (orange), pumpkin shape, medium size. Both objects are correctly attributed with no swaps. Attributes match prompt exactly. VERDICT: PASS

A yellow taxi and a black limousine parked next to each othePASS
A yellow taxi and a black limousine parked next to each other on a street

Evaluator (Qwen 3.5 122B)

Yellow taxi: correct color, sedan shape, standard size. Black limousine: correct color, sedan shape, standard size. Both parked next to each other on street. No attribute swaps observed. All attributes correctly bound. VERDICT: PASS

Medium 67%
A big brown horse, a small white rabbit, and a medium-sized PASS
A big brown horse, a small white rabbit, and a medium-sized black cat standing in a meadow

Evaluator (Qwen 3.5 122B)

Big brown horse: correct. Small white rabbit: correct. Medium-sized black cat: correct. All attributes (color, size) are correctly bound to each object with no swaps. VERDICT: PASS

A tall thin white candle, a short fat red candle, and a mediPASS
A tall thin white candle, a short fat red candle, and a medium spiral yellow candle on a shelf

Evaluator (Qwen 3.5 122B)

Tall thin white candle: white, tall, thin (cylindrical). Short fat red candle: red, short, fat (cylindrical). Medium spiral yellow candle: yellow, medium, spiral (coiled). All attributes correctly bound to respective objects. VERDICT: PASS

A large red circle, a small blue triangle, and a medium greeFAIL
A large red circle, a small blue triangle, and a medium green star arranged on a white canvas

Evaluator (Qwen 3.5 122B)

The red circle is correctly placed and sized. The green star is medium-sized and correctly shaped. However, the blue triangle is not a standalone object — it’s part of the star’s base, violating the prompt’s requirement for a separate small blue triangle. Attributes are swapped: the blue triangle is not separate, and the star’s green color is correct, but the blue triangle’s shape is not standalone. VERDICT: FAIL

Hard 67%
A pink fire truck, a blue banana, and a green sun in a surrePASS
A pink fire truck, a blue banana, and a green sun in a surreal landscape

Evaluator (Qwen 3.5 122B)

Pink fire truck: correct color, standard size, realistic shape. Blue banana: correct color, large size, curved shape. Green sun: correct color, large size, spiky circular shape. All attributes correctly bound to respective objects with no swaps. VERDICT: PASS

An orange elephant, a yellow penguin, and a silver strawberrPASS
An orange elephant, a yellow penguin, and a silver strawberry arranged on a table

Evaluator (Qwen 3.5 122B)

Orange elephant: correct color, proper shape. Yellow penguin: correct color, proper shape. Silver strawberry: correct color, proper shape. All objects match their specified attributes without swaps. VERDICT: PASS

A purple carrot, a red tree with red leaves, and a white watFAIL
A purple carrot, a red tree with red leaves, and a white watermelon sliced open

Evaluator (Qwen 3.5 122B)

Purple carrot: correct color (purple), shape (conical root), green top. Red tree: correct color (red leaves), shape (tree). White watermelon: incorrect — it’s green-striped, not white; sliced open as requested. Color mismatch for watermelon. VERDICT: FAIL

Compositionality89%
Easy 100%
A tabby cat wearing oversized red sunglassesPASS
A tabby cat wearing oversized red sunglasses

Evaluator (Qwen 3.5 122B)

Tabby cat: Present. Oversized red sunglasses: Present. Coherent combination: Yes, the sunglasses sit squarely on the cat’s face, appearing proportionally oversized. All elements are clearly visible and logically integrated. VERDICT: PASS

A penguin holding a colorful umbrellaPASS
A penguin holding a colorful umbrella

Evaluator (Qwen 3.5 122B)

Penguin: Present. Colorful umbrella: Present. Coherent combination: Yes, penguin holds umbrella naturally. All elements are present and logically combined. VERDICT: PASS

A golden retriever wearing a tiny cowboy hatPASS
A golden retriever wearing a tiny cowboy hat

Evaluator (Qwen 3.5 122B)

Golden retriever: Present. Tiny cowboy hat: Present (though not “tiny” relative to dog, it’s a proper-sized hat). Coherent combination: Yes, the dog wears the hat naturally. All elements are present and coherently combined. VERDICT: PASS

Medium 100%
A medieval knight in full armor sitting at a desk typing on PASS
A medieval knight in full armor sitting at a desk typing on a modern laptop

Evaluator (Qwen 3.5 122B)

Medieval knight in full armor: Present. Sitting at a desk: Present. Typing on a modern laptop: Present. All elements are coherently combined in a plausible scene. VERDICT: PASS

A Roman gladiator taking a selfie with a smartphone in the CPASS
A Roman gladiator taking a selfie with a smartphone in the Colosseum

Evaluator (Qwen 3.5 122B)

Roman gladiator: Present. Selfie with smartphone: Present. Colosseum: Present. All elements are coherently combined in the image. VERDICT: PASS

An astronaut in a spacesuit riding a bicycle through a parkPASS
An astronaut in a spacesuit riding a bicycle through a park

Evaluator (Qwen 3.5 122B)

Astronaut in spacesuit: Present. Bicycle: Present. Park setting: Present (trees, grass, paths, benches). All elements are coherently combined in a realistic, surreal scene. VERDICT: PASS

Hard 67%
An octopus wearing a top hat and monocle playing chess on thPASS
An octopus wearing a top hat and monocle playing chess on the surface of the moon with Earth visible in the background

Evaluator (Qwen 3.5 122B)

Octopus: Present. Top hat: Present. Monocle: Present. Chessboard: Present. Moon surface: Present. Earth in background: Present. All elements are coherently combined in a surreal, imaginative scene. VERDICT: PASS

A dinosaur in a business suit giving a PowerPoint presentatiPASS
A dinosaur in a business suit giving a PowerPoint presentation in a modern office while drinking coffee

Evaluator (Qwen 3.5 122B)

Dinosaur: Present (Tyrannosaurus-like). Business suit: Present (navy suit, white shirt, blue tie). PowerPoint presentation: Present (screen in background shows presentation slide). Drinking coffee: Present (holding a coffee cup). All elements coherently combined in modern office setting. VERDICT: PASS

A robot dressed as a pirate sailing a paper boat on a lake oFAIL
A robot dressed as a pirate sailing a paper boat on a lake of lava while juggling three kittens

Evaluator (Qwen 3.5 122B)

Robot: Present (skeleton with mechanical arm/hands). Pirate: Present (hat, vest, red scarf, pirate aesthetic). Paper boat: Not present (wooden boat). Lake of lava: Present. Juggling three kittens: Present (two held, one airborne). VERDICT: FAIL

Counting44%
Easy 33%
Exactly 3 red apples arranged on a wooden tablePASS
Exactly 3 red apples arranged on a wooden table

Evaluator (Qwen 3.5 122B)

I see exactly 3 red apples arranged on a wooden table. The count matches the request perfectly. All objects are clearly identifiable and in correct quantity. VERDICT: PASS

Exactly 5 yellow rubber ducks sitting in a row on a bathroomFAIL
Exactly 5 yellow rubber ducks sitting in a row on a bathroom shelf

Evaluator (Qwen 3.5 122B)

The image shows exactly 3 yellow rubber ducks sitting in a row on a bathroom shelf. The prompt requested 5 ducks. Since 3 ≠ 5, the count does not match. VERDICT: FAIL

Exactly 4 white candles standing on a dark surfaceFAIL
Exactly 4 white candles standing on a dark surface

Evaluator (Qwen 3.5 122B)

Only 2 white candles are visible, not 4 as requested. Both are lit and stand on a dark surface. The count does not match the prompt. VERDICT: FAIL

Medium 33%
2 orange cats and 3 black dogs sitting together on a grassy PASS
2 orange cats and 3 black dogs sitting together on a grassy lawn

Evaluator (Qwen 3.5 122B)

I see 2 orange cats and 3 black dogs sitting together on a grassy lawn. The count exactly matches the requested prompt. VERDICT: PASS

4 red roses and 2 white daisies in a glass vase on a tableFAIL
4 red roses and 2 white daisies in a glass vase on a table

Evaluator (Qwen 3.5 122B)

I count 6 red roses and 2 white daisies in the vase. The prompt requested 4 red roses and 2 white daisies. Since there are more roses than requested, the count does not exactly match. VERDICT: FAIL

1 blue butterfly and 3 yellow butterflies resting on a gardeFAIL
1 blue butterfly and 3 yellow butterflies resting on a garden fence

Evaluator (Qwen 3.5 122B)

I see 2 blue butterflies and 1 yellow butterfly resting on the fence. The prompt requested 1 blue and 3 yellow butterflies. The count does not match — fewer blue butterflies and fewer yellow butterflies than requested. VERDICT: FAIL

Hard 67%
Exactly 9 goldfish swimming in a large round fishbowlPASS
Exactly 9 goldfish swimming in a large round fishbowl

Evaluator (Qwen 3.5 122B)

I count 9 distinct goldfish swimming in a large round fishbowl. All are clearly visible and match the description. The count exactly matches the requested number. VERDICT: PASS

A fruit bowl containing exactly 3 bananas, 4 oranges, and 2 FAIL
A fruit bowl containing exactly 3 bananas, 4 oranges, and 2 apples

Evaluator (Qwen 3.5 122B)

I see 2 bananas, 4 oranges, and 1 apple. The requested count was 3 bananas, 4 oranges, and 2 apples. The banana and apple counts are incorrect. VERDICT: FAIL

Exactly 11 colorful balloons floating against a clear blue sPASS
Exactly 11 colorful balloons floating against a clear blue sky

Evaluator (Qwen 3.5 122B)

I count 11 balloons: 4 blue, 2 purple, 2 pink, 2 orange, 1 green, 1 yellow, 1 white, and 1 light green. All are colorful and floating against a clear blue sky with minimal clouds. The count exactly matches the request. VERDICT: PASS

Negation44%
Easy 100%
A tropical beach scene with no peoplePASS
A tropical beach scene with no people

Evaluator (Qwen 3.5 122B)

The image shows a tropical beach with palm trees, turquoise water, and white sand. No people are visible anywhere in the scene. The prompt explicitly excluded people, and this exclusion is fully respected. VERDICT: PASS

A cat lying on a couch without a collarPASS
A cat lying on a couch without a collar

Evaluator (Qwen 3.5 122B)

The cat is lying on a gray couch and shows no visible collar around its neck. The prompt explicitly excluded a collar, and none is present in the image. All details align with the exclusion requirement. VERDICT: PASS

A classroom with desks and a chalkboard but no studentsPASS
A classroom with desks and a chalkboard but no students

Evaluator (Qwen 3.5 122B)

The image shows an empty classroom with desks and a chalkboard. No students are present, fulfilling the exclusion requirement. The scene is devoid of any people, confirming the prompt’s constraint. VERDICT: PASS

Medium 33%
A sandy beach with palm trees, seashells, and a beach umbrelFAIL
A sandy beach with palm trees, seashells, and a beach umbrella but no water or ocean visible

Evaluator (Qwen 3.5 122B)

The image shows a sandy beach with palm trees and seashells, and a beach umbrella is present. However, the ocean is clearly visible in the background behind the umbrella and palm trees, violating the prompt’s exclusion. VERDICT: FAIL

An office workspace with a desk, chair, bookshelf, and lamp FAIL
An office workspace with a desk, chair, bookshelf, and lamp but no computers or screens

Evaluator (Qwen 3.5 122B)

The image shows a desk, chair, bookshelf, and lamp — all included elements. However, on the right desk, a black computer monitor is clearly visible, violating the exclusion rule. No screens or computers were allowed, but one is present. VERDICT: FAIL

A kitchen with pots, pans, a stove, and a cutting board but PASS
A kitchen with pots, pans, a stove, and a cutting board but no food

Evaluator (Qwen 3.5 122B)

The image shows a kitchen with multiple pots, pans, stoves, and a cutting board — all as expected. However, no food is visible on or in any of the cookware or surfaces. The prompt’s exclusion of “food” is strictly upheld. All other elements are present, but the critical exclusion is confirmed absent. VERDICT: PASS

Hard 0%
A car body with windows, doors, and headlights but no wheelsFAIL
A car body with windows, doors, and headlights but no wheels

Evaluator (Qwen 3.5 122B)

The image shows a complete car with clearly visible wheels on both front and rear axles. The prompt explicitly excluded wheels, but they are present. VERDICT: FAIL

A pepperoni pizza with tomato sauce, basil, and crust but abFAIL
A pepperoni pizza with tomato sauce, basil, and crust but absolutely no cheese

Evaluator (Qwen 3.5 122B)

The image shows a pepperoni pizza with visible melted cheese beneath the tomato sauce and pepperoni, along with fresh basil leaves. Cheese is explicitly excluded in the prompt. Its presence violates the constraint. VERDICT: FAIL

A human face with eyes, nose, and mouth but no eyebrowsFAIL
A human face with eyes, nose, and mouth but no eyebrows

Evaluator (Qwen 3.5 122B)

The image clearly shows well-defined, dark eyebrows above both eyes. The prompt explicitly excluded eyebrows, but they are visibly present. This violates the exclusion requirement. VERDICT: FAIL

Relative Position83%
Easy 100%
A red ball underneath a wooden chairPASS
A red ball underneath a wooden chair

Evaluator (Qwen 3.5 122B)

The red ball is positioned directly underneath the wooden chair, fitting the prompt’s “underneath” spatial relationship. No other positional terms (left/right/above/below) are specified in the prompt, so no further verification is needed. The core requirement is satisfied. VERDICT: PASS

A coffee mug next to an open book on a deskPASS
A coffee mug next to an open book on a desk

Evaluator (Qwen 3.5 122B)

The coffee mug and open book are both on the desk, side by side. The mug is to the left of the book. All specified spatial relationships (“next to,” “on a desk”) are accurately depicted. No objects are incorrectly positioned relative to each other. VERDICT: PASS

A cat sitting on top of a cardboard boxPASS
A cat sitting on top of a cardboard box

Evaluator (Qwen 3.5 122B)

The cat is clearly sitting on top of a cardboard box, fulfilling the core spatial relationship. The box is positioned on a surface, and another box is adjacent to it. All described positional relationships (on, under, left, right, above, below) are correctly implied: the cat is on the box, the box is under the cat, and the second box is beside it. No contradictory spatial relationships are present. VERDICT: PASS

Medium 67%
A cat sitting on a pillow, the pillow placed on a wooden chaPASS
A cat sitting on a pillow, the pillow placed on a wooden chair

Evaluator (Qwen 3.5 122B)

The cat sits on the pillow, which rests on the wooden chair — correctly depicting “on” relationships. No other positional terms (left/right/above/below) are specified in the prompt, so no evaluation needed for those. All described spatial relationships are accurate. VERDICT: PASS

A bird perched on a branch, the branch extending from a treePASS
A bird perched on a branch, the branch extending from a tree next to a park bench

Evaluator (Qwen 3.5 122B)

The bird is perched on a branch extending from a tree. A park bench is visible in the foreground, partially under the tree’s canopy and to the left of the bird’s position. The spatial relationships “bird on branch,” “branch from tree,” and “tree next to bench” are accurately depicted. All positional descriptors are correct. VERDICT: PASS

A candle on top of a stack of books, the books next to a potFAIL
A candle on top of a stack of books, the books next to a potted plant

Evaluator (Qwen 3.5 122B)

The candle sits on top of the stack of books (correct: “on”). The books are next to a potted plant — but the prompt says “the books next to a potted plant,” implying one plant. There are two plants: one large white pot to the right of the books, and a smaller terracotta pot to the left. The prompt’s spatial relationship is partially inaccurate due to multiple plants. VERDICT: FAIL

Hard 67%
Five chess pieces on a board: a king in the center, a queen PASS
Five chess pieces on a board: a king in the center, a queen to its left, a bishop to its right, a rook above the king, and a knight below the king

Evaluator (Qwen 3.5 122B)

The image shows five chess pieces arranged around a central king. The queen is to the left, the bishop to the right, the rook above, and the knight below — matching the prompt’s spatial description exactly. All positional relationships are correctly depicted. VERDICT: PASS

A desk scene: laptop in the center, coffee mug to the left, PASS
A desk scene: laptop in the center, coffee mug to the left, phone to the right, lamp above, and notebook below

Evaluator (Qwen 3.5 122B)

Laptop is centered. Coffee mug is to the left. Phone is to the right. Lamp is above (mounted on wall). Notebook is below (in front of laptop). All specified spatial relationships are correctly depicted. VERDICT: PASS

Five animals in a field: a horse on the far left, a cow to tFAIL
Five animals in a field: a horse on the far left, a cow to the right of the horse, a sheep in the center, a goat to the right of the sheep, and a dog on the far right

Evaluator (Qwen 3.5 122B)

The image shows only cows and sheep — no horse, goat, or dog. The prompt’s specified spatial relationships cannot be verified as the required animals are missing. The arrangement of visible animals (cows and sheep) does not match the described positions. VERDICT: FAIL

Extreme 100%
A landscape with a red barn in the foreground at large scalePASS
A landscape with a red barn in the foreground at large scale, a windmill in the midground at medium scale, and snow-capped mountains in the background at small scale, all with consistent depth perspective

Evaluator (Qwen 3.5 122B)

The red barn is in the foreground, windmills are midground, and snow-capped mountains are background — all correctly layered with consistent depth perspective. Spatial relationships (left/right/foreground/background) are accurately depicted. No object is misplaced or inverted. VERDICT: PASS

A beach scene with a sandcastle in the foreground, a surfer PASS
A beach scene with a sandcastle in the foreground, a surfer on the waves in the midground, and a cruise ship on the horizon in the background, sizes decreasing with distance

Evaluator (Qwen 3.5 122B)

The sandcastle is in the foreground on the beach, the surfer is in the midground on the waves, and the cruise ship is in the background on the horizon — all correctly ordered by distance. Spatial relationships (foreground → midground → background) are accurately depicted. No positional errors (left/right/above/below) violate the prompt. VERDICT: PASS

A city park with a park bench and pigeons in the foreground,PASS
A city park with a park bench and pigeons in the foreground, a fountain in the midground, and skyscrapers in the background, each layer progressively smaller

Evaluator (Qwen 3.5 122B)

The bench and pigeons are in the foreground, the fountain is in the midground, and skyscrapers are in the background — all correctly layered and progressively smaller. Spatial relationships (foreground/midground/background) are accurately depicted. No incorrect positional terms (on/under/left/right/above/below) are violated. VERDICT: PASS

Scale & Proportions89%
Easy 67%
A person standing next to a yellow school bus, showing realiPASS
A person standing next to a yellow school bus, showing realistic relative scale

Evaluator (Qwen 3.5 122B)

The person is standing next to the bus with realistic scale: the bus is visibly much larger, with the person appearing appropriately small in comparison. Proportions are consistent — the bus’s windows, wheels, and body are correctly sized relative to the human figure. Shadows and perspective align naturally. VERDICT: PASS

A ladybug sitting on a large green leaf, shown at realistic PASS
A ladybug sitting on a large green leaf, shown at realistic proportions

Evaluator (Qwen 3.5 122B)

The ladybug is realistically proportioned relative to the large green leaf it rests on. The leaf’s size and texture are consistent with natural scale, and the ladybug appears appropriately small in comparison — a common real-world size relationship. No distortion or implausible scaling is visible. VERDICT: PASS

A child holding a basketball, both at correct relative sizesFAIL
A child holding a basketball, both at correct relative sizes

Evaluator (Qwen 3.5 122B)

The basketball is disproportionately large compared to the child’s body. A standard basketball is about 24 cm in diameter, while a young child’s torso is much smaller — the ball appears to be nearly as wide as the child’s chest, which is unrealistic. This violates the “correct relative sizes” requirement. VERDICT: FAIL

Medium 100%
A tiny person standing on the rim of a coffee cup, looking dPASS
A tiny person standing on the rim of a coffee cup, looking down into it

Evaluator (Qwen 3.5 122B)

The tiny figurine is proportionally correct — it fits naturally on the rim of the coffee cup, appearing miniature in scale relative to the cup’s size. The perspective and lighting are consistent, and the figurine’s posture suggests it’s looking down into the cup. All elements align with the prompt’s size relationships. VERDICT: PASS

A giant tabby cat walking between city skyscrapers like a kaPASS
A giant tabby cat walking between city skyscrapers like a kaiju

Evaluator (Qwen 3.5 122B)

The cat is rendered as a colossal kaiju, dwarfing the surrounding skyscrapers and street-level elements. Its scale is consistent with the prompt: it dominates the frame, its paws nearly touching the ground while towering over pedestrians and buildings. The perspective and proportions create a convincing sense of immense size relative to the urban environment. VERDICT: PASS

A miniature house sitting on the palm of someone's handPASS
A miniature house sitting on the palm of someone's hand

Evaluator (Qwen 3.5 122B)

The miniature house is proportionally scaled to fit comfortably within the palm, with realistic details like windows, roof tiles, and tiny trees. The hand’s size relative to the house is consistent with a real human palm holding a small-scale model. All elements maintain correct perspective and scale. VERDICT: PASS

Hard 100%
A group photo of an elephant, a horse, a dog, a cat, and a mPASS
A group photo of an elephant, a horse, a dog, a cat, and a mouse, all standing in a line with correct real-world proportional sizes

Evaluator (Qwen 3.5 122B)

The elephant is correctly the largest, followed by the horse and dog, then the cat, and finally the mouse — all matching real-world proportions. The animals are arranged in a line with consistent scaling. No distortions or implausible size relationships are present. VERDICT: PASS

A dinner table scene with a plate, fork, knife, wine glass, PASS
A dinner table scene with a plate, fork, knife, wine glass, salt shaker, and peppercorn, all at realistic relative sizes

Evaluator (Qwen 3.5 122B)

The objects maintain realistic relative proportions: the plate is appropriately sized for the food, the wine glasses are standard height and width, the cutlery matches typical dining utensil scale, and the salt/pepper shakers are correctly proportioned to the table setting. All items appear consistent with real-world dining table arrangements. VERDICT: PASS

A farm scene with a barn, tractor, farmer, cow, chicken, andPASS
A farm scene with a barn, tractor, farmer, cow, chicken, and a fence post, all at correct proportional scale

Evaluator (Qwen 3.5 122B)

The barn is appropriately large, the tractor is realistically scaled to it, the farmer is proportionally human-sized, and the chickens are correctly small relative to the other elements. The fence post is also appropriately sized for the scene. All objects maintain consistent, realistic proportions relative to each other and their environment. VERDICT: PASS

Human realism 95%

Faces & Expressions83%
Easy 100%
A close-up portrait of a young woman with a warm natural smiPASS
A close-up portrait of a young woman with a warm natural smile

Evaluator (Qwen 3 VL)

The face is symmetric with natural, correctly proportioned features. The warm, genuine smile is clearly readable, with crinkled eyes and relaxed mouth conveying happiness. No distortions are present; lighting and composition enhance the natural look. The portrait aligns well with the prompt. VERDICT: PASS

A portrait of a middle-aged man with a calm, serious expressPASS
A portrait of a middle-aged man with a calm, serious expression

Evaluator (Qwen 3 VL)

The face is symmetric with natural, correctly proportioned features. Wrinkles and gray hair suit a middle-aged man. The expression is calm and serious, with neutral lips and steady gaze. No distortions or unnatural artifacts are visible. The emotion is clearly readable as composed and thoughtful. VERDICT: PASS

A headshot of a teenager with a relaxed, neutral facePASS
A headshot of a teenager with a relaxed, neutral face

Evaluator (Qwen 3 VL)

The face is symmetric with natural, correctly proportioned features. The teenager’s expression is relaxed and neutral, with no discernible distortion. Eyes, eyebrows, lips, and ears are well-aligned. Lighting and focus are even, enhancing clarity. The emotion is clearly readable as calm and composed. VERDICT: PASS

Medium 100%
A close-up of a person's face showing clear disgust, nose wrPASS
A close-up of a person's face showing clear disgust, nose wrinkled and lip curled

Evaluator (Qwen 3 VL)

Face is symmetric with correct anatomical features. Disgust is clearly conveyed: nose wrinkled, lips curled, brows furrowed. No significant distortions. Expression is natural and unambiguous. VERDICT: PASS

A portrait capturing genuine surprise — raised eyebrows, widPASS
A portrait capturing genuine surprise — raised eyebrows, wide eyes, slightly open mouth

Evaluator (Qwen 3 VL)

The face is symmetric with correctly rendered features. The emotion of genuine surprise is clearly readable: wide, bulging eyes, raised eyebrows, and a slightly open mouth. No distortions are present. Facial structure and proportions are natural. VERDICT: PASS

A face showing quiet melancholy — downcast eyes, slightly drPASS
A face showing quiet melancholy — downcast eyes, slightly drawn mouth, distant gaze

Evaluator (Qwen 3 VL)

The face is symmetric with natural, correctly rendered features. The downcast gaze, slightly downturned mouth, and furrowed brow clearly convey quiet melancholy. No significant distortions are present. The emotion is highly readable and aligns with the prompt. VERDICT: PASS

Hard 100%
An elderly woman in her 80s with deep wrinkles and white haiPASS
An elderly woman in her 80s with deep wrinkles and white hair, smiling warmly with kind eyes

Evaluator (Qwen 3 VL)

Face is symmetric with natural, aged features and deep wrinkles. White hair is present. Warm, genuine smile with kind eyes clearly conveys the intended emotion. No noticeable distortions. Facial structure and expression are realistic and well-executed. VERDICT: PASS

A 5-year-old child with round cheeks and large eyes, crying PASS
A 5-year-old child with round cheeks and large eyes, crying with tears streaming down their face

Evaluator (Qwen 3 VL)

The child’s face is symmetric with natural features: round cheeks, large eyes, and clear tear streaks. The crying expression is vividly readable through furrowed brows, downturned mouth, and visible tears. No distortions or unnatural artifacts are present. The image accurately fulfills the prompt’s requirements. VERDICT: PASS

An elderly man in his 70s with age spots and thinning gray hPASS
An elderly man in his 70s with age spots and thinning gray hair, looking skeptically over reading glasses

Evaluator (Qwen 3 VL)

Face is symmetric with natural wrinkles and age spots. Thinning gray hair and reading glasses are present. Skeptical expression is clearly readable via furrowed brow and downturned mouth. No significant distortions. Features align well with prompt. VERDICT: PASS

Extreme 33%
A group of three friends at a table: one laughing with head PASS
A group of three friends at a table: one laughing with head tilted back, one looking shocked with wide eyes, and one rolling their eyes with a smirk

Evaluator (Qwen 3 VL)

All three faces show clear, symmetrical features with natural expressions. Left: laughing with head tilted back, eyes closed. Center: shocked with wide eyes and open mouth. Right: smirk with rolled eyes (visible brow raise). No distortions. Emotions are unmistakable and match the prompt perfectly. VERDICT: PASS

A family photo with four people: a father beaming with prideFAIL
A family photo with four people: a father beaming with pride, a mother smiling gently, a teenager looking bored, and a child grinning excitedly

Evaluator (Qwen 3 VL)

Faces are symmetric with natural features. Emotions are clearly readable: father beams, mother smiles gently, both children grin excitedly. Teenager is not present — only four people total, but the rightmost person is an adult woman, not a teen. Prompt specifies a teenager, which is absent. VERDICT: FAIL

Three coworkers in a meeting: one looking frustrated with fuFAIL
Three coworkers in a meeting: one looking frustrated with furrowed brows, one nodding with a pleased expression, and one looking confused with a tilted head

Evaluator (Qwen 3 VL)

Faces are symmetric with correct features. Left woman shows frustration (furrowed brows, downturned mouth). Center man smiles, conveying pleased expression. Right man has slight smirk, but lacks clear confusion or tilted head. Emotions are readable but not fully aligned with prompt. Minor facial distortions in right man’s eyes and mouth. VERDICT: FAIL

Full Body100%
Easy 100%
A woman standing upright in a relaxed pose, arms at her sidePASS
A woman standing upright in a relaxed pose, arms at her sides, wearing jeans and a white t-shirt

Evaluator (Qwen 3 VL)

The woman stands upright with arms at her sides, wearing a white t-shirt and jeans — perfectly matching the prompt. Limb count and proportions are anatomically accurate. No distortions are visible. The relaxed, neutral pose is clearly recognizable. All elements align with the requested description. VERDICT: PASS

A man standing straight facing the camera, hands in pockets,PASS
A man standing straight facing the camera, hands in pockets, normal proportions

Evaluator (Qwen 3 VL)

The man stands straight, facing the camera, with hands in pockets — perfectly matching the prompt. Limb count is correct (two arms, two legs). Proportions are natural and anatomically accurate. No distortions are visible. The pose is clearly recognizable and realistic. All criteria are satisfied. VERDICT: PASS

A person standing in a neutral pose in a park, full body visPASS
A person standing in a neutral pose in a park, full body visible from head to feet

Evaluator (Qwen 3 VL)

The person’s full body is visible, standing in a neutral pose with natural limb count and proportions. Anatomical correctness is maintained: arms at sides, legs straight, head level. The park setting and backpack are consistent with the prompt. No significant distortions or anomalies are present. The pose is clearly recognizable as standing still. VERDICT: PASS

Medium 100%
A woman mid-stride running through a park, hair and clothes PASS
A woman mid-stride running through a park, hair and clothes showing motion

Evaluator (Qwen 3 VL)

Limb count and proportions are anatomically correct. Pose is a natural running stride with motion blur in hair and clothing, clearly conveying mid-stride action. No significant distortions. The scene matches the prompt perfectly. VERDICT: PASS

A man captured mid-jump over a puddle, body in the air with PASS
A man captured mid-jump over a puddle, body in the air with bent knees

Evaluator (Qwen 3 VL)

Limb count and proportions are accurate. The pose is anatomically plausible for a mid-air jump with bent knees, arms out for balance. The action is clearly recognizable — a man leaping over a puddle, with water splashing realistically. No significant distortions. The body’s posture and motion are natural and dynamic. VERDICT: PASS

A dancer in a dynamic spin, dress flowing outward with one lPASS
A dancer in a dynamic spin, dress flowing outward with one leg extended

Evaluator (Qwen 3 VL)

The dancer has two arms, two legs, and correct proportions. The pose is a recognizable dynamic spin with one leg extended and arms outstretched, matching the prompt. The dress flows naturally with motion. No anatomical distortions are present. The action is clearly identifiable as a ballet spin. VERDICT: PASS

Hard 100%
A person in a yoga tree pose — standing on one leg with the PASS
A person in a yoga tree pose — standing on one leg with the other foot pressed to the inner thigh, arms raised overhead

Evaluator (Qwen 3 VL)

The person has two legs, correct proportions, and performs the tree pose accurately: standing on one foot, other foot pressed to inner thigh, arms raised. No distortions; pose is clearly recognizable. Anatomical alignment is natural and plausible for the pose. VERDICT: PASS

A rock climber clinging to a cliff face with arms and legs ePASS
A rock climber clinging to a cliff face with arms and legs extended to different holds

Evaluator (Qwen 3 VL)

The climber has correct limb count and proportions. The pose is anatomically plausible for rock climbing, with arms and legs extended to different holds. The action is clearly recognizable as dynamic climbing. No significant distortions are present. The image accurately depicts the prompt. VERDICT: PASS

A cellist seated with correct posture, legs around the instrPASS
A cellist seated with correct posture, legs around the instrument, bow arm extended

Evaluator (Qwen 3 VL)

The cellist has correct limb count and proportions. His posture is anatomically sound: seated, legs wrapped around the cello, bow arm extended. The pose is clearly recognizable as playing the cello. No distortions are visible. All elements of the prompt are accurately rendered. VERDICT: PASS

Extreme 100%
Two tango dancers in a close embrace, the man dipping the woPASS
Two tango dancers in a close embrace, the man dipping the woman backward with intertwined legs

Evaluator (Qwen 3 VL)

Limb count and proportions are accurate. The pose is a recognizable tango dip with intertwined legs — the woman’s legs are wrapped around the man’s waist, and his hands support her back. No anatomical distortions; musculature and joint alignment are natural. The action matches the prompt precisely. VERDICT: PASS

Two gymnasts performing a synchronized handstand, side by siPASS
Two gymnasts performing a synchronized handstand, side by side with identical body alignment

Evaluator (Qwen 3 VL)

Both gymnasts show correct limb count and proportions. Their synchronized handstand pose is anatomically accurate, with legs straight, feet together, and bodies aligned vertically. The action is clearly recognizable as a handstand. No significant distortions are visible. VERDICT: PASS

Two soccer players mid-tackle, one sliding and the other jumPASS
Two soccer players mid-tackle, one sliding and the other jumping over with the ball at their feet

Evaluator (Qwen 3 VL)

Limb counts and proportions are anatomically correct. The sliding player’s body is low to the ground with realistic motion blur and dirt kicked up. The jumping player’s pose is dynamic, with one leg extended toward the ball, which is at their feet. Musculature and joint angles are natural. No distortions. The described action is clearly recognizable as a mid-tackle. VERDICT: PASS

Hands100%
Easy 100%
A person standing with both hands visible at their sides, fiPASS
A person standing with both hands visible at their sides, fingers relaxed and naturally spread

Evaluator (Qwen 3 VL)

Both hands show five fingers each, anatomically correct with relaxed, naturally spread fingers. No anomalies detected. Fingers are proportionate, joints appear natural, and the gesture matches the prompt. VERDICT: PASS

A woman holding her hands out in front of her palms-up, all PASS
A woman holding her hands out in front of her palms-up, all ten fingers clearly visible

Evaluator (Qwen 3 VL)

Both hands show all ten fingers clearly — five per hand. Fingers are anatomically correct, with natural spacing and curvature. No anomalies like extra or missing fingers. Gesture is natural and consistent with the prompt. VERDICT: PASS

A man with arms crossed, both hands resting on opposite uppePASS
A man with arms crossed, both hands resting on opposite upper arms with fingers visible

Evaluator (Qwen 3 VL)

Both hands show all five fingers clearly — thumb, index, middle, ring, pinky — no anomalies. The gesture is anatomically plausible: hands rest on opposite upper arms, fingers slightly curled, palms facing inward. No extra or missing digits. The pose is natural and consistent with the prompt. VERDICT: PASS

Medium 100%
A person holding a white ceramic coffee mug by the handle wiPASS
A person holding a white ceramic coffee mug by the handle with their right hand

Evaluator (Qwen 3 VL)

Both visible hands show five fingers each. The right hand grips the mug handle naturally, with thumb and fingers wrapped around it — anatomically correct. No anomalies in finger count or gesture. The left hand supports the base, also natural. All details align with the prompt. VERDICT: PASS

A woman writing in a notebook with a pen gripped correctly iPASS
A woman writing in a notebook with a pen gripped correctly in her right hand

Evaluator (Qwen 3 VL)

Both hands show five fingers each. The right hand grips the pen in a natural, anatomically correct writing posture (thumb, index, and middle fingers). The left hand rests near the notebook, not interfering. No anomalies in finger count or anatomy. VERDICT: PASS

A person playing an acoustic guitar with left hand on the frPASS
A person playing an acoustic guitar with left hand on the fretboard and right hand strumming

Evaluator (Qwen 3 VL)

Left hand on fretboard shows 5 fingers (thumb, index, middle, ring, pinky) in natural playing position. Right hand strumming shows 5 fingers (thumb partially obscured, index, middle, ring, pinky visible). Grip and finger placement are anatomically correct for guitar playing. No anomalies detected. VERDICT: PASS

Hard 100%
A hand giving a thumbs-up with four fingers curled and thumbPASS
A hand giving a thumbs-up with four fingers curled and thumb extended upward

Evaluator (Qwen 3 VL)

The hand shows five fingers: thumb extended upward, and four fingers curled into a fist. The gesture is anatomically correct — thumb opposes the palm, fingers curl naturally. No anomalies like extra or missing fingers. All five digits are clearly visible and properly formed. VERDICT: PASS

A person making a peace sign with their right hand, clearly PASS
A person making a peace sign with their right hand, clearly showing two fingers up and three curled

Evaluator (Qwen 3 VL)

The right hand clearly shows two fingers (index and middle) extended upward, with the other three fingers (ring, pinky, and thumb) curled inward — totaling five fingers. The gesture is anatomically correct, with natural finger positioning and no anomalies. The left hand is not visible. All criteria met. VERDICT: PASS

A person counting to three on their fingers, with index, midPASS
A person counting to three on their fingers, with index, middle, and ring fingers extended

Evaluator (Qwen 3 VL)

Each hand shows exactly 5 fingers: index, middle, and ring extended as requested, with thumb and pinky curled inward. The gesture is anatomically correct with natural finger positioning and no anomalies. Fingers are not fused or distorted. VERDICT: PASS

Extreme 100%
Two people shaking hands in a professional setting, all fingPASS
Two people shaking hands in a professional setting, all fingers of both hands anatomically correct

Evaluator (Qwen 3 VL)

Both hands show five clearly defined fingers each. The handshake is anatomically correct — palms facing each other, fingers interlocked, thumbs visible. No anomalies like extra or missing fingers, malformed joints, or unnatural positioning. The gesture is natural for a professional setting. VERDICT: PASS

Two people doing a high-five, palms meeting with all fingersPASS
Two people doing a high-five, palms meeting with all fingers extended on both hands

Evaluator (Qwen 3 VL)

Both hands clearly show five fingers each — thumb, index, middle, ring, pinky — with all fingers extended as requested. The high-five gesture is anatomically correct: palms meet, fingers spread naturally, and wrists are aligned. No anomalies or missing fingers. The image accurately fulfills the prompt. VERDICT: PASS

One person handing a small gift box to another person, both PASS
One person handing a small gift box to another person, both sets of hands gripping the box correctly

Evaluator (Qwen 3 VL)

Both hands show five fingers each — thumbs, index, middle, ring, and pinky — clearly visible and anatomically correct. Grips are natural: giver’s hand cradles the box, receiver’s hand supports it with fingers curled around. No anomalies or missing digits. Gesture is realistic and appropriate for handing a gift. VERDICT: PASS

Multi-Subject100%
Easy 100%
Two women standing side by side: one with short blonde hair PASS
Two women standing side by side: one with short blonde hair wearing a red jacket, the other with long black hair wearing a blue dress

Evaluator (Qwen 3 VL)

Person 1: Short blonde hair, wearing a red jacket — matches description. Person 2: Long black hair, wearing a blue dress — matches description. Both are visually distinct and correctly portrayed. VERDICT: PASS

Two men at a cafe: one with a beard and glasses wearing a grPASS
Two men at a cafe: one with a beard and glasses wearing a green sweater, the other clean-shaven in a black hoodie

Evaluator (Qwen 3 VL)

Person 1: Bearded man with glasses, wearing a green sweater — matches description. Person 2: Clean-shaven man, wearing a black hoodie — matches description. Both are visually distinct and clearly identifiable. VERDICT: PASS

Two friends posing: one tall with curly red hair in denim ovPASS
Two friends posing: one tall with curly red hair in denim overalls, the other short with straight brown hair in a yellow sundress

Evaluator (Qwen 3 VL)

Person 1 (tall): Curly red hair, denim overalls — matches. Person 2 (short): Straight brown hair, yellow sundress — matches. Both visually distinct, correctly portrayed. VERDICT: PASS

Medium 100%
Four people at a dinner table: a bald man in a suit, a womanPASS
Four people at a dinner table: a bald man in a suit, a woman with silver hair in a red blouse, a young man with dreadlocks in a denim jacket, and a woman with a hijab in a green dress

Evaluator (Qwen 3 VL)

Bald man in suit: Present, visually distinct. Silver-haired woman in red blouse: Present, visually distinct. Young man with dreadlocks in denim jacket: Present, visually distinct. Woman with hijab in green dress: Present, visually distinct. All four match descriptions perfectly. VERDICT: PASS

Three musicians on stage: a tall woman with short pink hair PASS
Three musicians on stage: a tall woman with short pink hair playing bass, a man with a beard and bandana on drums, and a slim man with glasses playing keyboard

Evaluator (Qwen 3 VL)

Left: Tall woman with short pink hair, playing bass guitar — matches description. Center: Bearded man with bandana, seated at drums — matches description. Right: Slim man with glasses, playing keyboard — matches description. All three are visually distinct and accurately portrayed. VERDICT: PASS

Three hikers on a trail: a man with a backpack and sunhat, aPASS
Three hikers on a trail: a man with a backpack and sunhat, a woman with braided hair and hiking poles, and a teenager with a baseball cap and water bottle

Evaluator (Qwen 3 VL)

Man: wears sunhat and backpack — matches. Woman: has braided hair, uses hiking poles, wears backpack — matches. Teenager: wears baseball cap, carries water bottle, uses hiking poles — matches. All three are visually distinct and correctly depicted. VERDICT: PASS

Truthfulness 85%

Photorealism100%
Medium 100%
A still life on a wooden table featuring a clear glass of waPASS
A still life on a wooden table featuring a clear glass of water, a polished silver spoon, a folded linen napkin, and a ripe red apple — each material rendered with accurate texture and light behavior

Evaluator (Qwen 3.6 27B)

Glass shows correct refraction and reflections; silver spoon reflects light realistically; wooden table displays grain and wear; linen napkin shows natural weave and folds; apple has realistic skin texture and highlights. All materials render accurate textures and light behavior as requested. VERDICT: PASS

A workspace desk with a brushed aluminum laptop, a leather-bPASS
A workspace desk with a brushed aluminum laptop, a leather-bound notebook, a ceramic coffee mug, and a glass paperweight — all materials visually distinct and realistic

Evaluator (Qwen 3.6 27B)

Brushed aluminum laptop shows correct matte sheen and reflections. Leather notebook displays realistic grain and stitching. Ceramic mugs have appropriate matte texture and subtle glaze. Glass paperweight shows clear translucency and reflections. All materials render with accurate visual properties. VERDICT: PASS

A kitchen counter with a stainless steel pot, a wooden cuttiPASS
A kitchen counter with a stainless steel pot, a wooden cutting board, a glass bottle of olive oil, and a cotton dish towel — each surface reflecting light correctly

Evaluator (Qwen 3.6 27B)

Stainless steel pot shows correct metallic reflections and brushed texture. Glass bottle displays realistic translucency and refraction of olive oil, with accurate reflections. Wooden cutting board has believable grain and matte finish. Cotton towel renders soft, fibrous texture and diffuse light absorption. All materials reflect light appropriately for their properties. VERDICT: PASS

Physics & Reflections92%
Easy 67%
A red sphere on a white surface with a single shadow cast toPASS
A red sphere on a white surface with a single shadow cast to the right, consistent with a light source from the upper left

Evaluator (Qwen 3.6 27B)

Shadow direction (right) matches upper-left light source. Shadow shape is elliptical, consistent with a sphere. Highlights and shading on the sphere are smooth and physically accurate. No reflections or distortions present. Color is natural red with realistic shadow tones. All elements align with prompt. VERDICT: PASS

A coffee mug on a sunny windowsill casting a sharp shadow toPASS
A coffee mug on a sunny windowsill casting a sharp shadow to the left

Evaluator (Qwen 3.6 27B)

The mug casts a sharp, coherent shadow to its left, consistent with sunlight coming from the right (e.g., east-facing window in morning). Shadow edges are crisp, and the mug’s reflection in the dark coffee is plausible. No unnatural distortion or color shifts. Lighting and shadow direction align with a sunny windowsill scenario. VERDICT: PASS

A person standing outdoors at midday with a short shadow dirFAIL
A person standing outdoors at midday with a short shadow directly below them

Evaluator (Qwen 3.6 27B)

The shadow is cast directly behind the person, not beneath them, contradicting midday sun expectations. It’s elongated and distorted, inconsistent with strong overhead lighting. Color is accurate (dark shadow), but direction/orientation fails physical plausibility. VERDICT: FAIL

Medium 100%
A woman standing in front of a full-length mirror, her reflePASS
A woman standing in front of a full-length mirror, her reflection accurately showing the back of her outfit and reversed orientation

Evaluator (Qwen 3.6 27B)

The reflection shows the woman’s back correctly reversed, matching mirror physics. Shadows and lighting are consistent: light source from left casts shadows rightward on floor and mirror. No distortion or color anomalies. Reflections align with real-world optics. All elements are physically plausible. VERDICT: PASS

A cat sitting on a polished marble floor with its reflectionPASS
A cat sitting on a polished marble floor with its reflection visible below

Evaluator (Qwen 3.6 27B)

The cat’s reflection is inverted and slightly distorted, matching a polished marble surface. Direction and orientation are consistent: reflection mirrors the cat’s posture and eye gaze. Color fidelity is accurate, with natural tones preserved. Shadows are minimal but plausible under diffuse lighting. No obvious physical inconsistencies. VERDICT: PASS

A vase of flowers on a glass table with a clear reflection vPASS
A vase of flowers on a glass table with a clear reflection visible on the table surface

Evaluator (Qwen 3.6 27B)

Reflections are clear, inverted, and proportionally accurate beneath the vase. Shadows cast by the vase and flowers align with light direction from the upper left, showing natural softness and gradient. Glass distortion is minimal and realistic. Colors in reflection match the source objects with slight tonal softening due to glass. All elements are physically consistent. VERDICT: PASS

Hard 100%
A row of trees reflected in a calm lake at sunset, with the PASS
A row of trees reflected in a calm lake at sunset, with the reflection slightly darker and gently rippled

Evaluator (Qwen 3.6 27B)

Reflections mirror tree shapes accurately, with gentle ripples distorting them naturally. Colors match sunset hues: warmer tones in reflection, darker silhouette. Shadows align with low sun angle, creating consistent silhouettes. No unnatural distortions or inverted elements. All elements are physically plausible. VERDICT: PASS

A city skyline reflected in a rain puddle on a street, with PASS
A city skyline reflected in a rain puddle on a street, with mild distortion and color shift

Evaluator (Qwen 3.6 27B)

Reflections show correct inverted orientation and direction, mirroring the skyline and streetlights. Mild distortion from ripples and water droplets is natural. Color shift is subtle, with cooler tones in puddle vs. surroundings — plausible due to water’s refractive properties and ambient lighting. No physical inconsistencies. VERDICT: PASS

A mountain range reflected in a still alpine lake, with the PASS
A mountain range reflected in a still alpine lake, with the reflection inverted and subtly muted in color

Evaluator (Qwen 3.6 27B)

The reflection is inverted and vertically aligned with the mountains, matching physical optics. Shadows on the mountains and their mirrored counterparts are consistent in direction and intensity. Color is subtly muted in the reflection, as expected from water absorption and atmospheric effects. No distortion or unnatural artifacts are present. VERDICT: PASS

Extreme 100%
A clear glass sphere sitting on a mirror surface near a poolPASS
A clear glass sphere sitting on a mirror surface near a pool of water, showing refraction through the sphere, reflection on the mirror, and water reflection simultaneously

Evaluator (Qwen 3.6 27B)

The sphere shows realistic refraction (inverted, distorted background), clear mirror reflection below, and water reflection in the background. Shadows align with light direction. Colors and distortion are physically consistent. All three effects coexist naturally. VERDICT: PASS

A chrome sculpture in a room with a glass floor and large wiPASS
A chrome sculpture in a room with a glass floor and large windows, showing multiple reflections and environmental mapping on every surface

Evaluator (Qwen 3.6 27B)

Reflections and shadows are physically plausible: reflections on glass floor and walls match the sculpture’s form and orientation, with correct distortion from surface curvature. Shadows cast by overhead lights align with light source direction. Color fidelity is accurate—metallic sheen, glass transparency, and ambient lighting are consistent. No impossible distortions or mismatched reflections. VERDICT: PASS

A crystal wine glass on a polished marble counter next to a PASS
A crystal wine glass on a polished marble counter next to a window overlooking a lake — showing refraction in the glass, reflection on the marble, and the lake's own reflective surface

Evaluator (Qwen 3.6 27B)

The glass shows realistic refraction of the lake scene through its cut facets, with distorted but coherent shapes. The marble countertop reflects the glass and window frame accurately, matching the light direction. The lake’s reflection is clear and oriented correctly, mirroring the sky and trees. Shadows align with ambient lighting. All elements are physically plausible. VERDICT: PASS

World Knowledge75%
Easy 100%
The Eiffel Tower in Paris on a clear day, architecturally acPASS
The Eiffel Tower in Paris on a clear day, architecturally accurate with its iron lattice structure and three levels

Evaluator (Qwen 3.6 27B)

The Eiffel Tower is clearly recognizable and depicted with accurate architectural details: its iron lattice structure, three levels (first, second, and top), and surrounding context are all correct. The image shows a clear day with no distortions. All prompt requirements are met. VERDICT: PASS

The Statue of Liberty on her island pedestal, with correct tPASS
The Statue of Liberty on her island pedestal, with correct torch, crown, tablet, and green patina

Evaluator (Qwen 3.6 27B)

The Statue of Liberty is clearly recognizable and factually accurate: correct torch, crown, tablet, and green patina are all present. The pedestal and island setting match real-world architecture. No significant inaccuracies observed. VERDICT: PASS

The Taj Mahal in Agra seen from the reflecting pool, with acPASS
The Taj Mahal in Agra seen from the reflecting pool, with accurate dome, minarets, and symmetry

Evaluator (Qwen 3.6 27B)

The Taj Mahal is clearly recognizable, viewed from the reflecting pool as requested. The dome, minarets, and overall symmetry are accurately depicted. Architectural details like the main archway, decorative inlays, and four minarets flanking the structure are correct. The reflection in the pool is realistic. No major factual or structural inaccuracies are present. VERDICT: PASS

Medium 100%
A traditional Japanese shamisen instrument with accurate thrPASS
A traditional Japanese shamisen instrument with accurate three-stringed design, wooden body, and bachi plectrum

Evaluator (Qwen 3.6 27B)

The instrument is a traditional Japanese shamisen, correctly featuring a wooden body, three strings, and a bachi plectrum (visible as a tied stick). The bridge, tuning pegs, and neck are accurately rendered. Minor wear and tear are authentic to an aged instrument. All key structural and functional elements match the prompt’s description. VERDICT: PASS

A medieval astrolabe with accurate rete, tympan, and engravePASS
A medieval astrolabe with accurate rete, tympan, and engraved degree markings on brass

Evaluator (Qwen 3.6 27B)

The image shows a real medieval astrolabe with a brass body, clearly visible rete (star pointers), tympan (azimuthal disk), and engraved degree markings. The design, including zodiac signs and numerical scales, is historically accurate. The stand and decorative elements are period-appropriate. No major architectural or factual inaccuracies are present. VERDICT: PASS

A Japanese kintsugi bowl — cracked ceramic repaired with visPASS
A Japanese kintsugi bowl — cracked ceramic repaired with visible gold lacquer lines

Evaluator (Qwen 3.6 27B)

The bowl is a recognizable kintsugi repair, with visible gold lacquer lines mending cracks in white ceramic. The technique is factually accurate: gold is used to highlight repairs, and the bowl’s form and cracks are typical. No architectural inaccuracies apply, as kintsugi is an art form, not architecture. The subject is authentic and well-executed. VERDICT: PASS

Hard 33%
A detailed diagram of the water cycle showing evaporation frFAIL
A detailed diagram of the water cycle showing evaporation from the ocean, condensation forming clouds, precipitation as rain, and collection in rivers flowing back to the sea

Evaluator (Qwen 3.6 27B)

The image depicts the water cycle with labeled stages: evaporation, condensation, precipitation, and collection. While visually clear and conceptually accurate, the labels are misspelled (e.g., “Condensation” → “Condadsborn,” “Precipitation” → “Ploua ehtam”). These errors undermine factual accuracy despite correct representation of processes. VERDICT: FAIL

A cross-section showing tectonic plates at a convergent bounPASS
A cross-section showing tectonic plates at a convergent boundary with subduction, a trench, volcanic arc, and layered mantle

Evaluator (Qwen 3.6 27B)

The image depicts a convergent boundary with subduction, trench, volcanic arc, and layered mantle — all core tectonic features. Labels are mostly accurate (e.g., “Subducting plate,” “Volcanic arc,” “Mantle”). Minor label errors (e.g., “Nonne connoey” likely “Subducting plate”) are typos, not structural inaccuracies. The diagram correctly shows oceanic plate sinking, magma rising, and associated topography. VERDICT: PASS

An anatomical illustration of the human heart showing four cFAIL
An anatomical illustration of the human heart showing four chambers, valves, aorta, pulmonary arteries, and veins with correct placement

Evaluator (Qwen 3.6 27B)

The image depicts a stylized anatomical heart with labeled structures, but the labels are garbled (e.g., “Carne,” “Piondric Inobnes,” “Veme”). While the overall shape and major vessels (aorta, pulmonary arteries/veins) are roughly correct, the mislabeling renders it factually inaccurate. The prompt requires correct labeling and architecture — which is not met. VERDICT: FAIL

Extreme 67%
A Morpho menelaus butterfly with accurate iridescent blue doPASS
A Morpho menelaus butterfly with accurate iridescent blue dorsal wings, brown ventral wing pattern with eyespots, and correct body morphology

Evaluator (Qwen 3.6 27B)

The image shows a butterfly with iridescent blue dorsal wings and brown ventral wings featuring eyespots — matching Morpho menelaus. Body morphology, antennae, and wing shape are accurate. No significant anatomical or color inaccuracies are present. VERDICT: PASS

A specimen of bismuth crystal showing accurate iridescent oxPASS
A specimen of bismuth crystal showing accurate iridescent oxide layers, geometric hopper crystal formation, and characteristic staircase structure

Evaluator (Qwen 3.6 27B)

The image shows a real bismuth crystal specimen with vivid iridescent oxide layers and distinct geometric hopper formation. The staircase-like growth pattern is clearly visible. The colors are authentic due to thin-film interference from the oxide coating. No significant factual or structural inaccuracies are present. VERDICT: PASS

The Hagia Sophia interior showing the massive central dome, FAIL
The Hagia Sophia interior showing the massive central dome, pendentives, semi-domes, and Islamic calligraphy medallions alongside Byzantine mosaics

Evaluator (Qwen 3.6 27B)

The image shows the interior of a grand domed church with extensive Byzantine-style mosaics and frescoes. However, it lacks the distinctive pendentives and semi-domes characteristic of Hagia Sophia’s architecture. The calligraphy medallions are absent; instead, the walls feature Christian iconography. The structure resembles St. Peter’s Basilica or another Renaissance church, not Hagia Sophia. VERDICT: FAIL

Professional Studio 96%

Camera & Lighting100%
Easy 100%
A wide-angle shot of a modern living room interior showing ePASS
A wide-angle shot of a modern living room interior showing expansive space with slight barrel distortion at the edges

Evaluator (Gemma 4 26B)

The image shows a wide-angle perspective with noticeable barrel distortion at the edges, framed by a vignette. Lighting is even and modern, with track lights and recessed ceiling fixtures. Depth of field is shallow, with foreground elements sharp and background slightly softer — consistent with a wide-angle lens. All elements match the prompt. VERDICT: PASS

A close-up macro shot of a single dewdrop on a rose petal, sPASS
A close-up macro shot of a single dewdrop on a rose petal, shallow depth of field

Evaluator (Gemma 4 26B)

The image shows a close-up macro shot of a single dewdrop on a textured, pinkish surface resembling a rose petal. Shallow depth of field is evident, with the drop sharply in focus and the background heavily blurred. Lighting is soft and diffused, creating natural reflections in the droplet without harsh shadows. All elements of the prompt are accurately rendered. VERDICT: PASS

An aerial bird's-eye view looking straight down at a trafficPASS
An aerial bird's-eye view looking straight down at a traffic roundabout with cars and trees

Evaluator (Gemma 4 26B)

The image shows a true bird’s-eye aerial view looking straight down at a roundabout with cars and trees, matching the prompt. Lighting is consistent with bright daylight, casting clear shadows. Depth of field appears uniform, typical of a high-altitude drone shot. All key elements are accurately represented. VERDICT: PASS

Medium 100%
A portrait lit with classic Rembrandt lighting — triangle ofPASS
A portrait lit with classic Rembrandt lighting — triangle of light on the shadowed cheek, moody atmosphere

Evaluator (Gemma 4 26B)

The image features a classic Rembrandt lighting setup: a distinct triangular highlight on the shadowed cheek, deep shadows, and a moody atmosphere. The shallow depth of field isolates the subject, and the perspective is a tight, centered portrait. All elements align with the prompt’s description. VERDICT: PASS

A landscape photograph bathed in golden hour light — long waPASS
A landscape photograph bathed in golden hour light — long warm shadows, orange sky, soft glow

Evaluator (Gemma 4 26B)

The image captures a golden hour landscape with an orange sky, soft glow, and long warm shadows cast across rolling hills and grasses. Sun rays pierce through clouds, enhancing the atmospheric lighting. Depth of field is shallow, keeping foreground grasses sharp while softly blurring distant hills — matching the prompt’s intent. Perspective is wide, emphasizing the expansive scene. VERDICT: PASS

A high-key portrait of a person against a bright white backgPASS
A high-key portrait of a person against a bright white background, minimal shadows, airy and luminous

Evaluator (Gemma 4 26B)

The image matches the prompt: bright white background, minimal shadows, and airy, luminous lighting. The subject is centered with a shallow depth of field, blurring the background while keeping her face sharp. Lighting is even and high-key, with soft highlights and no harsh shadows. Perspective is direct and flattering. VERDICT: PASS

Hard 100%
A portrait shot at 85mm f/1.4 with creamy bokeh background, PASS
A portrait shot at 85mm f/1.4 with creamy bokeh background, backlit with warm rim light separating the subject from a blurred garden

Evaluator (Gemma 4 26B)

The image shows a portrait with a shallow depth of field and creamy bokeh background, consistent with an 85mm f/1.4 lens. Warm rim lighting separates the subject from the blurred garden backdrop. While lens specs can’t be verified visually, the lighting and composition match the prompt’s description closely. VERDICT: PASS

A street photograph with shallow depth of field as if shot aPASS
A street photograph with shallow depth of field as if shot at 50mm f/1.8, subject sharp with soft city lights in the background

Evaluator (Gemma 4 26B)

The image shows a street scene with the subject sharply in focus while background city lights are beautifully blurred into bokeh — matching the shallow depth of field expected from a 50mm f/1.8 lens. The perspective is street-level, and lighting suggests urban night conditions. All key elements of the prompt are accurately rendered. VERDICT: PASS

A food photograph shot from above at 35mm f/2.8 with naturalPASS
A food photograph shot from above at 35mm f/2.8 with natural window light from the left, slight vignette, warm color temperature

Evaluator (Gemma 4 26B)

The image is shot from above, matching the requested perspective. Natural light from the left casts soft shadows and highlights, consistent with window lighting. A subtle vignette and warm color temperature are present. Depth of field appears shallow, though not perfectly sharp at f/2.8, likely due to lens characteristics or focus. Overall, it closely matches the prompt. VERDICT: PASS

Extreme 100%
A tilt-shift photograph of a city street making real-life loPASS
A tilt-shift photograph of a city street making real-life look like a miniature model — selective focus strip, saturated colors

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt: it uses a tilt-shift effect with a narrow selective focus strip down the street, creating a miniature model illusion. Perspective is straight-on, depth of field is shallow and centered, and colors are vividly saturated. Lighting is bright and even, mimicking daylight. All elements align with the requested camera/lighting style. VERDICT: PASS

A cinematic frame with blue and orange anamorphic lens flarePASS
A cinematic frame with blue and orange anamorphic lens flare streaking horizontally across a dark moody scene

Evaluator (Gemma 4 26B)

The image features horizontal blue and orange lens flare streaks across a dark, moody background, matching the prompt’s description. While no explicit depth of field or perspective is visible (as there are no subjects), the flare’s cinematic composition and color contrast fulfill the lighting and mood requirements. VERDICT: PASS

A portrait with dramatic split lighting — half the face brigPASS
A portrait with dramatic split lighting — half the face brightly lit, the other half in deep shadow, high contrast chiaroscuro

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt: dramatic split lighting with one half of the face brightly lit and the other half in deep shadow, creating high-contrast chiaroscuro. The portrait is tightly framed, with shallow depth of field blurring the background. Lighting and composition align precisely with the requested effect. VERDICT: PASS

Color Precision92%
Easy 100%
A bright red sports car parked on a gray asphalt road — the PASS
A bright red sports car parked on a gray asphalt road — the red is vivid and unambiguous

Evaluator (Gemma 4 26B)

The car is a vivid, unambiguous bright red, matching the prompt’s description perfectly. The asphalt road is gray as specified. No color deviations are visible. VERDICT: PASS

A woman wearing a royal blue evening dress against a neutralPASS
A woman wearing a royal blue evening dress against a neutral background

Evaluator (Gemma 4 26B)

The dress is a rich, saturated royal blue, matching the prompt exactly. The background is a clean, neutral light gray, fulfilling the requirement. No color deviations are visible. VERDICT: PASS

A field of pure sunflower yellow flowers under a clear blue PASS
A field of pure sunflower yellow flowers under a clear blue sky

Evaluator (Gemma 4 26B)

The image shows a field of sunflowers with bright, vivid yellow petals under a clear, light blue sky — matching “pure sunflower yellow” and “clear blue sky” as described. Colors are accurate and natural, with no unnatural tints or deviations. The prompt’s color intent is fully satisfied. VERDICT: PASS

Medium 67%
A bedroom wall painted in #FF6B35 (burnt orange) with white PASS
A bedroom wall painted in #FF6B35 (burnt orange) with white furniture and natural daylight

Evaluator (Gemma 4 26B)

The wall is a warm burnt orange closely matching #FF6B35. Furniture is white as specified. Natural daylight is present. Minor variations in bedding/throw (beige tones) don’t contradict the prompt’s core color requirements. All key elements align with the specified colors and lighting. VERDICT: PASS

A room with walls in #4A90D9 (medium blue) with warm wooden PASS
A room with walls in #4A90D9 (medium blue) with warm wooden floor and white trim

Evaluator (Gemma 4 26B)

The walls match #4A90D9 (a medium blue), the floor is warm wood, and trim is white — all as specified. Colors are accurate and consistent with the prompt. VERDICT: PASS

A forest scene where the foliage is precisely #2D5F2D (dark FAIL
A forest scene where the foliage is precisely #2D5F2D (dark muted green) with brown tree trunks

Evaluator (Gemma 4 26B)

The foliage is not precisely #2D5F2D (dark muted green); it appears as varied shades of green with some blue undertones and natural variations, not a uniform hex color. Tree trunks are brown but not a single, exact shade. The prompt required exact colors, which are not achieved. VERDICT: FAIL

Hard 100%
A minimalist room with walls in #F5E6CC (warm cream), a sofaPASS
A minimalist room with walls in #F5E6CC (warm cream), a sofa in #2C3E50 (dark navy), and accent pillows in #E74C3C (bright red)

Evaluator (Gemma 4 26B)

The walls match #F5E6CC (warm cream), the sofa matches #2C3E50 (dark navy), and the pillows match #E74C3C (bright red). All specified hex colors are accurately represented in the image. VERDICT: PASS

A product flat-lay on a #1A1A2E (deep dark blue) background,PASS
A product flat-lay on a #1A1A2E (deep dark blue) background, with a #F0C27F (soft gold) watch and #FFFFFF (pure white) earbuds

Evaluator (Gemma 4 26B)

The background is a deep dark blue matching #1A1A2E. The watch case is a soft gold tone matching #F0C27F. The earbuds are pure white (#FFFFFF). All specified colors are accurately represented. VERDICT: PASS

A graphic design composition with background #FAFAFA (near-wPASS
A graphic design composition with background #FAFAFA (near-white), main shape #FF4500 (orange-red), and accent lines #333333 (dark charcoal)

Evaluator (Gemma 4 26B)

The background is near-white (#FAFAFA), the main shape is orange-red (#FF4500), and the accent lines are dark charcoal (#333333). All specified colors are accurately rendered. VERDICT: PASS

Extreme 100%
A smooth gradient on a wall going from #FF0000 (red) on the PASS
A smooth gradient on a wall going from #FF0000 (red) on the left to #0000FF (blue) on the right, transitioning evenly through purple

Evaluator (Gemma 4 26B)

The image displays a smooth gradient transitioning from red (#FF0000) on the left to blue (#0000FF) on the right, passing through purple in the center. The colors match the specified hex values and transition evenly as described. VERDICT: PASS

A horizontal banner with a gradient from #00C9FF (cyan) on tPASS
A horizontal banner with a gradient from #00C9FF (cyan) on the left to #92FE9D (mint green) on the right

Evaluator (Gemma 4 26B)

The gradient transitions from a cyan (#00C9FF) on the left to a mint green (#92FE9D) on the right, matching the specified hex values precisely. The colors are accurately rendered with no visible deviation. VERDICT: PASS

A sunset sky gradient on a flat surface from #FF6B35 (orangePASS
A sunset sky gradient on a flat surface from #FF6B35 (orange) at the bottom to #1B1464 (deep indigo) at the top

Evaluator (Gemma 4 26B)

The image shows a gradient transitioning from orange-red at the bottom to deep blue-purple at the top. The bottom color is close to #FF6B35 (vibrant orange), and the top is near #1B1464 (deep indigo). While not perfectly exact due to atmospheric blending, the overall color scheme matches the specified hex values. VERDICT: PASS

Photorealism100%
Easy 100%
A single red sneaker on a clean white background with soft sPASS
A single red sneaker on a clean white background with soft studio lighting and gentle shadow — product photography style

Evaluator (Gemma 4 26B)

The sneaker appears convincingly photographed: clean white background, soft lighting, natural shadows, and realistic fabric texture. No obvious AI artifacts like warped geometry, smeared details, or unnatural reflections. Stitching, laces, and sole look physically accurate. Slight noise and shadow softness suggest studio capture, not AI generation. VERDICT: PASS

A clear glass perfume bottle on a reflective black surface wPASS
A clear glass perfume bottle on a reflective black surface with studio lighting, no background distractions

Evaluator (Gemma 4 26B)

The image appears convincingly real at first glance: sharp glass reflections, natural lighting, and a clean, distraction-free background. No obvious AI artifacts like warped geometry, smeared textures, or inconsistent reflections are visible. The composition and lighting match a professional studio product shot. VERDICT: PASS

A ripe green apple on a seamless white background with soft PASS
A ripe green apple on a seamless white background with soft side lighting and subtle shadow — catalog style

Evaluator (Gemma 4 26B)

The apple appears convincingly real: natural skin texture, realistic stem, soft side lighting, and a clean white background with a subtle shadow. No obvious AI artifacts like warped geometry, unnatural textures, or inconsistent lighting. The image matches the prompt’s “catalog style” description precisely. VERDICT: PASS

Graphical design 50%

Layout & Design11%
Easy 0%
A movie night poster with a large popcorn image in the centeFAIL
A movie night poster with a large popcorn image in the center, the title 'MOVIE NIGHT' in bold at the top, and the tagline 'Every Friday at 8 PM' at the bottom

Evaluator (Qwen 3 VL)

The large popcorn bucket is centered, with “MOVIE NIGHT” in bold at the top and “Every Friday at 8 1PM” at the bottom — matching the prompt. Text is readable and correctly placed. However, “NIGHT” is misaligned and “1PM” has a typo (“1PM” instead of “8PM”). Minor errors in text accuracy. VERDICT: FAIL

An event poster featuring a guitar illustration, the headlinFAIL
An event poster featuring a guitar illustration, the headline 'LIVE MUSIC FEST' in large text, and 'Tickets available now' as a tagline below

Evaluator (Qwen 3 VL)

The guitar illustration is present and prominent. The headline “LIVE MUSIC FEST” is visible but has distorted/missing letters (“MU” and “ST” are broken). The tagline “Tickets available now” is present but misspelled (“avouiblbe”). Visual hierarchy is mostly correct, but critical text errors prevent proper communication. VERDICT: FAIL

A book club poster with an open book illustration, the titleFAIL
A book club poster with an open book illustration, the title 'JOIN THE BOOK CLUB' prominently displayed, and 'Meets every Wednesday' as the tagline

Evaluator (Qwen 3 VL)

The title “JOIN THE BOOK CLUB” is visually prominent but contains typos (“JON THE BOOK:UB”). The tagline “Meets every Wednesday” is misrendered as “Metty eery Wressklay.” The open book illustration is present and correctly placed below the title. Visual hierarchy is acceptable, but critical text errors undermine clarity and correctness. VERDICT: FAIL

Medium 0%
A landing page design with a hero header image at the top, aFAIL
A landing page design with a hero header image at the top, a headline and subtitle in the middle, a 'Sign Up' call-to-action button, and a footer with three social media icons

Evaluator (Qwen 3 VL)

The hero header image, headline, subtitle, “Sign Up” button, and footer social icons are all present. However, the headline and subtitle text is garbled and unreadable, violating the requirement for readable and correctly placed text. Visual hierarchy is mostly intact, but illegible text undermines usability. VERDICT: FAIL

A newsletter layout with a banner image header, a two-columnFAIL
A newsletter layout with a banner image header, a two-column body section with text on the left and an image on the right, a highlighted quote block, and a footer with unsubscribe link

Evaluator (Qwen 3 VL)

The layout includes a header banner, two-column body (text left, image right), a highlighted quote block, and a footer with an unsubscribe link — all present. However, all text is gibberish placeholder (e.g., “Nanopr Ares, condav enue”), making it unreadable and violating the requirement for correct text placement and readability. VERDICT: FAIL

A restaurant menu layout with a logo header, three category FAIL
A restaurant menu layout with a logo header, three category sections (appetizers, mains, desserts) each with items and prices, and a footer with address and hours

Evaluator (Qwen 3 VL)

The layout includes a logo header, three category sections (Appetizers, Mains, Desserts), and a footer with “Hours” — all present. However, text is garbled and unreadable (e.g., “Appetizers” appears as “Appetūzes,” prices are gibberish), violating the requirement for readable, correctly placed text. Visual hierarchy is partially maintained but compromised by illegible content. VERDICT: FAIL

Hard 33%
A magazine double-page spread with a left page containing a PASS
A magazine double-page spread with a left page containing a large hero photo, and a right page with a two-column article, a pull quote, a sidebar with author bio, and a page number

Evaluator (Qwen 3 VL)

The layout matches the prompt: left page has a large hero photo; right page has two columns, a pull quote, a sidebar with author bio, and page numbers. Text is readable and correctly placed within its designated zones. Visual hierarchy is clear and consistent with magazine design conventions. VERDICT: PASS

A mobile app UI mockup showing a dashboard with a top navigaFAIL
A mobile app UI mockup showing a dashboard with a top navigation bar, a search field, a 2x2 grid of feature cards with icons, a list of recent activity items, and a bottom tab bar

Evaluator (Qwen 3 VL)

The layout includes a top navigation bar, search field, 2x2 grid of feature cards, recent activity list, and bottom tab bar — all present. However, text is heavily distorted and unreadable, violating the requirement for correct placement and readability. Visual hierarchy is mostly intact, but illegible text undermines usability. VERDICT: FAIL

A fashion magazine spread with a full-bleed left-page photo,FAIL
A fashion magazine spread with a full-bleed left-page photo, right page with three-column layout, headline, body copy, a smaller inset image, and styled page numbers

Evaluator (Qwen 3 VL)

The layout includes a full-bleed left-page photo and a right page with three columns, headline, body copy, and inset images. Page numbers are present. However, all text is gibberish — unreadable and incorrectly placed — violating the requirement for readable, correctly positioned text. The visual hierarchy is present but meaningless due to illegible content. VERDICT: FAIL

Data Visualisation0%
Easy 0%
A pie chart showing market share: Company A 35%, Company B 2FAIL
A pie chart showing market share: Company A 35%, Company B 25%, Company C 20%, Others 20% — each slice a different color with labels and percentages

Evaluator (Qwen 3 VL)

The chart misrepresents the data: “Others” is labeled as 10% (not 20%), and “Company C” is mislabeled as “Company B” (25% instead of 20%). Labels are garbled and unreadable. The chart type is correct (pie chart), but the core data and labels are inaccurate and illegible. VERDICT: FAIL

A simple line graph showing daily website visitors over one FAIL
A simple line graph showing daily website visitors over one week (Mon-Sun): 1200, 1500, 1100, 1800, 2000, 1700, 1400 — with grid lines, axis labels, and title 'Weekly Traffic'

Evaluator (Qwen 3 VL)

The chart misrepresents the data: axes show incorrect values (e.g., 1200 on x-axis, 2000 on y-axis), labels are garbled (“Weehly Trafiic”), and the x-axis lacks day labels (Mon-Sun). The line trends don’t match the requested data points. Grid lines and title are present but mislabeled. Chart type is a line graph, but visual accuracy fails. VERDICT: FAIL

A bar chart showing monthly sales for January through June wFAIL
A bar chart showing monthly sales for January through June with values 12K, 18K, 15K, 22K, 19K, 25K — labeled axes, color-coded bars, and a title 'Monthly Sales 2025'

Evaluator (Qwen 3 VL)

The chart shows only three bars, missing January, March, and May. Values shown (18K, 22K, 19K) don’t match the requested sequence (12K, 18K, 15K, 22K, 19K, 25K). Title is misspelled (“Monnhly Salcer s ‘225”) and axis labels are missing. Bars are color-coded but lack month labels on x-axis. Chart type is correct, but content is incomplete and inaccurate. VERDICT: FAIL

Style Diversity92%
Easy 67%
A golden retriever sitting in a garden, painted in the stylePASS
A golden retriever sitting in a garden, painted in the style of an oil painting with visible brushstrokes and rich texture

Evaluator (Qwen 3 VL)

The image clearly depicts a golden retriever in a garden, rendered with visible brushstrokes, rich texture, and an oil-painting aesthetic. The fur, foliage, and flowers show painterly texture and blending consistent with traditional oil techniques. The style is unmistakably recognizable as requested. VERDICT: PASS

A golden retriever sitting in a garden, rendered in pixel arPASS
A golden retriever sitting in a garden, rendered in pixel art style with a 32x32 grid and limited color palette

Evaluator (Qwen 3 VL)

The image displays a golden retriever in a garden with a distinct pixelated aesthetic, consistent with a 32x32 grid and limited color palette. Blocky, low-resolution textures and simplified forms are evident, fulfilling the pixel art style request. The subject and setting are clearly recognizable. VERDICT: PASS

A golden retriever sitting in a garden, rendered in JapaneseFAIL
A golden retriever sitting in a garden, rendered in Japanese anime style with flat colors and large expressive eyes

Evaluator (Qwen 3 VL)

The image shows a golden retriever in a garden, but lacks the key Japanese anime style features: flat colors and large expressive eyes. The dog’s eyes are realistic and detailed, not exaggerated. The shading and fur texture are more realistic than flat. Background has depth and gradients, not flat color blocks. Style is more cartoonish or illustrative, not anime. VERDICT: FAIL

Medium 100%
A city street scene in the style of Art Deco — geometric shaPASS
A city street scene in the style of Art Deco — geometric shapes, gold and black palette, ornamental symmetry

Evaluator (Qwen 3 VL)

The image clearly embodies Art Deco: bold geometric shapes, symmetrical composition, and a gold-on-black palette. Buildings feature stylized ornamentation, and the street is framed by decorative borders. Visual elements like sunburst motifs and angular lines reinforce the style. All requested characteristics are present and unmistakable. VERDICT: PASS

A portrait of a man in Pop Art style — bold outlines, Ben-DaPASS
A portrait of a man in Pop Art style — bold outlines, Ben-Day dots, flat vivid primary colors

Evaluator (Qwen 3 VL)

The image clearly embodies Pop Art: bold black outlines, Ben-Day dot patterns for shading, and flat, vivid primary colors (red, blue, yellow). Facial features are stylized with high contrast and saturated hues. All requested elements are unmistakably present. VERDICT: PASS

A woman with a parasol in the style of Impressionism — soft PASS
A woman with a parasol in the style of Impressionism — soft brushstrokes, dappled light, pastel palette

Evaluator (Qwen 3 VL)

The image exhibits soft brushstrokes, dappled light, and a pastel palette—core Impressionist traits. The loose, blended application of color and focus on light effects align well with the requested style. Though the figure is somewhat idealized, the overall aesthetic matches Impressionism’s emphasis on atmosphere and light. VERDICT: PASS

Hard 100%
A forest rendered in photorealistic watercolor — the detail PASS
A forest rendered in photorealistic watercolor — the detail and lighting of a photo combined with visible watercolor bleeds and paper texture

Evaluator (Qwen 3 VL)

The image clearly depicts a forest with visible watercolor bleeds, soft gradients, and textured paper. It combines photorealistic detail in tree bark and foliage with the artistic qualities of watercolor. Lighting is atmospheric and natural, enhancing depth. All key requested characteristics are present. VERDICT: PASS

A woman in a futuristic city rendered in cyberpunk Art NouvePASS
A woman in a futuristic city rendered in cyberpunk Art Nouveau — neon colors and tech elements combined with flowing organic lines and floral borders

Evaluator (Qwen 3 VL)

The image successfully blends cyberpunk (neon lights, tech gear, futuristic city) with Art Nouveau (flowing floral borders, ornate frame, organic curves). Neon colors and tech elements coexist with decorative floral motifs. The style is clearly recognizable and meets all key visual criteria. VERDICT: PASS

A medieval castle depicted in Japanese ukiyo-e woodblock priPASS
A medieval castle depicted in Japanese ukiyo-e woodblock print style combined with photorealistic sky and lighting

Evaluator (Qwen 3 VL)

The image successfully blends ukiyo-e style (flat planes, bold outlines, woodblock color palette, cloud motifs, and Japanese calligraphy/seals) with photorealistic lighting and shadows on the castle’s stone surfaces. The sky’s soft gradients and atmospheric depth mimic realism, while the castle’s architecture retains medieval European features. Key stylistic elements are clearly recognizable and harmoniously combined. VERDICT: PASS

Extreme 100%
An interior scene rendered in soft brutalism — raw concrete PASS
An interior scene rendered in soft brutalism — raw concrete textures combined with rounded pastel furniture and warm diffused light

Evaluator (Qwen 3 VL)

The image clearly embodies “soft brutalism”: raw, textured concrete walls and ceiling contrast with rounded, pastel-colored furniture. Warm, diffused light from windows and a floor lamp enhances the cozy, muted palette. All key elements from the prompt are present and harmoniously integrated. VERDICT: PASS

A still life painted in anxious minimalism — sparse compositPASS
A still life painted in anxious minimalism — sparse composition with uncomfortable negative space, muted colors, and subtle visual tension

Evaluator (Qwen 3 VL)

The image presents a sparse, minimalist composition with heavy negative space, muted gray and brown tones, and subtle texture suggesting visual tension. While it lacks traditional still-life objects, the implied emptiness and atmospheric weight align with “anxious minimalism.” The style is recognizable and its core characteristics are present. VERDICT: PASS

A cityscape rendered in nostalgic futurism — retro-futuristiPASS
A cityscape rendered in nostalgic futurism — retro-futuristic 1960s space-age aesthetics with a melancholic warm-toned patina

Evaluator (Qwen 3 VL)

The image strongly embodies retro-futurism with 1960s space-age aesthetics: stylized skyscrapers, vintage cars, glowing neon signs, and celestial bodies in the sky. The warm, melancholic patina is evident through the sepia-toned lighting, film grain, and aged border. Key visual cues like palm trees, retro signage, and cosmic elements are unmistakably present. Style is unmistakable and faithful to the prompt. VERDICT: PASS