ImageBench

vs

192 evaluations across 6 categories

68%vs84%
Pass Rate
4.1svs14.1s
Avg Latency
Text Rendering27%93%Spatial Reasoning68%79%Human realism81%91%Truthfulness63%82%Professional Studio96%78%Graphical design46%92%Latency59%22%
The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink
Text RenderingTypography StyleEasylocal/bonsai-image-ternary-4b

Prompt: The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink

Text Rendering 27%vs93%

Typography StyleClick to expand33%vs100%
Easy 33%vs100%
The word 'CHAPTER ONE' typed on aged paper with a vintage tylocal/bonsai-image-ternary-4bFAIL
The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink

Evaluator (Qwen 3 VL)

The text mimics a vintage typewriter font with uneven ink and aged paper texture, matching the medium. However, “CHAYTER” is misspelled (should be “CHAPTER”). Spelling correctness is a core requirement. VERDICT: FAIL

The phrase 'TODAY'S SPECIAL' written in white chalk on a darlocal/bonsai-image-ternary-4bFAIL
The phrase 'TODAY'S SPECIAL' written in white chalk on a dark green chalkboard with chalk dust

Evaluator (Qwen 3 VL)

The text mimics chalk on a greenboard with dust and uneven strokes, matching the medium. However, “SPPES.CAL” is misspelled — “SPECIAL” is incorrect. The core requirement (correct spelling) is not met. VERDICT: FAIL

The word 'OPEN' displayed on a red LED sign with glowing pixlocal/bonsai-image-ternary-4bPASS
The word 'OPEN' displayed on a red LED sign with glowing pixel-style letters

Evaluator (Qwen 3 VL)

The text "OPEN" is spelled correctly and rendered in a convincing pixel-style LED format, with individual glowing dots forming each letter against a dark background. The red illumination and blocky, lit-pixel aesthetic match the prompt precisely. VERDICT: PASS

Writing accuracyClick to expand25%vs92%
Easy 100%vs100%
A single large word 'HELLO' centered on a white backgroundlocal/bonsai-image-ternary-4bPASS
A single large word 'HELLO' centered on a white background

Evaluator (Qwen 3 VL)

The word "HELLO" is spelled correctly and fully readable. All letters are present and clearly formed against the white background. No misspellings, missing words, or illegible parts are visible. VERDICT: PASS

The word 'ADVENTURE' in big bold letters centered on a plainlocal/bonsai-image-ternary-4bPASS
The word 'ADVENTURE' in big bold letters centered on a plain cream background

Evaluator (Qwen 3 VL)

The word "ADVENTURE" is spelled correctly and fully readable in bold, centered text on a cream background. No misspellings, missing words, or illegible parts are present. All elements match the intended specification. VERDICT: PASS

The word 'SERENITY' written large and centered on a solid lilocal/bonsai-image-ternary-4bPASS
The word 'SERENITY' written large and centered on a solid light blue background

Evaluator (Qwen 3 VL)

The word "SERENITY" is spelled correctly and fully readable. All letters are intact, clear, and centered on a solid light blue background as intended. No misspellings, missing words, or illegible parts are present. VERDICT: PASS

Medium 0%vs100%
The sentence 'The quick brown fox jumps high' in clean blacklocal/bonsai-image-ternary-4bFAIL
The sentence 'The quick brown fox jumps high' in clean black text on a white background

Evaluator (Qwen 3 VL)

“quick” is misspelled as “grichck”; “brown” as “breom”; “jumps” as “Jrmst.” (with garbled letters); “high” as “migh-” (with hyphen and missing ‘t’). All words are illegible or incorrect. VERDICT: FAIL

'Every sunset brings a new dawn ahead' written neatly on a plocal/bonsai-image-ternary-4bFAIL
'Every sunset brings a new dawn ahead' written neatly on a pale yellow background

Evaluator (Qwen 3 VL)

Text reads: "Every sunset / heur day / aheerb." — misspellings: "heur" (should be "new"), "aheerb" (should be "ahead"). "ahead" is partially obscured. Not fully readable or correct. VERDICT: FAIL

'Stars shine brightest in the dark night' displayed on a clelocal/bonsai-image-ternary-4bFAIL
'Stars shine brightest in the dark night' displayed on a clean off-white background

Evaluator (Qwen 3 VL)

The text is heavily distorted and illegible. "Stars shine brightest in the dark night" is corrupted: "Stars" → "Starrstrim", "shine" → "loiknkg", "brightest" → "gristn't", "in" → missing, "the" → correct, "dark" → "drack", "night" → "nnighttt". All words are misspelled or missing. VERDICT: FAIL

Hard 0%vs100%
A poster with headline 'SUMMER SALE' in large bold text, bodlocal/bonsai-image-ternary-4bFAIL
A poster with headline 'SUMMER SALE' in large bold text, body text 'Up to 50% off all items this weekend only' in medium size, and fine print 'Terms and conditions apply. See store for details.' at the bottom in small text

Evaluator (Qwen 3 VL)

Misspellings and errors: “SUMMER SALE” is distorted (e.g., “SUIM EER”); body text reads “Up to 5%off all itemts tthiss wreenaod only” (incorrect words/spacing); fine print says “Terns and connttions apply” (misspelled). All text is illegible or incorrect per intended message. VERDICT: FAIL

A flyer with large title 'GRAND OPENING', subtitle 'Join us local/bonsai-image-ternary-4bFAIL
A flyer with large title 'GRAND OPENING', subtitle 'Join us for food, music, and fun' in medium text, and small disclaimer 'RSVP required. Limited seating available.' at the bottom

Evaluator (Qwen 3 VL)

“GRAND” is misspelled as “GRANND”; subtitle reads “Join us α foot c, andcr” (missing “for”, “music”, and “fun”); disclaimer has “RSSTP” (should be “RSVP”), “eolpie al” (should be “required”), “Limiting” (should be “Limited”), “avaitale” (should be “available”), and “Limitd” (should be “Limited”). All text is illegible or incorrect. VERDICT: FAIL

An event banner with headline 'ANNUAL GALA 2025' at the top local/bonsai-image-ternary-4bFAIL
An event banner with headline 'ANNUAL GALA 2025' at the top in large serif font, description 'An evening of fine dining and live jazz' in medium text, and footnote 'Black tie optional. Doors open at 6 PM.' in small text

Evaluator (Qwen 3 VL)

Multiple words are misspelled or illegible: “ANNUAL GALA” has distorted letters; “G0025” uses “0” instead of “O”; “evening” → “eevering”; “dining” → “eofriing”; “live jazz” → “livVH las7”; “optional” → “opiinnal”; “Doors” → “Dørrs”; “6 PM” → “6Pm.”. Text is not fully readable or accurate. VERDICT: FAIL

Extreme 0%vs67%
A vintage movie poster for 'THE LAST VOYAGE' featuring a taglocal/bonsai-image-ternary-4bFAIL
A vintage movie poster for 'THE LAST VOYAGE' featuring a tagline 'One ship. One chance. No turning back.' with credits block reading 'Directed by James Harlow. Starring Elena Voss, Marcus Chen, and Sofia Reyes. Music by Thomas Grant. A Paramount Pict

Evaluator (Qwen 3 VL)

Misspellings: “shipu” → “ship”; “chanone” → “chance”; “tiunning” → “turning”. Also, “Directed” is misspelled as “Dirced”. “Paramount Pictures” is partially obscured and illegible. “Coming Summer 2025” is partially obscured. VERDICT: FAIL

A magazine cover titled 'NATIONAL EXPLORER' with cover lineslocal/bonsai-image-ternary-4bFAIL
A magazine cover titled 'NATIONAL EXPLORER' with cover lines 'The Hidden Temples of Cambodia — page 34', 'Interview: Dr. Maria Santos on Climate Action', 'Top 10 Trails You Must Hike This Year', and a barcode area with 'Issue 247 | March 2025 | $6.99

Evaluator (Qwen 3 VL)

“NATIONAL” is misspelled as “NATIONANL”. “The Hidden Temples of Cambodia” is garbled: “The Htideen To. rmpdeas. D. Maircs on Clnation Camoirpinc”. “Interview: Dr. Maria Santos on Climate Action” is unreadable. “Top 10 Trails You Must Hike This Year” is partially legible but distorted. Barcode area text is illegible. Multiple critical errors. VERDICT: FAIL

A product label for 'GOLDEN HARVEST ORGANIC HONEY' with subtlocal/bonsai-image-ternary-4bFAIL
A product label for 'GOLDEN HARVEST ORGANIC HONEY' with subtitle 'Raw & Unfiltered', weight '12 oz (340g)', description 'Pure wildflower honey harvested from sustainable apiaries in the Pacific Northwest. No additives or preservatives.', and small te

Evaluator (Qwen 3 VL)

Misspellings: "GOLDIEN HARIEST" (should be GOLDEN HARVEST), "OIRGANIC HENEY" (should be ORGANIC HONEY), "Unfffted" (should be Unfiltered), "suskennm apounts" (should be sustainable apiaries), "aploetinns" (should be apiaries), "Míronerete" (should be Pacific Northwest), "apaluimerh" (should be additives), "presiadormes" (should be preservatives), "Best bfore: 06/2026" (should be Best before), "Store in cool, dry place" (partially illegible), "Product of USA" (correct but misaligned). VERDICT:

Spatial Reasoning 68%vs79%

Attributes BindingClick to expand78%vs100%
Easy 100%vs100%
A red ball and a blue cube sitting side by side on a white slocal/bonsai-image-ternary-4bPASS
A red ball and a blue cube sitting side by side on a white surface

Evaluator (Qwen 3.5 122B)

The image shows two objects on a white surface: - Left object: Red, spherical (ball), medium size. - Right object: Blue, cubic (cube), medium size. Both objects are correctly colored and shaped as specified. They are positioned side by side. No attribute swaps or mismatches observed. VERDICT: PASS

A green apple and an orange pumpkin on a wooden cutting boarlocal/bonsai-image-ternary-4bPASS
A green apple and an orange pumpkin on a wooden cutting board

Evaluator (Qwen 3.5 122B)

The image shows a green apple and an orange pumpkin on a wooden cutting board. The apple is correctly green, round, and apple-shaped. The pumpkin is correctly orange, round, and pumpkin-shaped with a stem. No attributes are swapped — color, shape, and object identity are all correctly bound. VERDICT: PASS

A yellow taxi and a black limousine parked next to each othelocal/bonsai-image-ternary-4bPASS
A yellow taxi and a black limousine parked next to each other on a street

Evaluator (Qwen 3.5 122B)

The image shows two vehicles parked side by side on a street. - Left vehicle: Yellow sedan with “TAXI” sign on roof — matches “yellow taxi”. - Right vehicle: Black sedan with elongated body and roof-mounted light bar — consistent with “black limousine” (though technically a stretched sedan, commonly called a limo). Colors, sizes, and shapes are correctly bound to each object. No swaps. VERDICT: PASS

Medium 67%vs100%
A large red circle, a small blue triangle, and a medium greelocal/bonsai-image-ternary-4bFAIL
A large red circle, a small blue triangle, and a medium green star arranged on a white canvas

Evaluator (Qwen 3.5 122B)

The image contains: - A large red circle — correct. - A small blue triangle — correct (base of the composite shape). - A medium green star — correct (top of the composite shape). However, the star and triangle are physically merged into one object (star on top of triangle), not separate as implied by “arranged” — suggesting they should be distinct. Also, the triangle is not fully visible; it’s partially obscured. The prompt likely expects three separate objects. VERDICT: FAIL

A tall thin white candle, a short fat red candle, and a medilocal/bonsai-image-ternary-4bPASS
A tall thin white candle, a short fat red candle, and a medium spiral yellow candle on a shelf

Evaluator (Qwen 3.5 122B)

The image shows three candles on a wooden shelf: 1. Left: Tall, thin, white candle — matches “tall thin white candle” 2. Middle: Short, fat, red candle — matches “short fat red candle” 3. Right: Medium height, spiral-shaped, yellow candle — matches “medium spiral yellow candle” All attributes (color, size, shape) are correctly bound to the right objects with no swaps. VERDICT: PASS

A big brown horse, a small white rabbit, and a medium-sized local/bonsai-image-ternary-4bPASS
A big brown horse, a small white rabbit, and a medium-sized black cat standing in a meadow

Evaluator (Qwen 3.5 122B)

The image contains: - A big brown horse: Correct color and size. - A small white rabbit: Correct color and size. - A medium-sized black cat: Correct color and size. All attributes are correctly bound to the right objects with no swaps. VERDICT: PASS

Hard 67%vs100%
A pink fire truck, a blue banana, and a green sun in a surrelocal/bonsai-image-ternary-4bPASS
A pink fire truck, a blue banana, and a green sun in a surreal landscape

Evaluator (Qwen 3.5 122B)

The image contains: - A pink fire truck (correct color, shape, and object) - Blue bananas (correct color and object; plural but acceptable as “a blue banana” can imply type) - A green sun (correct color and object; stylized with spikes, which fits “surreal”) All attributes are correctly bound to their respective objects. No swaps or mismatches. VERDICT: PASS

A purple carrot, a red tree with red leaves, and a white watlocal/bonsai-image-ternary-4bFAIL
A purple carrot, a red tree with red leaves, and a white watermelon sliced open

Evaluator (Qwen 3.5 122B)

The image shows: - A purple root vegetable with green feathery tops — visually a purple carrot (color and shape match). - A tree with red leaves — matches “red tree with red leaves”. - A sliced watermelon with green rind and pale flesh — but the flesh is not white; it’s light orange or beige, and seeds are visible. The prompt specified “white watermelon”, which is inaccurate here. Attribute binding: Carrot and tree are correct. Watermelon’s color attribute fails — flesh is not white. VERDICT:

An orange elephant, a yellow penguin, and a silver strawberrlocal/bonsai-image-ternary-4bPASS
An orange elephant, a yellow penguin, and a silver strawberry arranged on a table

Evaluator (Qwen 3.5 122B)

The image shows: - An orange elephant (correct color and shape) - Two penguins, both yellow and white (prompt asked for “a yellow penguin” — singular; also, penguins are not typically yellow, but the attribute is bound correctly to the penguin objects) - A silver strawberry (correct color and shape) However, there are two penguins instead of one, and the prompt specified “a yellow penguin” — implying one. Also, while the attributes are bound correctly per object, the quantity mismatch may viola

CompositionalityClick to expand89%vs89%
Easy 100%vs67%
A tabby cat wearing oversized red sunglasseslocal/bonsai-image-ternary-4bPASS
A tabby cat wearing oversized red sunglasses

Evaluator (Qwen 3.5 122B)

- Tabby cat: Present — clearly visible with characteristic striped fur pattern. - Oversized red sunglasses: Present — large red-framed sunglasses are worn on the cat’s face, fitting the “oversized” description relative to its head. - Coherent combination: Yes — the sunglasses sit naturally on the cat’s face without distortion or logical inconsistency; reflections and fit appear realistic. All elements are present and coherently combined. VERDICT: PASS

A golden retriever wearing a tiny cowboy hatlocal/bonsai-image-ternary-4bPASS
A golden retriever wearing a tiny cowboy hat

Evaluator (Qwen 3.5 122B)

- Golden retriever: Present — clearly visible, correct breed and coloring. - Tiny cowboy hat: Present — brown cowboy hat is on the dog’s head; size appears proportionate (not oversized or undersized). - Coherent combination: Yes — hat sits naturally on the dog’s head, no anatomical or logical inconsistencies. All elements are present and coherently combined. VERDICT: PASS

A penguin holding a colorful umbrellalocal/bonsai-image-ternary-4bPASS
A penguin holding a colorful umbrella

Evaluator (Qwen 3.5 122B)

- Penguin: Present — clearly visible, standing upright with natural coloring and posture. - Colorful umbrella: Present — rainbow-colored panels (blue, purple, yellow, orange, etc.) fully open above the penguin. - Holding: Present — penguin’s flipper is gripping the umbrella handle in a plausible, anthropomorphic pose. - Coherent combination: Yes — elements are integrated naturally; lighting, shadows, and perspective align realistically. VERDICT: PASS

Medium 100%vs100%
A medieval knight in full armor sitting at a desk typing on local/bonsai-image-ternary-4bPASS
A medieval knight in full armor sitting at a desk typing on a modern laptop

Evaluator (Qwen 3.5 122B)

- Medieval knight in full armor: Present — detailed plate and chainmail armor with helmet. - Sitting at a desk: Present — wooden desk with books, laptop, and knight seated on chair. - Typing on a modern laptop: Present — silver laptop open, knight’s gauntleted hands positioned over keyboard. All elements are visually present and coherently combined despite the anachronism. VERDICT: PASS

A Roman gladiator taking a selfie with a smartphone in the Clocal/bonsai-image-ternary-4bPASS
A Roman gladiator taking a selfie with a smartphone in the Colosseum

Evaluator (Qwen 3.5 122B)

- Roman gladiator: Present — man in armor with helmet/headband, beard, muscular build. - Selfie: Present — holding phone up, arm extended, looking at screen. - Smartphone: Present — modern device with dual cameras visible. - Colosseum: Present — iconic arches and ruins clearly identifiable as Rome’s Colosseum. All elements are coherently combined; anachronism is intentional per prompt. VERDICT: PASS

An astronaut in a spacesuit riding a bicycle through a parklocal/bonsai-image-ternary-4bPASS
An astronaut in a spacesuit riding a bicycle through a park

Evaluator (Qwen 3.5 122B)

- Astronaut: Present — wearing a full spacesuit with helmet and life-support pack. - Spacesuit: Present — detailed, realistic, with patches and visor. - Bicycle: Present — standard road bike, being ridden. - Park setting: Present — trees, grass, benches, lampposts, people in background. - Coherent combination: Yes — astronaut is actively riding the bike on a park path; lighting and shadows are consistent. All elements are present and logically integrated. VERDICT: PASS

Hard 67%vs100%
An octopus wearing a top hat and monocle playing chess on thlocal/bonsai-image-ternary-4bPASS
An octopus wearing a top hat and monocle playing chess on the surface of the moon with Earth visible in the background

Evaluator (Qwen 3.5 122B)

Let’s examine each element from the prompt: 1. **Octopus** – Present. A large, pink octopus is central to the image. 2. **Wearing a top hat** – Present. The octopus wears a black top hat. 3. **Wearing a monocle** – Partially present. It wears round spectacles (two lenses), not a single monocle — but visually similar and likely acceptable as “monocle” in artistic interpretation. 4. **Playing chess** – Present. Chessboard with pieces is between the octopus and a human figure. 5. **On the surface

A dinosaur in a business suit giving a PowerPoint presentatilocal/bonsai-image-ternary-4bPASS
A dinosaur in a business suit giving a PowerPoint presentation in a modern office while drinking coffee

Evaluator (Qwen 3.5 122B)

Let’s examine each element from the prompt: 1. **Dinosaur** – ✅ Present (T-Rex-like dinosaur) 2. **Business suit** – ✅ Present (navy blue suit, white shirt, tie) 3. **Giving a PowerPoint presentation** – ✅ Partially present (standing in front of a large screen displaying a slide-like interface; gesture suggests presenting) 4. **Modern office** – ✅ Present (desks, chairs, windows, lighting, tech setup) 5. **Drinking coffee** – ✅ Present (holding a clear cup with brown liquid and straw) All core

A robot dressed as a pirate sailing a paper boat on a lake olocal/bonsai-image-ternary-4bFAIL
A robot dressed as a pirate sailing a paper boat on a lake of lava while juggling three kittens

Evaluator (Qwen 3.5 122B)

Let’s examine each element from the prompt: 1. **Robot** – The figure has a skeletal, mechanical face and visible robotic arm joints → ✅ Present (though more skeleton-robot hybrid). 2. **Dressed as a pirate** – Wears pirate hat, coat, belt, boots, and holds a mast like a ship’s tiller → ✅ Present. 3. **Sailing a paper boat** – The boat is wooden, not paper → ❌ Not present. 4. **Lake of lava** – Glowing orange-red liquid with cracks and fire → ✅ Present. 5. **Juggling three kittens** – Three kit

CountingClick to expand22%vs56%
Easy 33%vs100%
Exactly 3 red apples arranged on a wooden tablelocal/bonsai-image-ternary-4bPASS
Exactly 3 red apples arranged on a wooden table

Evaluator (Qwen 3.5 122B)

The image shows three red apples placed on a wooden surface. Each apple is distinct, with visible stems and natural color variations. The arrangement matches the prompt’s request for “exactly 3 red apples arranged on a wooden table.” No extra or missing objects are present. VERDICT: PASS

Exactly 5 yellow rubber ducks sitting in a row on a bathroomlocal/bonsai-image-ternary-4bFAIL
Exactly 5 yellow rubber ducks sitting in a row on a bathroom shelf

Evaluator (Qwen 3.5 122B)

The image shows three yellow rubber ducks sitting on a bathroom shelf. The prompt requested exactly five ducks. Since only three are visible, the count does not match the request. VERDICT: FAIL

Exactly 4 white candles standing on a dark surfacelocal/bonsai-image-ternary-4bFAIL
Exactly 4 white candles standing on a dark surface

Evaluator (Qwen 3.5 122B)

The image shows exactly two white candles standing on a dark surface. The prompt requested four candles. Since only two are present, the count does not match the request. VERDICT: FAIL

Medium 33%vs67%
2 orange cats and 3 black dogs sitting together on a grassy local/bonsai-image-ternary-4bPASS
2 orange cats and 3 black dogs sitting together on a grassy lawn

Evaluator (Qwen 3.5 122B)

The image shows two orange cats and three black dogs sitting together on a grassy lawn. The count exactly matches the prompt’s request: 2 cats, 3 dogs. All are present and correctly described. VERDICT: PASS

4 red roses and 2 white daisies in a glass vase on a tablelocal/bonsai-image-ternary-4bFAIL
4 red roses and 2 white daisies in a glass vase on a table

Evaluator (Qwen 3.5 122B)

The image shows 5 red roses, 1 white daisy, and 1 small white bud (not a full daisy). The prompt requested 4 red roses and 2 white daisies. The counts do not match: there is one extra rose and one fewer daisy than specified. VERDICT: FAIL

1 blue butterfly and 3 yellow butterflies resting on a gardelocal/bonsai-image-ternary-4bFAIL
1 blue butterfly and 3 yellow butterflies resting on a garden fence

Evaluator (Qwen 3.5 122B)

The image shows three butterflies on a wooden garden fence: two are blue with black-and-white wing edges, and one is solid yellow. There are also yellow flowers nearby, but only one yellow butterfly. The prompt requested “1 blue butterfly and 3 yellow butterflies” — this does not match the actual count (2 blue, 1 yellow). Therefore, the description is inaccurate. VERDICT: FAIL

Hard 0%vs0%
Exactly 9 goldfish swimming in a large round fishbowllocal/bonsai-image-ternary-4bFAIL
Exactly 9 goldfish swimming in a large round fishbowl

Evaluator (Qwen 3.5 122B)

The image shows a round fishbowl with multiple goldfish. Counting carefully: there are 8 clearly visible goldfish swimming inside the bowl. One fish is partially visible at the top left (reflected or behind glass), but not fully in view — so it doesn’t count as “swimming in” the bowl per standard interpretation. The prompt asks for exactly 9 goldfish swimming in the bowl. Since only 8 are clearly and fully present, the count does not match. VERDICT: FAIL

A fruit bowl containing exactly 3 bananas, 4 oranges, and 2 local/bonsai-image-ternary-4bFAIL
A fruit bowl containing exactly 3 bananas, 4 oranges, and 2 apples

Evaluator (Qwen 3.5 122B)

The image shows a fruit bowl with 2 bananas, 3 oranges, and 1 apple. The requested count was 3 bananas, 4 oranges, and 2 apples. None of the counts match exactly. VERDICT: FAIL

Exactly 11 colorful balloons floating against a clear blue slocal/bonsai-image-ternary-4bFAIL
Exactly 11 colorful balloons floating against a clear blue sky

Evaluator (Qwen 3.5 122B)

The image shows a cluster of colorful balloons against a blue sky. Counting them reveals approximately 20 balloons — far more than the requested 11. The sky is clear with some faint clouds, and all balloons are colorful and floating. However, the count does not match the prompt’s specification of “exactly 11.” VERDICT: FAIL

NegationClick to expand44%vs56%
Easy 100%vs67%
A tropical beach scene with no peoplelocal/bonsai-image-ternary-4bPASS
A tropical beach scene with no people

Evaluator (Qwen 3.5 122B)

The image shows a serene tropical beach with white sand, turquoise water, gentle waves, palm trees on the left, and a clear blue sky with scattered clouds. There are no people visible anywhere in the scene — not on the sand, in the water, or in the distance. The horizon is clear except for a few distant boats or ships, which are not people. VERDICT: PASS

A cat lying on a couch without a collarlocal/bonsai-image-ternary-4bPASS
A cat lying on a couch without a collar

Evaluator (Qwen 3.5 122B)

The image shows an orange tabby cat lying on a gray couch. The cat is curled up with its tail wrapped around its body. There is no collar visible around the cat’s neck — the fur and anatomy are clear, and no accessory or restraint is present. The excluded element (collar) is truly absent. VERDICT: PASS

A classroom with desks and a chalkboard but no studentslocal/bonsai-image-ternary-4bPASS
A classroom with desks and a chalkboard but no students

Evaluator (Qwen 3.5 122B)

The image shows a classroom with wooden desks and chairs arranged in rows, a green chalkboard on the wall, windows letting in daylight, and some papers on the desks. There are no students visible anywhere in the room — no people seated, standing, or partially in frame. The exclusion criterion (“no students”) is fully satisfied. VERDICT: PASS

Medium 33%vs67%
A sandy beach with palm trees, seashells, and a beach umbrellocal/bonsai-image-ternary-4bFAIL
A sandy beach with palm trees, seashells, and a beach umbrella but no water or ocean visible

Evaluator (Qwen 3.5 122B)

The image shows a sandy beach with palm trees, seashells, and a beach umbrella — all as described. However, in the background, beyond the sand and vegetation, there is clearly visible blue water extending to the horizon — this is the ocean. Since the prompt explicitly excluded “no water or ocean visible,” and the ocean is present, the exclusion criterion is violated. VERDICT: FAIL

An office workspace with a desk, chair, bookshelf, and lamp local/bonsai-image-ternary-4bFAIL
An office workspace with a desk, chair, bookshelf, and lamp but no computers or screens

Evaluator (Qwen 3.5 122B)

The image shows an office workspace with a desk, chair, bookshelf, and lamp as described. However, on the right-hand desk, there is a black object that appears to be the side/back of a computer monitor or screen. Since the prompt explicitly excluded “computers or screens,” and this element is present, the exclusion criterion is violated. VERDICT: FAIL

A kitchen with pots, pans, a stove, and a cutting board but local/bonsai-image-ternary-4bPASS
A kitchen with pots, pans, a stove, and a cutting board but no food

Evaluator (Qwen 3.5 122B)

The image shows a kitchen with two stoves, multiple pots and pans, a wooden cutting board, and various utensils. There is no visible food on the counters, stoves, or in any open containers. The shelves above hold jars and pots, but none appear to contain or display food items. All elements align with the prompt’s exclusion of food. VERDICT: PASS

Hard 0%vs33%
A pepperoni pizza with tomato sauce, basil, and crust but ablocal/bonsai-image-ternary-4bFAIL
A pepperoni pizza with tomato sauce, basil, and crust but absolutely no cheese

Evaluator (Qwen 3.5 122B)

The image shows a pepperoni pizza with tomato sauce, basil, and crust — as specified. However, there is clearly melted white cheese visible beneath the pepperoni and around the basil leaves. The prompt explicitly excluded cheese (“absolutely no cheese”), yet cheese is present in the image. VERDICT: FAIL

A car body with windows, doors, and headlights but no wheelslocal/bonsai-image-ternary-4bFAIL
A car body with windows, doors, and headlights but no wheels

Evaluator (Qwen 3.5 122B)

The image shows a complete gray sedan parked in a lot. It has windows, doors, and headlights — as expected. However, it also clearly has four wheels with tires and rims. The prompt explicitly excluded “no wheels,” meaning the presence of wheels violates the exclusion criterion. VERDICT: FAIL

A human face with eyes, nose, and mouth but no eyebrowslocal/bonsai-image-ternary-4bFAIL
A human face with eyes, nose, and mouth but no eyebrows

Evaluator (Qwen 3.5 122B)

The image shows a human face with clearly visible eyes, nose, and mouth. However, the face also has prominent, well-defined eyebrows — dark, arched, and symmetrically placed above each eye. Since the prompt explicitly excluded eyebrows, and they are present, the core criterion is violated. VERDICT: FAIL

Relative PositionClick to expand83%vs92%
Easy 100%vs100%
A cat sitting on top of a cardboard boxlocal/bonsai-image-ternary-4bPASS
A cat sitting on top of a cardboard box

Evaluator (Qwen 3.5 122B)

The cat is clearly sitting on top of a cardboard box. The box is beneath the cat, and the cat is above the box — matching the prompt’s spatial description. No conflicting or incorrect positional relationships are present. VERDICT: PASS

A red ball underneath a wooden chairlocal/bonsai-image-ternary-4bPASS
A red ball underneath a wooden chair

Evaluator (Qwen 3.5 122B)

The image shows a red ball positioned directly beneath the seat of a wooden chair, resting on the floor between the chair’s legs. The spatial relationship “underneath” is accurately depicted — the ball is below the chair structure and contained within its footprint. No other positional claims (left, right, etc.) are made in the prompt, so only the “under” relationship needs verification. It is correct. VERDICT: PASS

A coffee mug next to an open book on a desklocal/bonsai-image-ternary-4bPASS
A coffee mug next to an open book on a desk

Evaluator (Qwen 3.5 122B)

The image shows a white coffee mug and an open book on a wooden desk. The mug is positioned to the left of the book, and both are resting on the desk surface. This matches the prompt’s description: “A coffee mug next to an open book on a desk.” All spatial relationships (on, left/right adjacency) are correctly represented. VERDICT: PASS

Medium 100%vs100%
A bird perched on a branch, the branch extending from a treelocal/bonsai-image-ternary-4bPASS
A bird perched on a branch, the branch extending from a tree next to a park bench

Evaluator (Qwen 3.5 122B)

The image shows a bird perched on a thin branch that extends horizontally from the trunk of a large tree. The tree is positioned to the left of a park bench in the foreground, and the branch with the bird is above the bench. All spatial relationships described — “on” (bird on branch), “extending from” (branch from tree), “next to” (tree next to bench), and implied “above” (branch above bench) — are accurately represented. VERDICT: PASS

A cat sitting on a pillow, the pillow placed on a wooden chalocal/bonsai-image-ternary-4bPASS
A cat sitting on a pillow, the pillow placed on a wooden chair

Evaluator (Qwen 3.5 122B)

The image shows an orange and white cat sitting on a floral-patterned pillow. The pillow is placed directly on the seat of a wooden chair. All spatial relationships described in the prompt — “cat on pillow,” “pillow on chair” — are accurately represented. No positional errors are present. VERDICT: PASS

A candle on top of a stack of books, the books next to a potlocal/bonsai-image-ternary-4bPASS
A candle on top of a stack of books, the books next to a potted plant

Evaluator (Qwen 3.5 122B)

The image shows a lit candle placed directly on top of a stack of three books. To the right of the book stack is a large potted plant in a white pot. To the left is a smaller potted plant. The spatial relationships described — “candle on top of books” and “books next to a potted plant” — are accurately represented. All positional descriptors (on, next to) are correct. VERDICT: PASS

Hard 33%vs67%
Five chess pieces on a board: a king in the center, a queen local/bonsai-image-ternary-4bFAIL
Five chess pieces on a board: a king in the center, a queen to its left, a bishop to its right, a rook above the king, and a knight below the king

Evaluator (Qwen 3.5 122B)

The image shows eight chess pieces arranged in a loose grid, not five on a board. There is no actual chessboard visible — just a plain background with a border. The pieces are not positioned as described: no clear “center king,” no queen to its left, no bishop to its right, no rook above, and no knight below. In fact, there are two knights, two rooks, two bishops, one queen, and one king — totaling eight pieces. Spatial relationships specified in the prompt are not met. VERDICT: FAIL

A desk scene: laptop in the center, coffee mug to the left, local/bonsai-image-ternary-4bPASS
A desk scene: laptop in the center, coffee mug to the left, phone to the right, lamp above, and notebook below

Evaluator (Qwen 3.5 122B)

The image shows a desk scene with a laptop centered. To its left are two coffee mugs (one stacked on the other). To its right is a smartphone. Above the laptop is a desk lamp casting light downward. Below the laptop, in the foreground, lies an open spiral notebook. All specified spatial relationships — left, right, above, below — are accurately represented. VERDICT: PASS

Five animals in a field: a horse on the far left, a cow to tlocal/bonsai-image-ternary-4bFAIL
Five animals in a field: a horse on the far left, a cow to the right of the horse, a sheep in the center, a goat to the right of the sheep, and a dog on the far right

Evaluator (Qwen 3.5 122B)

The image shows five animals in a field, but the species and positions do not match the prompt. From left to right: a brown horse (correct), a black-and-white cow (correct position), two sheep in the center (prompt says one sheep), a brown-and-white goat (correct position), and no dog — instead, another cow or calf is on the far right. The prompt specifies “a dog on the far right,” which is absent. Also, “a sheep in the center” is inaccurate — there are two sheep side-by-side. Spatial relationsh

Extreme 100%vs100%
A landscape with a red barn in the foreground at large scalelocal/bonsai-image-ternary-4bPASS
A landscape with a red barn in the foreground at large scale, a windmill in the midground at medium scale, and snow-capped mountains in the background at small scale, all with consistent depth perspective

Evaluator (Qwen 3.5 122B)

The image shows a red barn in the foreground, large and prominent. Behind it, in the midground, are two windmills — one to the left (partially obscured) and one to the right (fully visible), both medium-sized relative to the barn. Snow-capped mountains stretch across the background, appearing smaller due to perspective, consistent with depth. All positional relationships (barn in front, windmills behind, mountains farthest) align with the prompt’s spatial description. Depth cues (size, overlap,

A beach scene with a sandcastle in the foreground, a surfer local/bonsai-image-ternary-4bPASS
A beach scene with a sandcastle in the foreground, a surfer on the waves in the midground, and a cruise ship on the horizon in the background, sizes decreasing with distance

Evaluator (Qwen 3.5 122B)

The image shows a beach scene with a sandcastle in the foreground, a surfer on the waves in the midground, and a cruise ship on the horizon in the background. The sizes of the objects decrease with distance as described. All positional relationships are correct. VERDICT: PASS

A city park with a park bench and pigeons in the foreground,local/bonsai-image-ternary-4bPASS
A city park with a park bench and pigeons in the foreground, a fountain in the midground, and skyscrapers in the background, each layer progressively smaller

Evaluator (Qwen 3.5 122B)

The image correctly shows a park bench and pigeons in the foreground, a fountain in the midground, and skyscrapers in the background. The spatial layering is accurate: foreground objects are largest, midground medium, background smallest — consistent with perspective. No positional errors (e.g., “on,” “under,” “left,” “right”) are present; all elements align as described. VERDICT: PASS

Scale & ProportionsClick to expand89%vs78%
Easy 100%vs33%
A person standing next to a yellow school bus, showing realilocal/bonsai-image-ternary-4bPASS
A person standing next to a yellow school bus, showing realistic relative scale

Evaluator (Qwen 3.5 122B)

The person standing next to the yellow school bus appears proportionally correct — her height relative to the bus’s door, windows, and overall structure matches real-world expectations. The bus is large and imposing, while she is appropriately scaled as an adult human beside it. No distortions or unnatural sizing are evident. VERDICT: PASS

A ladybug sitting on a large green leaf, shown at realistic local/bonsai-image-ternary-4bPASS
A ladybug sitting on a large green leaf, shown at realistic proportions

Evaluator (Qwen 3.5 122B)

The ladybug is appropriately sized relative to the leaf — typical for a real-world observation. The leaf’s veins, texture, and curvature are consistent with natural proportions, and the insect’s anatomy (legs, antennae, elytra) scales correctly to its body. No exaggerated or distorted size relationships are present. VERDICT: PASS

A child holding a basketball, both at correct relative sizeslocal/bonsai-image-ternary-4bPASS
A child holding a basketball, both at correct relative sizes

Evaluator (Qwen 3.5 122B)

The child’s hands are gripping the basketball, and the ball appears appropriately sized for a young child — roughly the width of his torso, which is realistic. The proportions between the child’s body, head, arms, and the basketball are anatomically consistent with a toddler or preschooler holding a standard youth-sized basketball. No distortions or scale errors are visible. VERDICT: PASS

Medium 100%vs100%
A tiny person standing on the rim of a coffee cup, looking dlocal/bonsai-image-ternary-4bPASS
A tiny person standing on the rim of a coffee cup, looking down into it

Evaluator (Qwen 3.5 122B)

The image shows a miniature figurine of a person standing on the rim of a coffee cup, looking down into the liquid. The scale is consistent with the prompt — the person is tiny relative to the cup, and their posture and position match the description. Lighting, shadows, and reflections are coherent with the scene. No proportional inconsistencies are visible. VERDICT: PASS

A giant tabby cat walking between city skyscrapers like a kalocal/bonsai-image-ternary-4bPASS
A giant tabby cat walking between city skyscrapers like a kaiju

Evaluator (Qwen 3.5 122B)

The image depicts a tabby cat of enormous scale, towering over city buildings and pedestrians — consistent with the “kaiju” description. The cat’s paws are as large as storefronts, and its head reaches mid-rise building levels. Skyscrapers in the background appear appropriately scaled relative to the cat’s massive form. Proportions are internally consistent within the surreal premise. VERDICT: PASS

A miniature house sitting on the palm of someone's handlocal/bonsai-image-ternary-4bPASS
A miniature house sitting on the palm of someone's hand

Evaluator (Qwen 3.5 122B)

The image shows a detailed miniature house resting in the palm of a human hand. The scale is consistent: the house fits comfortably within the hand, with fingers curling around it naturally. The trees and chimney are proportionally small relative to the house, and the hand’s skin texture and creases appear realistic at this scale. No anatomical or proportional distortions are evident. VERDICT: PASS

Hard 67%vs100%
A group photo of an elephant, a horse, a dog, a cat, and a mlocal/bonsai-image-ternary-4bPASS
A group photo of an elephant, a horse, a dog, a cat, and a mouse, all standing in a line with correct real-world proportional sizes

Evaluator (Qwen 3.5 122B)

The image shows an elephant, two horses, a dog, a cat, and a mouse — all standing in a line. The elephant is largest, followed by the horses, then the dog, then the cat, and finally the tiny mouse — which matches real-world proportional sizes. All animals are scaled appropriately relative to each other. VERDICT: PASS

A dinner table scene with a plate, fork, knife, wine glass, local/bonsai-image-ternary-4bPASS
A dinner table scene with a plate, fork, knife, wine glass, salt shaker, and peppercorn, all at realistic relative sizes

Evaluator (Qwen 3.5 122B)

The image shows a dinner table setting with a plate, fork, knife, wine glass, salt shaker, and what appears to be peppercorns. The relative sizes of these objects are consistent with real-world expectations: the wine glass is appropriately taller than the salt shaker, the plate is larger than the shakers, and the cutlery is proportionate to the plate. No obvious distortions or scale inconsistencies are present. VERDICT: PASS

A farm scene with a barn, tractor, farmer, cow, chicken, andlocal/bonsai-image-ternary-4bFAIL
A farm scene with a barn, tractor, farmer, cow, chicken, and a fence post, all at correct proportional scale

Evaluator (Qwen 3.5 122B)

The image shows a farm scene with a barn, tractor, farmer, chickens, and fence. However, there is no cow visible — the prompt explicitly requires one. Additionally, while most objects are roughly to scale, the chickens in the foreground appear disproportionately large compared to the fence and tractor, breaking proportional consistency. Since a required element (cow) is missing and proportions are inconsistent, the core criteria are not met. VERDICT: FAIL

Human realism 81%vs91%

Faces & ExpressionsClick to expand92%vs100%
Easy 100%vs100%
A close-up portrait of a young woman with a warm natural smilocal/bonsai-image-ternary-4bPASS
A close-up portrait of a young woman with a warm natural smile

Evaluator (Qwen 3 VL)

The face is symmetric with natural, correctly proportioned features. The warm, genuine smile is clearly readable, with crinkled eyes and relaxed mouth. No distortions or unnatural artifacts are present. Lighting and composition enhance the natural look. VERDICT: PASS

A portrait of a middle-aged man with a calm, serious expresslocal/bonsai-image-ternary-4bPASS
A portrait of a middle-aged man with a calm, serious expression

Evaluator (Qwen 3 VL)

The face is symmetric with natural, correctly proportioned features. Wrinkles and gray hair suit a middle-aged man. The expression is calm and serious, with neutral lips and steady gaze. No distortions or unnatural artifacts are visible. The emotion is clearly readable as intended. VERDICT: PASS

A headshot of a teenager with a relaxed, neutral facelocal/bonsai-image-ternary-4bPASS
A headshot of a teenager with a relaxed, neutral face

Evaluator (Qwen 3 VL)

The face is symmetric with natural, correctly proportioned features. The teenager’s expression is relaxed and neutral, with no exaggerated emotion. No distortions are visible; eyes, nose, and mouth are well-aligned. Lighting and focus are professional, enhancing clarity. The prompt is fully met. VERDICT: PASS

Medium 100%vs100%
A close-up of a person's face showing clear disgust, nose wrlocal/bonsai-image-ternary-4bPASS
A close-up of a person's face showing clear disgust, nose wrinkled and lip curled

Evaluator (Qwen 3 VL)

Face is symmetric with correct anatomical features. Disgust is clearly conveyed: nose wrinkled, lips curled, brows furrowed. No distortions; expression is natural and intense. All requested elements are present and accurately rendered. VERDICT: PASS

A portrait capturing genuine surprise — raised eyebrows, widlocal/bonsai-image-ternary-4bPASS
A portrait capturing genuine surprise — raised eyebrows, wide eyes, slightly open mouth

Evaluator (Qwen 3 VL)

The face is symmetric with natural features. The expression clearly conveys genuine surprise: wide eyes, raised eyebrows, and a slightly open mouth. No distortions are present. All requested elements are accurately rendered. VERDICT: PASS

A face showing quiet melancholy — downcast eyes, slightly drlocal/bonsai-image-ternary-4bPASS
A face showing quiet melancholy — downcast eyes, slightly drawn mouth, distant gaze

Evaluator (Qwen 3 VL)

The face is symmetric with natural, correctly rendered features. The downcast gaze, slightly drawn mouth, and furrowed brow clearly convey quiet melancholy. No significant distortions are present. The emotion is strongly readable and aligns with the prompt. VERDICT: PASS

Hard 100%vs100%
An elderly woman in her 80s with deep wrinkles and white hailocal/bonsai-image-ternary-4bPASS
An elderly woman in her 80s with deep wrinkles and white hair, smiling warmly with kind eyes

Evaluator (Qwen 3 VL)

Face is symmetric with natural, aged features and deep wrinkles. White hair is present. Warm, genuine smile with kind eyes clearly conveys the intended emotion. No noticeable distortions. Facial structure and expression are realistic and appropriate for an elderly woman. VERDICT: PASS

A 5-year-old child with round cheeks and large eyes, crying local/bonsai-image-ternary-4bPASS
A 5-year-old child with round cheeks and large eyes, crying with tears streaming down their face

Evaluator (Qwen 3 VL)

The child’s face is symmetric with natural features: round cheeks, large eyes, and clear tear streaks. The crying expression is vividly readable through furrowed brows, downturned mouth, and visible tears. No distortions or unnatural artifacts are present. The prompt’s key elements are accurately rendered. VERDICT: PASS

An elderly man in his 70s with age spots and thinning gray hlocal/bonsai-image-ternary-4bPASS
An elderly man in his 70s with age spots and thinning gray hair, looking skeptically over reading glasses

Evaluator (Qwen 3 VL)

Face is symmetric with natural aging features: age spots, thinning gray hair, and pronounced wrinkles. Skeptical expression is clearly readable via furrowed brow and downturned mouth. No significant distortions; glasses sit naturally. All prompt elements are accurately rendered. VERDICT: PASS

Extreme 67%vs100%
A group of three friends at a table: one laughing with head local/bonsai-image-ternary-4bPASS
A group of three friends at a table: one laughing with head tilted back, one looking shocked with wide eyes, and one rolling their eyes with a smirk

Evaluator (Qwen 3 VL)

All three faces show clear, symmetrical features with natural expressions. Left: laughing with head tilted back, eyes closed. Center: shocked with wide eyes and open mouth. Right: smirking with rolled eyes. No distortions; emotions are vividly readable. VERDICT: PASS

A family photo with four people: a father beaming with pridelocal/bonsai-image-ternary-4bFAIL
A family photo with four people: a father beaming with pride, a mother smiling gently, a teenager looking bored, and a child grinning excitedly

Evaluator (Qwen 3 VL)

Faces are symmetric with natural features. Emotions are clearly readable: father beams, mother smiles gently, both children grin excitedly. Teenager is not present — only four people total, but the prompt specified four including a teenager. Since the teen is missing, the image fails to meet the prompt’s requirement. VERDICT: FAIL

Three coworkers in a meeting: one looking frustrated with fulocal/bonsai-image-ternary-4bPASS
Three coworkers in a meeting: one looking frustrated with furrowed brows, one nodding with a pleased expression, and one looking confused with a tilted head

Evaluator (Qwen 3 VL)

Faces are symmetric with correct features. Left woman shows frustration (furrowed brows, downturned mouth). Center man shows pleased expression (smile). Right man shows confusion (tilted head, furrowed brow). Emotions are clearly readable. No significant distortions. VERDICT: PASS

Full BodyClick to expand100%vs100%
Easy 100%vs100%
A woman standing upright in a relaxed pose, arms at her sidelocal/bonsai-image-ternary-4bPASS
A woman standing upright in a relaxed pose, arms at her sides, wearing jeans and a white t-shirt

Evaluator (Qwen 3 VL)

The woman stands upright with arms at her sides, wearing a white t-shirt and jeans — perfectly matching the prompt. Limb count and proportions are anatomically correct. No distortions are visible. The relaxed, neutral pose is clearly recognizable. All criteria are satisfied. VERDICT: PASS

A man standing straight facing the camera, hands in pockets,local/bonsai-image-ternary-4bPASS
A man standing straight facing the camera, hands in pockets, normal proportions

Evaluator (Qwen 3 VL)

The man stands straight, facing the camera, with hands in pockets — perfectly matching the prompt. Limb count is correct (two arms, two legs). Proportions are natural and anatomically accurate. No distortions are visible. The pose is clearly recognizable and realistic. All criteria are met. VERDICT: PASS

A person standing in a neutral pose in a park, full body vislocal/bonsai-image-ternary-4bPASS
A person standing in a neutral pose in a park, full body visible from head to feet

Evaluator (Qwen 3 VL)

The person’s full body is visible, standing neutrally in a park. Limb count and proportions are anatomically correct. Pose is recognizable as standing still, hands relaxed at sides. No significant distortions. Clothing and background are consistent with the prompt. VERDICT: PASS

Medium 100%vs100%
A woman mid-stride running through a park, hair and clothes local/bonsai-image-ternary-4bPASS
A woman mid-stride running through a park, hair and clothes showing motion

Evaluator (Qwen 3 VL)

Limb count and proportions are anatomically correct. Pose is a natural running stride with motion blur in hair and clothing, clearly conveying mid-stride action. No distortions or unnatural anatomy. Motion is well-rendered. VERDICT: PASS

A man captured mid-jump over a puddle, body in the air with local/bonsai-image-ternary-4bPASS
A man captured mid-jump over a puddle, body in the air with bent knees

Evaluator (Qwen 3 VL)

Limb count and proportions are accurate. The pose is anatomically plausible for a mid-air jump with bent knees, arms out for balance. The action is clearly recognizable — a man leaping over a puddle, with water splashing realistically. No significant distortions. The body’s posture and motion are natural and dynamic. VERDICT: PASS

A dancer in a dynamic spin, dress flowing outward with one llocal/bonsai-image-ternary-4bPASS
A dancer in a dynamic spin, dress flowing outward with one leg extended

Evaluator (Qwen 3 VL)

The dancer has two arms, two legs, and correct proportions. The pose is a recognizable dynamic spin with one leg extended and arms outstretched, capturing motion. The dress flows outward as described. No anatomical distortions are visible. The action matches the prompt precisely. VERDICT: PASS

Hard 100%vs100%
A person in a yoga tree pose — standing on one leg with the local/bonsai-image-ternary-4bPASS
A person in a yoga tree pose — standing on one leg with the other foot pressed to the inner thigh, arms raised overhead

Evaluator (Qwen 3 VL)

The person has two legs, correct proportions, and performs the tree pose accurately: standing on one foot, other foot pressed to inner thigh, arms raised. No anatomical distortions; pose is clearly recognizable. Limbs are intact and positioned naturally. VERDICT: PASS

A rock climber clinging to a cliff face with arms and legs elocal/bonsai-image-ternary-4bPASS
A rock climber clinging to a cliff face with arms and legs extended to different holds

Evaluator (Qwen 3 VL)

The climber has correct limb count and proportions. The pose is anatomically plausible for rock climbing — arms and legs extended to different holds, with body tension and gear visible. No significant distortions; the action is clearly recognizable as dynamic climbing. Gear placement and rope interaction are realistic. VERDICT: PASS

A cellist seated with correct posture, legs around the instrlocal/bonsai-image-ternary-4bPASS
A cellist seated with correct posture, legs around the instrument, bow arm extended

Evaluator (Qwen 3 VL)

The cellist has correct limb count and proportions. His posture is anatomically sound: seated, legs encircling the cello, bow arm extended. The pose is clearly recognizable as playing the cello. No distortions are visible. All elements of the prompt are accurately rendered. VERDICT: PASS

Extreme 100%vs100%
Two tango dancers in a close embrace, the man dipping the wolocal/bonsai-image-ternary-4bPASS
Two tango dancers in a close embrace, the man dipping the woman backward with intertwined legs

Evaluator (Qwen 3 VL)

Limb count and proportions are accurate. The pose is a recognizable tango dip with intertwined legs — the woman’s legs are wrapped around the man’s waist, and her body is arched backward. No anatomical distortions; musculature and joint alignment are natural. The action matches the prompt precisely. VERDICT: PASS

Two soccer players mid-tackle, one sliding and the other jumlocal/bonsai-image-ternary-4bPASS
Two soccer players mid-tackle, one sliding and the other jumping over with the ball at their feet

Evaluator (Qwen 3 VL)

Limb counts and proportions are anatomically correct. The sliding player’s body is low to the ground, and the jumping player’s posture is dynamic, with the ball near their feet — matching the prompt. Musculature and joint angles appear natural, with no obvious distortions. The action is clearly recognizable as a mid-tackle scenario. VERDICT: PASS

Two gymnasts performing a synchronized handstand, side by silocal/bonsai-image-ternary-4bPASS
Two gymnasts performing a synchronized handstand, side by side with identical body alignment

Evaluator (Qwen 3 VL)

Both gymnasts have correct limb count and proportions. Their synchronized handstand pose is anatomically accurate and recognizable — legs straight, feet together, hands supporting weight. Slight asymmetry in head/neck angle and minor leg separation is natural. No major distortions. VERDICT: PASS

HandsClick to expand42%vs67%
Easy 33%vs33%
A person standing with both hands visible at their sides, filocal/bonsai-image-ternary-4bFAIL
A person standing with both hands visible at their sides, fingers relaxed and naturally spread

Evaluator (Qwen 3.5 122B)

Upon close examination of the hands in the image: - Left hand (viewer’s right): All 5 fingers are visible — thumb, index, middle, ring, and pinky. They appear relaxed and naturally spread. No anomalies. - Right hand (viewer’s left): Only 4 fingers are clearly visible — thumb is partially obscured or missing from view. The visible fingers are relaxed, but the absence of a clearly rendered thumb breaks anatomical completeness. The prompt requires “both hands visible” with “fingers relaxed and na

A woman holding her hands out in front of her palms-up, all local/bonsai-image-ternary-4bPASS
A woman holding her hands out in front of her palms-up, all ten fingers clearly visible

Evaluator (Qwen 3.5 122B)

The woman’s hands are held palms-up with fingers spread. Left hand: 5 fingers clearly visible. Right hand: 5 fingers clearly visible. All fingers appear anatomically correct — no extra, missing, or fused digits. Gesture is natural and matches the prompt. No anomalies detected. VERDICT: PASS

A man with arms crossed, both hands resting on opposite uppelocal/bonsai-image-ternary-4bFAIL
A man with arms crossed, both hands resting on opposite upper arms with fingers visible

Evaluator (Qwen 3.5 122B)

The image shows a man with arms crossed. On the left hand (right side of image), all 5 fingers are visible and appear anatomically correct. On the right hand (left side of image), only 4 fingers are clearly visible — the thumb is obscured or missing from view. The grip/gesture is mostly natural, but the missing thumb on one hand constitutes an anomaly. Since the prompt requires “both hands... fingers visible” and “exactly 5 per hand,” this fails that criterion. VERDICT: FAIL

Medium 33%vs33%
A person holding a white ceramic coffee mug by the handle wilocal/bonsai-image-ternary-4bFAIL
A person holding a white ceramic coffee mug by the handle with their right hand

Evaluator (Qwen 3.5 122B)

The image shows two hands holding a white mug. The left hand (foreground) has 5 fingers visible and grips the mug’s base — anatomically plausible. The right hand (background, near handle) appears to have only 3–4 visible fingers; the thumb and index may be obscured or merged, and the grip on the handle is ambiguous. No clear anomaly like extra digits, but finger count isn’t clearly 5 per hand due to occlusion and blur. Grip is partially correct but not fully verifiable. VERDICT: FAIL

A woman writing in a notebook with a pen gripped correctly ilocal/bonsai-image-ternary-4bPASS
A woman writing in a notebook with a pen gripped correctly in her right hand

Evaluator (Qwen 3.5 122B)

The woman’s right hand gripping the pen shows 5 fingers, and the grip appears anatomically correct — thumb and index finger holding the pen, middle finger supporting underneath. The left hand, resting on the notebook, also displays 5 fingers with natural positioning. No visible anomalies or extra/missing digits. Both hands are rendered realistically. VERDICT: PASS

A person playing an acoustic guitar with left hand on the frlocal/bonsai-image-ternary-4bFAIL
A person playing an acoustic guitar with left hand on the fretboard and right hand strumming

Evaluator (Qwen 3.5 122B)

Upon close examination of the hands: - Right hand (strumming): Only 4 fingers are clearly visible; the thumb is obscured or missing. The index finger appears to be strumming, but the hand structure looks slightly distorted — fingers seem fused or misaligned near the knuckles. - Left hand (fretboard): All 5 fingers are present, but the thumb is wrapped unnaturally around the neck, and the index finger appears to have an extra joint or bend. The grip is plausible for guitar playing, though slight

Hard 67%vs100%
A person making a peace sign with their right hand, clearly local/bonsai-image-ternary-4bPASS
A person making a peace sign with their right hand, clearly showing two fingers up and three curled

Evaluator (Qwen 3.5 122B)

The image shows a person making a peace sign with their right hand. The index and middle fingers are extended upward, while the thumb, ring, and pinky fingers are curled inward — consistent with a standard peace gesture. All five fingers are visible and anatomically correct. No anomalies detected. VERDICT: PASS

A hand giving a thumbs-up with four fingers curled and thumblocal/bonsai-image-ternary-4bPASS
A hand giving a thumbs-up with four fingers curled and thumb extended upward

Evaluator (Qwen 3.5 122B)

The image shows one hand making a thumbs-up gesture. The thumb is extended upward, and the other four fingers are curled into a fist — anatomically correct for this gesture. No anomalies or extra/missing fingers are visible. The hand appears natural and properly formed. VERDICT: PASS

A person counting to three on their fingers, with index, midlocal/bonsai-image-ternary-4bFAIL
A person counting to three on their fingers, with index, middle, and ring fingers extended

Evaluator (Qwen 3.5 122B)

The left hand shows only 4 fingers — the pinky is missing or fused. The right hand appears to have 5 fingers, but the gesture is inconsistent with “counting to three” as described (index, middle, ring extended). The left hand’s anatomy is anomalous: no visible pinky, and fingers appear malformed or merged. This violates anatomical correctness and the prompt’s requirement for two hands counting to three. VERDICT: FAIL

Extreme 33%vs100%
Two people shaking hands in a professional setting, all finglocal/bonsai-image-ternary-4bPASS
Two people shaking hands in a professional setting, all fingers of both hands anatomically correct

Evaluator (Qwen 3.5 122B)

Upon close examination of the handshake: - The left hand (from viewer’s perspective, belonging to the man on the left) shows 5 fingers clearly visible and correctly positioned around the other hand. - The right hand (man on the right) also displays 5 fingers, with natural curvature and alignment consistent with a firm, professional handshake. - No extra, missing, or distorted fingers are present. - The grip is anatomically plausible — thumbs interlocked appropriately, knuckles aligned, no unnat

Two people doing a high-five, palms meeting with all fingerslocal/bonsai-image-ternary-4bFAIL
Two people doing a high-five, palms meeting with all fingers extended on both hands

Evaluator (Qwen 3.5 122B)

Upon close inspection: The left person’s right hand (meeting palm-to-palm) shows 5 fingers — all extended and anatomically plausible. The right person’s left hand also shows 5 fingers, properly aligned. However, the right person’s *right* hand (lower, open palm) has only 4 visible fingers — the thumb is obscured or missing. Additionally, the meeting hands show slight finger overlap that may suggest minor distortion, but not critical. Since one hand clearly lacks a visible fifth finger, the core

One person handing a small gift box to another person, both local/bonsai-image-ternary-4bFAIL
One person handing a small gift box to another person, both sets of hands gripping the box correctly

Evaluator (Qwen 3.5 122B)

Upon close examination: The left hand (giver) shows 4 visible fingers + thumb — anatomically plausible. The right hand (receiver) also shows 4 visible fingers + thumb — again, plausible. However, the receiver’s lower hand (bottom right) appears to have an extra or malformed finger near the wrist area — possibly a sixth digit or artifact. Also, the grip is slightly unnatural: fingers overlap unnaturally on the box’s side. Anomaly present. VERDICT: FAIL

Multi-SubjectClick to expand100%vs100%
Easy 100%vs100%
Two women standing side by side: one with short blonde hair local/bonsai-image-ternary-4bPASS
Two women standing side by side: one with short blonde hair wearing a red jacket, the other with long black hair wearing a blue dress

Evaluator (Qwen 3 VL)

Person 1: Short blonde hair, wearing a red jacket — matches description. Person 2: Long black hair, wearing a blue dress — matches description. Both are visually distinct and correctly portrayed. VERDICT: PASS

Two men at a cafe: one with a beard and glasses wearing a grlocal/bonsai-image-ternary-4bPASS
Two men at a cafe: one with a beard and glasses wearing a green sweater, the other clean-shaven in a black hoodie

Evaluator (Qwen 3 VL)

Person 1: Bearded man with glasses, wearing a green sweater — matches description. Person 2: Clean-shaven man in a black hoodie — matches description. Both are visually distinct and clearly identifiable. VERDICT: PASS

Two friends posing: one tall with curly red hair in denim ovlocal/bonsai-image-ternary-4bPASS
Two friends posing: one tall with curly red hair in denim overalls, the other short with straight brown hair in a yellow sundress

Evaluator (Qwen 3 VL)

Person 1 (tall): Curly red hair, denim overalls — matches description. Person 2 (short): Straight brown hair, yellow sundress — matches description. Both are visually distinct and correctly portrayed. VERDICT: PASS

Medium 100%vs100%
Four people at a dinner table: a bald man in a suit, a womanlocal/bonsai-image-ternary-4bPASS
Four people at a dinner table: a bald man in a suit, a woman with silver hair in a red blouse, a young man with dreadlocks in a denim jacket, and a woman with a hijab in a green dress

Evaluator (Qwen 3 VL)

Bald man in suit: ✅ Present, bald, wearing suit. Silver-haired woman in red blouse: ✅ Present, silver hair, red blouse. Young man with dreadlocks in denim jacket: ✅ Present, dreadlocks, denim jacket. Woman with hijab in green dress: ✅ Present, beige hijab, green dress. All four are visually distinct and match descriptions. VERDICT: PASS

Three musicians on stage: a tall woman with short pink hair local/bonsai-image-ternary-4bPASS
Three musicians on stage: a tall woman with short pink hair playing bass, a man with a beard and bandana on drums, and a slim man with glasses playing keyboard

Evaluator (Qwen 3 VL)

Left: Tall woman with short pink hair, playing bass guitar — matches. Center: Man with beard and bandana, on drums — matches. Right: Slim man with glasses, playing keyboard — matches. All three are visually distinct and correctly identified. VERDICT: PASS

Three hikers on a trail: a man with a backpack and sunhat, alocal/bonsai-image-ternary-4bPASS
Three hikers on a trail: a man with a backpack and sunhat, a woman with braided hair and hiking poles, and a teenager with a baseball cap and water bottle

Evaluator (Qwen 3 VL)

Man: Wearing sunhat and backpack — matches. Woman: Braided hair, hiking poles, backpack — matches. Teenager: Baseball cap, water bottle, backpack — matches. All three are visually distinct and accurately depicted per prompt. VERDICT: PASS

Truthfulness 63%vs82%

PhotorealismClick to expand100%vs100%
Medium 100%vs100%
A still life on a wooden table featuring a clear glass of walocal/bonsai-image-ternary-4bPASS
A still life on a wooden table featuring a clear glass of water, a polished silver spoon, a folded linen napkin, and a ripe red apple — each material rendered with accurate texture and light behavior

Evaluator (Qwen 3.6 27B)

The image accurately renders the requested materials. The glass shows correct refraction and transparency, the silver spoon exhibits realistic metallic reflections, the wooden table has appropriate grain and weathering, the linen napkin displays natural fabric folds and texture, and the apple has a convincing glossy skin with subsurface scattering effects. All elements demonstrate accurate light behavior and material properties. VERDICT: PASS

A workspace desk with a brushed aluminum laptop, a leather-blocal/bonsai-image-ternary-4bPASS
A workspace desk with a brushed aluminum laptop, a leather-bound notebook, a ceramic coffee mug, and a glass paperweight — all materials visually distinct and realistic

Evaluator (Qwen 3.6 27B)

The image successfully renders the requested materials with high fidelity. The brushed aluminum laptop exhibits appropriate metallic sheen and reflections. The leather notebook shows realistic grain and stitching texture. The ceramic mugs have a matte, slightly porous appearance consistent with stoneware. The glass paperweight demonstrates correct translucency, refraction, and surface reflections. The wooden desk displays natural grain and subtle highlights. All materials are visually distinct a

A kitchen counter with a stainless steel pot, a wooden cuttilocal/bonsai-image-ternary-4bPASS
A kitchen counter with a stainless steel pot, a wooden cutting board, a glass bottle of olive oil, and a cotton dish towel — each surface reflecting light correctly

Evaluator (Qwen 3.6 27B)

The image accurately renders the requested materials. The stainless steel pot exhibits appropriate metallic reflections and brushed texture. The glass bottle shows correct translucency, refraction, and surface highlights. The wooden board displays natural grain and wear, while the cotton towel has a soft, fibrous texture with realistic light absorption. All surfaces interact with light in a physically plausible manner. VERDICT: PASS

Physics & ReflectionsClick to expand83%vs83%
Easy 67%vs67%
A red sphere on a white surface with a single shadow cast tolocal/bonsai-image-ternary-4bPASS
A red sphere on a white surface with a single shadow cast to the right, consistent with a light source from the upper left

Evaluator (Qwen 3.6 27B)

The image displays a red sphere on a white surface. The primary shadow is cast to the right and slightly downward, which is consistent with a light source positioned to the upper left. The shadow's shape is an ellipse, correctly representing the perspective of a sphere's shadow on a flat plane. The lighting on the sphere itself shows a highlight on the upper-left side and a gradient of darkness towards the bottom-right, matching the shadow's direction. There are no distortions or color inconsist

A coffee mug on a sunny windowsill casting a sharp shadow tolocal/bonsai-image-ternary-4bPASS
A coffee mug on a sunny windowsill casting a sharp shadow to the left

Evaluator (Qwen 3.6 27B)

The shadow is cast to the left, which is consistent with sunlight coming from the right side of the window. The shadow’s shape matches the mug’s form, and its sharpness aligns with direct sunlight. No distortions or color anomalies are present. The reflection on the glass also appears natural. VERDICT: PASS

A person standing outdoors at midday with a short shadow dirlocal/bonsai-image-ternary-4bFAIL
A person standing outdoors at midday with a short shadow directly below them

Evaluator (Qwen 3.6 27B)

The shadow is cast to the right and slightly behind the subject, which contradicts the prompt’s requirement of a “short shadow directly below them” at midday. At true midday, shadows should be minimal and nearly directly under the person. Here, the shadow is elongated and offset, suggesting late morning or afternoon lighting. Additionally, the shadow’s shape and intensity are consistent with the sun’s position, but it does not match the specified condition. VERDICT: FAIL

Medium 67%vs67%
A woman standing in front of a full-length mirror, her reflelocal/bonsai-image-ternary-4bFAIL
A woman standing in front of a full-length mirror, her reflection accurately showing the back of her outfit and reversed orientation

Evaluator (Qwen 3.6 27B)

The reflection is physically impossible. The woman is facing the mirror, so her reflection should show her front. Instead, the reflection shows her back, which is illogical. Additionally, there are three figures in the mirror (two reflections plus the real person), creating a nonsensical duplication that violates basic optics. The orientation and content of the reflection do not match reality. VERDICT: FAIL

A cat sitting on a polished marble floor with its reflectionlocal/bonsai-image-ternary-4bPASS
A cat sitting on a polished marble floor with its reflection visible below

Evaluator (Qwen 3.6 27B)

The reflection is physically plausible. It is correctly oriented (inverted vertically), positioned directly beneath the cat, and exhibits appropriate distortion and blurring consistent with a polished but not perfectly mirror-like marble surface. The color and lighting match the subject, and the reflection fades naturally with distance, aligning with real-world optics. VERDICT: PASS

A vase of flowers on a glass table with a clear reflection vlocal/bonsai-image-ternary-4bPASS
A vase of flowers on a glass table with a clear reflection visible on the table surface

Evaluator (Qwen 3.6 27B)

The reflection on the glass table is physically plausible. It correctly mirrors the orientation and position of the vase and flowers, with appropriate distortion due to the glass thickness and viewing angle. The colors in the reflection match the original objects, and the lighting direction is consistent with the scene. No obvious errors in shadow or reflection behavior are present. VERDICT: PASS

Hard 100%vs100%
A row of trees reflected in a calm lake at sunset, with the local/bonsai-image-ternary-4bPASS
A row of trees reflected in a calm lake at sunset, with the reflection slightly darker and gently rippled

Evaluator (Qwen 3.6 27B)

The reflections are physically plausible. They are vertically inverted, matching the trees' positions. The gentle ripples distort the reflection appropriately, and the color matches the sunset sky. The reflection is slightly darker, as expected. VERDICT: PASS

A city skyline reflected in a rain puddle on a street, with local/bonsai-image-ternary-4bPASS
A city skyline reflected in a rain puddle on a street, with mild distortion and color shift

Evaluator (Qwen 3.6 27B)

The reflections in the puddle are physically plausible. The orientation is correct (inverted), and the distortion matches the expected behavior of water ripples and surface tension. The color shift is consistent with wet surfaces and ambient lighting. Shadows and reflections align with the scene’s light sources and geometry. VERDICT: PASS

A mountain range reflected in a still alpine lake, with the local/bonsai-image-ternary-4bPASS
A mountain range reflected in a still alpine lake, with the reflection inverted and subtly muted in color

Evaluator (Qwen 3.6 27B)

The reflection is physically plausible: it is inverted, correctly oriented, and shows subtle color muting and slight distortion consistent with a still alpine lake. Shadows on the mountains align with the lighting direction, and the reflection mirrors these accurately. No obvious artifacts or implausibilities are present. VERDICT: PASS

Extreme 100%vs100%
A clear glass sphere sitting on a mirror surface near a poollocal/bonsai-image-ternary-4bPASS
A clear glass sphere sitting on a mirror surface near a pool of water, showing refraction through the sphere, reflection on the mirror, and water reflection simultaneously

Evaluator (Qwen 3.6 27B)

The reflections and refractions are physically plausible. The mirror surface correctly reflects the sphere and the inverted image within it. The refraction through the glass sphere shows an inverted, distorted view of the background (trees/sky), which is accurate for a convex lens effect. The lighting and highlights on the sphere align with the environment, and the water in the background adds to the scene’s realism without conflicting with the reflections. VERDICT: PASS

A chrome sculpture in a room with a glass floor and large wilocal/bonsai-image-ternary-4bPASS
A chrome sculpture in a room with a glass floor and large windows, showing multiple reflections and environmental mapping on every surface

Evaluator (Qwen 3.6 27B)

The reflections are physically plausible. The chrome sculpture accurately mirrors the surrounding environment, including the windows and the glass floor structure. The reflections on the glass floor align correctly with the sculpture’s position and orientation. Distortions in the reflections are consistent with the curvature of the chrome surface and the transparency of the glass. The lighting and shadows also appear coherent with the scene's light sources. VERDICT: PASS

A crystal wine glass on a polished marble counter next to a local/bonsai-image-ternary-4bPASS
A crystal wine glass on a polished marble counter next to a window overlooking a lake — showing refraction in the glass, reflection on the marble, and the lake's own reflective surface

Evaluator (Qwen 3.6 27B)

The image displays physically plausible reflections and refractions. The crystal glass exhibits realistic light distortion and refraction of the background scenery. The marble countertop shows a clear, slightly blurred reflection of the glass base, consistent with a polished surface. The lake outside accurately reflects the sky and trees with appropriate water distortion. Lighting direction is consistent across all elements. VERDICT: PASS

World KnowledgeClick to expand33%vs75%
Easy 100%vs100%
The Eiffel Tower in Paris on a clear day, architecturally aclocal/bonsai-image-ternary-4bPASS
The Eiffel Tower in Paris on a clear day, architecturally accurate with its iron lattice structure and three levels

Evaluator (Qwen 3.6 27B)

The image depicts the Eiffel Tower with high architectural accuracy. The iron lattice structure, the three distinct levels (including the glass-enclosed first floor and open second floor), and the spire are all correctly rendered. The perspective and surrounding environment (trees, walkways, people) are realistic and consistent with a clear day in Paris. No significant inaccuracies are present. VERDICT: PASS

The Taj Mahal in Agra seen from the reflecting pool, with aclocal/bonsai-image-ternary-4bPASS
The Taj Mahal in Agra seen from the reflecting pool, with accurate dome, minarets, and symmetry

Evaluator (Qwen 3.6 27B)

The image depicts the Taj Mahal with high architectural accuracy. The central dome, four minarets, and the reflecting pool are correctly positioned and proportioned. The symmetry is preserved, and the intricate details of the marble structure are visible. The reflection in the water is realistic. There are no significant factual or architectural inaccuracies. VERDICT: PASS

The Statue of Liberty on her island pedestal, with correct tlocal/bonsai-image-ternary-4bPASS
The Statue of Liberty on her island pedestal, with correct torch, crown, tablet, and green patina

Evaluator (Qwen 3.6 27B)

The image depicts the Statue of Liberty with high accuracy. The green patina, crown spikes, torch, and tablet are all correctly rendered. The pedestal and island surroundings match the real-world location. No significant architectural or factual inaccuracies are present. VERDICT: PASS

Medium 33%vs67%
A traditional Japanese shamisen instrument with accurate thrlocal/bonsai-image-ternary-4bFAIL
A traditional Japanese shamisen instrument with accurate three-stringed design, wooden body, and bachi plectrum

Evaluator (Qwen 3.6 27B)

The image depicts a stringed instrument that resembles a shamisen in body shape and string count, but it lacks key authentic features: the bachi plectrum is missing, the tuning pegs are incorrectly shaped and placed (modern-style instead of traditional Japanese), and the bridge and tailpiece appear non-traditional. The wood texture and wear look plausible, but structural inaccuracies undermine factual correctness. VERDICT: FAIL

A medieval astrolabe with accurate rete, tympan, and engravelocal/bonsai-image-ternary-4bFAIL
A medieval astrolabe with accurate rete, tympan, and engraved degree markings on brass

Evaluator (Qwen 3.6 27B)

The image depicts a brass instrument that resembles an astrolabe, but it is not factually accurate. The markings are gibberish and nonsensical rather than accurate astronomical or degree markings. The structure lacks the distinct layered components (mater, tympan, rete) of a real astrolabe, appearing instead as a flat, generic dial with random symbols. Therefore, it fails the accuracy criteria. VERDICT: FAIL

A Japanese kintsugi bowl — cracked ceramic repaired with vislocal/bonsai-image-ternary-4bPASS
A Japanese kintsugi bowl — cracked ceramic repaired with visible gold lacquer lines

Evaluator (Qwen 3.6 27B)

The image accurately depicts a Japanese kintsugi bowl. The subject is clearly recognizable as a ceramic vessel with cracks repaired using gold lacquer, which aligns perfectly with the definition of kintsugi. The texture of the ceramic, the irregularity of the cracks, and the metallic sheen of the gold lines are rendered realistically. There are no architectural or factual inaccuracies; the object looks like a genuine photograph of such an item. VERDICT: PASS

Hard 0%vs33%
A detailed diagram of the water cycle showing evaporation frlocal/bonsai-image-ternary-4bFAIL
A detailed diagram of the water cycle showing evaporation from the ocean, condensation forming clouds, precipitation as rain, and collection in rivers flowing back to the sea

Evaluator (Qwen 3.6 27B)

The image attempts to depict the water cycle but fails significantly in accuracy and clarity. The text labels are garbled and nonsensical (e.g., “Condensation” is misspelled as “Comdaisbon,” “Collection” as “Cndeanigio For tos ta Soree”). The diagrams lack proper scientific labeling and logical flow. Rain is shown falling from clouds in unrealistic, stylized beams rather than natural precipitation. The river’s path and connection to the ocean are unclear or misdrawn. These errors make the diagra

A cross-section showing tectonic plates at a convergent bounlocal/bonsai-image-ternary-4bFAIL
A cross-section showing tectonic plates at a convergent boundary with subduction, a trench, volcanic arc, and layered mantle

Evaluator (Qwen 3.6 27B)

The image depicts a geological cross-section, but the text labels are completely garbled and illegible (e.g., "Nonnie connoor," "Echnot," "Volotunia"). This makes the subject unrecognizable as a factual diagram. Additionally, the geological structures are inaccurate; for instance, the subducting plate is shown bending upward into the mantle rather than descending, and the layering does not correspond to standard tectonic models. VERDICT: FAIL

An anatomical illustration of the human heart showing four clocal/bonsai-image-ternary-4bFAIL
An anatomical illustration of the human heart showing four chambers, valves, aorta, pulmonary arteries, and veins with correct placement

Evaluator (Qwen 3.6 27B)

The image depicts a human heart, but the anatomical accuracy is severely compromised. The major vessels are incorrectly arranged; for instance, the aorta and pulmonary artery positions are swapped or distorted, and the superior/inferior vena cava connections are nonsensical. The text labels are garbled gibberish (e.g., "Cane," "Honesldnges"), making it impossible to verify correct placement. The internal structures do not match standard anatomical diagrams. VERDICT: FAIL

Extreme 0%vs100%
A Morpho menelaus butterfly with accurate iridescent blue dolocal/bonsai-image-ternary-4bFAIL
A Morpho menelaus butterfly with accurate iridescent blue dorsal wings, brown ventral wing pattern with eyespots, and correct body morphology

Evaluator (Qwen 3.6 27B)

The image depicts a butterfly that is a hybrid of two different species rather than a single accurate subject. The dorsal wings (top) resemble a Morpho butterfly with iridescent blue scales, but the ventral wings (bottom) display the distinct brown pattern and eyespots of a Swallowtail (Papilio), not a Morpho. A real Morpho menelaus has dark brown/black undersides, not the tan/brown swallowtail pattern shown here. The body morphology is also inconsistent, appearing too thick and furry for a Morp

The Hagia Sophia interior showing the massive central dome, local/bonsai-image-ternary-4bFAIL
The Hagia Sophia interior showing the massive central dome, pendentives, semi-domes, and Islamic calligraphy medallions alongside Byzantine mosaics

Evaluator (Qwen 3.6 27B)

The image depicts a grand interior with Byzantine-style mosaics and domes, but it is not the Hagia Sophia. The Hagia Sophia’s interior features a specific layout with a massive central dome, semi-domes, and distinct Islamic calligraphy medallions (which are absent here). This image appears to be an AI-generated or composite scene resembling a generic Byzantine church (possibly inspired by San Vitale or similar structures), lacking the unique architectural identity and specific decorative element

A specimen of bismuth crystal showing accurate iridescent oxlocal/bonsai-image-ternary-4bFAIL
A specimen of bismuth crystal showing accurate iridescent oxide layers, geometric hopper crystal formation, and characteristic staircase structure

Evaluator (Qwen 3.6 27B)

The image depicts a cluster of crystals with vibrant iridescence, but the geometry is incorrect for bismuth. Bismuth forms distinct hopper crystals with stepped, staircase-like faces. The subject shown consists of smooth, rounded, gem-like polyhedra that resemble fluorite or synthetic glass rather than the angular, metallic structure of bismuth. The "staircase structure" requested is absent. VERDICT: FAIL

Professional Studio 96%vs78%

Camera & LightingClick to expand100%vs75%
Easy 100%vs67%
A wide-angle shot of a modern living room interior showing elocal/bonsai-image-ternary-4bPASS
A wide-angle shot of a modern living room interior showing expansive space with slight barrel distortion at the edges

Evaluator (Gemma 4 26B)

The image features a wide-angle perspective with significant barrel distortion, visible through the circular vignetting/distortion effect at the edges of the frame. The lighting is even and soft, consistent with a modern interior. The depth of field is wide, keeping both the foreground furniture and the background windows in relatively sharp focus, which matches a standard wide-angle shot. The spatial expansion and distortion criteria are fully met. VERDICT: PASS

A close-up macro shot of a single dewdrop on a rose petal, slocal/bonsai-image-ternary-4bPASS
A close-up macro shot of a single dewdrop on a rose petal, shallow depth of field

Evaluator (Gemma 4 26B)

The image is a close-up macro shot of a single dewdrop resting on a surface. The depth of field is extremely shallow, with the background and foreground heavily blurred, focusing entirely on the droplet. The lighting is soft and natural, highlighting the transparency and reflections within the water drop. The perspective and technical execution perfectly match the prompt's requirements for a macro shot with shallow depth of field. VERDICT: PASS

An aerial bird's-eye view looking straight down at a trafficlocal/bonsai-image-ternary-4bPASS
An aerial bird's-eye view looking straight down at a traffic roundabout with cars and trees

Evaluator (Gemma 4 26B)

The image shows an aerial bird's-eye view looking straight down at a traffic roundabout. The perspective is a top-down view, and the lighting is consistent with a bright, sunny day with clear shadows cast by the trees and vehicles. The depth of field is deep, keeping both the center and the edges of the roundabout in focus, which is typical for high-altitude aerial photography. VERDICT: PASS

Medium 100%vs67%
A portrait lit with classic Rembrandt lighting — triangle oflocal/bonsai-image-ternary-4bPASS
A portrait lit with classic Rembrandt lighting — triangle of light on the shadowed cheek, moody atmosphere

Evaluator (Gemma 4 26B)

The image successfully demonstrates classic Rembrandt lighting. There is a distinct, small triangle of light visible on the subject's shadowed cheek (the side opposite the main light source). The lighting creates a moody, dramatic atmosphere with high contrast between light and shadow. The perspective is a standard portrait shot, and the depth of field is shallow, keeping the focus on the facial features. VERDICT: PASS

A landscape photograph bathed in golden hour light — long walocal/bonsai-image-ternary-4bPASS
A landscape photograph bathed in golden hour light — long warm shadows, orange sky, soft glow

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. It depicts a landscape during the golden hour, featuring a vibrant orange sky and a soft, warm glow. The low angle of the sun creates long, warm shadows across the undulating grassy hills and the silhouetted trees in the background. The lighting pattern is consistent with a sunset/sunrise, providing a sense of depth and a soft, atmospheric quality throughout the scene. VERDICT: PASS

A high-key portrait of a person against a bright white backglocal/bonsai-image-ternary-4bPASS
A high-key portrait of a person against a bright white background, minimal shadows, airy and luminous

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. It features a high-key portrait with a bright, white, and overexposed background that eliminates most shadows. The lighting is airy, luminous, and soft, creating a seamless transition between the subject and the background. The perspective is a standard eye-level portrait, and the depth of field is shallow, keeping the focus entirely on the subject's face. VERDICT: PASS

Hard 100%vs67%
A portrait shot at 85mm f/1.4 with creamy bokeh background, local/bonsai-image-ternary-4bPASS
A portrait shot at 85mm f/1.4 with creamy bokeh background, backlit with warm rim light separating the subject from a blurred garden

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. The perspective is a classic portrait shot. The depth of field is shallow, creating a very creamy, blurred bokeh background typical of an 85mm f/1.4 lens. Most importantly, the lighting features a strong, warm rim light (backlighting) that creates a glowing outline around the subject's hair and shoulders, effectively separating her from the garden background. VERDICT: PASS

A street photograph with shallow depth of field as if shot alocal/bonsai-image-ternary-4bPASS
A street photograph with shallow depth of field as if shot at 50mm f/1.8, subject sharp with soft city lights in the background

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. The subject is sharp and in focus, while the background features a shallow depth of field with soft, circular bokeh (out-of-focus light orbs) from the city lights. The perspective is a standard eye-level street photograph, and the lighting pattern of the background lights creates the intended "soft city lights" effect. VERDICT: PASS

A food photograph shot from above at 35mm f/2.8 with naturallocal/bonsai-image-ternary-4bPASS
A food photograph shot from above at 35mm f/2.8 with natural window light from the left, slight vignette, warm color temperature

Evaluator (Gemma 4 26B)

The image matches the prompt's requirements. The perspective is a top-down view (flat lay style). The lighting is clearly natural light coming from the left (the window), creating soft shadows on the right. The color temperature is warm, and there is a subtle vignette effect around the edges of the frame. The depth of field is shallow, consistent with an f/2.8 aperture, as the background elements are softly blurred. VERDICT: PASS

Extreme 100%vs100%
A tilt-shift photograph of a city street making real-life lolocal/bonsai-image-ternary-4bPASS
A tilt-shift photograph of a city street making real-life look like a miniature model — selective focus strip, saturated colors

Evaluator (Gemma 4 26B)

The image successfully implements the tilt-shift effect. It features a strong selective focus strip that creates a shallow depth of field, blurring the foreground and background to make the city street look like a miniature model. The colors are highly saturated and vibrant, and the lighting is bright and even, consistent with a daylight scene. The perspective and blur patterns perfectly match the requested aesthetic. VERDICT: PASS

A cinematic frame with blue and orange anamorphic lens flarelocal/bonsai-image-ternary-4bPASS
A cinematic frame with blue and orange anamorphic lens flare streaking horizontally across a dark moody scene

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. It features a dark, moody scene with prominent horizontal blue and orange anamorphic lens flares streaking across the frame. The lighting pattern is highly cinematic, and the horizontal orientation of the flares is a direct realization of the "anamorphic" requirement. The composition is dark and atmospheric, adhering to the requested aesthetic. VERDICT: PASS

A portrait with dramatic split lighting — half the face briglocal/bonsai-image-ternary-4bPASS
A portrait with dramatic split lighting — half the face brightly lit, the other half in deep shadow, high contrast chiaroscuro

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt's requirements. The lighting is a classic dramatic split lighting pattern, where one side of the man's face is brightly illuminated while the other side is cast in deep, heavy shadow. This creates a high-contrast chiaroscuro effect. The perspective is a close-up portrait, and the depth of field is shallow, keeping the focus entirely on the facial features. VERDICT: PASS

Color PrecisionClick to expand100%vs83%
Easy 100%vs100%
A bright red sports car parked on a gray asphalt road — the local/bonsai-image-ternary-4bPASS
A bright red sports car parked on a gray asphalt road — the red is vivid and unambiguous

Evaluator (Gemma 4 26B)

The image features a sports car that is a bright, vivid, and unambiguous red, exactly as specified in the prompt. The car is parked on a gray asphalt road, which also matches the description. The colors are consistent, saturated, and clearly defined without any deviation from the requested color scheme. VERDICT: PASS

A woman wearing a royal blue evening dress against a neutrallocal/bonsai-image-ternary-4bPASS
A woman wearing a royal blue evening dress against a neutral background

Evaluator (Gemma 4 26B)

The image shows a woman wearing a deep, vibrant blue evening dress. The color of the dress is a true royal blue, which matches the color specified in the prompt. The background is a plain, light-colored neutral shade, which also aligns with the description. All color requirements are met accurately. VERDICT: PASS

A field of pure sunflower yellow flowers under a clear blue local/bonsai-image-ternary-4bPASS
A field of pure sunflower yellow flowers under a clear blue sky

Evaluator (Gemma 4 26B)

The image features a field of bright, sunflower-yellow flowers under a clear, solid blue sky. The colors are vibrant and accurately represent the "pure sunflower yellow" and "clear blue" specified in the prompt. There are no deviations in color hue or saturation that contradict the description. VERDICT: PASS

Medium 100%vs33%
A bedroom wall painted in #FF6B35 (burnt orange) with white local/bonsai-image-ternary-4bPASS
A bedroom wall painted in #FF6B35 (burnt orange) with white furniture and natural daylight

Evaluator (Gemma 4 26B)

The image features a bedroom with a wall painted in a distinct burnt orange color, which aligns with the hex code #FF6B35 (a vibrant, saturated orange). The furniture, including the bed frame and nightstand, is white as requested. The lighting appears to be natural daylight, consistent with the prompt. The colors match the specified requirements accurately. VERDICT: PASS

A forest scene where the foliage is precisely #2D5F2D (dark local/bonsai-image-ternary-4bPASS
A forest scene where the foliage is precisely #2D5F2D (dark muted green) with brown tree trunks

Evaluator (Gemma 4 26B)

The image displays a forest scene with dark green foliage and brown tree trunks. The foliage color is a deep, muted green that aligns well with the description of #2D5F2D. The tree trunks are a consistent brown. The colors in the image accurately reflect the specific hex value and the descriptive text provided in the prompt. VERDICT: PASS

A room with walls in #4A90D9 (medium blue) with warm wooden local/bonsai-image-ternary-4bPASS
A room with walls in #4A90D9 (medium blue) with warm wooden floor and white trim

Evaluator (Gemma 4 26B)

The image shows a room with walls in a medium blue shade that closely aligns with the hex code #4A90D9. The floor is a warm wooden color, and the trim (baseboards and window frame) is white, exactly as specified in the prompt. All color requirements are met accurately. VERDICT: PASS

Hard 100%vs100%
A minimalist room with walls in #F5E6CC (warm cream), a sofalocal/bonsai-image-ternary-4bPASS
A minimalist room with walls in #F5E6CC (warm cream), a sofa in #2C3E50 (dark navy), and accent pillows in #E74C3C (bright red)

Evaluator (Gemma 4 26B)

The image accurately reflects the specified colors. The walls are a warm cream color consistent with #F5E6CC. The sofa is a dark navy blue, matching #2C3E50. The accent pillows are a bright, vibrant red, consistent with #E74C3C. All color requirements from the prompt are met precisely. VERDICT: PASS

A product flat-lay on a #1A1A2E (deep dark blue) background,local/bonsai-image-ternary-4bPASS
A product flat-lay on a #1A1A2E (deep dark blue) background, with a #F0C27F (soft gold) watch and #FFFFFF (pure white) earbuds

Evaluator (Gemma 4 26B)

The image matches the color requirements. The background is a deep, dark blue consistent with #1A1A2E. The watch features a soft gold/tan color consistent with #F0C27F. The earbuds are a pure white, matching the #FFFFFF specification. All three specified colors are accurately represented in the flat-lay composition. VERDICT: PASS

A graphic design composition with background #FAFAFA (near-wlocal/bonsai-image-ternary-4bPASS
A graphic design composition with background #FAFAFA (near-white), main shape #FF4500 (orange-red), and accent lines #333333 (dark charcoal)

Evaluator (Gemma 4 26B)

The image contains a background that is white/near-white, a large central shape in a vibrant orange-red color, and thin accent lines in a dark charcoal/black shade. These colors align perfectly with the requested hex codes: #FAFAFA (near-white background), #FF4500 (orange-red main shape), and #333333 (dark charcoal lines). The color accuracy is high. VERDICT: PASS

Extreme 100%vs100%
A smooth gradient on a wall going from #FF0000 (red) on the local/bonsai-image-ternary-4bPASS
A smooth gradient on a wall going from #FF0000 (red) on the left to #0000FF (blue) on the right, transitioning evenly through purple

Evaluator (Gemma 4 26B)

The image shows a smooth gradient transitioning from a bright red on the left to a bright blue on the right. The transition passes through a purple/magenta middle section, which aligns perfectly with the prompt's requirement of a gradient from #FF0000 (red) to #0000FF (blue) through purple. The colors and the direction of the gradient are accurate to the description. VERDICT: PASS

A sunset sky gradient on a flat surface from #FF6B35 (orangelocal/bonsai-image-ternary-4bPASS
A sunset sky gradient on a flat surface from #FF6B35 (orange) at the bottom to #1B1464 (deep indigo) at the top

Evaluator (Gemma 4 26B)

The image displays a gradient transition from a bright orange at the bottom to a deep indigo/dark blue at the top. The bottom color closely aligns with the specified #FF6B35 (orange), and the top color matches the deep indigo/dark blue (#1B1464) described in the prompt. The transition is smooth and follows the requested color scheme accurately. VERDICT: PASS

A horizontal banner with a gradient from #00C9FF (cyan) on tlocal/bonsai-image-ternary-4bPASS
A horizontal banner with a gradient from #00C9FF (cyan) on the left to #92FE9D (mint green) on the right

Evaluator (Gemma 4 26B)

The image displays a horizontal gradient. On the left side, the color is a bright cyan, which aligns with the specified #00C9FF. Moving toward the right, the color transitions into a light mint green, matching the specified #92FE9D. The gradient transition is smooth and follows the color instructions provided in the prompt accurately. VERDICT: PASS

PhotorealismClick to expand67%vs67%
Easy 67%vs67%
A single red sneaker on a clean white background with soft slocal/bonsai-image-ternary-4bPASS
A single red sneaker on a clean white background with soft studio lighting and gentle shadow — product photography style

Evaluator (Gemma 4 26B)

The image successfully captures the requested aesthetic. The lighting is soft, the shadow is natural, and the texture of the red material appears consistent with a real product shot. There are no obvious AI artifacts, such as warped laces or nonsensical geometry, that break the illusion of a real photograph. The composition is clean and professional, meeting the criteria for a studio-quality product shot. VERDICT: PASS

A clear glass perfume bottle on a reflective black surface wlocal/bonsai-image-ternary-4bFAIL
A clear glass perfume bottle on a reflective black surface with studio lighting, no background distractions

Evaluator (Gemma 4 26B)

The image fails to meet the standard of a studio-quality product photograph. There are significant AI artifacts, most notably the nonsensical, floating white shapes inside the liquid that look like distorted tubes or wires. Additionally, the glass geometry is inconsistent, and the reflection on the surface is poorly defined and lacks realistic light refraction. The overall structure of the bottle cap and the liquid's behavior look unnatural and physically impossible. VERDICT: FAIL

A ripe green apple on a seamless white background with soft local/bonsai-image-ternary-4bPASS
A ripe green apple on a seamless white background with soft side lighting and subtle shadow — catalog style

Evaluator (Gemma 4 26B)

The image successfully meets the criteria for a studio-quality product shot. The lighting is soft and natural, creating a realistic subtle shadow on the white background. The texture of the green apple skin, including the lenticels (small white spots), looks authentic and consistent. There are no obvious AI artifacts, such as warping or unnatural-looking textures, and the stem appears structurally sound and realistic. It passes as a high-quality catalog photograph. VERDICT: PASS

Graphical design 46%vs92%

Data VisualisationClick to expand0%vs67%
Easy 0%vs67%
A bar chart showing monthly sales for January through June wlocal/bonsai-image-ternary-4bFAIL
A bar chart showing monthly sales for January through June with values 12K, 18K, 15K, 22K, 19K, 25K — labeled axes, color-coded bars, and a title 'Monthly Sales 2025'

Evaluator (Qwen 3 VL)

The chart shows only three bars, missing January, March, and May. Values shown (18K, 22K, 19K) don’t match the requested sequence (12K, 18K, 15K, 22K, 19K, 25K). Title is misspelled (“Monnhly Salcer s ‘225”) and axis labels are missing. Bars are color-coded but lack proper x-axis labels for months. Chart type is correct, but content is incomplete and inaccurate. VERDICT: FAIL

A pie chart showing market share: Company A 35%, Company B 2local/bonsai-image-ternary-4bFAIL
A pie chart showing market share: Company A 35%, Company B 25%, Company C 20%, Others 20% — each slice a different color with labels and percentages

Evaluator (Qwen 3 VL)

The chart misrepresents the data: “Others” is labeled as 10% (not 20%), and “Company C” is mislabeled as “Company B” (25% instead of 20%). Labels are garbled and unreadable. The chart type is correct (pie chart), but the core data and labels are inaccurate and illegible. VERDICT: FAIL

A simple line graph showing daily website visitors over one local/bonsai-image-ternary-4bFAIL
A simple line graph showing daily website visitors over one week (Mon-Sun): 1200, 1500, 1100, 1800, 2000, 1700, 1400 — with grid lines, axis labels, and title 'Weekly Traffic'

Evaluator (Qwen 3 VL)

The chart misrepresents the data: axes are inverted, labels are garbled (e.g., “1100” on x-axis), and values don’t match the requested sequence. The title and grid lines are present but the visual elements are unreadable and incorrect. Chart type is a line graph, but the data is misrepresented. VERDICT: FAIL

Layout & DesignClick to expand11%vs100%
Easy 0%vs100%
A movie night poster with a large popcorn image in the centelocal/bonsai-image-ternary-4bFAIL
A movie night poster with a large popcorn image in the center, the title 'MOVIE NIGHT' in bold at the top, and the tagline 'Every Friday at 8 PM' at the bottom

Evaluator (Qwen 3 VL)

The large popcorn bucket is centered, with “MOVIE NIGHT” in bold at the top and “Every Friday at 8 1PM” at the bottom — matching the prompt. Text is readable and correctly placed. However, “NIGHT” is misaligned and “1PM” has a typo (“1PM” instead of “8PM”). Minor errors in text accuracy. VERDICT: FAIL

An event poster featuring a guitar illustration, the headlinlocal/bonsai-image-ternary-4bFAIL
An event poster featuring a guitar illustration, the headline 'LIVE MUSIC FEST' in large text, and 'Tickets available now' as a tagline below

Evaluator (Qwen 3 VL)

The guitar illustration is present and prominent. The headline “LIVE MUSIC FEST” is visible but has visual artifacts (e.g., “MUSIC” partially obscured, “FST” fragmented), making it unreadable in full. The tagline “Tickets available now” is present but misspelled (“avouiblbe”). Visual hierarchy is mostly correct, but critical text elements are flawed. VERDICT: FAIL

A book club poster with an open book illustration, the titlelocal/bonsai-image-ternary-4bFAIL
A book club poster with an open book illustration, the title 'JOIN THE BOOK CLUB' prominently displayed, and 'Meets every Wednesday' as the tagline

Evaluator (Qwen 3 VL)

The title "JOIN THE BOOK CLUB" is misrendered as "JON THE BOOK:UB" with incorrect spacing and a colon. The tagline "Meets every Wednesday" is garbled as "Metty eery Wressklay." The open book illustration is present but the text elements are not readable or correctly placed. Visual hierarchy is partially maintained but text errors undermine clarity. VERDICT: FAIL

Medium 0%vs100%
A landing page design with a hero header image at the top, alocal/bonsai-image-ternary-4bFAIL
A landing page design with a hero header image at the top, a headline and subtitle in the middle, a 'Sign Up' call-to-action button, and a footer with three social media icons

Evaluator (Qwen 3 VL)

The hero header image is present with a headline and subtitle, though both are garbled and unreadable. The “Sign Up” button is correctly placed and readable. Social media icons appear at the bottom, but the footer is cut off. Visual hierarchy is mostly intact, but critical text elements are illegible, violating core usability requirements. VERDICT: FAIL

A newsletter layout with a banner image header, a two-columnlocal/bonsai-image-ternary-4bFAIL
A newsletter layout with a banner image header, a two-column body section with text on the left and an image on the right, a highlighted quote block, and a footer with unsubscribe link

Evaluator (Qwen 3 VL)

The layout includes a header banner, two-column body (text left, image right), a highlighted quote block, and a footer with an unsubscribe link. However, all text is gibberish placeholder (e.g., “Nanopr Ares, condav enue”), making it unreadable and violating the requirement for correct text placement and readability. VERDICT: FAIL

A restaurant menu layout with a logo header, three category local/bonsai-image-ternary-4bFAIL
A restaurant menu layout with a logo header, three category sections (appetizers, mains, desserts) each with items and prices, and a footer with address and hours

Evaluator (Qwen 3 VL)

The layout includes a logo header, three category sections (Appetizers, Mains, Desserts), and a footer with “Hours” — all present. However, text is garbled and unreadable (e.g., “Appetizers” appears as “Appetūzes,” prices are gibberish), violating the requirement for readable, correctly placed text. Visual hierarchy is partially maintained but compromised by illegible content. VERDICT: FAIL

Hard 33%vs100%
A magazine double-page spread with a left page containing a local/bonsai-image-ternary-4bPASS
A magazine double-page spread with a left page containing a large hero photo, and a right page with a two-column article, a pull quote, a sidebar with author bio, and a page number

Evaluator (Qwen 3 VL)

The layout matches the prompt: left page has a large hero photo; right page has two columns, a pull quote, a sidebar with author bio, and page numbers. Text is readable and correctly placed with proper visual hierarchy. All described elements are present and functional. VERDICT: PASS

A mobile app UI mockup showing a dashboard with a top navigalocal/bonsai-image-ternary-4bFAIL
A mobile app UI mockup showing a dashboard with a top navigation bar, a search field, a 2x2 grid of feature cards with icons, a list of recent activity items, and a bottom tab bar

Evaluator (Qwen 3 VL)

The layout includes a top navigation bar, search field, 2x2 grid of feature cards, recent activity list, and bottom tab bar — all present. However, text is heavily distorted and unreadable, violating the requirement for correct placement and readability. Visual hierarchy is mostly intact, but illegible text undermines usability. VERDICT: FAIL

A fashion magazine spread with a full-bleed left-page photo,local/bonsai-image-ternary-4bFAIL
A fashion magazine spread with a full-bleed left-page photo, right page with three-column layout, headline, body copy, a smaller inset image, and styled page numbers

Evaluator (Qwen 3 VL)

The layout includes a full-bleed left-page photo and a right page with three columns, headline, body copy, and inset images. Page numbers are present. However, all text is gibberish — unreadable and incorrectly placed — violating core requirements for readability and proper typography. The visual hierarchy is present but meaningless due to illegible text. VERDICT: FAIL

Style DiversityClick to expand83%vs92%
Easy 33%vs67%
A golden retriever sitting in a garden, painted in the stylelocal/bonsai-image-ternary-4bPASS
A golden retriever sitting in a garden, painted in the style of an oil painting with visible brushstrokes and rich texture

Evaluator (Qwen 3 VL)

The image clearly depicts a golden retriever in a garden, rendered with visible brushstrokes, rich texture, and an oil-painting aesthetic. The fur, foliage, and flowers show painterly texture and impasto-like qualities. The style matches the requested oil-painting look with strong visual evidence of brushwork and depth. VERDICT: PASS

A golden retriever sitting in a garden, rendered in Japaneselocal/bonsai-image-ternary-4bFAIL
A golden retriever sitting in a garden, rendered in Japanese anime style with flat colors and large expressive eyes

Evaluator (Qwen 3 VL)

The image shows a golden retriever in a garden, but lacks the key Japanese anime style features: flat colors and large expressive eyes. The dog’s eyes are realistic and detailed, not exaggeratedly large. The shading and fur texture are also more realistic than flat. The background has some stylization but doesn’t fully match anime aesthetics. VERDICT: FAIL

A golden retriever sitting in a garden, rendered in pixel arlocal/bonsai-image-ternary-4bFAIL
A golden retriever sitting in a garden, rendered in pixel art style with a 32x32 grid and limited color palette

Evaluator (Qwen 3 VL)

The image displays a golden retriever in a garden with a distinct pixelated aesthetic. However, it lacks the strict 32x32 grid structure and limited color palette expected in true pixel art. Colors are vibrant and varied, and details are smoothed, deviating from the crisp, blocky style. The pixelation is visible but not technically adherent to the specified constraints. VERDICT: FAIL

Medium 100%vs100%
A city street scene in the style of Art Deco — geometric shalocal/bonsai-image-ternary-4bPASS
A city street scene in the style of Art Deco — geometric shapes, gold and black palette, ornamental symmetry

Evaluator (Qwen 3 VL)

The image clearly embodies Art Deco: bold geometric shapes, symmetrical composition, and a gold/black palette dominate. Ornamental details frame the scene, and stylized buildings reflect the era’s aesthetic. All requested elements are unmistakably present. VERDICT: PASS

A woman with a parasol in the style of Impressionism — soft local/bonsai-image-ternary-4bPASS
A woman with a parasol in the style of Impressionism — soft brushstrokes, dappled light, pastel palette

Evaluator (Qwen 3 VL)

The image features soft brushstrokes, dappled light, and a pastel palette—core Impressionist traits. The subject holds a parasol, and the scene’s luminous, atmospheric quality aligns with the style. Though slightly polished, the visual language unmistakably evokes Impressionism. VERDICT: PASS

A portrait of a man in Pop Art style — bold outlines, Ben-Dalocal/bonsai-image-ternary-4bPASS
A portrait of a man in Pop Art style — bold outlines, Ben-Day dots, flat vivid primary colors

Evaluator (Qwen 3 VL)

The image clearly embodies Pop Art: bold black outlines, Ben-Day dot patterns for shading, and flat, vivid primary colors (red, blue, yellow). Facial features are stylized with high contrast and saturated hues. All requested characteristics are unmistakably present. VERDICT: PASS

Hard 100%vs100%
A forest rendered in photorealistic watercolor — the detail local/bonsai-image-ternary-4bPASS
A forest rendered in photorealistic watercolor — the detail and lighting of a photo combined with visible watercolor bleeds and paper texture

Evaluator (Qwen 3 VL)

The image clearly depicts a forest with visible watercolor bleeds, soft gradients, and textured paper. It combines photorealistic detail in tree bark and foliage with the artistic qualities of watercolor. Lighting is atmospheric and natural, enhancing depth. All key characteristics of the requested style are present. VERDICT: PASS

A woman in a futuristic city rendered in cyberpunk Art Nouvelocal/bonsai-image-ternary-4bPASS
A woman in a futuristic city rendered in cyberpunk Art Nouveau — neon colors and tech elements combined with flowing organic lines and floral borders

Evaluator (Qwen 3 VL)

The image clearly blends cyberpunk (neon-lit city, tech gear) with Art Nouveau (flowing floral borders, ornate frame). Neon colors and organic lines are prominent. The fusion is intentional and recognizable. All key elements from the prompt are present and well-executed. VERDICT: PASS

A medieval castle depicted in Japanese ukiyo-e woodblock prilocal/bonsai-image-ternary-4bPASS
A medieval castle depicted in Japanese ukiyo-e woodblock print style combined with photorealistic sky and lighting

Evaluator (Qwen 3 VL)

The image successfully blends ukiyo-e style (flat planes, bold outlines, woodblock color, cloud motifs, and Japanese calligraphy) with photorealistic lighting and depth in the sky and stone textures. The castle’s architecture is medieval but rendered with ukiyo-e aesthetics. Key characteristics are clearly present. VERDICT: PASS

Extreme 100%vs100%
An interior scene rendered in soft brutalism — raw concrete local/bonsai-image-ternary-4bPASS
An interior scene rendered in soft brutalism — raw concrete textures combined with rounded pastel furniture and warm diffused light

Evaluator (Qwen 3 VL)

The image clearly showcases soft brutalism: raw, textured concrete walls and ceiling contrast with rounded, pastel-colored furniture. Warm, diffused light from windows and a floor lamp enhances the cozy, minimalist ambiance. All key elements from the prompt are present and well-executed. VERDICT: PASS

A still life painted in anxious minimalism — sparse compositlocal/bonsai-image-ternary-4bPASS
A still life painted in anxious minimalism — sparse composition with uncomfortable negative space, muted colors, and subtle visual tension

Evaluator (Qwen 3 VL)

The image presents a sparse, minimalist composition with vast negative space and muted, desaturated tones. Subtle texture and faint brushwork suggest painterly intent, and the stark division between wall and floor creates quiet visual tension. While no explicit subject anchors the scene, the emptiness evokes unease, aligning with “anxious minimalism.” Key elements are present. VERDICT: PASS

A cityscape rendered in nostalgic futurism — retro-futuristilocal/bonsai-image-ternary-4bPASS
A cityscape rendered in nostalgic futurism — retro-futuristic 1960s space-age aesthetics with a melancholic warm-toned patina

Evaluator (Qwen 3 VL)

The image clearly embodies retro-futurism: warm, melancholic tones; 1960s-inspired architecture and vehicles; celestial bodies in the sky; and a nostalgic, slightly grainy patina. Key visual traits like stylized neon signage, space-age skyscrapers, and vintage cars are unmistakably present. The aesthetic aligns perfectly with the prompt’s request. VERDICT: PASS