ImageBench V1 —
192 evaluations across 6 categories

Prompt: The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink
Text Rendering 73%
Typography StyleClick to expand100%
PASSEvaluator (Qwen 3 VL)
Text reads “CHAPTER ONE” correctly. Font mimics vintage typewriter style with uneven, slightly textured ink strokes and distressed edges. Background shows aged paper texture with vignette. The medium and style convincingly match the prompt’s description of typewriter output on aged paper. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Text "TODAY'S SPECIAL" is rendered in white chalk on a dark green chalkboard, with visible chalk dust texture and smudges. Spelling is correct. The style convincingly mimics chalk writing, including uneven strokes and a pink chalk checkmark. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The text "OPEN" is rendered in glowing red LED pixels, matching the described medium. Letters are clearly formed with individual light elements, and spelling is correct. The pixelated, illuminated style convincingly simulates an LED sign. VERDICT: PASS
Writing accuracyClick to expand67%
PASSEvaluator (Qwen 3 VL)
The word "HELLO" is spelled correctly and fully readable in bold navy font. It is centered on a white background, though framed by a decorative border. No misspellings or illegible parts exist. The text matches the intended specification exactly. VERDICT: PASS
FAILEvaluator (Qwen 3 VL)
The word "ADVENTURE" is split into two lines: "ADVENT" and "TURE". The letter 'U' is missing between "ADVENT" and "TURE", making it illegible as a single word. The text is readable but incorrectly formed. VERDICT: FAIL
PASSEvaluator (Qwen 3 VL)
The word "SERENITY" is spelled correctly and is fully readable, centered on a solid light blue background. No misspellings, missing words, or illegible parts are present. The text matches the intended specification exactly. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
All words are spelled correctly and fully readable: "The", "quick", "brown", "fox", "jumps", "high". Text is clean black on white background as intended. No misspellings or illegible parts. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Text reads: "Every sunset brings a new dawn ahead" — all words are spelled correctly and fully legible against the pale yellow background. No misspellings or missing words detected. Font is clear and consistent. VERDICT: PASS
FAILEvaluator (Qwen 3 VL)
The main text “Stars shine brightest in the dark night” is fully readable and correctly spelled. However, the small cursive text above contains misspellings and is illegible. Since the core evaluation requires all text to match exactly, the presence of errors in the secondary text fails the criteria. VERDICT: FAIL
PASSEvaluator (Qwen 3 VL)
All text matches the intended message exactly. "SUMMER SALE", "Up to 50% off all items this weekend only", and "Terms and conditions apply. See store for details." are fully legible and correctly spelled. No misspellings or missing words detected. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
All text matches the intended content exactly: "GRAND OPENING", "Join us for food, music, and fun", and "RSVP required. Limited seating available." No misspellings, missing words, or illegible parts. Text is fully readable and correctly rendered. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
All text matches the intended content exactly: "ANNUAL GALA 2025", "An evening of fine dining and live jazz", and "Black tie optional. Doors open at 6 PM." Every word is spelled correctly and fully readable. No misspellings or illegible parts detected. VERDICT: PASS
FAILEvaluator (Qwen 3 VL)
Text is mostly correct but contains errors: "SOFIA REYES" is misspelled as "SOFIA REYED NATON" (extra "NATON"), and "MUSIC BY THOMAS GRANT" is incorrectly rendered as "MUSIC BIY THOMAS GRANT by THOMAS GRANT.(ATLUCNE4)" (typo "BIY", extra text "(ATLUCNE4)"). Key names and phrases are illegible or corrupted. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
Text “Top 10 Trails You Must Hike This Year” is garbled; “You” and “Hike” are misrendered. “Cambodia” appears correctly but is misaligned. All other text matches intended content. Misspellings and illegibility prevent full accuracy. VERDICT: FAIL
PASSEvaluator (Qwen 3 VL)
All text matches the intended content exactly: "GOLDEN HARVEST ORGANIC HONEY", "Raw & Unfiltered", "Pure wildflower honey harvested from sustainable apiaries in the Pacific Northwest. No additives or preservatives.", "12 oz (340g)", and "Best before: 06/2026. Store in cool, dry place. Product of USA." No misspellings or illegible parts. VERDICT: PASS
Spatial Reasoning 75%
Attributes BindingClick to expand100%
PASSEvaluator (Qwen 3.5 122B)
The image shows two objects on a white surface: 1. A red sphere (ball) — correct color and shape. 2. A blue cube — correct color and shape. Both objects are side by side, as requested. No attributes are swapped; colors and shapes match the prompt exactly. Size is not specified in the prompt, so relative sizing is acceptable. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows: - One orange pumpkin (correct color, round shape, appropriate size relative to apples) - Two green apples (correct color, round shape, appropriate size) - All objects are on a wooden cutting board (correct surface) All attributes are correctly bound: the pumpkin is orange and large; the apples are green and smaller. No attribute swaps observed. The prompt asked for "a green apple" (singular) but two are present — however, this doesn’t violate core attribute binding. VERDICT: P
PASSEvaluator (Qwen 3.5 122B)
The image shows a yellow taxi in the foreground and a black limousine parked behind it on a street. The taxi is yellow with a "T" logo, and the limousine is black, elongated, and has multiple windows. Both vehicles are correctly identified with their respective attributes (color, size, shape) as described in the prompt. There are no swaps or incorrect bindings of attributes to objects. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a white canvas with three shapes: a large red circle (top left), a medium blue triangle (top right), and a medium green star (bottom center). The prompt requested a “small” blue triangle, but the triangle appears medium-sized relative to the other shapes — not clearly small. However, since “small” is subjective without scale reference, and all other attributes (color, shape, arrangement) are correct, we accept it as passable. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows three candles on a shelf: 1. Left: Short, fat, red candle — matches description. 2. Center: Tall, thin, white candle — matches description. 3. Right: Medium-height, spiral-shaped, yellow candle — matches description. All attributes (color, size, shape) are correctly bound to each object with no swaps. The spatial arrangement and visual properties align precisely with the prompt. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a large brown horse with a lighter mane and a white marking on its forehead, standing in a grassy meadow. A small white rabbit is positioned near the horse’s front legs. A medium-sized black cat stands nearby, facing forward. All attributes (color, size, shape) are correctly bound to each object — no swaps or mismatches observed. The scene matches the prompt precisely. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains a pink fire truck (correct color and object), a blue banana (correct color and object), and a green sun (correct color and object). All attributes are correctly bound to their respective objects with no swaps. The surreal landscape is represented by the grassy field and sky. All requested elements are present and accurately depicted. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains: - A purple carrot (correct color and shape) - A red tree with red leaves (correct color and foliage) - A white watermelon, whole and sliced open (correct color and state) All attributes are correctly bound to their respective objects with no swaps. The objects match the prompt’s specifications in color, form, and presentation. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows three objects on a wooden surface: 1. An orange elephant — correct color, shape, and size relative to others. 2. A yellow penguin — correct color, shape, and size; white belly and black feet match typical penguin depiction. 3. A silver strawberry — correct metallic color and textured shape resembling a strawberry. All attributes (color, shape, object identity) are correctly bound with no swaps. VERDICT: PASS
CompositionalityClick to expand78%
PASSEvaluator (Qwen 3.5 122B)
The image contains all required elements: - Tabby cat: Present (striped fur pattern visible) - Oversized red sunglasses: Present (large red frames with dark lenses) - Coherent combination: The sunglasses sit naturally on the cat’s face, with reflections in lenses showing the cat’s eyes All prompt elements are accurately represented and logically combined. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains: - Golden retriever: Present and clearly depicted - Cowboy hat: Present on the dog's head - "Tiny" descriptor: The hat appears proportionate to the dog's head, not notably tiny The core elements are present and coherently combined. While the hat isn't dramatically small, it fits the concept appropriately. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a penguin standing upright on a paved surface, holding a rainbow-colored umbrella in its flipper. The umbrella has distinct panels of red, orange, yellow, green, blue, and purple. The penguin is rendered with realistic texture and anatomy, and the umbrella is held naturally as if by a human. All described elements — penguin, colorful umbrella, and their coherent combination — are present and visually consistent. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains all described elements: 1. Medieval knight in full armor - PRESENT (full plate armor with helmet and chainmail) 2. Sitting at a desk - PRESENT (wooden desk/table) 3. Typing on a modern laptop - PRESENT (Apple MacBook visible with hands positioned as if typing) The concepts are coherently combined in a single scene with appropriate lighting and perspective. The juxtaposition of medieval and modern elements is clear and intentional. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image contains: - Roman gladiator: Present (helmet, armor, muscular build) - Selfie pose: Present (arm extended toward camera) - Smartphone: Present (floating above helmet, not held in hand) - Colosseum: Present (background structure) The smartphone is not being held by the gladiator — it’s floating unnaturally above his helmet, breaking coherence with the “taking a selfie” action. While all elements are visually present, their combination lacks logical interaction. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image contains all required elements: 1. Astronaut in a spacesuit - PRESENT (full white suit with helmet, American flag patch, life support pack) 2. Riding a bicycle - PRESENT (astronaut seated on bike, hands on handlebars, feet on pedals) 3. Through a park - PRESENT (paved path, green grass, trees, sunny outdoor setting) All elements are coherently combined in a single, logical scene. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains all described elements: - Octopus: Present (orange, with tentacles) - Top hat: Present (black, on octopus head) - Monocle/Goggles: Present (worn on hat) - Chess: Present (board with pieces) - Moon surface: Present (cratered gray terrain) - Earth in background: Present (visible in upper right) All elements are coherently combined in a single scene. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains all required elements: 1. Dinosaur (T-Rex head/body) - Present 2. Business suit - Present (worn by dinosaur) 3. PowerPoint presentation - Present (on projector screen) 4. Modern office - Present (conference room setting) 5. Drinking coffee - Present (holding coffee cup) All elements are coherently combined in a single scene. The dinosaur is dressed professionally, presenting to colleagues in a corporate environment while holding a coffee cup. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
Let’s examine each element from the prompt: 1. **Robot dressed as a pirate** – ✅ Present: Robot wears a tricorn hat, red bandana, and has glowing eyes; armor-like body with straps. 2. **Sailing a paper boat** – ✅ Present: The robot sits in a brown, folded-paper-style boat. 3. **On a lake of lava** – ✅ Present: Glowing orange molten liquid surrounds the boat, with rocky terrain and fire effects. 4. **Juggling three kittens** – ❌ Not juggling: Three kittens are present (two held in hands, one in
CountingClick to expand56%
PASSEvaluator (Qwen 3.5 122B)
The image shows three red apples on a wooden table: two whole apples and one apple cut in half, revealing its interior. The count of apples is exactly three, matching the prompt’s request. All are red and placed on a wooden surface. The arrangement and object types align with the description. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image displays exactly five yellow rubber ducks arranged in a horizontal row on a white bathroom shelf or sink edge. Each duck is identical in appearance, with orange beaks and black eyes with lashes. The count matches the prompt’s request precisely: five ducks, yellow, rubber-like, in a row, on a bathroom surface. No extra or missing objects. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image displays four white pillar candles arranged in a row on a dark surface. All candles are upright and unlit, with visible wicks. The count of candles is exactly four, matching the prompt’s request. No other objects are present. The description “white candles standing on a dark surface” accurately reflects the visual content. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows three black dogs and two orange cats on a grassy lawn. The dogs are sitting, and one cat is lying down while the other sits. The count of animals matches the prompt exactly: 3 black dogs and 2 orange cats. All are together on grass. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows 4 red roses and 3 white daisies in a glass vase on a table. The prompt requested 4 red roses and 2 white daisies. While the rose count is correct, the daisy count exceeds the request by one. Since the core criteria (exact match) is not met, this is a failure. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image displays four butterflies resting on a wire fence. There is one large blue butterfly at the top center. Below it, there are three yellow butterflies: one partially visible on the left, one on the right, and one at the bottom center. The count matches the prompt exactly: 1 blue and 3 yellow butterflies. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows a round fishbowl containing goldfish. Counting them: one on the far left, one top-left, one bottom-center, one center-right (large), one behind it, one top-right, one far right, and two more near the center — totaling 8 clearly visible goldfish. There may be a partial ninth obscured, but only 8 are distinctly countable. The prompt requested exactly 9. Since the count does not match precisely, this fails the criterion. VERDICT: FAIL
FAILEvaluator (Qwen 3.5 122B)
The image shows a fruit bowl with 3 bananas, 2 apples, and 3 oranges. The prompt requested 3 bananas, 4 oranges, and 2 apples. While the banana and apple counts are correct, the orange count is incorrect (3 instead of 4). Therefore, the total object count does not exactly match the request. VERDICT: FAIL
FAILEvaluator (Qwen 3.5 122B)
The image displays colorful balloons against a clear blue sky. Counting them: there are 9 balloons total — 2 red, 2 yellow, 2 pink, 1 green, 1 blue, and 1 patterned (confetti-style). The prompt requested exactly 11 balloons. Since only 9 are present, the count does not match. VERDICT: FAIL
NegationClick to expand67%
PASSEvaluator (Qwen 3.5 122B)
The image shows a tropical beach scene with palm trees, turquoise water, white sand, and distant mountains under a blue sky. There are no people visible anywhere in the frame — not on the beach, in the water, or in the background. The exclusion criterion (“no people”) is fully satisfied. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows a cat lying on a beige couch. The cat has white and tabby markings. Upon close inspection, the cat is wearing a dark-colored collar around its neck. Since the prompt explicitly excluded “a collar,” and one is clearly present, the excluded element is not absent. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image shows a classroom with desks, chairs, a chalkboard, and a bulletin board. There are no students visible in the scene. The excluded element (students) is truly absent. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image displays a sandy beach foreground scattered with various seashells. In the mid-ground, there are several palm trees and a colorful beach umbrella. The background shows more greenery and a blue sky with clouds. Crucially, there is no water, ocean, or sea visible anywhere in the frame; the view is strictly of the sand and the vegetation behind it. The excluded element is absent. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows an office workspace with a wooden bookshelf filled with books and binders, a light-colored desk, a black mesh office chair, and a black desk lamp. There are no computers, monitors, tablets, or any type of screen visible in the scene. The excluded elements are truly absent. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a kitchen scene with a stainless steel pot with a lid, a smaller stainless steel saucepan, a black frying pan, and a wooden cutting board on a countertop next to a gas stove. The stove has a visible blue flame under the saucepan. There is no food visible in any of the cookware or on the cutting board. The excluded element (food) is absent. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows a pepperoni pizza with visible melted cheese covering the surface, along with tomato sauce, basil leaves, and a golden crust. Since the prompt explicitly excluded cheese and the image clearly contains melted cheese, the excluded element is present. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image shows a white car body shell viewed from above. It includes the roof, door frames, and window openings. The prompt explicitly excluded wheels, and indeed, no wheels are visible in this top-down view of the bare chassis. The excluded element (wheels) is absent. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows a human face with clearly visible eyes, nose, and mouth. However, the face also has well-defined eyebrows above each eye — they are natural-looking, arched, and filled in with hair-like texture. Since the prompt explicitly excluded eyebrows, and they are present, this violates the constraint. VERDICT: FAIL
Relative PositionClick to expand83%
PASSEvaluator (Qwen 3.5 122B)
The image shows a tabby cat with white paws and chest sitting directly on the top surface of a closed cardboard box. The box is positioned on a wooden floor. The spatial relationship described in the prompt — “a cat sitting on top of a cardboard box” — is accurately represented in the image. All positional relationships are correct. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a red ball positioned directly beneath the seat of a wooden chair, resting on a wooden floor. The spatial relationship described in the prompt — “a red ball underneath a wooden chair” — is accurately represented. The ball is clearly under the chair’s structure, and no conflicting positional claims are present. All described relationships are correct. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a white coffee mug and an open book on a wooden surface, likely a desk. The mug is positioned to the left of the book. The book is open and lies flat on the desk. The spatial relationships described in the prompt — "a coffee mug next to an open book on a desk" — are accurately represented: both objects are on the desk, and the mug is beside (next to) the book. All positional relationships are correct. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a bird perched on a branch that extends from a tree trunk on the left side of the frame. Below and to the right of the branch, a park bench is visible in the foreground. The spatial relationships described — bird on branch, branch extending from tree, tree next to bench — are all accurately represented. All positional cues (on, next to, below) are correct. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a tabby cat sitting directly on a patterned pillow. The pillow is placed on the seat of a wooden chair. All spatial relationships described in the prompt — “cat on pillow,” “pillow on chair” — are accurately represented. No positional errors are present. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a candle placed on top of a stack of two books. To the left of the books is a potted plant in a terracotta pot. The spatial relationships described — “candle on top of books” and “books next to potted plant” — are accurately represented. All positional descriptors (on, next to) match the visual layout. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows five chess pieces on a board, but the spatial relationships do not match the prompt. The king is centered, but the queen is not to its left — instead, a bishop is there. A knight is to the left (not below), and a pawn is below the king (not a rook). The piece to the right is a pawn, not a bishop. No rook is visible above the king. Multiple positional errors exist. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image shows a desk scene with a laptop centered, a coffee mug to its left, a phone to its right, a lamp above casting light, and an open notebook with a pen below the laptop. All specified spatial relationships — left, right, above, below — are accurately represented as described in the prompt. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows five animals in a field arranged left to right: a horse, a cow, a sheep, a goat, and a dog. The horse is on the far left. The cow is to the right of the horse. The sheep is in the center. The goat is to the right of the sheep. The dog is on the far right. All specified spatial relationships are accurately represented. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a red barn in the foreground at large scale, consistent with the prompt. Two windmills appear in the midground — one on a hill to the left and another closer to the right — both at medium scale. Snow-capped mountains are in the background at small scale, maintaining depth perspective. All spatial relationships (foreground/midground/background) and relative positions (left/right hills, above/below elevation) are correctly represented. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a beach scene with a sandcastle in the foreground, a surfer riding a wave in the midground, and a cruise ship on the horizon in the background. The spatial arrangement matches the prompt: sandcastle closest (foreground), surfer in middle distance (midground), ship farthest (background). Sizes decrease appropriately with distance. All positional relationships are correctly represented. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows a park bench and pigeons in the foreground, a fountain in the midground, and skyscrapers in the background — matching the prompt’s layering. However, the pigeons are not “on” or “under” the bench as implied by typical spatial phrasing; some are on the ground near it, one is perched on the bench arm. The phrase “each layer progressively smaller” is visually accurate due to perspective. But since the prompt implies specific positional relationships (e.g., pigeons “on” bench) that a
Scale & ProportionsClick to expand67%
PASSEvaluator (Qwen 3.5 122B)
The image shows a person standing next to a yellow school bus. The person’s height relative to the bus appears realistic — approximately one-third to one-quarter of the bus’s height, which is consistent with real-world proportions. The bus’s length, wheel size, and window spacing also align with typical school bus dimensions. No obvious scaling distortions are present. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The ladybug appears appropriately sized relative to the leaf, consistent with real-world proportions. The leaf’s veins and texture are detailed and scale correctly with the insect. No obvious distortions or size mismatches are present. The composition reflects a natural, realistic relationship between the two objects. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows a child holding a basketball. The ball appears disproportionately large relative to the child’s torso and hands — it nearly spans the width of their chest and is held with both hands in a way that suggests it’s too big for a typical child. In reality, a standard basketball (size 7) would be smaller relative to a child’s body; even a youth-sized ball (size 5 or 6) wouldn’t appear this oversized. This violates the “correct relative sizes” criterion. VERDICT: FAIL
FAILEvaluator (Qwen 3.5 122B)
The image shows a tiny figurine of a person sitting on the rim of a coffee cup, not standing. The person’s size relative to the cup is plausible for a miniature figure, but the pose contradicts the prompt’s description of “standing.” Since the core requirement includes both size relationship and posture, and the posture is incorrect, the criteria are not fully met. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image depicts a giant tabby cat striding between skyscrapers, with its paws resting on building rooftops and its body towering over the urban landscape. The scale is consistent with the “kaiju” description — the cat’s head reaches mid-building height, and its spine spikes rise above surrounding structures. Proportions are internally consistent: limbs, tail, and facial features match a tabby cat, just massively scaled. No anatomical or spatial inconsistencies violate the prompt’s intent. VER
PASSEvaluator (Qwen 3.5 122B)
The image shows a miniature house resting on an open human palm. The house is appropriately scaled to fit comfortably within the hand, appearing small enough to be held yet detailed enough to show windows, doors, and roof shingles. The proportions between the hand and the house are realistic and consistent with the prompt’s description of a “miniature house sitting on the palm.” Lighting and shadows also support spatial coherence. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows an elephant, horse, dog, cat, and mouse arranged in a line. The elephant is largest, followed by the horse, then the dog, then the cat, and finally the mouse — which matches real-world size proportions. All animals are standing upright and appear correctly scaled relative to each other. No obvious distortions or inversions in scale are present. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a table setting with a plate, two forks, a knife, a wine glass, and salt and pepper shakers. The relative sizes appear realistic: the plate is appropriately large compared to the cutlery, the wine glass is proportionate to the plate, and the shakers are correctly sized relative to the other items. All objects maintain consistent and believable proportions for a dining scene. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image displays a farm scene with a barn, tractor, farmer, cow, chicken, and fence post. However, the proportions are inconsistent: the chicken in the foreground is unnaturally large relative to the cow and tractor, appearing almost as tall as the cow’s legs. The farmer also appears disproportionately small compared to the tractor and barn. These scale errors violate the requirement for correct proportional relationships. VERDICT: FAIL
Human realism 81%
Faces & ExpressionsClick to expand100%
PASSEvaluator (Qwen 3 VL)
Face is nearly symmetrical with natural features. Smile appears warm and genuine, with clear crinkles around eyes and even teeth. Slight AI artifacts: overly smooth skin texture and unnaturally perfect teeth. Emotion is clearly readable as joyful. Minor distortions don’t detract from core intent. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The face is symmetric with natural, correctly proportioned features. The expression is calm and serious, with neutral lips and steady gaze. No distortions are visible. Lighting and texture appear realistic. The portrait accurately fulfills the prompt’s requirements. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The face is symmetric with correctly proportioned features. Eyes, nose, and mouth are aligned naturally. The expression is relaxed and neutral, conveying calmness without distortion. Skin texture and lighting are realistic. No visible artifacts or anatomical errors. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The face shows clear disgust: nose wrinkled, lips curled, and brows furrowed. Features are symmetric and anatomically plausible. No significant distortions. The emotion is unmistakably readable. All requested elements are accurately rendered. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Face is symmetric with correctly rendered features. Eyes are wide, eyebrows raised, mouth slightly open — all clearly conveying genuine surprise. No significant distortions. Skin texture and lighting are natural. The expression is unmistakably the intended emotion. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Face is symmetric with natural, correctly rendered features. Downcast eyes and slightly drawn mouth convey quiet melancholy clearly. Distant gaze is evident through lowered eyelids and gaze direction. No distortions or unnatural artifacts present. Emotion is authentic and readable. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Face is symmetric with natural, aged features and deep wrinkles. White hair is visible. Warm, genuine smile with kind eyes clearly conveys emotion. No significant distortions. Facial structure and expression align well with the prompt. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The child’s face is symmetric with natural features: round cheeks, large eyes, and tears streaming down. The crying expression is clearly readable, with furrowed brows and a downturned mouth. No significant distortions are present. The image accurately fulfills the prompt’s requirements. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Face is symmetric with realistic, aged features: visible age spots, thinning gray hair, and reading glasses. Skeptical expression is clearly readable via furrowed brow and downturned mouth. No distortions; details are natural and well-defined. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Faces show clear, symmetrical features with no distortions. Emotions are vividly readable: left man laughs with head back, center man looks shocked with wide eyes, right man rolls eyes with smirk. All three expressions match the prompt precisely. Lighting and anatomy are natural and consistent. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Faces are symmetrical with correct anatomical features. Emotions are clearly readable: father beams, mother smiles gently, teenager looks bored, child grins excitedly. No distortions or unnatural artifacts. All four individuals match the prompt’s emotional descriptions accurately. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Faces are symmetric with correct features. Emotions are clearly readable: left man shows frustration (furrowed brows), center woman shows pleased nodding (smile), right man shows confusion (tilted head, squinted eyes). No distortions. All three expressions match the prompt accurately. VERDICT: PASS
Full BodyClick to expand92%
FAILEvaluator (Qwen 3 VL)
The image shows a torso and upper legs, but the full body is cut off. Limbs visible (arms, hands) are anatomically correct and at sides, consistent with a relaxed upright pose. However, the prompt specified a full-body view, which is not present. The pose is recognizable from the visible portion. VERDICT: FAIL
PASSEvaluator (Qwen 3 VL)
The man stands straight facing forward with hands in pockets, matching the prompt. Limb count and proportions are anatomically correct. No distortions are visible. The pose is clearly recognizable as described. All elements align with the requested description. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The image shows a full-body view of a person standing in a park, from head to feet. Limb count and proportions are anatomically correct. The pose is neutral, with arms at sides and feet flat on grass. No significant distortions are visible. The scene matches the prompt description accurately. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The woman’s limbs are correctly proportioned and count is accurate. Her running pose is anatomically plausible, with one leg forward, the other back, arms bent for momentum, and hair flowing backward — all consistent with mid-stride motion. No significant distortions are visible. The action and prompt are clearly recognizable and well-executed. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The man’s limbs are correctly counted and proportioned. His pose—bent knees, arms extended for balance—is anatomically plausible for a jump. The action is clearly recognizable: mid-air leap over a puddle. No significant distortions are visible. The image accurately depicts the requested scene with natural motion and detail. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The dancer’s pose is anatomically plausible: one leg extended upward, the other supporting weight on the ball of the foot. Limb count and proportions are correct. The flowing dress captures motion realistically. The dynamic spin is recognizable, with fabric swirling naturally around the body. No significant distortions or anatomical errors are present. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The person stands on one leg, with the other foot pressed against the inner thigh, arms raised overhead in prayer position. Limb count and proportions are correct. The pose is clearly recognizable as Tree Pose (Vrksasana). No anatomical distortions are visible. The body alignment appears natural and balanced. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The climber has two arms and two legs, with proportions and joint angles consistent with human anatomy. The pose is recognizable as dynamic rock climbing, with limbs extended to grips. No distortions are visible. The action matches the prompt precisely. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The cellist’s posture and limb count are anatomically correct. Legs are wrapped around the instrument, bow arm is extended, and hands position is accurate for playing. Proportions are natural, with no visible distortions. The pose clearly matches the prompt’s description. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The dancers’ limbs are correctly counted and proportioned. The woman’s legs are intertwined with the man’s, and her body is arched backward in a classic tango dip. The pose is anatomically plausible and matches the prompt. No significant distortions are visible. The action is clearly recognizable as a tango embrace with a dip. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Both players show correct limb count and proportions. The sliding player’s body posture and the jumping player’s airborne pose are anatomically plausible and match the described action. The ball is near the jumping player’s feet. No significant distortions are visible. The dynamic interaction is recognizable as a mid-tackle scenario. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Both gymnasts have correct limb count and proportions. Their handstand poses are anatomically plausible, though not perfectly symmetrical — one has legs straight, the other slightly bent. The action is recognizable as a synchronized handstand. Minor distortions: one gymnast’s head/neck angle appears slightly unnatural. Overall, the core anatomical correctness and pose recognition are met. VERDICT: PASS
HandsClick to expand42%
FAILEvaluator (Qwen 3.5 122B)
The left hand (viewer’s left) has 5 fingers, all appearing anatomically normal. The right hand (viewer’s right) also has 5 fingers, but the thumb is positioned unusually — it appears to emerge from the side of the palm rather than the base, and its orientation is awkward. Additionally, the pinky finger on the right hand seems slightly fused or misshapen at the tip. These anomalies suggest AI generation artifacts. VERDICT: FAIL
FAILEvaluator (Qwen 3.5 122B)
Upon close examination, the left hand (viewer’s left) has six fingers: thumb, index, middle, ring, pinky, and an extra digit between index and middle. The right hand appears to have five fingers normally. This violates anatomical correctness and the prompt’s requirement of “exactly 5 per hand.” Anomaly: polydactyly on left hand. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image shows a man with arms crossed, hands resting on opposite upper arms. Left hand (on right shoulder): 5 fingers visible, anatomically correct. Right hand (on left shoulder): 5 fingers visible, also anatomically correct. No anomalies in finger count or grip. Gesture is natural and consistent with the prompt. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows a right hand gripping a white mug by the handle. The thumb is positioned on top of the handle, and four fingers are wrapped around it — but upon close inspection, there are only three visible fingers (index, middle, ring) plus the thumb; the pinky appears fused or missing. Additionally, the fingers show unnatural curvature and slight merging at the joints. Anatomical correctness fails due to incorrect finger count and structure. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The right hand holding the pen shows 5 fingers with a natural, anatomically correct grip. The left hand resting on the notebook also displays 5 fingers in a relaxed, realistic position. No anomalies or distortions are visible in either hand. The anatomy and gesture align with the prompt’s requirements. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The left hand on the fretboard shows 4 fingers clearly visible, with the thumb likely behind the neck — anatomically plausible. The right hand is blurred due to motion (strumming), but appears to have 5 fingers in a natural strumming position. No obvious anatomical anomalies or extra/missing fingers are evident despite motion blur. Grip and gesture are consistent with playing guitar. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a single right hand making a peace sign. The index and middle fingers are extended upward, while the ring finger, pinky, and thumb are curled inward — forming a natural, anatomically correct gesture. All five digits are present and positioned appropriately for this pose. No anomalies or extra/missing fingers are visible. The hand structure, knuckles, and nail details appear realistic and consistent with human anatomy. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows one hand making a thumbs-up gesture. The thumb is extended upward, and four fingers are curled into the palm — anatomically correct for this gesture. All five digits (thumb + 4 fingers) are present and properly positioned. No anomalies or extra/missing fingers. The anatomy is accurate. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows two hands. The right hand (in foreground) has index, middle, and ring fingers extended — matching the prompt’s gesture. However, the left hand is gripping the right index finger unnaturally, with thumb and index finger pinching it mid-shaft — not a natural counting pose. Also, only 4 fingers are clearly visible on the left hand; the pinky is obscured or missing. Anatomical grip is incorrect for “counting to three.” Anomaly: unnatural finger pinch and incomplete finger visibility.
FAILEvaluator (Qwen 3.5 122B)
Upon close examination: The left hand (in suit sleeve) shows 4 fingers clearly visible — thumb, index, middle, and ring; pinky is obscured or missing. The right hand (bare arm) also shows only 4 fingers — thumb, index, middle, and ring; pinky is not visible. While this may be due to occlusion from the grip, the prompt requires “all fingers of both hands anatomically correct” — implying full visibility or at least plausible anatomy. The grip itself appears natural for a handshake. However, since
FAILEvaluator (Qwen 3.5 122B)
The image shows two hands pressed together in a prayer-like or namaste gesture, not a high-five. The left hand (viewer’s left) has 5 fingers visible; the right hand also has 5 fingers. However, the gesture is anatomically incorrect for a high-five — palms are flat against each other vertically, not slapping horizontally. Fingers are extended but aligned in a vertical plane, inconsistent with a high-five. Anomaly: wrong gesture entirely. VERDICT: FAIL
FAILEvaluator (Qwen 3.5 122B)
The image shows four hands (two pairs) interacting with a gift box. The left pair appears to be one person’s hands, and the right pair another’s — but anatomically, it’s implausible for two people to each use both hands to grip the same small box from opposite sides without overlapping or awkward positioning. More critically: the left “hand” has only 4 visible fingers (thumb + 3), and the right hand has 5, but the left hand’s thumb is positioned unnaturally high, and the right hand’s pinky is pa
Multi-SubjectClick to expand100%
PASSEvaluator (Qwen 3 VL)
Left woman: short blonde hair, red blazer over white shirt. Right woman: long black hair, blue sleeveless dress. Both visually distinct and match prompt descriptions precisely. No missing or mismatched elements. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Man on right: beard, glasses, green sweater — matches description. Man on left: clean-shaven, black hoodie — matches description. Both individuals are visually distinct and present. All features align with prompt. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Tall person: curly red hair, denim overalls over green top. Short person: straight brown bob, yellow sundress. Both visually distinct, matching descriptions. No extra or missing individuals. All features accurately rendered. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
All four individuals are present and visually distinct. Left: bald man in suit. Next: woman with silver hair, red blouse. Center: young man with dreadlocks, denim jacket. Right: woman in green hijab and dress. Features match prompt exactly. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Tall woman: pink short hair, playing bass, wearing red top and black pants. Man on drums: beard, bandana, playing drums. Slim man: glasses, playing keyboard, wearing black shirt and jeans. All three are visually distinct and match descriptions. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Man: wears sunhat, large backpack, and hiking attire. Woman: has braided hair, uses hiking poles, wears shorts. Teenager: wears baseball cap, holds water bottle, carries backpack. All three are visually distinct and match described features precisely. VERDICT: PASS
Truthfulness 74%
PhotorealismClick to expand100%
PASSEvaluator (Qwen 3.6 27B)
The image demonstrates high-fidelity material rendering. The glass exhibits correct refraction and transparency, distorting the background as expected. The metal spoon shows sharp, accurate specular highlights and reflections. The wooden table has a natural grain and warm tone, while the linen napkin displays a convincing woven texture with soft light interaction. The apple’s skin reflects light realistically with subtle subsurface scattering effects. All materials behave physically correctly un
PASSEvaluator (Qwen 3.6 27B)
The image successfully renders distinct materials with high fidelity. The brushed aluminum laptop shows appropriate metallic sheen and reflections. The leather notebook exhibits realistic texture and grain. The ceramic mug displays a matte, speckled glaze with correct light interaction. The glass elements (paperweight and sheets) demonstrate accurate translucency, refraction, and edge reflections. The wood desk surface has a natural grain and subtle gloss. All materials are visually distinct and
PASSEvaluator (Qwen 3.6 27B)
The image successfully renders the distinct visual properties of each requested material. The stainless steel pot exhibits sharp, accurate reflections and metallic sheen. The wooden cutting board shows natural grain and matte texture. The glass bottle demonstrates translucency and refraction with the oil inside, while the cotton towel displays a soft, fibrous texture. All surfaces interact with light realistically, meeting the prompt's criteria for correct material rendering. VERDICT: PASS
Physics & ReflectionsClick to expand92%
PASSEvaluator (Qwen 3.6 27B)
The image displays a red sphere on a white surface. The shadow is cast to the right and slightly downward, which is consistent with a light source positioned to the upper left. The shadow's shape is an elongated oval, correctly distorted by perspective and the angle of the light. The highlight on the sphere also aligns with the upper-left light source. The physics of the shadow and reflection are accurate. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The shadows are physically plausible. The light source is clearly coming from the right (the window), casting sharp, distinct shadows to the left of both the mug and the pitcher. The shadow shapes accurately correspond to the objects' forms, and the intensity suggests strong, direct sunlight. The reflections on the glossy surfaces also align with the lighting direction. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The shadow is cast directly beneath the person, consistent with midday lighting. The direction and orientation are physically plausible, and there’s no distortion or unnatural coloration in the shadow. The length of the shadow is short, as expected at midday. Overall, the shadow aligns well with the lighting conditions described in the prompt. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The reflection is physically plausible. The mirror accurately captures the woman’s back, with correct reversed orientation (e.g., pockets and seams align symmetrically). Lighting and shadows match the room’s ambient illumination — no directional mismatches or unnatural distortions. Colors are consistent between subject and reflection. The mirror’s frame and edge reflections also behave realistically. No artifacts or physics violations detected. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The reflection is vertically inverted and positioned directly beneath the cat, consistent with a polished floor. The distortion increases toward the foreground due to perspective, which is physically accurate. Colors and lighting in the reflection match the subject. The shadow under the cat aligns with the light source direction. No anomalies in orientation or physics are present. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The reflection is vertically inverted and correctly positioned directly beneath the vase, consistent with a glass surface. The distortion and slight blurring in the reflection are physically plausible for a glossy table. The lighting and shadows align with the scene’s illumination, and the colors in the reflection match the original objects. No anomalies in direction or orientation are observed. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The reflections are physically plausible. The orientation is correct (inverted vertically), and the direction aligns with the light source. The gentle ripples distort the reflection naturally, creating a wavy, slightly darker appearance consistent with water physics. The colors match the sky and trees, with appropriate attenuation in the water. No anomalies detected. VERDICT: PASS
FAILEvaluator (Qwen 3.6 27B)
The reflections are physically plausible in orientation and direction, correctly mirroring the buildings and sky. However, the image exhibits severe chromatic aberration (RGB splitting) across the entire frame, not just as a subtle "color shift" in the reflection. This heavy artifacting creates a glitch-art aesthetic that overrides natural physics, making the visual result unrealistic rather than a faithful depiction of a puddle reflection. VERDICT: FAIL
PASSEvaluator (Qwen 3.6 27B)
The reflection is physically plausible. It is correctly inverted, mirroring the mountain range and shoreline with appropriate orientation. The water surface shows minimal distortion, consistent with a still lake, and the reflection is subtly muted in color and contrast compared to the actual landscape, which aligns with real-world optical behavior. No directional or structural inconsistencies are present. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The image demonstrates high physical plausibility. The glass sphere correctly refracts the background scene, inverting it as expected. The mirror surface provides a sharp, accurate reflection of the sphere and the sky. The water ripples distort the surrounding reflections naturally, and the lighting is consistent across all elements. The interaction between the sphere, the mirror, and the water is coherent and realistic. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The reflections are physically plausible. The chrome sculptures accurately mirror the environment, including the ceiling lights and window frames, with appropriate distortion based on their curved surfaces. The glass floor creates a clear, inverted reflection of the sculptures and the room above, consistent with a highly reflective surface. The lighting and shadows align with the natural light coming from the large windows, enhancing the realism of the scene. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The image displays physically plausible optics. The reflection on the marble is correctly oriented and positioned directly beneath the glass, with appropriate distortion. The refraction through the crystal bowl accurately distorts the background scenery, creating realistic caustic patterns and color dispersion. The lighting direction is consistent across the scene, casting soft shadows and highlights that align with the window's position. The lake’s surface in the background also shows natural r
World KnowledgeClick to expand50%
PASSEvaluator (Qwen 3.6 27B)
The image depicts the Eiffel Tower with high architectural accuracy, showcasing its iconic iron lattice structure, three distinct levels (including the top platform and antenna), and characteristic arched base. The setting—a clear blue sky with Parisian buildings and greenery at the base—matches the prompt’s description. No significant inaccuracies are visible; proportions, details, and context align with reality. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The image depicts the Taj Mahal with high architectural accuracy, including the central dome, four minarets, and symmetrical layout. The reflection in the pool is rendered realistically. However, the people in the foreground appear slightly distorted or "melty," a common artifact in AI generation. Despite these minor human figure anomalies, the core subject—the monument itself—is recognizable and factually accurate. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The Statue of Liberty is clearly recognizable and accurately depicted with correct green patina, torch, crown, and tablet. The pedestal’s architectural details — including the colonnade, stone base, and surrounding structures — align with real-world references. Minor stylization or smoothing may be present, but no factual or structural inaccuracies are evident. Core criteria are fully met. VERDICT: PASS
FAILEvaluator (Qwen 3.6 27B)
The image depicts a shamisen, but it contains significant factual inaccuracies. The tuning pegs are incorrectly shaped like shuttlecocks (badminton birdies), which is not a real-world design. The body’s edge inlay appears garbled and nonsensical rather than traditional wood or lacquer patterns. While the three strings and bachi plectrum are present, the overall construction lacks authenticity and realism, failing to meet the “accurate” requirement. VERDICT: FAIL
FAILEvaluator (Qwen 3.6 27B)
The image depicts a brass astrolabe, but the text and symbols are nonsensical gibberish rather than accurate Latin or Arabic inscriptions typical of medieval instruments. The rete (star map) lacks recognizable constellations, and the degree markings appear inconsistent or decorative rather than functional. While the overall form resembles an astrolabe, the lack of factual accuracy in engravings and astronomical details fails the prompt’s requirement for an “accurate” medieval astrolabe. VERDICT
PASSEvaluator (Qwen 3.6 27B)
The image depicts a ceramic bowl with visible cracks repaired using gold lacquer, which is the defining characteristic of kintsugi. The texture of the glaze, the specific pattern of the cracks (crazing), and the application of the gold repair are visually consistent with real-world examples of this art form. The lighting and depth of field suggest a high-quality photograph of a physical object. There are no architectural or factual inaccuracies regarding the subject matter. VERDICT: PASS
FAILEvaluator (Qwen 3.6 27B)
The image depicts the water cycle with recognizable elements like evaporation, condensation, and precipitation. However, it contains significant factual errors in its text labels. The title is gibberish, "EVAPORATION" is misspelled as "EVAPORATIO," and "PRECIPITATION" is incorrectly labeled as "RAINI." These inaccuracies make the diagram factually incorrect. VERDICT: FAIL
FAILEvaluator (Qwen 3.6 27B)
The image depicts the correct geological concept of a convergent boundary with subduction, a trench, and a volcanic arc. However, it fails significantly on factual accuracy due to severe text hallucinations. Labels are garbled nonsense (e.g., "SUDULTION" for Subduction, "RENCH" for Trench, "VOCANIC ACR" for Volcanic Arc, "ACOOI" for Oceanic Crust). These errors make the diagram scientifically inaccurate and confusing. VERDICT: FAIL
FAILEvaluator (Qwen 3.6 27B)
The image is a stylized diagram of a heart, but it fails the accuracy criteria. The labels are nonsensical gibberish (e.g., "Neltuill," "Aigtim," "Priruptus Valve") instead of correct anatomical terms. Furthermore, the anatomical placement is incorrect; for instance, the vessel labeled "Arta" (presumably Aorta) is depicted emerging from the right side of the heart (blue side), whereas the aorta connects to the left ventricle. The pulmonary artery is also misplaced. VERDICT: FAIL
PASSEvaluator (Qwen 3.6 27B)
The image depicts a Morpho butterfly with the characteristic iridescent blue dorsal wings and brown ventral patterns with eyespots. The body morphology, including the antennae and thorax, appears correct. The coloration and pattern are consistent with the species description. The presentation in a display case is realistic. There are no obvious factual or architectural inaccuracies regarding the subject matter. VERDICT: PASS
FAILEvaluator (Qwen 3.6 27B)
The image depicts the Hagia Sophia interior with recognizable elements: the massive central dome, pendentives with angelic figures, semi-domes, and large Islamic calligraphy medallions. However, the Byzantine mosaics are not clearly visible or accurately rendered — instead, generic decorative patterns dominate. The architectural structure is mostly correct but lacks fine historical detail. Some lighting and texture appear AI-enhanced or stylized. VERDICT: FAIL
PASSEvaluator (Qwen 3.6 27B)
The image depicts a mineral specimen that closely resembles bismuth, featuring the characteristic geometric hopper crystal formation and staircase structure. The iridescent oxide layers are vividly rendered with accurate rainbow hues. However, the surrounding matrix appears to be composed of quartz-like crystals, which is not typical for pure bismuth specimens but could represent a composite or artistic interpretation. Despite this minor deviation, the core features of the bismuth crystal are re
Professional Studio 89%
Camera & LightingClick to expand83%
PASSEvaluator (Gemma 4 26B)
The image successfully implements a wide-angle perspective with a distinct circular vignette and heavy barrel distortion at the edges, creating a fisheye-like effect. The lighting shows a modern interior with recessed ceiling lights and natural light from the large glass doors, matching the "modern living room" description. The depth of field is deep, keeping the entire room in focus as expected for a wide-angle architectural shot. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image perfectly matches the prompt. It features a close-up macro shot of a single, prominent dewdrop on a pink rose petal. The depth of field is shallow, with the central dewdrop and the immediate petal in sharp focus while the background and surrounding petals are softly blurred. The lighting is natural and even, highlighting the texture of the petal and the clarity of the water droplet. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image perfectly matches the prompt. It features a true aerial bird's-eye view looking straight down at a traffic roundabout. The perspective is top-down, showing the circular road, the central island with trees, and various cars moving through the intersection. The lighting is even and natural, consistent with a high-angle daylight shot. VERDICT: PASS
FAILEvaluator (Gemma 4 26B)
The image features a portrait with a moody atmosphere and a shallow depth of field. However, it fails the specific lighting requirement. Rembrandt lighting is characterized by a small triangle of light on the shadowed cheek. In this image, the light is a broad, bright patch on the lit side of the face, and there is no distinct light triangle on the shadowed side. VERDICT: FAIL
PASSEvaluator (Gemma 4 26B)
The image perfectly matches the prompt. It features a landscape with a deep sense of depth through layered hills. The lighting is a textbook example of golden hour: the sky is a vibrant orange, there is a soft, hazy glow throughout the atmosphere, and long, dramatic shadows are cast across the foreground by the low-angle sunlight. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image perfectly matches the prompt. It features a high-key portrait with a bright, white, and overexposed background. The lighting is soft and airy, creating a luminous effect with minimal shadows on the subject's face. The depth of field is shallow, focusing on the eyes while the rest of the image remains soft and bright, adhering to the requested aesthetic. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image perfectly matches the technical specifications. The portrait features a shallow depth of field with a very creamy, blurred bokeh background, consistent with an 85mm f/1.4 lens. The lighting is clearly backlit, evidenced by the bright, warm rim light highlighting the subject's hair and shoulders, effectively separating her from the blurred garden background. VERDICT: PASS
FAILEvaluator (Gemma 4 26B)
The image fails to meet the core requirement of the prompt. While it features a shallow depth of field and bokeh, the prompt specifically requested a "subject sharp" with a blurred background. In this image, the entire scene, including the subjects in the foreground and midground, is completely out of focus and blurry. There is no sharp subject to anchor the composition. VERDICT: FAIL
PASSEvaluator (Gemma 4 26B)
The image matches the prompt's requirements. The perspective is a top-down shot (flat lay). The lighting is directional, coming from the upper-left, creating soft shadows on the right side of the food. The color temperature is warm, and there is a subtle vignette around the edges of the frame. The depth of field is relatively shallow, consistent with an f/2.8 aperture. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image successfully implements the tilt-shift effect. It features a distinct selective focus strip in the center of the street, with heavy blurring at the top and bottom to create a miniature model appearance. The colors are highly saturated, and the high-angle perspective enhances the toy-like aesthetic described in the prompt. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image features prominent blue and orange light streaks that run horizontally across a dark background, accurately capturing the "blue and orange anamorphic lens flare" described. The streaks create a sense of depth and cinematic atmosphere within the dark, moody scene. The lighting pattern and horizontal orientation perfectly match the prompt's requirements. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image perfectly matches the prompt. It features a close-up portrait with dramatic split lighting. One side of the face is brightly lit, while the other side is cast in deep, heavy shadow, creating a high-contrast chiaroscuro effect. The lighting pattern is sharp and distinct, dividing the face vertically as requested. VERDICT: PASS
Color PrecisionClick to expand100%
PASSEvaluator (Gemma 4 26B)
The image features a bright, vivid red sports car parked on a gray asphalt road. The red color is highly saturated, intense, and unambiguous, perfectly matching the prompt's description. The road is a clear, textured gray. All color requirements specified in the prompt are met accurately. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The woman is wearing a vibrant, deep blue dress that perfectly matches the description of "royal blue." The background is a solid, light beige/off-white color, which qualifies as a "neutral background." All color requirements from the prompt are accurately represented in the image. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image features a field of bright yellow sunflowers and a clear, vibrant blue sky. The colors perfectly match the prompt's description of "pure sunflower yellow flowers" and a "clear blue sky." There are no deviations in the color-coding or the requested hues. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image accurately reflects the prompt's requirements. The wall is a vibrant burnt orange that matches the hex code #FF6B35. The furniture, including the bed frame, nightstands, and lamp, is white. The scene is illuminated by natural daylight coming from the window. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The foliage in the image consists of a muted, dark green color that aligns well with the description of #2D5F2D. The tree trunks are a consistent reddish-brown. The image also explicitly includes the text "#2D5F2D" in the bottom left corner, confirming the intended color match. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image accurately reflects the prompt's requirements. The walls are a medium blue that matches the hex code #4A90D9 (a bright, sky-blue shade). The floor is a warm-toned wood, and the trim (baseboard and crown molding) is white. All color specifications are met perfectly. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image accurately reflects the specified colors. The walls are a warm cream color matching #F5E6CC, the sofa is a dark navy color matching #2C3E50, and the accent pillows are a bright red matching #E74C3C. The hex codes are even explicitly labeled on the objects within the image to confirm the match. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image follows the color requirements well. The background is a deep dark blue, consistent with #1A1A2E. The watch features a soft gold/beige color, matching the #F0C27F description. The earbuds and their charging case are pure white (#FFFFFF). While the watch face contains a text artifact ("#00C27F"), the actual visual colors of the objects accurately reflect the requested hex values and color names. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image accurately reflects the requested color palette. The background is a near-white color matching #FAFAFA. The main central shape is a vibrant orange-red, consistent with #FF4500. The accent lines are a dark charcoal/dark grey, matching the #333333 specification. The text overlays within the image also confirm these specific hex codes, and the visual colors align perfectly with the prompt's requirements. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image shows a smooth gradient transitioning from a bright red on the left to a bright blue on the right, passing through a purple middle section. The left side matches the red color (#FF0000) and the right side matches the blue color (#0000FF) specified in the prompt. The transition is even and follows the requested color scheme perfectly. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image features a vertical gradient. At the bottom, there is a vibrant orange color, which aligns well with the requested #FF6B35. Moving upwards, the colors transition through pink and purple tones before reaching a bright, saturated blue at the top. While the top color is a bright blue rather than the much darker, deep indigo (#1B1464) requested, the overall color scheme and the bottom color match the prompt's intent. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image displays a horizontal banner with a gradient. On the left, the color is a bright cyan, which matches the hex code #00C9FF. On the right, the color is a bright lime/mint green, which matches the hex code #92FE9D. The transition between these two colors is a smooth gradient as requested. VERDICT: PASS
PhotorealismClick to expand67%
PASSEvaluator (Gemma 4 26B)
The image successfully meets the prompt's requirements. The red sneaker is presented on a clean white background with soft, professional studio lighting and a natural, gentle shadow underneath. The texture of the fabric and the details of the laces and sole appear realistic and consistent. There are no obvious AI artifacts, such as warped geometry or nonsensical textures, and it successfully passes as a high-quality product photograph. VERDICT: PASS
FAILEvaluator (Gemma 4 26B)
The image contains significant AI artifacts that prevent it from passing as a real photograph. The most obvious issue is the illogical geometry of the glass bottle and the cap; the edges are warped, inconsistent, and physically impossible. The internal straw (dip tube) is strangely shaped and does not align naturally with the bottle's structure. Additionally, the reflections and light refractions through the glass are nonsensical and lack the precision of a real studio shot. VERDICT: FAIL
PASSEvaluator (Gemma 4 26B)
The image successfully captures the requested studio aesthetic. The lighting is soft and directional, creating a natural-looking shadow beneath the apple. The texture of the skin, including the lenticels (small white spots), appears realistic and consistent. There are no obvious AI artifacts, such as unnatural warping or nonsensical textures. It looks like a high-quality, professional catalog photograph. VERDICT: PASS
Graphical design 54%
Data VisualisationClick to expand0%
FAILEvaluator (Qwen 3 VL)
The chart misrepresents the requested data: it shows multiple overlapping bars per month (not a simple bar chart), incorrect month labels (e.g., “Orodg” for March), and distorted values (e.g., 18K for January, not 12K). The legend and title are present but misleading. The chart type does not match the requested bar chart format. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
The pie chart misrepresents data: "Company B" is incorrectly labeled on two slices (25% and 20%), while "Company A" appears only on a blue slice (35%) but is mislabeled as "Company B" on the legend. "Others" is 20% but mislabeled as "Company B" in one callout. The chart type is correct, but critical labeling errors render it inaccurate. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
The chart misrepresents data: y-axis labels are reversed and illegible, x-axis labels are garbled (e.g., “Weny” for Mon, “Fur” for Wed), and the title is corrupted. Values shown (1200, 1500, 1100, 1800, 2000, 1700, 1400) are incorrect — the prompt specified 1200, 1500, 1100, 1800, 2000, 1700, 1400, but the y-axis scale is inverted. The chart type is correct, but critical elements are unreadable and inaccurate. VERDICT: FAIL
Layout & DesignClick to expand33%
PASSEvaluator (Qwen 3 VL)
The image displays "MOVIE NIGHT" in bold at the top, a large popcorn graphic centered below it, and "Every Friday at 8 PM" at the bottom. Text is readable and correctly placed per the prompt. Visual hierarchy is clear, with title dominant, popcorn central, and tagline at base. All required layout elements are present and correctly positioned. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The poster includes a guitar illustration, the headline "LIVE MUSIC FEST" in large, bold text, and the tagline "Tickets available now" below. Text is readable and correctly placed. Visual hierarchy is clear with headline dominating, tagline secondary, and guitar as a supporting graphic. All required elements are present and correctly formatted. VERDICT: PASS
FAILEvaluator (Qwen 3 VL)
The poster displays "JOIN THE BOOK CLUB" prominently at the top with correct hierarchy. An open book illustration is centered below. The tagline "Meets every Wednesday" is present but below it, placeholder text is garbled and unreadable. The core layout elements are present, but critical text is corrupted. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
All layout zones are present: hero header, headline/subtitle, “Sign Up” button, and footer with three social icons. Visual hierarchy is correct. However, all text is gibberish and unreadable, violating the requirement for readable text. The core requirement (readable text) is not met. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
The image shows a newsletter layout with a header banner, two-column body (text left, image right), a highlighted quote block, and a footer with an unsubscribe link. However, all text is gibberish and unreadable, violating the requirement for readable, correctly placed text. The visual hierarchy is present but meaningless due to illegible content. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
The image shows a menu with a logo header, three category sections (Appetizers, Mains, Desserts), and a footer with hours and address. Text is readable and correctly placed per layout. However, all item names and prices are gibberish, indicating AI-generated placeholder text. The core layout structure is intact, but content is invalid. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
The layout matches the prompt: left page has a large hero photo, right page has two columns, a pull quote, sidebar with author bio, and page number. Text is present but illegible due to AI artifacts—no readable content. Visual hierarchy is structurally correct, but textual elements fail functional readability. VERDICT: FAIL
PASSEvaluator (Qwen 3 VL)
The image shows a mobile app UI with a top navigation bar, search field, 2x2 grid of feature cards, recent activity list, and bottom tab bar — all present. Visual hierarchy is clear. Text is readable and correctly placed, though some Chinese characters appear garbled or nonsensical. Core layout elements are intact and correctly structured. VERDICT: PASS
FAILEvaluator (Qwen 3 VL)
Left page: full-bleed photo of model — present. Right page: three-column layout — present. Headline, body copy, inset image, and page number (22) — all visible. However, all text is gibberish, unreadable placeholder. Visual hierarchy is structurally correct but text fails functional requirement. VERDICT: FAIL
Style DiversityClick to expand83%
PASSEvaluator (Qwen 3 VL)
The image clearly depicts a golden retriever in a garden setting. Visible brushstrokes and textured application of paint are evident across the dog’s fur, grass, and foliage, mimicking traditional oil painting techniques. Color blending and impasto-like effects enhance the rich texture. The style aligns precisely with the requested oil painting aesthetic. VERDICT: PASS
FAILEvaluator (Qwen 3 VL)
The image shows a golden retriever in a garden, rendered in a Japanese anime style with large, expressive eyes and flat color areas. However, the fur has subtle shading and texture, deviating from the requested "flat colors." The style is recognizable but not fully aligned with the prompt’s specificity. VERDICT: FAIL
PASSEvaluator (Qwen 3 VL)
The image clearly depicts a golden retriever in a garden, rendered in a distinct pixel art style with visible 32x32 grid blocks and a limited, blocky color palette. Key characteristics—blocky forms, simplified textures, and low-resolution detail—are present. The subject and setting match the prompt’s description. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The image clearly embodies Art Deco: bold geometric shapes, symmetrical composition, and a dominant gold-and-black palette. Buildings feature stepped forms, sunburst motifs, and stylized ornamentation. The street scene is framed by ornamental arches, and the foreground displays intricate Art Deco patterns. All requested stylistic elements are unmistakably present. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The image clearly depicts a woman with a parasol rendered in Impressionism. Soft, visible brushstrokes, dappled light effects in sky and dress, and a pastel palette (blues, yellows, greens) are all present. The style is unmistakably recognizable as Impressionist, matching the prompt’s key visual requirements. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The image clearly exhibits Pop Art style: bold black outlines, Ben-Day dot patterns, and flat, vivid primary colors (pink, blue, yellow). Facial features are stylized with high contrast and saturated hues. All requested visual characteristics are unmistakably present. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The image shows a forest with visible watercolor bleeds, soft gradients, and textured paper grain. Tree details are rendered with painterly softness, yet retain photographic depth and lighting—sunlight filtering through canopy with realistic shadows. The style blends photorealistic composition with watercolor’s fluid, translucent qualities. All requested characteristics are clearly present. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The image clearly blends cyberpunk (neon lights, tech gear, futuristic city) with Art Nouveau (flowing organic lines, floral borders, golden decorative frame). The woman’s flowing hair and ornate frame match the requested style. Neon colors and tech elements are prominent. All key visual characteristics are present and well-integrated. VERDICT: PASS
FAILEvaluator (Qwen 3 VL)
The image blends ukiyo-e style with photorealistic sky: bold outlines, flat colors, and dramatic gradient sky. However, the castle’s architecture is distinctly Japanese (castle-style), not medieval. The prompt requested a medieval castle, which is absent. The style fusion is visually present but misaligned with the core subject. VERDICT: FAIL
PASSEvaluator (Qwen 3 VL)
The image clearly shows raw concrete structures with exposed aggregate and organic forms, matching soft brutalism. Furniture is rounded, pastel-colored, and minimalist. Warm, diffused lighting enhances the cozy, sculptural feel. All requested elements are distinctly present and harmoniously integrated. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The image displays a sparse still life with a single object in the lower-left corner, surrounded by vast, muted beige negative space. Colors are desaturated and restrained, creating visual tension through isolation and implied imbalance. The composition is minimalist, and the object’s stark placement evokes unease. The style aligns with “anxious minimalism” as requested. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The image displays unmistakable retro-futurism: sleek, curved, and spire-topped architecture evoking 1960s space-age design. Warm, faded sepia tones and soft lighting create a melancholic patina. Key visual elements — organic forms, verticality, and nostalgic color grading — are clearly present and align with the prompt’s aesthetic. VERDICT: PASS