ImageBench V1 —
192 evaluations across 6 categories

Prompt: The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink
Text Rendering 93%
Typography StyleClick to expand100%
PASSEvaluator (Qwen 3 VL)
Text reads "CHAPTER ONE" correctly. Font mimics vintage typewriter style with uneven, slightly smudged ink and distressed paper texture. Visible typewriter keys and carriage reinforce the medium. All prompt elements are convincingly rendered. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Text reads "TODAY'S SPECIAL" correctly. Letters show chalk texture, smudges, and dust particles on dark green board. Chalk dust is visible around letters and on ledge. Style convincingly mimics hand-written chalk. All elements match prompt. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The word "OPEN" is rendered in glowing red pixel-style letters on a matrix display, matching the LED sign description. The spelling is correct. The pixelated, blocky font convincingly simulates an LED screen. VERDICT: PASS
Writing accuracyClick to expand92%
PASSEvaluator (Qwen 3 VL)
The word "HELLO" is fully spelled correctly and clearly readable in bold black letters on a white background. All five letters are intact and legible. No misspellings, missing words, or illegible parts are present. The text matches the intended specification exactly. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The word "ADVENTURE" is spelled correctly and fully readable in bold, centered text on a cream background. All letters are intact and legible. No misspellings or missing parts. The text matches the intended specification exactly. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The word "SERENITY" is fully spelled correctly and clearly legible in white, centered on a solid light blue background. No misspellings, missing letters, or illegible parts are present. The text matches the intended specification exactly. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
All words are spelled correctly and fully readable: "The", "quick", "brown", "fox", "jumps", "high". No misspellings or illegible parts. Text matches intended phrase exactly. Contrast is clean, font is legible. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The text reads: "Every sunset brings a new dawn ahead". All words are spelled correctly and fully readable against the pale yellow background. No misspellings or illegible parts detected. The phrase matches the intended text exactly. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Text reads: "Stars shine brightest in the dark night" — matches intended phrase exactly. All words are spelled correctly, fully visible, and legible. Background is clean off-white as specified. No errors detected. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
All text matches the intended content exactly. "SUMMER SALE", "Up to 50% off all items this weekend only", and "Terms and conditions apply. See store for details." are all spelled correctly and fully readable. No misspellings or illegible parts detected. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
All text matches the intended message exactly: "GRAND OPENING", "Join us for food, music, and fun", and "RSVP required. Limited seating available." Every word is spelled correctly and fully readable. No misspellings or illegible parts detected. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
All text matches the intended content exactly: "ANNUAL GALA 2025", "An evening of fine dining and live jazz", and "Black tie optional. Doors open at 6 PM.". No misspellings, missing words, or illegible parts detected. Text is fully readable and accurate. VERDICT: PASS
FAILEvaluator (Qwen 3 VL)
Text is mostly correct but contains errors: “Music by Thomas Grant. A Paramount Grant.” incorrectly repeats “Grant” instead of “Pictures.” All other text matches the intended script. The misspelling “Voss” (correctly spelled) is present, but “Grant” is duplicated. VERDICT: FAIL
PASSEvaluator (Qwen 3 VL)
All text matches the intended content exactly: "NATIONAL EXPLORER", "The Hidden Temples of Cambodia — page 34", "Interview: Dr. Maria Santos on Climate Action", "Top 10 Trails You Must Hike This Year", and "Issue 247 | March 2025 | $6.99". No misspellings, missing words, or illegible parts detected. Text is fully readable and accurate. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
All text matches the intended content exactly. Spelling is correct, all words are present, and every line is fully readable. No misspellings, missing words, or illegible parts detected. The label accurately reproduces the specified text with perfect fidelity. VERDICT: PASS
Spatial Reasoning 79%
Attributes BindingClick to expand100%
PASSEvaluator (Qwen 3.5 122B)
The image shows two objects on a white surface. On the left is a red sphere (ball). On the right is a blue cube. Both objects are correctly colored and shaped as specified. They are positioned side by side. All attributes (color, shape) are correctly bound to the right objects with no swaps. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a green apple and an orange pumpkin placed on a wooden cutting board. The apple is correctly green, round, and has a stem. The pumpkin is orange, ribbed, and has a short stem. Both objects are appropriately sized relative to each other and the board. No attribute swaps or misidentifications are present. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows two vehicles parked on a street. The vehicle on the left is a yellow taxi, matching the description in color and type. The vehicle on the right is a long black limousine, also matching the description in color and type. Both are parked next to each other as requested. No attributes are swapped or misassigned. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains three distinct shapes on a white background. The large red circle is correctly rendered. The small blue triangle is correctly rendered. The medium green star is correctly rendered. All attributes (color, size, shape) are correctly bound to their respective objects with no swaps or errors. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows three candles on a wooden shelf: 1. Left: Tall, thin, white candle — matches description. 2. Center: Short, fat, red candle — matches description. 3. Right: Medium-height, spiral-shaped, yellow candle — matches description. All attributes (color, size, shape) are correctly bound to each object with no swaps. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a large brown horse standing in a meadow, a small white rabbit sitting on the grass, and a medium-sized black cat standing nearby. All attributes (color, size, shape) are correctly bound to each object with no swaps. The horse is big and brown, the rabbit is small and white, and the cat is medium-sized and black. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains a pink fire truck with correct shape and features (ladder, wheels, cab). A large blue banana is present in the foreground with correct curved shape and stem. A green spiky sun appears in the sky with correct radial shape. All three requested objects are present with their specified colors and forms correctly bound — no attribute swaps observed. The surreal landscape context is also fulfilled. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows: 1. A purple carrot with green leaves — correct color and object. 2. A red tree with red leaves — correct color and object. 3. A sliced watermelon with white flesh and green rind — matches “white watermelon sliced open” (though typically watermelons are red inside, the prompt specified white, so this is compliant). All attributes are correctly bound to their respective objects with no swaps. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows three objects on a wooden table: 1. An orange elephant figurine with white tusks and feet — matches “orange elephant” 2. A fluffy yellow penguin chick with black eyes, beak, and wingtips — matches “yellow penguin” 3. A metallic silver strawberry with textured surface and green calyx — matches “silver strawberry” All attributes (color, shape, object type) are correctly bound. No swaps or mismatches observed. VERDICT: PASS
CompositionalityClick to expand89%
PASSEvaluator (Qwen 3.5 122B)
The image contains all described elements: 1. Tabby cat - Present (brown/black striped fur pattern) 2. Oversized red sunglasses - Present (large red frames with dark lenses) 3. Coherent combination - Present (glasses fit naturally on cat's face) All prompt requirements are met with proper integration of concepts. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows a golden retriever sitting in a grassy field. The dog is wearing a cowboy hat. However, the hat is not "tiny" — it fits proportionally on the dog’s head, appearing normal-sized for a canine, not miniature or comically small as “tiny” would imply. While the core concept is present, the specific descriptor “tiny” is inaccurate. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image shows a penguin standing on snow with a blue sky and clouds in the background. The penguin has colorful wings (yellow, blue, red) and is holding a large, colorful umbrella (red, blue, yellow panels). All described elements — penguin, colorful umbrella, and their combination — are clearly present and coherently integrated. The unusual concept is fully realized. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a medieval knight in full armor sitting at a wooden desk. The knight is typing on a modern silver laptop (with an Apple logo). Scrolls and parchment are also present on the desk, enhancing the medieval setting. All described elements — knight, armor, desk, laptop — are clearly visible and coherently combined in a single scene. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains all described elements: 1. Roman gladiator - Present (wearing helmet, armor, and leather skirt) 2. Taking a selfie - Present (holding phone up in selfie pose) 3. Smartphone - Present (modern smartphone in hand) 4. Colosseum - Present (ancient Roman amphitheater architecture) All elements are coherently combined in a single scene with proper lighting and perspective. The anachronistic combination is executed logically. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows an astronaut in a full white spacesuit with blue stripes, complete with helmet and backpack, riding a silver bicycle. The setting is a sunny park with green grass, trees, and colorful flowerbeds. All described elements — astronaut, spacesuit, bicycle, and park — are clearly present and coherently combined. The scene is visually consistent and logically assembled. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains all described elements: - Octopus: Present, large and detailed - Top hat: Present, black top hat on octopus - Monocle: Present, gold-rimmed eyewear on octopus - Chess: Present, wooden chessboard with one piece - Moon surface: Present, cratered lunar terrain - Earth in background: Present, blue planet visible in space All elements are coherently combined in a surreal but visually consistent composition. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains all specified elements: 1. Dinosaur - Present (T-Rex head/body) 2. Business suit - Present (dark pinstripe suit with tie) 3. PowerPoint presentation - Present (projector screen with charts/graphs) 4. Modern office - Present (glass walls, conference table, laptop) 5. Drinking coffee - Present (holding steaming coffee mug) All elements are coherently combined in a single scene. The dinosaur is anthropomorphized appropriately for the business context. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains all described elements: - Robot dressed as a pirate: Present (wearing pirate hat, eye patch, striped shirt) - Paper boat: Present (white origami-style boat) - Lake of lava: Present (glowing orange/red molten surface with bubbles) - Juggling three kittens: Present (three fluffy kittens floating above robot’s hands in juggling motion) All components are coherently combined in a surreal, visually consistent scene. VERDICT: PASS
CountingClick to expand56%
PASSEvaluator (Qwen 3.5 122B)
The image displays exactly three red apples placed on a wooden table surface. The apples are arranged in a triangular formation with one in the foreground and two slightly behind. The count matches the requested quantity precisely. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows exactly five yellow rubber ducks arranged in a row on a white bathroom shelf. Each duck is identical in appearance, with orange beaks and black eyes. The count matches the prompt’s request precisely. No extra or missing objects are present. The scene is clean, well-lit, and accurately reflects the described composition. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image displays exactly four white candles arranged in a row on a dark, reflective surface. The count matches the requested quantity precisely. All candles are unlit and appear identical in shape and color. The dark background and surface provide strong contrast, highlighting the candles as intended. No extra or missing objects are present. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows three black dogs sitting in a row in the background and two orange cats sitting in front of them on a grassy lawn. The count of 3 black dogs and 2 orange cats exactly matches the requested prompt. All animals are clearly visible and correctly identified. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows exactly 4 red roses and 2 white daisies arranged in a clear glass vase, placed on a wooden table. All requested objects are present in the correct quantities and context. The composition matches the prompt precisely. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows a garden fence with butterflies. I can clearly count 1 blue butterfly perched on the leftmost visible fence post. There are 4 yellow butterflies visible along the fence line, not 3 as requested. The prompt specified "1 blue butterfly and 3 yellow butterflies", but the image contains 1 blue and 4 yellow butterflies. VERDICT: FAIL
FAILEvaluator (Qwen 3.5 122B)
The image shows a round glass fishbowl containing goldfish. Counting the fish: there are 8 clearly visible goldfish swimming inside the bowl. The prompt requested exactly 9 goldfish. Since only 8 are present, the count does not match the requirement. VERDICT: FAIL
FAILEvaluator (Qwen 3.5 122B)
The image shows a fruit bowl with: - 3 bananas (visible, clustered together) - 3 oranges (clearly visible; one is partially hidden behind another) - 2 apples (both red, clearly visible) The prompt requested exactly 4 oranges, but only 3 are present. Therefore, the count does not match. VERDICT: FAIL
FAILEvaluator (Qwen 3.5 122B)
The image shows a cluster of colorful balloons against a blue sky. Counting them: 1 yellow (top left), 1 blue, 1 pink, 1 orange, 1 red (center), 1 light pink, 1 green, 1 purple, 1 teal, 1 red (bottom left), 1 white, and 1 yellow (bottom right). That’s 12 balloons total. The prompt requested exactly 11. Therefore, the count does not match. VERDICT: FAIL
NegationClick to expand56%
PASSEvaluator (Qwen 3.5 122B)
The image depicts a serene tropical beach scene with golden sand, turquoise water, and palm trees leaning over the shore. Footprints are visible in the sand, but there are no people present anywhere in the frame — no figures on the beach, in the water, or in the background foliage. The exclusion criterion (“no people”) is fully satisfied. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows a fluffy orange cat lying on a couch in warm sunlight. Upon close inspection of the cat’s neck, a thin dark line with a small metallic bell is visible — indicating the presence of a collar. Since the prompt explicitly excluded “a cat lying on a couch without a collar,” and the cat here is wearing one, the excluded element is present. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image shows an empty classroom with rows of wooden desks and chairs facing a green chalkboard mounted on a white wall. A wooden teacher’s podium sits centered beneath the board. No students, people, or personal items are visible. The scene is clean, orderly, and devoid of human presence — fully satisfying the exclusion criterion. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows a sandy beach with palm trees in the background, numerous seashells scattered in the foreground, and a colorful beach umbrella. However, on the left side of the image, beyond the sand, there is clearly visible water — likely the ocean or sea — with gentle waves and a horizon line where water meets sky. Since the prompt explicitly excluded “water or ocean visible,” and it is present, the exclusion criterion is violated. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image shows a wooden desk with a leather chair, a bookshelf filled with books, and a brass desk lamp. There is a notebook and pen on the desk. No computers, monitors, tablets, or any electronic screens are visible in the scene. The excluded elements are completely absent. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a kitchen scene with a stainless steel gas stove, a wooden cutting board on the countertop, and several pots and pans stacked to the left. One large pot sits on the back burner. All cookware appears empty or covered; no food items are visible inside or around them. The excluded element (food) is truly absent. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows a pepperoni pizza with visible tomato sauce, fresh basil leaves, and a golden-brown crust. However, there is clearly melted cheese covering the surface beneath the toppings — it’s bubbly, golden in areas, and blends with the sauce. Since the prompt explicitly excluded cheese (“absolutely no cheese”), and cheese is visibly present, the core criterion is violated. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image shows the front view of a silver car body. Visible elements include the hood, windshield with side windows, and two round headlights. The frame cuts off at the bumper level. There are no wheels visible in the image, which aligns with the exclusion criteria in the prompt. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image clearly shows a human face with eyes, nose, and mouth. However, the excluded element — eyebrows — are visibly present above both eyes. They are well-defined, natural-looking, and fully formed. Since the prompt explicitly required “no eyebrows,” their presence violates the constraint. VERDICT: FAIL
Relative PositionClick to expand92%
PASSEvaluator (Qwen 3.5 122B)
The image shows a cat positioned directly on the upper surface of a cardboard box. The cat’s paws and tail rest on the box, confirming the “sitting on top of” relationship. No conflicting spatial descriptors are present. All positional relationships described in the prompt are accurately reflected in the image. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a wooden chair with a red ball positioned directly underneath it, resting on the floor between the front legs. The ball is clearly below the seat and under the horizontal support bar of the chair. All spatial relationships described in the prompt — specifically “underneath” — are accurately represented. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a white coffee mug positioned to the left of an open book, both resting on a wooden desk surface. The spatial arrangement matches the prompt: the mug is “next to” the book, and both are “on” the desk. No incorrect positional relationships (e.g., under, above, below) are present. All described spatial relations are accurate. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a bird perched on a branch that extends from a tree trunk on the left. The branch is positioned above and slightly behind a wooden park bench, which sits in the foreground. The spatial relationships described — bird on branch, branch extending from tree, tree next to bench — are all accurately represented. All positional cues (above, next to) are correct. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a cat sitting on a fluffy beige pillow, which is placed on the seat of a wooden chair. The spatial relationships described in the prompt — “cat on pillow,” “pillow on chair” — are accurately represented. All positional descriptors (on, under) align with the visual content. No contradictions or misplacements are evident. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a lit candle placed directly on top of a stack of four books. To the right of the book stack is a potted plant in a terracotta pot. The spatial relationships described — “candle on top of books” and “books next to potted plant” — are accurately represented. All positional descriptors (on, next to) match the visual layout. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows five chess pieces on a board: a king in the center, two pawns to its left (not a queen), a knight to its right (not a bishop), and a rook above the king. The prompt incorrectly identifies the queen and bishop, and misplaces the knight as “below” when it’s to the right. Spatial relationships are not fully accurate per the prompt’s description. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image shows a desk scene with a laptop centered, a coffee mug to its left, and a phone to its right — all matching the prompt. A lamp is positioned above the laptop, casting shadows on the wall behind. A spiral notebook lies below the laptop, closer to the viewer. All specified spatial relationships (left, right, above, below) are accurately represented. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows five animals in a field arranged left to right: a horse, a cow, a sheep, a goat, and a dog. The horse is on the far left. The cow is to the right of the horse. The sheep is in the center. The goat is to the right of the sheep. The dog is on the far right. All specified spatial relationships are accurately represented. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a large red barn dominating the left foreground, with a windmill positioned to its right in the midground. Snow-capped mountains appear in the distant background on the right side. The spatial arrangement matches the prompt: barn (foreground, large), windmill (midground, medium), mountains (background, small). Depth perspective is consistent. All positional relationships are correctly represented. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image correctly depicts a beach scene with a detailed sandcastle in the foreground, a surfer riding waves in the midground, and a cruise ship on the horizon in the background. The sizes of the objects decrease appropriately with distance, maintaining the specified spatial relationships. All positional relationships described in the prompt are accurately represented. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image correctly depicts a city park scene with spatial layering as described. In the foreground, a wooden park bench is positioned on the left, with several pigeons scattered on the grass and pavement around it. The midground features a large, active fountain centered behind the bench. The background consists of tall skyscrapers rising above the tree line, progressively smaller due to perspective. All positional relationships align with the prompt. VERDICT: PASS
Scale & ProportionsClick to expand78%
PASSEvaluator (Qwen 3.5 122B)
The person’s height relative to the school bus appears realistic — approximately 1/3 to 1/4 the height of the bus’s side panel, which is consistent with real-world proportions. The bus’s windows, wheels, and lettering also scale appropriately to the human figure. No obvious distortions or size inconsistencies are visible. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The ladybug appears disproportionately large relative to the leaf it is sitting on. In reality, a ladybug is small enough that many could fit across the width of such a leaf. Here, the beetle occupies a significant portion of the visible leaf surface, suggesting an unrealistic scale relationship between the insect and its environment. VERDICT: FAIL
FAILEvaluator (Qwen 3.5 122B)
The image shows a child holding a basketball. The ball appears disproportionately large relative to the child’s hands and torso — it nearly spans the width of his chest and is held with both hands in a way that suggests it’s too big for his grip. In reality, a standard basketball would be smaller relative to a young child’s body. This violates the prompt’s requirement for “correct relative sizes.” VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image shows a miniature figurine of a person standing on the rim of a coffee cup, looking down into the liquid. The scale is consistent with the prompt: the person is tiny relative to the cup, and the cup’s proportions appear realistic for a standard coffee mug. The figurine’s posture and placement align with the description. No obvious distortions or inconsistencies in size relationships are present. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image depicts a giant tabby cat walking down a city street, towering over cars and pedestrians, with skyscrapers flanking it on both sides. The cat’s size is exaggerated to kaiju-like proportions relative to the urban environment — its paws are larger than cars, and its head reaches mid-building height. This matches the prompt’s description of “giant” and “like a kaiju.” Proportions are internally consistent within the surreal context. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a miniature house resting on an open human palm. The house is appropriately small relative to the hand, fitting comfortably within the palm’s width and length. The fingers and thumb frame the house naturally, and the scale appears realistic for a “miniature” object held in a hand. No proportional inconsistencies are visible. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows an elephant, two horses, a dog, a cat, and a mouse arranged in a line. The relative sizes are generally correct: the elephant is largest, followed by the horses, then the dog, cat, and mouse smallest. However, there are two horses instead of one as specified in the prompt. Despite this extra animal, the core requirement of correct proportional sizing among the listed animals is met. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image displays a table setting with a plate, fork, knife, wine glass, salt and pepper shakers, and loose peppercorns. The relative sizes appear realistic: the plate is appropriately sized for a dinner setting, the cutlery is proportional to the plate, the wine glass is standard height relative to the other items, and the shakers are correctly sized as condiment containers. The loose peppercorns are tiny compared to the shakers, which is accurate. All proportions are consistent with a real-wo
PASSEvaluator (Qwen 3.5 122B)
The image displays a farm scene with a barn, tractor, farmer, cow, chickens, and fence post. The relative sizes appear consistent: the barn is the largest structure, the tractor is appropriately sized for a farm vehicle, the farmer is human-sized relative to the tractor, the cow is large compared to the chickens, and the fence post is in proper proportion to the animals and structures. All objects maintain realistic scale relationships. VERDICT: PASS
Human realism 91%
Faces & ExpressionsClick to expand100%
PASSEvaluator (Qwen 3 VL)
The face is symmetric with natural, correctly proportioned features. The warm, natural smile is clearly readable, with genuine crinkles around the eyes. Lighting enhances the joyful expression without distortion. Skin texture and freckles appear authentic. No noticeable AI artifacts or anatomical errors. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The face is symmetric with natural, correctly rendered features. The expression is calm and serious, with subtle furrows indicating contemplation. Lighting enhances texture without distortion. No anatomical anomalies or unnatural artifacts are present. The prompt’s intent is clearly achieved. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The face is symmetric with natural, correctly proportioned features. The teenager’s expression is relaxed and neutral, with no discernible distortions. Eyes are clear, lips are softly closed, and lighting is even. The prompt’s intent is fully met: a realistic, calm headshot of a teen. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The face shows clear disgust: nose is wrinkled, lips are curled, and brows are furrowed. Features are symmetric and anatomically correct. No distortions are present. The emotion is unmistakably readable and strongly conveyed. All requested elements are accurately rendered. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Face is symmetric with natural features. Eyes are wide, eyebrows raised, and mouth slightly open — clearly conveying surprise. Lighting creates realistic highlights and shadows. No visible distortions. All requested elements are accurately rendered. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Face is symmetric with natural, correctly rendered features. Downcast eyes and slightly drawn mouth convey quiet melancholy effectively. Distant gaze is implied by the downward look. No significant distortions. Lighting enhances the somber mood without compromising anatomical accuracy. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The face is symmetric with natural, aged features: deep wrinkles, white hair, and warm, kind eyes. The smile is clearly readable and conveys genuine warmth. No distortions are present; details like skin texture and lighting are realistic. The image accurately fulfills the prompt’s requirements. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The child’s face is symmetric with natural features: round cheeks, large eyes, and tear streams clearly visible. The crying expression is vividly rendered with furrowed brows and an open mouth. No distortions are present. The prompt’s key elements are accurately captured. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Face is symmetric with realistic aging features: age spots, thinning gray hair, and deep wrinkles. Skeptical expression is clearly readable via furrowed brow and downturned mouth. No distortions; glasses sit naturally. All prompt elements are accurately rendered. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Faces show clear, symmetric features with no distortions. Emotions are vividly readable: left man laughs with head back, center man looks shocked with wide eyes, right man rolls eyes with smirk. Lighting enhances expressions naturally. All three expressions match the prompt precisely. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Faces are symmetric with natural features. Father beams proudly, mother smiles gently, teenager looks bored, child grins excitedly—all emotions clearly readable. No distortions. Lighting and composition enhance realism. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Faces are symmetric with correct features, no distortions. Emotions are clearly readable: left man shows frustration (furrowed brows), center woman shows pleased (smiling), right man shows confusion (tilted head, raised eyebrows). All three expressions match the prompt precisely. VERDICT: PASS
Full BodyClick to expand100%
PASSEvaluator (Qwen 3 VL)
The woman stands upright with arms at her sides, wearing a white t-shirt and jeans, matching the prompt. Limb count and proportions are anatomically correct. Pose is relaxed and recognizable. No distortions are visible. All elements align with the requested description. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The man stands straight, facing the camera, with hands in pockets. Limb count and proportions are anatomically correct. The pose is clearly recognizable as described. No distortions are visible. All elements match the prompt precisely. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The person’s full body is visible from head to feet in a neutral standing pose. Limb count and proportions are anatomically correct. The pose is recognizable as standing still in a park. No distortions are evident. The scene matches the prompt accurately. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The woman’s limbs are correctly proportioned and count is accurate. Her running pose is anatomically plausible, with dynamic motion blur in hair and dress enhancing the sense of movement. No distortions in limb structure or proportions are visible. The action is clearly recognizable as running through a park. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The man’s limbs are correctly counted and proportioned. His crouched mid-air pose with bent knees is anatomically plausible for jumping over a puddle. The action is clearly recognizable, with water splashing realistically beneath him. No significant distortions are present. The composition and lighting are naturalistic. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The dancer’s limbs and proportions are anatomically correct. The pose—a dynamic spin with one leg extended and dress flowing outward—is clearly recognizable. Motion blur on the dress enhances the spinning effect. No distortions in limb count or proportions. The action matches the prompt precisely. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The figure shows a person in a full-body yoga tree pose: standing on one leg, the other foot pressed to the inner thigh, arms raised overhead in prayer position. Limb count and proportions are anatomically correct. The pose is clearly recognizable and executed with proper alignment. No significant distortions are visible. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The climber has correct limb count and proportions. Arms and legs are extended to different holds as described, with realistic muscle definition and joint angles. The pose is anatomically plausible for rock climbing. No significant distortions are visible. The action is clearly recognizable as dynamic climbing. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The cellist’s posture is anatomically correct: legs encircle the instrument, bow arm is extended, and hands position is accurate for playing. Limb count and proportions are natural. The pose is clearly recognizable as a seated cello performance. No distortions are visible. All elements align with the prompt. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The dancers’ limbs are correctly proportioned and anatomically accurate. The woman is dipped backward with intertwined legs, matching the prompt. The pose is a recognizable tango lift. No distortions are visible. All elements align with the requested action. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Both players show correct limb count and proportions. The sliding player’s body posture and the jumping player’s mid-air pose are anatomically plausible and match the described action. No distortions are visible. The ball is clearly at the jumping player’s feet. The scene accurately depicts a dynamic soccer tackle. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Both gymnasts show correct limb count and proportions. Their synchronized handstand pose is anatomically accurate and recognizable. Alignment is nearly identical, with minor variations in hand placement and leg angle. No significant distortions are visible. The action is clearly depicted as a handstand with legs extended. VERDICT: PASS
HandsClick to expand67%
FAILEvaluator (Qwen 3.5 122B)
The left hand (viewer’s right) shows 5 fingers, but the thumb appears fused or malformed with the index finger. The right hand (viewer’s left) also has 5 fingers, but the pinky is slightly curled unnaturally and the thumb lacks clear definition. Overall, while finger count is mostly correct, anatomical realism is compromised — especially in the left hand’s thumb/index region. Not fully natural or relaxed as prompted. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image shows a woman with both hands extended forward, palms up. Each hand clearly displays five fingers — thumbs and four fingers — with natural curvature and positioning consistent with an open, palms-up gesture. No extra, missing, or malformed fingers are visible. The anatomy is correct and matches the prompt’s requirement for “all ten fingers clearly visible” and “exactly 5 per hand.” VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The left hand (on the right arm) shows 4 fingers clearly, with the thumb likely hidden — anatomically plausible. The right hand (on the left arm) is blurred and partially obscured; only 2–3 fingers are vaguely visible, making it impossible to confirm 5 fingers. The grip appears natural for crossed arms. However, due to insufficient visibility of the right hand’s fingers, we cannot verify “exactly 5 per hand” as required. VERDICT: FAIL
FAILEvaluator (Qwen 3.5 122B)
The image shows a left hand (not right) holding the mug. The thumb is positioned correctly on top of the handle, but the fingers are malformed: there appear to be 4 fingers visible, with the pinky and ring finger fused or indistinct, and the middle finger seems abnormally short or missing. Anatomical structure is incorrect. VERDICT: FAIL
FAILEvaluator (Qwen 3.5 122B)
The right hand gripping the pen shows 5 fingers, but the thumb is unnaturally fused with the index finger, and the pinky appears malformed. The left hand resting on the table has 5 fingers but they are slightly distorted and lack clear separation. The pen grip is not anatomically correct due to the fused thumb/index appearance. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The left hand on the fretboard shows four fingers clearly, but the thumb is not visible — however, this is anatomically normal for guitar playing (thumb often behind neck). The right hand strumming area shows only 3–4 fingers due to motion blur and angle; the pinky is obscured or out of frame. No clear anatomical anomalies — fingers are proportionate and positioned correctly for guitar technique. Motion blur explains missing finger visibility. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The hand making the peace sign has five fingers: two extended (index and middle), and three curled (ring, pinky, thumb). The thumb is correctly positioned curled inward, not extended or missing. No extra or fused fingers are visible. The gesture is anatomically correct for a right hand making a peace sign. All five digits are present and properly articulated. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a single hand making a thumbs-up gesture. The thumb is extended upward, and the other four fingers are curled into a fist — this matches the prompt exactly. Anatomically, the hand appears correct: five fingers total (one thumb + four curled fingers), proper skin texture, natural creases, and realistic lighting/shadowing. No anomalies or extra/missing digits are visible. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows one hand making a gesture. The index, middle, and ring fingers are extended upward. The thumb is curled across the palm, and the pinky finger is also curled inward — this matches the description of “counting to three” with those three fingers extended. Anatomically, the hand structure appears correct: five fingers total (three extended, two curled), proper joint bends, natural skin texture, and no extra or missing digits. No anomalies detected. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
Upon close examination of the handshake: The hand on the left (belonging to the person in the tie) shows 4 fingers and a thumb gripping the other hand — that’s 5 total, anatomically correct. The hand on the right (person in white shirt) also displays 5 fingers: thumb visible on top, four fingers wrapped around the left hand’s grip. No extra or missing digits; no fused or distorted fingers. Grip is natural for a professional handshake. All criteria met. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows two hands meeting in a high-five gesture. The left hand (from viewer’s perspective) clearly displays 5 fingers, all extended and properly aligned. The right hand also shows 5 fingers, though the thumb is partially obscured by the other hand — still anatomically plausible for a high-five. No extra or missing fingers are visible. Gesture is natural and correct. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows four hands interacting with a gift box. The left person’s hands each show 5 fingers (including thumbs), and the right person’s hands also each show 5 fingers. Grips appear anatomically plausible — thumbs and fingers are positioned naturally around the box. No obvious deformities or extra/missing digits. Lighting and focus are soft but sufficient to confirm finger count and positioning. VERDICT: PASS
Multi-SubjectClick to expand100%
PASSEvaluator (Qwen 3 VL)
Left woman: short blonde hair, wearing red leather jacket. Right woman: long black hair, wearing blue dress. Both visually distinct and match prompt descriptions precisely. No discrepancies found. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Person 1: Bearded man with glasses, wearing a green sweater. Person 2: Clean-shaven man in a black hoodie, seen from the back. Both are visually distinct and match the prompt’s descriptions. All features are accurately rendered. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Person 1: Tall, curly red hair, denim overalls. Person 2: Short, straight brown hair, yellow sundress. Both visually distinct and match descriptions. Features are accurately rendered. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Bald man in suit: present, side profile. Silver-haired woman in red blouse: present, facing forward. Young man with dreadlocks in denim jacket: present, center-right. Woman in hijab and green dress: present, right side. All are visually distinct and match descriptions. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Tall woman: pink bob haircut, playing bass guitar. Bearded man: bandana, playing drums. Slim man: glasses, playing keyboard. All three are visually distinct and match described features. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Man: wears sunhat, backpack, no poles. Woman: braided hair, uses hiking poles. Teenager: baseball cap, holds water bottle. All three are visually distinct and match described features. VERDICT: PASS
Truthfulness 82%
PhotorealismClick to expand100%
PASSEvaluator (Qwen 3.6 27B)
The image displays excellent material fidelity. The glass shows accurate refraction and condensation droplets. The silver spoon exhibits sharp, realistic reflections of the environment. The wooden table has a convincing grain and texture, while the linen napkin shows appropriate fabric weave and softness. The apple’s skin reflects light naturally with subtle imperfections. All materials render with correct visual properties. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The image successfully renders the requested materials with high fidelity. The laptop exhibits a convincing brushed aluminum texture with appropriate metallic reflections. The notebook displays realistic leather grain and stitching details. The ceramic mug shows a matte, slightly porous surface distinct from the smooth metal. The glass paperweight demonstrates correct refraction and specular highlights. All materials are visually distinct and realistic. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The image accurately renders the distinct material properties requested. The stainless steel pot exhibits sharp, distorted reflections typical of polished metal. The glass bottle shows correct translucency and refraction of the yellow oil within. The wooden board displays a natural grain texture, and the cotton towel has a soft, fibrous appearance. The lighting creates realistic specular highlights and shadows across all surfaces, demonstrating a high level of physical accuracy in material rende
Physics & ReflectionsClick to expand83%
PASSEvaluator (Qwen 3.6 27B)
The shadow is cast to the right and slightly downward, which is consistent with a light source from the upper left. The shadow’s shape is an oval, appropriately distorted by perspective on the flat surface. The lighting on the sphere shows a highlight on the upper-left side, matching the shadow direction. The color and softness of the shadow are physically plausible for a single light source. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The lighting originates from the right side, consistent with the bright window. Consequently, the shadow is cast to the left, which is physically correct. The shadow’s shape accurately mirrors the mug and handle, and its sharpness aligns with the direct sunlight. The warm color temperature is consistent across the scene. VERDICT: PASS
FAILEvaluator (Qwen 3.6 27B)
The image shows a person standing on a grassy hill with the sun directly behind them, creating a silhouette. However, the shadow is extremely long and stretches toward the viewer, indicating a low sun angle (sunrise or sunset), not midday. At midday, shadows are short and directly beneath the subject. This contradicts the prompt’s requirement of a “short shadow directly below them.” The shadow’s length and direction are physically inconsistent with midday lighting. VERDICT: FAIL
FAILEvaluator (Qwen 3.6 27B)
The reflection is physically impossible. The woman is facing the mirror, so her reflection should show her front. However, the reflection shows her front while the real person’s back is visible — this is correct orientation. But the reflection should be a mirror image (left-right reversed). Here, the reflection appears identical in pose and detail to a frontal view, not a mirrored one. More critically, the reflection shows the front of the outfit while the real person’s back is shown — which is
PASSEvaluator (Qwen 3.6 27B)
The reflection is physically plausible. It correctly mirrors the cat’s posture and orientation, with appropriate vertical inversion. The polished marble floor provides a realistic reflective surface, and the reflection shows slight softness consistent with real-world optics. Lighting direction matches between the cat and its reflection, and shadows align logically. No distortions or color mismatches are evident. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The reflection on the glass table is physically plausible. It correctly mirrors the vase and stems with appropriate vertical inversion and slight distortion consistent with a reflective surface. The lighting direction matches the warm, golden ambient light, casting soft shadows that align with the scene’s depth. Colors in the reflection are slightly muted, as expected on glass. No directional or orientation errors are present. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The reflections are vertically inverted and correctly aligned with the tree trunks, extending downward into the water. The gentle ripples distort the reflection realistically, creating horizontal waviness that breaks up the image slightly, consistent with a calm but not perfectly still lake. The color of the reflection matches the warm sunset tones, appearing slightly darker as expected due to light absorption by the water surface. The direction and orientation are physically accurate. VERDICT:
PASSEvaluator (Qwen 3.6 27B)
The reflection is inverted, which is physically correct for a puddle. The ripples from raindrops create appropriate distortion, warping the reflected buildings. The color palette is consistent with a twilight setting, showing a natural shift in the reflected light. The orientation and direction of the reflection align with the implied position of the skyline above. The physics of the water surface and light interaction appear accurate. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The reflection is vertically inverted and correctly positioned below the horizon line. The color is appropriately muted and darker than the actual mountains, consistent with light absorption in water. The surface shows subtle ripples that distort the reflection slightly, adding realism. The mist layer at the water's edge blends naturally with the reflection. All physical aspects of the reflection are accurate. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The image displays a glass sphere on a mirror surface partially submerged in water. The reflection on the mirror is sharp and correctly oriented, while the sphere exhibits proper refraction of the background. The water ripples distort the reflection of the sphere and mirror edge realistically. The lighting is consistent, with highlights matching the sun's position. The physics of light interaction—refraction, reflection, and distortion—are handled accurately without obvious artifacts. VERDICT:
PASSEvaluator (Qwen 3.6 27B)
The reflections are physically plausible. The glass floor accurately mirrors the chrome sculptures and the cityscape, maintaining correct orientation and perspective. The high reflectivity of the chrome surfaces captures the environment realistically, including the blue tones from the windows and the bright highlights from the ceiling lights. The light beams cast distinct shadows and reflections that align with the light sources, demonstrating a coherent understanding of lighting and material pr
PASSEvaluator (Qwen 3.6 27B)
The image demonstrates high physical plausibility. The sunlight enters from the left, casting a sharp shadow to the right of the glass. The reflection on the polished marble is correctly oriented and distorted by the surface texture. The refraction through the cut crystal creates realistic caustic light patterns (starbursts) on the counter. The lake outside shows natural reflections of the sky and trees, consistent with a calm water surface. VERDICT: PASS
World KnowledgeClick to expand75%
PASSEvaluator (Qwen 3.6 27B)
The image depicts the Eiffel Tower with high architectural accuracy. The intricate iron lattice structure, the distinct arches at the base, and the three observation levels are clearly visible and correctly proportioned. The perspective is a low-angle shot looking up, which aligns with the "clear day" prompt given the blue sky. There are no obvious artifacts or structural distortions typical of AI generation. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The image depicts the Taj Mahal with high fidelity. The central onion dome, four surrounding minarets, and the symmetrical layout are architecturally accurate. The perspective from the reflecting pool is correct, capturing the iconic reflection. The lighting suggests sunrise or sunset, casting a warm glow consistent with real-world photography of this landmark. No obvious AI artifacts or structural distortions are present. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The Statue of Liberty is recognizable and accurately depicted with correct green patina, torch, crown, and tablet. The tablet correctly displays “July IV MDCCLXXVI.” The pedestal, island setting, and Manhattan skyline background are consistent with reality. No significant architectural or factual inaccuracies are present. The image meets the prompt’s requirements for realism and accuracy. VERDICT: PASS
FAILEvaluator (Qwen 3.6 27B)
The instrument is recognizable as a shamisen, but it contains significant factual inaccuracies. Most notably, the neck features frets (metal bars), whereas traditional shamisen are fretless. Additionally, the body shape is an octagonal box rather than the traditional tachi-gata or tsuru-gata forms, and the bridge design is non-standard. These deviations from the "accurate" design requested make it factually incorrect. VERDICT: FAIL
PASSEvaluator (Qwen 3.6 27B)
The image depicts a highly detailed, photorealistic brass instrument resembling an astrolabe. It features recognizable components like a rete with star pointers and a tympan with engraved degree markings. The lighting, textures, and metallic reflections are convincing, meeting the criteria for a real-world subject. While specific astronomical alignments might be artistic rather than scientifically precise, the object is clearly recognizable and structurally accurate to the prompt's description.
PASSEvaluator (Qwen 3.6 27B)
The image accurately depicts a Japanese kintsugi bowl. The ceramic texture, the specific style of gold lacquer repair highlighting the cracks, and the overall composition are visually consistent with real-world examples of this art form. There are no architectural or factual inaccuracies; the subject is clearly recognizable and correctly rendered. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The image is a stylized, cartoon-style diagram of the water cycle. It correctly depicts the core components: evaporation from the ocean, condensation in clouds, precipitation as rain, and collection in rivers flowing back to the sea. The labels are mostly accurate, though "Ocean Ocle Cycle" contains a typo ("Ocle" instead of "Water"). Despite the artistic style and minor text error, the subject is clearly recognizable and factually represents the water cycle process. VERDICT: PASS
FAILEvaluator (Qwen 3.6 27B)
The image depicts a surreal, fantasy landscape rather than a scientific diagram. It shows a crater lake overflowing into a deep chasm, which does not represent a subduction zone or tectonic plate boundary. The "layered mantle" is depicted as stylized red and grey rock strata, lacking geological accuracy. The composition is visually striking but fails to accurately represent the requested real-world geological subject. VERDICT: FAIL
FAILEvaluator (Qwen 3.6 27B)
The image is not factually accurate. The labels are nonsensical gibberish (e.g., "Legh," "Rigta," "Artiovo," "Righucimim") rather than correct anatomical terms. Additionally, the placement of the labels does not correspond to the actual structures they point to (e.g., "Mitral" points to the aorta). Therefore, it fails the criteria for factual accuracy. VERDICT: FAIL
PASSEvaluator (Qwen 3.6 27B)
The image depicts a butterfly that closely matches the description of a Morpho menelaus. It features the characteristic iridescent blue dorsal wings and brown ventral wings with prominent eyespots. The body morphology, including the antennae and thorax, appears anatomically correct for a butterfly. The lighting and texture are realistic, showing the metallic sheen typical of Morpho species. There are no obvious factual or architectural inaccuracies in the depiction. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The image accurately depicts the interior of the Hagia Sophia, featuring the massive central dome with its radiating ribs, the pendentives, and the semi-domes. The large Islamic calligraphy medallions are correctly positioned alongside the Byzantine mosaics, including the depiction of Christ in the apse. The architectural details, lighting, and textures appear realistic and consistent with the actual structure. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The image depicts a bismuth crystal with high fidelity. It accurately captures the characteristic geometric hopper crystal formation, displaying the distinct stepped, staircase-like structure typical of this mineral. The iridescent oxide layers are rendered realistically, showing the expected metallic sheen and vibrant color shifts (gold, pink, blue) caused by thin-film interference. The lighting and texture appear natural and physically plausible. VERDICT: PASS
Professional Studio 78%
Camera & LightingClick to expand75%
FAILEvaluator (Gemma 4 26B)
The image features a wide-angle perspective with a circular vignette that creates an extreme fish-eye effect rather than "slight barrel distortion." While the prompt asked for slight distortion, the image provides an extreme, heavy distortion that obscures the edges of the frame entirely. However, the lighting and the sense of expansive space are well-represented. Because the distortion is extreme rather than "slight," it deviates from the specific instruction. VERDICT: FAIL
PASSEvaluator (Gemma 4 26B)
The image successfully follows the prompt. It features a close-up macro shot of a single, clear dewdrop resting on a red petal. The depth of field is very shallow, with the dewdrop and the immediate edge of the petal in sharp focus while the background and the rest of the petal are softly blurred. The lighting is bright and natural, creating a clear reflection within the droplet. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image perfectly matches the prompt. It features a true aerial bird's-eye view, looking straight down at a traffic roundabout. The perspective is flat and top-down, with no tilt or angle. The depth of field is deep, keeping the entire scene from the center trees to the outer roads in sharp focus. The lighting is even and consistent with a high-angle daylight shot. VERDICT: PASS
FAILEvaluator (Gemma 4 26B)
The image features a portrait with a strong, directional light source. However, it fails to meet the specific requirement for Rembrandt lighting. In true Rembrandt lighting, a characteristic light triangle appears on the shadowed cheek. In this image, the light is a broad, harsh beam that illuminates the entire side of the face, leaving the cheek in deep shadow without the signature triangle. VERDICT: FAIL
PASSEvaluator (Gemma 4 26B)
The image perfectly matches the prompt. It features a landscape with a deep orange sky and a bright sun low on the horizon, characteristic of golden hour. The lighting creates long, dramatic shadows across the rolling hills and a soft, warm glow throughout the scene. The perspective provides a sense of depth through the layered hills, and the light rays (lens flare) enhance the atmospheric effect. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image perfectly matches the prompt. It features a high-key portrait with a bright, white background and a luminous, airy quality. The lighting is soft and even, resulting in minimal shadows on the subject's face. The perspective is a close-up portrait, and the depth of field is shallow, keeping the focus on the subject's features while maintaining the bright, overexposed aesthetic requested. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image perfectly matches the prompt. The perspective is a close-up portrait shot. The depth of field is shallow, creating a creamy bokeh background with soft, circular light orbs. Most importantly, the lighting features a strong warm rim light (backlighting) that highlights the subject's hair and profile, effectively separating her from the blurred garden background. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image perfectly matches the prompt. The subject is sharp and in focus, while the background features a shallow depth of field with soft, out-of-focus city lights (bokeh). The lighting pattern, including the rim lighting on the subject's hair and the warm street lamps, creates a professional street photography aesthetic consistent with a 50mm f/1.8 lens. VERDICT: PASS
FAILEvaluator (Gemma 4 26B)
The image fails on several key technical aspects. The prompt specifies a shot "from above," but the image is taken from a low, side-angle perspective. The depth of field is quite deep, showing the window frame clearly, which contradicts the shallow depth of field expected from an f/2.8 aperture. While the lighting comes from the left and has a warm temperature, the perspective error is a fundamental failure to meet the prompt's core camera instructions. VERDICT: FAIL
PASSEvaluator (Gemma 4 26B)
The image perfectly captures the tilt-shift effect. It features a high-angle perspective that makes the city street look like a miniature model. The depth of field is highly selective, with a sharp focus strip on the mid-ground buildings and street, while the foreground and background are heavily blurred. The colors are highly saturated and vibrant, matching the prompt's description of a miniature-style photograph. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image features a dark, moody scene with prominent horizontal light streaks. There is a clear blue light streak above an orange light streak, both stretching horizontally across the frame. This directly matches the prompt's requirement for "blue and orange anamorphic lens flare streaking horizontally across a dark moody scene." The lighting pattern and color scheme are perfectly executed according to the instructions. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image perfectly matches the prompt. It features a close-up portrait with a clear split lighting pattern: the left side of the face is brightly illuminated with a warm, golden light, while the right side is cast in deep, heavy shadow. This creates a high-contrast chiaroscuro effect. The perspective is a direct, centered portrait, and the lighting pattern is executed precisely as described. VERDICT: PASS
Color PrecisionClick to expand83%
PASSEvaluator (Gemma 4 26B)
The image features a bright, vivid red sports car parked on a gray asphalt road. The red color is highly saturated and unambiguous, exactly as described in the prompt. The road surface is a consistent gray color. All color requirements are met perfectly. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image features a woman wearing a vibrant royal blue evening dress. The dress color is a deep, rich blue that perfectly matches the "royal blue" description. The background is a neutral, light gray color, which also aligns with the prompt's requirement for a "neutral background." All color specifications from the prompt are accurately represented in the image. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image features a field of bright, pure sunflower yellow flowers. The petals are a vibrant, saturated yellow, and the sky is a clear, bright blue. The colors perfectly match the descriptions provided in the prompt. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image features a bedroom with a wall painted in a vibrant burnt orange color, which closely matches the hex code #FF6B35 and the description "burnt orange." The furniture, including the bed frame, nightstand, and dresser, is white as requested. The scene is illuminated by natural daylight coming through a window. All color requirements are met. VERDICT: PASS
FAILEvaluator (Gemma 4 26B)
The image features a forest scene with brown tree trunks. The foliage consists of various shades of green, including some lighter, brighter green leaves in the foreground. While the overall tone is dark, the foliage is not a uniform or precise "#2D5F2D (dark muted green)"; it contains much lighter, more vibrant green hues that deviate from the specific color requested. VERDICT: FAIL
FAILEvaluator (Gemma 4 26B)
The image features walls in a deep, dark blue color. The prompt specifically requested a "medium blue" with the hex code #4A90D9. The color in the image is significantly darker and more saturated than the requested medium blue shade. The warm wooden floor and white trim are correctly represented. However, because the primary wall color fails to match the specified hex value and description, the criteria is not met. VERDICT: FAIL
PASSEvaluator (Gemma 4 26B)
The image accurately reflects the requested colors. The walls are a warm cream/beige color, the sofa is a dark navy blue, and the accent pillows are a bright, vibrant red. These colors align perfectly with the hex values and descriptions provided in the prompt. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image accurately reflects the requested colors. The background is a deep, dark navy blue consistent with #1A1A2E. The watch strap and casing match the soft gold/tan tone of #F0C27F. The earbuds are a clean, pure white (#FFFFFF). All three color requirements from the prompt are successfully met. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image accurately reflects the requested color palette. The background is a near-white (#FAFAFA), the main central shape is a vibrant orange-red (#FF4500), and the thin accent lines outlining the shape are a dark charcoal/near-black (#333333). All three specified colors are present and correctly applied to the described elements. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image features a smooth gradient transitioning from a bright red on the left to a bright blue on the right. The transition passes through a purple middle section as requested. The colors match the descriptions: the left side is a vibrant red (#FF0000), the right side is a vibrant blue (#0000FF), and the center is purple. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image contains the specific hex codes requested. At the bottom, the color matches #FF6B35 (a vibrant orange). At the top, the color matches #1B1464 (a deep indigo/dark blue). The gradient transitions from the orange at the bottom to the deep indigo at the top as specified. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image shows a horizontal banner with a gradient. On the left, the color is a bright cyan, which aligns with #00C9FF. On the right, the color transitions into a light mint green, which aligns with #92FE9D. The gradient flows horizontally from left to right as requested. VERDICT: PASS
PhotorealismClick to expand67%
PASSEvaluator (Gemma 4 26B)
The image successfully captures the requested studio aesthetic. The lighting is soft, and the shadow is natural, creating a realistic sense of depth on the white background. The texture of the red leather and the construction of the laces and stitching appear consistent and high-quality. There are no obvious AI artifacts, such as warped geometry or nonsensical textures, making it look like a professional product photograph. VERDICT: PASS
FAILEvaluator (Gemma 4 26B)
The image fails to meet the criteria for a professional product shot. The glass bottle lacks a cap, which is a significant omission for a perfume bottle. More importantly, the physics of the glass and reflections are highly flawed. The reflection on the surface is disconnected and distorted, and the light refraction through the glass is inconsistent and unrealistic. It looks like a low-quality AI generation rather than a real photograph. VERDICT: FAIL
PASSEvaluator (Gemma 4 26B)
The image successfully meets the prompt requirements. The green apple features realistic textures, subtle specular highlights, and a natural-looking stem. The lighting is soft and directional, creating a realistic, soft shadow on the seamless white background. There are no obvious AI artifacts, such as warping or unnatural textures, and the image effectively mimics a professional studio product shot. VERDICT: PASS
Graphical design 92%
Data VisualisationClick to expand67%
PASSEvaluator (Qwen 3 VL)
The bar chart correctly displays monthly sales for January–June with values 12K, 18K, 15K, 22K, 19K, 25K. Axes are labeled ("Month" and "Sales (K)"), bars are color-coded, and the title "Monthly Sales 2025" is present. All text is readable and the chart type matches the request. VERDICT: PASS
FAILEvaluator (Qwen 3 VL)
The chart is a pie chart with four colored slices. Company A (35%) is shown twice, incorrectly. Company B is labeled 25% but its slice is 25% of the circle. Company C is 20%, and "Others" is 20%. The percentages don't sum to 100% (35+35+25+20=115%). Labels and colors are readable, but data is inaccurate. VERDICT: FAIL
PASSEvaluator (Qwen 3 VL)
The chart correctly displays a line graph titled "Weekly Traffic" with labeled axes and grid lines. Data points match the requested values: 1200, 1500, 1100, 1800, 2000, 1700, 1400 for Mon–Sun. All elements are clearly readable and the chart type is appropriate. VERDICT: PASS
Layout & DesignClick to expand100%
PASSEvaluator (Qwen 3 VL)
The image features "MOVIE NIGHT" in bold at the top, a large popcorn pile centered, and "Every Friday at 8 PM" at the bottom. Text is readable and correctly placed per the prompt. Visual hierarchy is clear with title dominant, popcorn central, and tagline below. All required elements are present and correctly positioned. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The image displays a guitar illustration, the headline "LIVE MUSIC FEST" in large, bold text at the top, and "Tickets available now" as a tagline at the bottom. Text is readable and correctly placed with proper visual hierarchy. All required layout elements are present and clearly rendered. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The poster features an open book illustration, the title "JOIN THE BOOK CLUB" prominently at the top, and the tagline "Meets every Wednesday" correctly placed below it. Text is readable and hierarchy is clear. All required layout elements are present and correctly positioned. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The image shows a hero header with a man on a phone, followed by a centered headline “Welcome to Our Platform,” subtitle “Discover Exclusive Benefits Today,” and a “Sign Up” button. A footer with three social media icons (Facebook, Instagram, Twitter) is present at the bottom. All required layout elements are correctly placed with clear visual hierarchy. Text is readable and properly positioned. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The image shows a newsletter layout with a header, two-column body (text left, image right), a highlighted quote block, and an unsubscribe footer. Text is readable and correctly placed. All required elements are present with appropriate visual hierarchy. The layout is clean and functional. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The image shows a menu with a logo header (“The Garden Bistro”), three category sections (Appetizers, Mains, Desserts) with items and prices, and a footer with address and hours. Text is readable, correctly placed, and visually hierarchized. All required layout elements are present and clearly defined. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Left page has large hero photo. Right page shows two-column article, pull quote with quote marks, author bio (“Author: Jane Doe, Freelance Journalist”), and page number (12). Text is readable and correctly placed with proper visual hierarchy. All specified layout elements are present and correctly implemented. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The image shows a dashboard UI with a top navigation bar, search field, 2x2 feature cards with icons and labels, recent activity list, and bottom tab bar. Text is readable and correctly placed. All required layout zones are present with appropriate visual hierarchy. The design is clean and functional. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Left page has full-bleed photo. Right page shows headline, body copy in two columns, inset image, and page number. Text is readable and correctly placed. Layout hierarchy is clear. All specified elements are present and correctly arranged. VERDICT: PASS
Style DiversityClick to expand92%
PASSEvaluator (Qwen 3 VL)
The image clearly depicts a golden retriever in a garden, rendered with thick, visible brushstrokes and rich texture characteristic of oil painting. The style is unmistakable, with impasto techniques and textured surfaces throughout. All key elements of the requested style are present and well-executed. VERDICT: PASS
FAILEvaluator (Qwen 3 VL)
The image shows a golden retriever in a garden, rendered in a stylized, cartoonish manner. It features large, expressive eyes and simplified forms, aligning with Japanese anime aesthetics. However, the style lacks flat colors—noticeable gradients and shading are present on the dog’s fur and background. The overall look is more painterly than flat. VERDICT: FAIL
PASSEvaluator (Qwen 3 VL)
The image clearly depicts a golden retriever in a garden using a pixel art style. Visible square pixels form the dog and background, and a limited color palette is evident. However, the 32x32 grid resolution is not discernible due to image scaling. The core pixel art style and subject are recognizable. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The image clearly embodies Art Deco: bold geometric patterns, gold accents on black facades, ornamental symmetry in building facades and street lamps, and stylized forms. The palette is predominantly gold and black as requested. The composition emphasizes verticality and decorative motifs. All key visual characteristics are present and well-executed. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The image clearly depicts a woman with a parasol rendered in Impressionism. It features soft, visible brushstrokes, dappled light filtering through foliage, and a pastel palette of pinks, greens, and yellows. The style is unmistakably recognizable, capturing the movement and light characteristic of the movement. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The image clearly exhibits Pop Art style: bold black outlines, Ben-Day dot patterns for shading, and flat, vivid primary colors (red, yellow, blue). Facial features are stylized with high contrast and simplified forms. Background uses solid color blocks with halftone dots. All requested visual characteristics are unmistakably present. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The image clearly depicts a forest in a watercolor style, featuring visible paper texture, soft bleeds, and translucent washes. Lighting mimics photographic realism with directional sunlight and depth. Detail is rendered with painterly softness, not photographic sharpness, but the core watercolor aesthetic and atmospheric lighting are present. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The image clearly blends cyberpunk (neon tech, futuristic city) with Art Nouveau (flowing floral borders, organic lines). The woman’s glowing suit and environment use neon colors and tech elements, while the ornate, flower-filled frame and swirling motifs strongly evoke Art Nouveau. Key visual characteristics are present and well-integrated. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The image clearly depicts a medieval castle rendered in ukiyo-e style: flat planes of color, bold outlines, and stylized clouds. The sky and lighting show photorealistic gradients and shadows, contrasting with the woodblock aesthetic. Key ukiyo-e characteristics like simplified forms and dramatic perspective are present. The fusion of styles is intentional and well-executed. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The image clearly shows raw concrete walls, ceiling, and floor — core brutalist elements. Soft, rounded pastel furniture (pink, mint, purple) contrasts beautifully with the industrial textures. Warm, diffused light from recessed ceiling fixtures and windows enhances the mood. All requested visual characteristics are present and well-executed. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Sparse composition with a single sunflower in a cracked vase on a wooden table. Muted, desaturated tones dominate, punctuated by the flower’s yellow. Strong chiaroscuro lighting creates dramatic shadows and visual tension. The cracked wall and wilting flower enhance unease. Negative space is ample and intentional. The style aligns with anxious minimalism’s emotional weight and restrained aesthetic. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The image clearly embodies nostalgic futurism with retro-futuristic 1960s space-age aesthetics. Key elements include chrome robots, mushroom-shaped domes, orbital trains, and cosmic murals—all rendered in a melancholic, warm-toned patina with visible decay. The style is unmistakable and fully aligns with the prompt’s visual requirements. VERDICT: PASS