ImageBench V1 —
192 evaluations across 6 categories
Benchmark V1 verdicts are produced by VLM judges and can contain mistakes. Treat PASS/FAIL labels as machine-assisted assessments, and inspect the images yourself. Learn more about the methodology.

Prompt: The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink
Text Rendering 47%
Typography StyleClick to expand100%
Typography Style measures whether fal/bria/fast preserves requested lettering styles, font mood, and text texture such as typewriter ink or neon signage.
PASSEvaluator (Qwen 3 VL)
The text “CHAPTER ONE” is spelled correctly and rendered in a bold, blocky font resembling vintage typewriter lettering. The ink appears slightly uneven and textured, matching the “slightly uneven ink” description. The paper shows aged, yellowed edges, enhancing the vintage feel. The style convincingly mimics a typewriter output on aged paper. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Text reads "TODAY'S SPECIAL" correctly. Letters mimic chalk strokes with uneven thickness and smudges, and chalk dust is visible below. Dark green background matches chalkboard. Style convincingly simulates white chalk writing. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The word "OPEN" is correctly spelled and rendered in glowing red LED pixels, matching the described medium. Letters are formed by illuminated dots, with visible bezel and diffused background, convincingly simulating an LED sign. Lighting and texture details are authentic. VERDICT: PASS
Writing accuracyClick to expand33%
Writing accuracy measures whether fal/bria/fast renders the exact requested words, numbers, labels, and short phrases without spelling or layout errors.
PASSEvaluator (Qwen 3 VL)
The word "HELLO" is fully spelled correctly and is clearly readable in bold red letters on a white background. All characters are intact and legible. No misspellings, missing letters, or illegible parts are present. The text matches the intended specification exactly. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The word "ADVENTURE" is spelled correctly and fully readable in bold, centered text on a plain cream background. All letters are intact and clearly legible. No misspellings or illegible parts detected. The design matches the intended specification exactly. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The word "SERENITY" is fully spelled correctly and is large, centered, and clearly readable against the solid light blue background. No letters are missing or distorted. The text meets all specified criteria without errors. VERDICT: PASS
FAILEvaluator (Qwen 3 VL)
The second line reads "joxfmpshigh" — "fox" is misspelled as "jox", and "jumps high" is rendered as "fmpshigh", missing letters and altering the intended phrase. Text is not fully readable or accurate. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
Text shows multiple errors: "sunset" is misspelled as "sustet", "brings" is correct, "down" is correct, but "a new ahead" should be "a new day". "Ahead" is incorrect. All words are readable, but content is inaccurate. VERDICT: FAIL
PASSEvaluator (Qwen 3 VL)
The text reads: "Stars shine in the dark night" — all words are spelled correctly and fully legible. The phrase matches the intended text exactly. Background is a gradient from off-white to dark blue with star effects, not a solid off-white. VERDICT: PASS
FAILEvaluator (Qwen 3 VL)
Text contains multiple errors: “item” is misspelled as “iteme”; “Terms conditions apply. obve fns” is garbled; “Teres and Sore zore for for Se detlls..” is illegible. All text is not readable or correctly spelled. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
Text errors: “Limited svva.” (misspelled “seating”), “RSVP require_” (incomplete), “Required. retard. Pmised. stanignd retiiable.” (garbled, nonsensical). All text below the main content is illegible or incorrect. Core message not accurately conveyed. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
Text contains multiple errors: “fine” misspelled as “tine”; “dining” split across lines; “Black tie optional” misrendered as “Bors tic robtiol.”; “Doors open at 6 PM” as “opon open. a6 PM..”. All text is illegible or incorrect. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
Text contains multiple misspellings and illegible parts: “THE LAST LAST” (redundant), tagline “One. ship. one Vcs. No and baick.” (incorrect), credits like “Thup ugle R7” and “Jsc ssmroc” are gibberish. Names and roles are garbled. No text is fully readable or correctly spelled as intended. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
Multiple text elements are misspelled or illegible: “Cammabby” (should be “Cambodia”), “Dr. Maria Santos” distorted, “Climate Action” garbled, “Top 10 Trails” misread, “$6.99” as “607.00” with “Tker” nonsense. Barcode area shows “Issue 247” but bottom-left says “Issue 227”. Many words are unrecognizable. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
Text contains multiple misspellings and illegible sections. “Unfiltered” is garbled; “No additives or preservatives” is unreadable; “Best before: 06/2026” is misrendered as “06/2066”; “Store in cool, dry place” is garbled; “Product of USA” is partially illegible. Core text is not readable as intended. VERDICT: FAIL
Spatial Reasoning 75%
Attributes BindingClick to expand89%
Attributes Binding measures whether fal/bria/fast attaches the right colors, materials, and properties to the correct requested objects.
PASSEvaluator (Qwen 3.5 122B)
The image shows two objects on a white surface: a red sphere (ball) and a blue cube. Both are correctly colored and shaped as described. They are positioned side by side. No attributes are swapped — the ball is red, the cube is blue, both are appropriately sized relative to each other for this composition. All prompt requirements are met. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a green apple and an orange pumpkin on a wooden cutting board. The apple is correctly green, round, and has a stem. The pumpkin is correctly orange, round, and has a textured surface with a stem. Both objects are appropriately sized relative to each other and placed on the board as described. No attribute swaps or misbindings are present. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows two vehicles parked side-by-side on a street. The vehicle on the left is a yellow sedan with a "TAXI" sign on top, matching the description of a yellow taxi. The vehicle on the right is a long, black luxury car with multiple doors and tinted windows, matching the description of a black limousine. Both objects have their attributes (color, type) correctly bound without any swaps. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains three shapes on a white background: a large red circle, a small blue triangle, and a medium green star. All attributes (color, size, shape) are correctly assigned to each object with no swaps. The arrangement matches the prompt’s description. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows three candles on a white shelf against a dark background. 1. Left candle: Tall, thin, white — matches “tall thin white candle” 2. Middle candle: Short, fat, red — matches “short fat red candle” 3. Right candle: Medium height, spiral shape, yellow — matches “medium spiral yellow candle” All attributes (color, size, shape) are correctly bound to each object with no swaps. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image contains a big brown horse, a small white rabbit, and two black cats (not one). The prompt requested one medium-sized black cat, but two are present. All other attributes (color, size, shape) are correctly bound to their respective objects. Since the quantity of cats is incorrect, the core criteria is not fully met. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image contains a pink fire truck with correct details (ladder, wheels, shape). A large blue banana is present with correct curvature and stem. A green spherical object (sun) appears in the sky. All requested objects are correctly colored and shaped without attribute swaps. The surreal landscape context is also maintained. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains: 1. Purple carrot – correct color and shape. 2. Red tree with red leaves – correct color and form. 3. White watermelon sliced open – correct internal color and cut state, though exterior is green (typical for watermelon). All attributes are correctly bound to their respective objects. No swaps or mismatches observed. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains an orange elephant, a yellow penguin, and a silver object that resembles a strawberry or ornament. However, there is an additional unrequested orange bird-like figure with a silver hat in the background. The core requested objects are present with correct colors and shapes. The silver object is ambiguous but plausibly interpreted as a stylized strawberry. Since all requested items are correctly attributed and no swaps occurred, the core criteria is met. VERDICT: PASS
CompositionalityClick to expand100%
Compositionality measures whether fal/bria/fast combines subjects, actions, outfits, props, and scene details into one coherent requested image.
PASSEvaluator (Qwen 3.5 122B)
The image contains all described elements: 1. Tabby cat: Present with distinct stripes and markings. 2. Oversized red sunglasses: Clearly visible on the cat’s face, fitting the description. 3. Coherent combination: The sunglasses sit naturally on the cat’s face, creating a believable and humorous effect. All components are accurately represented and integrated. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains all described elements: 1. Golden retriever: Present and clearly identifiable. 2. Tiny cowboy hat: Present on the dog’s head, proportionally sized appropriately. 3. Coherent combination: The hat sits naturally on the dog, with no visual inconsistencies or distortions. All components are accurately rendered and logically integrated. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a cartoon penguin standing on a sandy beach with the ocean and sky in the background. The penguin is holding a colorful rainbow umbrella. All described elements are present: 1. Penguin - Present 2. Holding - Present (penguin is holding the umbrella) 3. Colorful umbrella - Present (rainbow-colored) The elements are coherently combined in a logical scene. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a figure in full medieval armor, including helmet, breastplate, and gauntlets, seated at a modern office desk. The knight’s gloved hands are positioned on a laptop keyboard, with the screen displaying text. A notebook and pen rest nearby. All prompt elements — medieval knight, full armor, desk, modern laptop, typing action — are present and coherently combined. No logical or visual contradictions exist. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains all required elements: 1. Roman gladiator - Present (wearing helmet, armor, shield) 2. Taking a selfie - Present (holding phone up, arm extended) 3. Smartphone - Present (black rectangular device in hand) 4. Colosseum - Present (arched stone architecture in background) All elements are coherently combined in a single scene. The gladiator is positioned in front of the Colosseum while taking a selfie with a smartphone. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains all required elements: 1. Astronaut in a spacesuit: Present (white suit, helmet, backpack). 2. Riding a bicycle: Present (astronaut is seated on and pedaling a red bicycle). 3. Through a park: Present (green grass, trees, and park lamp post in background). The concepts are coherently combined in a single scene. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains all described elements: 1. Octopus - Present (blue, central figure) 2. Top hat - Present (black, on octopus head) 3. Monocle - Present (gold-rimmed, over one eye) 4. Chess board/pieces - Present (octopus tentacles interacting with pieces) 5. Moon surface - Present (cratered gray background) 6. Earth in background - Present (blue planet visible at top) All elements are coherently combined in a surreal composition. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
Analysis of elements: 1. Dinosaur: Present (T-Rex head/body). 2. Business suit: Present (grey suit, white shirt, red tie). 3. PowerPoint presentation: Partially present. The laptop screen shows colorful bars resembling a presentation, but the dinosaur is holding the laptop out rather than standing at a podium or screen. 4. Modern office: Present (background blur, desk, office supplies). 5. Drinking coffee: Present (holding a metal mug). The core concepts are all present and combined coherently.
PASSEvaluator (Qwen 3.5 122B)
The image contains: - Robot dressed as a pirate: Present (metallic face, eye patch, pirate hat, coat) - Paper boat: Present (white origami-style boat) - Lake of lava: Present (glowing red-orange water with reflections) - Juggling three kittens: Present (three white cats in mid-air around the robot) All elements are coherently combined in a surreal, dramatic scene. VERDICT: PASS
CountingClick to expand44%
Counting measures whether fal/bria/fast creates the exact number of requested objects without adding or missing items.
PASSEvaluator (Qwen 3.5 122B)
The image displays exactly three red apples placed on a wooden surface. The apples are arranged in a triangular formation, with one in front and two behind. The count matches the prompt’s request precisely. No extra or missing objects are present. The wooden texture is consistent and realistic. Lighting and shadows appear natural. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image displays exactly five yellow rubber ducks arranged in a horizontal row on a white shelf, consistent with a bathroom setting. Each duck is clearly visible and matches the description. The count of ducks is precisely five, matching the prompt’s requirement. No extra or missing objects are present. All criteria are fully satisfied. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows five white candles standing on a dark surface, not four as requested. All candles are lit and arranged in a staggered formation. Since the count does not match the prompt’s requirement of exactly four candles, the core criterion is not met. VERDICT: FAIL
FAILEvaluator (Qwen 3.5 122B)
The image shows two orange cats and two black dogs sitting on a grassy lawn. The prompt requested 2 orange cats and 3 black dogs. While the cat count is correct, the dog count is off by one — only two dogs are visible, not three. Therefore, the object counts do not exactly match the prompt. VERDICT: FAIL
FAILEvaluator (Qwen 3.5 122B)
The image shows 4 red roses and 4 white daisies (3 in the vase, 1 on the table). The prompt requested exactly 2 white daisies. Since the count of daisies does not match the requested number, the core criterion is not met. VERDICT: FAIL
FAILEvaluator (Qwen 3.5 122B)
The image shows 1 blue butterfly resting on the fence, which matches the request. However, there are 3 yellow butterflies visible in the background — but they are not resting on the fence; they appear to be flying or hovering above it. The prompt specifically required all 3 yellow butterflies to be “resting on a garden fence,” which is not accurate. Therefore, the count and positioning do not exactly match the request. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image shows a round fishbowl with a wooden rim and base. Inside, there are multiple goldfish swimming. Counting them: one on the far left, one near center-left, one in front center, one behind center, one on right side, one near bottom right, one partially visible behind right fish, one small one near bottom left, and one more faintly visible in background — totaling 9 goldfish. The bowl is large and round as described. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a fruit bowl with: - 3 bananas (visible on the left) - 4 oranges (three clearly visible, one partially hidden behind others) - 2 apples (one green-red, one mostly red-green) All counts match the prompt’s request exactly. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image contains 15 balloons, not 11 as requested. The balloons are colorful and floating against a clear blue sky, matching the description except for the count. Since the prompt specifically asked for exactly 11 balloons and there are 15, the core criterion is not met. VERDICT: FAIL
NegationClick to expand78%
Negation measures whether fal/bria/fast respects what should be absent from the image, not only what should be present.
PASSEvaluator (Qwen 3.5 122B)
The image shows a tropical beach scene with white sand, turquoise water, palm trees, and a small boat floating offshore. The sky is blue with scattered clouds. There are no people visible anywhere in the scene — on the beach, in the water, or in the boat. The excluded element (people) is fully absent. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a light-colored cat lying on a beige couch. The cat is looking directly at the camera with green eyes. There is no collar visible around its neck — the fur appears smooth and uninterrupted from chin to shoulders. The excluded element (collar) is truly absent. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows an empty classroom with rows of wooden desks and a chalkboard on the wall. There are no students visible in the room. The excluded element (students) is truly absent. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a sandy beach with numerous palm trees, scattered seashells in the foreground, and a blue-and-white striped beach umbrella. The background consists of sky and more trees; there is no visible water, ocean, or sea horizon line. The excluded element (water/ocean) is truly absent. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows an office workspace with a desk, chair, bookshelves, and a lamp. However, there is clearly a computer monitor (screen) on the desk — a white all-in-one desktop computer with its back facing the viewer. Since the prompt explicitly excluded “computers or screens,” and one is present, the image fails to meet the exclusion criterion. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image shows a kitchen scene with a stove, multiple pots and pans, and a wooden cutting board placed on the stovetop. There is no visible food in any of the containers or on surfaces — only empty cookware and utensils. The excluded element (“food”) is indeed absent. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a pepperoni pizza with tomato sauce, basil leaves, and a golden-brown crust. There is no visible cheese layer — the red sauce is directly beneath the pepperoni and basil, with no melted or shredded dairy product present. The surface texture beneath toppings appears to be sauce only, consistent with the exclusion of cheese. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a silver car body with windows, doors, and headlights clearly visible. The wheels are absent — the wheel wells are empty, showing only the inner arches. This matches the exclusion criteria: no wheels are present. The car is rendered as a shell or chassis without tires or rims. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image clearly shows a human face with eyes, nose, and mouth — as expected. However, the prompt explicitly excluded “eyebrows,” and in this image, well-defined, natural-looking eyebrows are prominently present above each eye. Since the excluded element (eyebrows) is visibly present, the image fails to meet the exclusion criterion. VERDICT: FAIL
Relative PositionClick to expand92%
Relative Position measures whether fal/bria/fast places objects in the requested spatial relationship, such as above, below, inside, or next to another object.
PASSEvaluator (Qwen 3.5 122B)
The image shows a cat positioned directly on the top surface of a cardboard box. The cat’s paws are resting on the box flaps, and its body is elevated above the surrounding floor and scattered cardboard pieces. The spatial relationship “on top of” is accurately represented. No conflicting positional descriptors are present. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a wooden chair with a red ball positioned directly underneath it, between the front legs. The spatial relationship described in the prompt — “a red ball underneath a wooden chair” — is accurately represented. The ball is clearly below the seat and within the footprint of the chair’s legs. All positional descriptors are correct. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a white coffee mug positioned to the left of an open book. Both objects are resting on a wooden desk surface. The spatial arrangement matches the prompt’s description: the mug is next to the book, and both are on the desk. All positional relationships described in the prompt are accurately represented. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a bird perched on a branch that extends from a tree on the left side of the frame. The branch is positioned in front of and to the left of a green park bench. The spatial relationships described — bird on branch, branch extending from tree, tree next to bench — are all accurately represented. All positional descriptors (on, next to) are correct. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a cat sitting on a white pillow, which is placed on a wooden chair. The spatial relationships described in the prompt are accurately represented: the cat is on the pillow, and the pillow is on the chair. All positional relationships are correct. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a white candle placed directly on top of a stack of three books. To the right of the book stack is a potted green plant. The spatial relationships described — “candle on top of books” and “books next to potted plant” — are accurately represented. All positional descriptors (on, next to) are correct. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows five chess pieces on a board, but the spatial arrangement does not match the prompt. The king is centered, with a queen to its left and a bishop to its right — this part is correct. However, there is no rook “above” the king; instead, a knight is positioned in front (below) the king, and another knight is to the right of the bishop. The prompt’s “rook above” and “knight below” are inaccurate or misassigned. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image shows a desk scene with a laptop centered on the desk. A white coffee mug is positioned to the left of the laptop. A smartphone lies to the right of the laptop. A desk lamp is situated above and to the right, casting light downward. A closed notebook rests in front of (below) the laptop on the desk surface. All specified spatial relationships are accurately represented. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows five animals in a field arranged left to right: a horse, a cow, a sheep, a goat, and a dog. The horse is on the far left. To its right is a cow (partially behind the horse’s head). In the center is a sheep. To the right of the sheep is a goat. On the far right is a dog. All specified spatial relationships match the prompt exactly. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a red barn in the foreground, large and prominent. Behind it, slightly to the left, is a windmill — smaller than the barn but still clearly visible, fitting “midground at medium scale.” In the far background, snow-capped mountains appear smaller due to perspective, consistent with “background at small scale.” Depth cues (overlap, size reduction, atmospheric haze) support correct spatial layering. All positional relationships are accurately rendered. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image correctly depicts a beach scene with a sandcastle in the foreground, a surfer riding waves in the midground, and a cruise ship on the horizon in the background. The sizes of the objects decrease appropriately with distance, maintaining proper spatial relationships as described. All positional relationships (on, under, left, right, above, below) are accurate. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image correctly depicts a city park scene with the specified spatial arrangement. In the foreground, there is a park bench on the left and several pigeons on the grass. The midground features a large fountain with water jets. The background consists of multiple skyscrapers of varying heights. All described positional relationships are accurate: the bench and pigeons are in front of the fountain, which is in front of the buildings. VERDICT: PASS
Scale & ProportionsClick to expand44%
Scale & Proportions measures whether fal/bria/fast keeps people, vehicles, buildings, and everyday objects at believable relative sizes.
FAILEvaluator (Qwen 3.5 122B)
The person standing next to the bus appears unnaturally small — roughly knee-height to the vehicle’s door — which violates realistic human-to-bus scale. A typical adult should reach at least to the bus’s window sill or door handle, not mid-thigh. This distortion breaks the “realistic relative scale” requirement. VERDICT: FAIL
FAILEvaluator (Qwen 3.5 122B)
The ladybug appears disproportionately large relative to the leaf, which is inconsistent with realistic proportions. In nature, ladybugs are small insects that would occupy only a fraction of a typical leaf’s surface area. Here, the beetle dominates the frame and seems unnaturally oversized for its environment, violating the prompt’s requirement for “realistic proportions.” VERDICT: FAIL
FAILEvaluator (Qwen 3.5 122B)
The child’s hands are gripping the basketball, but the ball appears disproportionately large relative to the child’s torso and arms — it nearly spans the width of their chest and extends beyond their hand span unnaturally. In reality, a basketball held by a young child would appear smaller in proportion to their body. This violates the prompt’s requirement for “correct relative sizes.” VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image shows a miniature figurine standing on the rim of a white cup, which is placed on a wooden surface. The figurine’s size relative to the cup is consistent with the description of a “tiny person” — it appears appropriately scaled as if it were a small toy or model. The proportions between the figure and the cup are believable and visually coherent. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a cat that is disproportionately large compared to the surrounding city buildings. The cat’s head reaches the height of several building stories, and its body spans the width of the street — consistent with the “giant cat like a kaiju” description. The perspective and scale are intentionally exaggerated to match the prompt’s fantasy concept. Relative sizes are internally consistent within the surreal context. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a miniature house resting on an open human palm. The house is appropriately scaled to fit comfortably within the hand, with proportions that suggest it is a small model or toy. The fingers and palm appear natural in size relative to the house, and the background (beach/ocean) provides context without distorting scale. The size relationship is visually consistent and plausible. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows five animals in a line: elephant, horse, dog, cat, and mouse. While they are arranged from largest to smallest, the proportions are not realistic. The dog, cat, and mouse are standing upright on hind legs like humans, which distorts their natural size relationships. Additionally, the dog appears disproportionately large compared to the cat and mouse — in reality, a dog is often larger than a cat, but here the scale feels exaggerated and inconsistent with real-world biology. The h
FAILEvaluator (Qwen 3.5 122B)
The image displays a table setting with a plate, two forks, a knife, a wine glass, and two shakers. However, there are significant proportion issues. The salt/pepper shakers are disproportionately large compared to the plate and cutlery. The small glass container (possibly for peppercorns) is also oddly sized relative to the other items. The forks appear unusually long compared to the plate diameter. These inconsistencies violate realistic relative sizing. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image shows a miniature farm diorama with a barn, tractor, farmer, cow, chicken, and fence. All elements are scaled consistently as model figures — the farmer is appropriately sized relative to the cow and tractor, the chicken is small compared to the cow, and the barn fits the scene proportionally. The entire setup appears intentionally crafted as a scaled model, not a real-life scene, so proportions are internally consistent within that context. VERDICT: PASS
Human realism 79%
Faces & ExpressionsClick to expand92%
Faces & Expressions measures whether fal/bria/fast creates realistic faces with the requested age, emotion, expression, and identity cues.
PASSEvaluator (Qwen 3 VL)
Face is symmetric with natural features; smile appears warm and genuine. No obvious distortions. Eyes, nose, and mouth are proportionally balanced. Lighting and expression convey the intended cheerful, natural emotion. Clothing and background are contextually appropriate. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The face is symmetric with correctly rendered features: eyes, nose, and mouth are proportionate and aligned. The expression conveys calm seriousness, with neutral lips and steady gaze. No noticeable distortions or anatomical anomalies. Lighting and texture are natural. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The face is symmetric with correctly proportioned features. The teenager’s expression is neutral and relaxed, matching the prompt. No significant distortions are visible. Lighting and detail are natural, enhancing realism. The image successfully conveys the requested emotion and composition. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Face is largely symmetric with correctly rendered features. Disgust is clearly conveyed: nose is wrinkled, lips are curled downward, and brows are furrowed. Slight asymmetry in brow furrowing and minor skin texture artifacts are present but don’t obscure the emotion. Overall, the expression matches the prompt accurately. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The face is largely symmetric with correctly rendered features. Eyes are wide, eyebrows raised, and mouth slightly open—clearly conveying surprise. Slight unnatural smoothness in skin texture and overly perfect symmetry hint at AI generation, but core emotional expression and anatomical correctness are intact. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Face is symmetric with correctly rendered features. Eyes are downcast, mouth slightly drawn, gaze distant — all aligning with the prompt. No noticeable distortions. Lighting and expression effectively convey quiet melancholy. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Face is symmetric with natural, aged features: deep wrinkles, white hair, and warm smile. Eyes convey kindness. No significant distortions. Expression is clearly readable as joyful and genuine. All prompt elements are accurately rendered. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The child’s face is symmetric with round cheeks and large eyes. Tears are not visibly streaming down, but the downturned mouth and furrowed brow convey clear distress. No major distortions are present. The prompt’s key elements are met: age-appropriate features and readable crying emotion. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Face is symmetric with natural wrinkles, age spots, and thinning gray hair. Skeptical expression is clearly readable via furrowed brow and downturned mouth. No significant distortions. Features align well with prompt. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Faces show clear, symmetric features with no distortions. Emotions are vividly readable: left man laughs with head back, center woman looks shocked, right woman smirks with rolled eyes. All three expressions match the prompt precisely. Lighting and detail are natural and high quality. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Faces are mostly symmetrical with correct features. Emotions are clearly readable: father beams, mother smiles gently, teenager looks bored (pouting lips), child grins excitedly. No significant distortions. All four individuals match the prompt’s emotional descriptions accurately. VERDICT: PASS
FAILEvaluator (Qwen 3 VL)
Faces are mostly symmetric with correct features. The leftmost person shows clear frustration (furrowed brows). The central person’s expression is ambiguous, not clearly pleased. The rightmost person’s confused look is weak (tilted head not prominent). The woman behind appears to be whispering, not nodding. Emotions are not clearly or fully aligned with the prompt. VERDICT: FAIL
Full BodyClick to expand83%
Full Body measures whether fal/bria/fast renders full-body people with natural anatomy, clothing, pose, and proportions.
PASSEvaluator (Qwen 3 VL)
The woman stands upright with arms at her sides, wearing a white t-shirt and jeans — matching the prompt. Limb count and proportions are anatomically correct. Pose is relaxed and recognizable. No distortions are visible. Clothing details and posture are natural and well-rendered. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The man stands straight, facing forward, with hands in pockets — perfectly matching the prompt. Limb count and proportions are anatomically correct. No distortions are visible. The pose is clearly recognizable and natural. All criteria are satisfied. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The person’s full body is visible from head to feet in a neutral standing pose. Limb count and proportions are anatomically correct. The pose is recognizable as standing still in a park. No significant distortions are present. The image aligns with the prompt’s requirements. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The woman’s limbs are correctly proportioned and count is accurate. Her running pose is anatomically plausible, with one leg forward, the other back, arms bent and swinging. Hair and clothing show motion blur consistent with running. No significant distortions are visible. The action is clearly recognizable as running through a park. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The man’s limbs are correctly counted and proportioned. His pose—bent knees, arms slightly out—matches the described mid-air jump over a puddle. The action is recognizable and anatomically plausible. No significant distortions are visible. The image convincingly depicts the requested scene. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The dancer has correct limb count and proportions. The pose is a dynamic spin with one leg extended, dress flowing outward, and she appears airborne. Anatomical structure is accurate, with no visible distortions. The action is clearly recognizable as a spinning movement. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The figure has two legs and correct proportions. The tree pose is recognizable: standing on one leg, the other foot pressed to the inner thigh, arms raised overhead in prayer position. No major anatomical distortions are visible. The pose is stable and natural-looking. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The climber has correct limb count and proportions. Pose is anatomically plausible for rock climbing, with arms and legs extended to holds. The action is recognizable as dynamic climbing. No major distortions; muscles and joints appear natural. Gear placement is realistic. VERDICT: PASS
FAILEvaluator (Qwen 3 VL)
The cellist’s posture and limb count are anatomically plausible. The bow arm is extended, and the instrument rests between the legs as described. However, the cello’s scroll and neck appear unnaturally elongated and misaligned with the body, and the bow’s angle suggests an awkward grip. The pose is recognizable but distorted in key structural details. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
The pose shows a tango embrace with the man dipping the woman backward, but her legs are not intertwined as described. Her right leg is bent, left leg extended, and feet are separate. Anatomical proportions are mostly correct, though the woman’s left arm appears unnaturally elongated and her right hand’s grip is distorted. The action is recognizable but not fully accurate per prompt. VERDICT: FAIL
PASSEvaluator (Qwen 3 VL)
Both players show correct limb count and proportions. The sliding player’s pose is anatomically plausible, though his foot appears unnaturally positioned near the ball. The jumping player’s pose is dynamic and recognizable as a tackle attempt. No major distortions; shadows and grass texture are realistic. The action matches the prompt. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Both gymnasts have correct limb count and proportions. Their synchronized handstand pose is anatomically plausible, with legs extended and bodies aligned. The action is clearly recognizable as a handstand with one leg slightly lifted. No major distortions are visible. VERDICT: PASS
HandsClick to expand58%
Hands measures whether fal/bria/fast draws anatomically plausible hands, fingers, and gestures.
FAILEvaluator (Qwen 3.5 122B)
The left hand (viewer’s left) has 5 fingers, but the right hand (viewer’s right) clearly shows 6 fingers — an extra digit between the ring and pinky finger. This is a clear anatomical anomaly inconsistent with natural human anatomy. The gesture otherwise appears relaxed, but the extra finger violates the “exactly 5 per hand” criterion. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The woman’s hands are extended palms-up, fingers spread. Each hand clearly shows five fingers: thumb, index, middle, ring, and pinky. No extra, missing, or fused fingers. Anatomical structure and gesture appear natural and correct. All ten fingers visible as specified. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The left hand (on the right side of the image) shows 4 fingers clearly, with the thumb likely hidden or merged. The right hand (on the left side) also appears to show only 4 distinct fingers — the pinky is missing or fused. Both hands lack full anatomical correctness: finger counts are incorrect, and the grip/gesture doesn’t match natural crossed-arm positioning. Anomalies include missing digits and unnatural finger alignment. VERDICT: FAIL
FAILEvaluator (Qwen 3.5 122B)
The hand gripping the mug handle shows anatomical issues. The thumb appears fused or improperly positioned relative to the index finger. The fingers curling around the mug are indistinct and lack clear separation — some appear merged or malformed. There are not clearly 5 distinct fingers visible; the anatomy is distorted. This fails the requirement for correct hand structure and finger count. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The woman’s right hand is visible, gripping a pen. It shows 5 fingers: thumb, index, middle, ring, and pinky — all appear anatomically correct with natural positioning for writing. The left hand is not visible in the frame. Since only one hand is shown and it meets the criteria (5 fingers, correct grip), the core requirement is satisfied. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The left hand on the fretboard shows only four fingers clearly visible, with the thumb positioned behind the neck — anatomically plausible for guitar playing. The right hand near the soundhole appears to have all five fingers present, though partially obscured by motion or angle. No gross anatomical anomalies (e.g., extra fingers, fused digits) are evident. Grip and gesture align with typical guitar-playing posture. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows two hands, each making a peace sign. Each hand has five fingers visible: two extended (index and middle) and three curled (thumb, ring, pinky). The anatomy appears correct — no extra or missing fingers, natural joint bends, and proper thumb positioning. No anomalies detected. Both hands match the described gesture accurately. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows one hand giving a thumbs-up gesture. The thumb is extended upward, and four fingers are curled into a fist — anatomically correct for this gesture. All five digits (thumb + 4 fingers) are present and properly positioned. No anomalies detected. The hand appears natural and consistent with human anatomy. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows one hand making a gesture with three fingers extended: index, middle, and ring fingers. The thumb and pinky are curled inward. This matches the prompt’s description of “counting to three” with those specific fingers extended. Anatomically, the hand structure appears correct — five fingers total, proper joint bends, natural skin texture, and realistic lighting. No anomalies detected. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
Upon close examination, the left hand (from viewer’s perspective) shows 5 fingers clearly. The right hand also displays 5 fingers, though one thumb is partially obscured by the grip — still anatomically plausible. Fingers interlock naturally; no extra or missing digits. Grip appears realistic for a handshake. Minor blurring doesn’t obscure anatomy. Core criteria met. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows two hands meeting in a high-five gesture. The left hand (from the viewer’s perspective) has 5 fingers clearly visible and anatomically correct. The right hand also displays 5 fingers, but the thumb appears to be fused or merged with the index finger — an anatomical anomaly. Additionally, some fingers on the right hand seem to overlap unnaturally, suggesting AI generation artifacts. While the intent is clear, the execution fails anatomical accuracy. VERDICT: FAIL
FAILEvaluator (Qwen 3.5 122B)
Upon close examination, the hands show anatomical inconsistencies. The left hand (gray sleeve) has fingers that appear fused or malformed at the tips, and the thumb positioning is unnatural. The right hand (red sleeve) has a thumb that seems to merge with the index finger, and finger counts are ambiguous due to blending. Grip mechanics are not fully realistic. VERDICT: FAIL
Multi-SubjectClick to expand83%
Multi-Subject measures whether fal/bria/fast keeps multiple people distinct while preserving their requested attributes, outfits, and positions.
PASSEvaluator (Qwen 3 VL)
Left woman: short blonde hair, red jacket over white top. Right woman: long black hair, blue sleeveless dress. Both visually distinct and match prompt descriptions precisely. No missing elements or mismatches. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Person 1: Bearded, wears black-rimmed glasses, green sweater over white collared shirt. Person 2: Clean-shaven, wears black hoodie. Both visually distinct, matching prompt descriptions. Cafe setting confirmed. All features accurately rendered. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Two individuals are present. Left: tall, curly red hair, denim overalls over a white tee. Right: shorter, straight brown hair, yellow sundress. Both are visually distinct and match descriptions precisely. VERDICT: PASS
FAILEvaluator (Qwen 3 VL)
The image shows four people, but only three match the prompt: a man with dreadlocks in a denim jacket (left), a woman with silver hair in a red blouse (center-left), and a woman with a hijab in a green dress (right). The “bald man in a suit” is missing. The woman with silver hair wears a black blazer, not a red blouse. VERDICT: FAIL
PASSEvaluator (Qwen 3 VL)
Left: Tall woman with short pink hair, playing bass. Center: Man with beard and red bandana, on drums. Right: Slim man with glasses, playing keyboard. All three are visually distinct and match described features. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Left: Man with sunhat and backpack. Center: Woman with braided hair and hiking poles. Right: Teenager with baseball cap and water bottle. All three are visually distinct and match described features. VERDICT: PASS
Truthfulness 63%
PhotorealismClick to expand100%
Photorealism measures whether fal/bria/fast makes materials, surfaces, and real-world objects look visually plausible and distinct.
PASSEvaluator (Qwen 3.6 27B)
The image demonstrates high-fidelity rendering of the requested materials. The glass exhibits accurate refraction and transparency, distorting the background wood grain realistically. The silver spoon shows sharp, coherent specular highlights consistent with polished metal. The linen napkin displays a convincing woven texture with soft, natural folds. The apple has a realistic skin texture with subtle subsurface scattering and a distinct highlight. The wooden table provides a solid, textured bas
PASSEvaluator (Qwen 3.6 27B)
The image successfully renders the requested materials with high fidelity. The laptop exhibits a convincing brushed aluminum finish with appropriate metallic reflections. The notebook displays realistic leather texture and stitching. The ceramic mug shows a matte glaze with subtle sheen, while the glass paperweight demonstrates accurate refraction, internal reflections, and transparency. The wooden desk surface has a natural grain pattern. All materials are visually distinct and physically plaus
PASSEvaluator (Qwen 3.6 27B)
The stainless steel pot shows appropriate metallic reflections, the glass bottle of olive oil displays correct translucency and refraction, the wooden cutting board has a natural matte texture, and the cotton towel exhibits soft fabric folds with realistic light absorption. All materials render with accurate visual properties—reflections, translucency, and textures are consistent with real-world physics. VERDICT: PASS
Physics & ReflectionsClick to expand75%
Physics & Reflections measures whether fal/bria/fast follows basic physical cues such as shadows, reflections, gravity, transparency, and contact points.
PASSEvaluator (Qwen 3.6 27B)
The image displays a red sphere on a white surface. The shadow is cast to the right and slightly back, which is consistent with a light source positioned to the upper left. The shadow's shape is an ellipse, correctly representing the perspective distortion of a sphere's shadow on a flat plane. The lighting on the sphere itself also matches this direction, with the highlight on the upper left and the sphere darkening towards the bottom right. The physics are accurate. VERDICT: PASS
FAILEvaluator (Qwen 3.6 27B)
The lighting originates from the left, casting a shadow to the right of the mug. The prompt explicitly requested a shadow "to the left," which is physically impossible given the light source direction shown. Additionally, the shadow shape is distorted and disconnected from the mug's base, appearing as a floating blob rather than a coherent projection. This violates physical plausibility regarding shadow orientation and geometry. VERDICT: FAIL
PASSEvaluator (Qwen 3.6 27B)
The shadow is cast directly beneath the figure, consistent with midday lighting. Its shape accurately mirrors the person’s stance and clothing folds, showing realistic distortion on the textured ground. The shadow’s darkness and sharpness align with bright, overhead sunlight. No directional inconsistencies or unnatural reflections are present. The physical plausibility of the shadow’s position, orientation, and form meets the prompt’s requirements. VERDICT: PASS
FAILEvaluator (Qwen 3.6 27B)
The reflection is physically impossible. The woman stands with her back to the mirror, yet the reflection shows her front-facing. A mirror reflects what is in front of it; therefore, the reflection should show her back, not her front. Additionally, the reflection is not laterally reversed as expected (e.g., the belt buckle orientation and hair parting do not mirror correctly). This violates basic optical physics. VERDICT: FAIL
PASSEvaluator (Qwen 3.6 27B)
The reflection is correctly oriented vertically below the cat, matching its pose. The distortion is appropriate for a polished surface, appearing slightly blurred and elongated. The color and brightness of the reflection are consistent with the subject, showing a darker, muted version of the black cat. The shadow direction aligns with the lighting, and the overall effect is physically plausible for a glossy marble floor. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The reflection on the glass table is physically plausible. It correctly mirrors the vase and flowers with appropriate vertical inversion. The reflection shows a slight blur and reduced contrast compared to the real objects, which accurately simulates the optical properties of a glass surface. The alignment is precise, and the colors match the source objects, maintaining consistency in lighting and orientation. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The reflections are vertically inverted and correctly positioned beneath the trees. The gentle ripples on the water surface create appropriate horizontal distortion, breaking up the reflection into wavy segments. The color temperature matches the warm sunset light, and the reflections appear slightly darker than the source objects, consistent with water absorption. The physics of light and reflection are accurately represented. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The reflection is vertically inverted, matching the skyline above. The water surface introduces appropriate ripples and distortion, softening the reflected buildings. Colors in the puddle appear slightly more saturated and shifted toward cooler tones, consistent with light interacting with water. The perspective aligns correctly with the camera angle. No physical inconsistencies are present. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The reflection is vertically inverted and correctly positioned below the horizon line, matching the mountain range's orientation. The water surface shows slight ripples, creating a natural distortion that softens the reflected image compared to the sharp reality above. The colors in the reflection are appropriately muted and darker due to light absorption and scattering in the water, which is physically accurate. The lighting and shadow directions align perfectly between the scene and its reflec
PASSEvaluator (Qwen 3.6 27B)
The image displays a glass sphere on a reflective surface with a water puddle. The reflection of the sphere is correctly inverted and positioned directly beneath it. The water ripples distort the reflection realistically, creating caustic-like patterns and warping the image as expected in fluid dynamics. The refraction through the sphere shows an inverted background, which is physically accurate for a glass lens. The lighting and shadows are consistent with the light source direction. VERDICT:
FAILEvaluator (Qwen 3.6 27B)
The reflections on the chrome sphere are physically plausible, showing distorted environmental mapping of the room and windows consistent with a convex mirror. The reflection on the glass table below is correctly oriented and slightly blurred, indicating depth. However, the object lacks a distinct contact shadow or grounding shadow on the glass surface, making it appear to float rather than rest on the table. This violates basic lighting physics. VERDICT: FAIL
PASSEvaluator (Qwen 3.6 27B)
The reflections and refractions are physically plausible. The wine glass exhibits realistic distortion of the background through its curved, faceted surface. The reflection on the polished marble counter is sharp and correctly oriented, mirroring the glass’s base and stem. The lake in the background shows a soft, diffuse reflection consistent with water under daylight. Shadows fall naturally from the window light source. No directional or color inconsistencies are present. VERDICT: PASS
World KnowledgeClick to expand42%
World Knowledge measures whether fal/bria/fast depicts recognizable places, landmarks, flags, and known objects with accurate real-world details.
PASSEvaluator (Qwen 3.6 27B)
The image depicts the Eiffel Tower with high architectural accuracy, including the iron lattice structure, three observation levels, and the characteristic arch at the base. The background shows a realistic Parisian skyline under a clear blue sky with scattered clouds. No obvious distortions or AI artifacts are present. The subject is clearly recognizable and factually consistent with the real-world landmark. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The image depicts the Taj Mahal with high fidelity. The central onion dome, four surrounding minarets, and the symmetrical arched entrances (iwans) are accurately rendered. The perspective from the reflecting pool is correct, showing the iconic symmetry and the red sandstone pathway. The architectural details, including the smaller domes and the base platform, align with the real-world structure. No significant hallucinations or distortions are present. VERDICT: PASS
FAILEvaluator (Qwen 3.6 27B)
The Statue of Liberty is recognizable, but the pedestal is architecturally inaccurate. The real pedestal features a distinct octagonal upper section with arched windows and a specific stone texture, whereas this image shows a simplified, rectangular base with generic columns and incorrect window placements. The surrounding landscape also lacks the expected water context of Liberty Island. VERDICT: FAIL
FAILEvaluator (Qwen 3.6 27B)
The image fails to depict a shamisen. The instrument shown has a round, lute-like body and a fretted neck, resembling a hybrid of a ukulele and a zither, rather than the shamisen’s characteristic rectangular body and fretless neck. Additionally, the strings appear to be more than three, and the bachi is placed awkwardly across the strings in a non-functional manner. These inaccuracies make the subject unrecognizable as a shamisen. VERDICT: FAIL
FAILEvaluator (Qwen 3.6 27B)
The image depicts a brass instrument that strongly resembles an astrolabe, featuring concentric rings, a central alidade, and engraved markings. However, the text and symbols are nonsensical gibberish rather than accurate astronomical or degree markings, which is a common artifact in AI-generated images. While the general form is recognizable, the lack of factual accuracy in the engravings and the slightly distorted geometry of the moving parts prevent it from being a truly accurate representati
PASSEvaluator (Qwen 3.6 27B)
The image depicts a ceramic bowl with a crackle glaze pattern filled with gold, accurately representing the aesthetic of kintsugi. The subject is clearly recognizable as a bowl, and the gold lines are distinct. While traditional kintsugi repairs specific break lines rather than a uniform crackle pattern (which is often a glaze style), this is a common artistic interpretation in AI generation and remains factually recognizable as the requested subject. The lighting and textures appear realistic.
FAILEvaluator (Qwen 3.6 27B)
The image depicts a water cycle scene with ocean, clouds, rain, and land, but the labels are completely illegible gibberish, failing the "detailed diagram" requirement. Furthermore, the visual representation is inaccurate: it shows water spraying upwards from the ocean surface and land (resembling geysers or fountains) rather than the correct scientific process of evaporation (water vapor rising). This misrepresents the core mechanism of the water cycle. VERDICT: FAIL
FAILEvaluator (Qwen 3.6 27B)
The image depicts a stylized, artistic interpretation rather than a scientifically accurate cross-section. While it shows mountains and a trench-like feature, it lacks clear tectonic plate boundaries, subduction zones, mantle layers, or a volcanic arc structure as defined in geology. The red lines appear decorative rather than indicative of magma flow or plate movement. The composition is more illustrative than educational or factual. VERDICT: FAIL
FAILEvaluator (Qwen 3.6 27B)
The image is clearly recognizable as a human heart illustration. However, it contains significant anatomical inaccuracies. The aorta is depicted as a single arch with three branches, but the pulmonary trunk and its bifurcation are incorrectly merged or mislabeled. The superior and inferior vena cavae are not clearly distinguished, and the coronary arteries/veins on the surface appear overly stylized and not entirely accurate in their branching patterns. While visually appealing, it fails strict
PASSEvaluator (Qwen 3.6 27B)
The image depicts a butterfly with the characteristic iridescent blue dorsal wings and brown ventral patterns with eyespots of a Morpho species. The body morphology, including antennae and wing venation, appears anatomically correct. The coloration and pattern distribution are consistent with real-world Morpho butterflies, particularly resembling Morpho menelaus or closely related species. No significant inaccuracies are observed in the depiction. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The image accurately depicts the interior of the Hagia Sophia, showcasing its iconic central dome, semi-domes, and pendentives. The presence of both Byzantine mosaics and large Islamic calligraphy medallions is historically and architecturally correct for this site. The structural details, lighting, and perspective appear realistic and consistent with known photographs of the landmark. No significant architectural inaccuracies are visible. VERDICT: PASS
FAILEvaluator (Qwen 3.6 27B)
The image depicts a bismuth crystal with the characteristic geometric hopper formation and staircase structure. However, the coloration is inaccurate; real bismuth crystals display vibrant, rainbow-like iridescence due to oxide layers, whereas this specimen appears muted, brownish, and dusty. The lighting and texture also suggest an AI-generated artifact rather than a high-quality photograph of a real mineral specimen. VERDICT: FAIL
Professional Studio 93%
Camera & LightingClick to expand100%
Camera & Lighting measures whether fal/bria/fast follows specific lens, angle, focus, exposure, and lighting instructions.
PASSEvaluator (Gemma 4 26B)
The image successfully captures a wide-angle shot of a modern living room. The perspective shows an expansive space with a deep field of view. Most importantly, the image exhibits clear barrel distortion at the edges, particularly visible in the curved lines of the ceiling and the warping of the walls, which matches the prompt's specific requirement for "slight barrel distortion." The lighting is soft and natural, consistent with a modern interior. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image successfully captures a close-up macro perspective of a single dewdrop resting on a petal. The depth of field is shallow, with the petal and dewdrop in sharp focus while the background is heavily blurred. The lighting is soft and natural, highlighting the texture of the petal and the clarity of the water droplet. All specified camera and lighting elements are present. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image perfectly matches the prompt. It provides a true aerial bird's-eye view looking straight down at a traffic roundabout. The perspective is top-down, the depth of field is consistent with a high-altitude drone shot (everything is in relatively sharp focus), and the lighting is even, suggesting a bright, overcast, or midday sun. The presence of cars and trees is clearly depicted. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image successfully implements the requested lighting. There is a distinct light source from the side, creating a moody atmosphere with deep shadows. Most importantly, there is a clear "triangle of light" on the shadowed cheek (the subject's right cheek), which is the hallmark of Rembrandt lighting. The depth of field is shallow, focusing on the facial features, and the perspective is a standard portrait shot. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image perfectly captures the requested "golden hour" aesthetic. The sun is low on the horizon, creating a bright orange sky and a soft, warm glow across the landscape. The low angle of the sun produces long, distinct shadows stretching across the grassy field toward the viewer. The perspective is a wide landscape shot with a deep depth of field, ensuring both the foreground grass and distant trees are visible. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image perfectly matches the prompt. It features a high-key portrait with a bright, white, seamless background. The lighting is soft and even, creating an airy and luminous feel with minimal shadows on the subject's face and neck. The perspective is a standard eye-level portrait, and the depth of field is shallow, keeping the focus sharply on the person's features while the background remains a clean, bright void. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image successfully captures the requested elements. The shallow depth of field (bokeh) is evident in the blurred garden and trees in the background. There is a clear warm rim light effect on the subject's hair and shoulders, creating separation from the background. The perspective is a classic portrait shot consistent with an 85mm lens. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image successfully implements the requested camera settings. The subject (the man in the foreground) is sharp and in focus, while the background exhibits a shallow depth of field with a significant blur. The city lights in the background are rendered as soft, out-of-focus bokeh, accurately mimicking a 50mm f/1.8 lens. The lighting and perspective are consistent with a street photograph taken during twilight. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image matches the prompt's requirements. The perspective is a top-down shot (shot from above). The lighting is directional, coming from the left, creating soft shadows on the right side of the bowl and utensils. There is a subtle vignette around the edges. The color temperature is warm, and the depth of field is relatively shallow, consistent with a 35mm f/2.8 setting. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image successfully implements the tilt-shift effect. It features a distinct selective focus strip along the center of the street, with heavy blurring (bokeh) in the foreground and background to create the miniature model aesthetic. The colors are highly saturated, and the perspective captures the urban canyon effectively. The depth of field manipulation perfectly matches the prompt's requirements for a tilt-shift photograph. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image features horizontal blue and orange light streaks against a dark background, which matches the description of "blue and orange anamorphic lens flare streaking horizontally across a dark moody scene." The lighting pattern and colors are accurately represented. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image perfectly matches the prompt. The lighting is a classic dramatic split lighting pattern: the left side of the subject's face (viewer's left) is brightly illuminated, while the right side is cast in deep, heavy shadow. This creates a high-contrast chiaroscuro effect. The perspective is a standard portrait shot, and the depth of field is shallow, keeping the focus sharply on the facial features. VERDICT: PASS
Color PrecisionClick to expand92%
Color Precision measures whether fal/bria/fast renders requested colors clearly and consistently across subjects, products, and backgrounds.
PASSEvaluator (Gemma 4 26B)
The image features a bright, vivid red sports car parked on a gray asphalt road. The red color is highly saturated and unambiguous, exactly as described in the prompt. The road is a standard gray asphalt. All color requirements are met perfectly. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image shows a woman wearing a deep, vibrant blue evening dress. The color of the dress is a true royal blue, and the background is a solid, neutral beige/tan color. The colors perfectly match the specific requirements of the prompt. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image features a field of sunflowers with bright, vibrant yellow petals, accurately representing "sunflower yellow." The sky above is a clear, bright blue, matching the prompt's description. All color requirements specified in the prompt are clearly and accurately depicted in the image. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image features a bedroom wall with a deep, burnt orange color that closely matches the specified hex code #FF6B35. The furniture, including the bed frame, dresser, wardrobe, desk, and chair, is consistently white as requested. The lighting appears to be natural daylight, and the overall color scheme adheres strictly to the prompt's requirements. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The foliage in the image consists of a dark, muted green that aligns well with the description of #2D5F2D. The tree trunks are a light brown/tan color, which matches the "brown tree trunks" requirement. The overall color palette is consistent with a dark, muted forest scene as requested. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image accurately reflects the prompt's color requirements. The walls are a medium blue that aligns with the hex code #4A90D9. The floor is a warm-toned wood, and the baseboards and door frames are white, matching the "white trim" description. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image accurately reflects the requested colors. The walls are a warm cream/off-white, the sofa is a dark navy blue, and the accent pillows are a bright, vibrant red. All three primary color requirements from the prompt are clearly visible and match the descriptions provided. VERDICT: PASS
FAILEvaluator (Gemma 4 26B)
The background is a deep dark blue, consistent with #1A1A2E. The earbuds are pure white (#FFFFFF). However, the watch does not match the requested color. The prompt specifies a "soft gold" (#F0C27F) watch, but the image shows a watch with a brown leather strap and a gold-toned case. The strap color is a significant deviation from the requested gold color. VERDICT: FAIL
PASSEvaluator (Gemma 4 26B)
The image follows the color specifications closely. The background is a near-white color consistent with #FAFAFA. The main circular shape is a vibrant orange-red, matching the description of #FF4500. The thin accent lines are a dark charcoal/black color, consistent with #333333. All three specified colors are accurately represented in the graphic design. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image shows a smooth gradient transitioning from a red color on the left to a blue color on the right. The center of the image transitions through a purple hue, as requested. The colors on the far left and far right appear to match the specified red (#FF0000) and blue (#0000FF)-like tones. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image displays a vertical gradient. At the bottom, the color is a bright orange, which aligns well with the requested #FF6B35. At the top, the color is a very dark, deep indigo/navy, which matches the requested #1B1464. The transition between the orange and the deep indigo is smooth and follows the specified color scheme accurately. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image shows a horizontal gradient. On the left, the color is a bright cyan, which aligns with #00C9FF. On the right, the color transitions into a light mint green, which aligns with #92FE9D. The gradient flows smoothly from left to right as requested. VERDICT: PASS
PhotorealismClick to expand67%
Photorealism measures whether fal/bria/fast produces polished product and studio imagery with realistic materials, lighting, shadows, and finish.
PASSEvaluator (Gemma 4 26B)
The image successfully achieves a high-quality studio look. The lighting is soft and consistent, creating a natural shadow that grounds the shoe. The textures of the leather and fabric appear realistic, and the edges are sharp and well-defined. There are no obvious AI artifacts, such as warped laces, nonsensical stitching, or melting textures. It passes as a professional product photograph at first glance. VERDICT: PASS
FAILEvaluator (Gemma 4 26B)
The image contains significant AI artifacts that prevent it from passing as a real photograph. The internal straw/dip tube is physically impossible; it curves unnaturally and appears to "float" or break through the glass structure. Additionally, the reflections on the glass and the liquid surface are inconsistent and distorted. The geometry of the bottle's neck and the way the light interacts with the liquid are physically nonsensical. VERDICT: FAIL
PASSEvaluator (Gemma 4 26B)
The image successfully meets the prompt requirements. It features a high-quality, realistic green apple on a seamless white background. The lighting is soft and directional, creating a natural-looking highlight on the left and a subtle, realistic shadow on the right. The texture of the skin and the stem detail are sharp and convincing, with no obvious AI artifacts or distortions visible. It looks like a professional studio product shot. VERDICT: PASS
Graphical design 54%
Data VisualisationClick to expand33%
Data Visualisation measures whether fal/bria/fast produces charts with correct labels, proportions, colors, and readable values.
PASSEvaluator (Qwen 3 VL)
The chart shows six bars with values 12K, 18K, 15K, 22K, 19K, 25K, matching the requested data. Bars are color-coded and labeled with months (Jan–Jun). Title "Monthly Sales 2025" is present. Axes are labeled. All elements are readable and correctly formatted. Chart type is a bar chart as requested. VERDICT: PASS
FAILEvaluator (Qwen 3 VL)
The chart is a pie chart, but labels and percentages are incorrect. "Company A" is 35% (correct), "Company B" is 25% (correct), "Company C" is mislabeled as 30% (should be 20%), and "Others" is 20% (correct). Labels are illegible or misspelled. The chart type matches, but data is inaccurate and unreadable. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
The chart misrepresents the data: axes are mislabeled (e.g., "1200" appears twice on Y-axis), days are illegible (e.g., "Chs", "Dang"), and values are unreadable scribbles. The title "Weekly wslster" is incorrect. The chart type is a line graph, but the visual elements are garbled and unrecognizable. Core requirements for readability and accuracy are not met. VERDICT: FAIL
Layout & DesignClick to expand33%
Layout & Design measures whether fal/bria/fast organizes posters, menus, covers, and other designed compositions with the requested hierarchy and elements.
PASSEvaluator (Qwen 3 VL)
The poster features a large popcorn image centered, with "MOVIE NIGHT" in bold at the top and "Every Friday at 8 PM" at the bottom. Text is readable and correctly placed. Visual hierarchy is clear: title dominates top, popcorn central, tagline bottom. All specified layout elements are present and accurately rendered. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The poster features the headline "LIVE MUSIC FEST" in large, bold text with a guitar illustration integrated into the lettering. Below, "Tickets available now" appears in a smaller bar. Text is readable and correctly placed. Visual hierarchy is clear, with headline dominating, tagline secondary. All required elements are present and correctly styled. VERDICT: PASS
FAILEvaluator (Qwen 3 VL)
The poster features an open book illustration and the title "JOIN THE BOOK CLUB" prominently displayed with correct visual hierarchy. However, the tagline reads "Meets every Wemssday" (spelled incorrectly) and is positioned above the title, not as intended. The "Free" box is misplaced in the top-right corner. Text is readable but placement and accuracy fail the prompt. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
The image lacks a clear hero header image at the top; instead, it shows a banner with unreadable text. The “Sign Up” headline is present but below a non-hero section. The CTA button is visible, and social icons appear at the bottom. However, all text is gibberish or illegible, violating the prompt’s requirement for readable, correctly placed text and proper layout hierarchy. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
The image shows a newsletter layout with a header banner, two-column body (text left, image right), a highlighted quote block, and a footer with "Unsscrirbe" and "INssoripe". All elements are visually present. However, all text is gibberish placeholder (e.g., "Dapsimke", "Orhrm", "Incdeleon_ma"), making it unreadable and violating the requirement for correct, readable text placement. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
The image shows a menu layout with a logo header, three category sections (Appetizers, Mains, Desserts), and a footer with address/hours. However, all text is gibberish or placeholder (e.g., “Niinpe y’tauu”), making it unreadable and violating the prompt’s requirement for correct, legible text placement. Visual hierarchy is present but meaningless due to illegible content. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
Left page features a large hero photo as required. Right page shows two columns, a pull quote, and a sidebar with author bio. Page number is visible. However, all text is gibberish and unreadable, violating the requirement for readable text. Layout structure is correct, but content fails functional criteria. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
The image shows a mobile UI with a top navigation bar, search field, 2x2 grid of feature cards, and bottom tab bar — all present. However, all text is gibberish or unreadable, and icons are misaligned or distorted. Layout hierarchy is visually intact but content is invalid. Core requirement (readable, correctly placed text) is not met. VERDICT: FAIL
PASSEvaluator (Qwen 3 VL)
Left page has full-bleed photo. Right page shows headline “FASHION TRENDS,” body copy in two columns, inset image, and styled page numbers. Text is readable and correctly placed. Layout hierarchy is clear and matches prompt. All described elements are present and correctly implemented. VERDICT: PASS
Style DiversityClick to expand75%
Style Diversity measures whether fal/bria/fast can shift between visual styles like oil painting, pixel art, watercolor, and technical illustration.
PASSEvaluator (Qwen 3.5 122B)
The image clearly depicts a golden retriever sitting in a garden setting. The visual style strongly resembles an oil painting, with visible, textured brushstrokes throughout the background foliage and on the dog’s fur. The color blending and impasto-like texture are consistent with traditional oil painting techniques. The key characteristics requested — oil painting style, visible brushstrokes, and rich texture — are all present and clearly recognizable. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image displays a golden retriever in a garden setting with stylized, anime-like features including large expressive eyes and clean outlines. The coloring is relatively flat with minimal shading gradients, consistent with anime aesthetics. The character design emphasizes cuteness and clarity typical of Japanese animation. While not perfectly matching every anime sub-style, the core requested characteristics are clearly present and recognizable. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image depicts a golden retriever in a garden setting, but the requested pixel art style is not clearly recognizable. While the image has a blocky, low-resolution appearance, it lacks the defining characteristics of true pixel art: a strict 32x32 grid and a deliberately limited color palette. The rendering appears more like a heavily compressed or blurred photograph than intentional pixel art with constrained colors and precise grid alignment. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image strongly exhibits the requested Art Deco style. Key characteristics are clearly present: the color palette is strictly gold and black; the architecture features geometric shapes, stepped forms, and ornamental symmetry typical of the era; and the streetlamp and pavement patterns reinforce the decorative, stylized aesthetic. The visual language is consistent with the prompt. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image clearly exhibits Impressionist characteristics: visible, soft brushstrokes throughout the composition, dappled light effects created by broken color patches in the foliage and ground, and a predominantly pastel palette with gentle pinks, blues, and greens. The woman and parasol are rendered with loose, painterly technique rather than sharp detail, matching the requested style. Background figures and environment are abstracted into colorful strokes, consistent with Impressionism. VERDI
PASSEvaluator (Qwen 3.5 122B)
The image clearly exhibits Pop Art characteristics: bold black outlines define the subject, Ben-Day dots are used extensively for shading and texture (especially on the face and neck), and flat vivid colors dominate — notably the yellow background, blue collar, and red dot shading. The stylized, graphic quality matches the requested aesthetic precisely. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image displays a forest scene with soft, blended edges and a textured, painterly quality that mimics watercolor paper. There are visible bleeds and washes of color, particularly in the background mist and foliage, while maintaining realistic lighting and detail in the foreground trees and stream. The overall aesthetic successfully combines photographic realism with watercolor techniques. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image successfully embodies the requested "cyberpunk Art Nouveau" style. Key characteristics are clearly present: the futuristic city setting with neon colors (pinks, cyans) fulfills the cyberpunk aspect, while the flowing organic lines, floral borders, and ornate, curved structures reflect the Art Nouveau influence. The combination of tech elements and natural motifs is well-executed. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image displays a medieval castle with stylized, cartoon-like architecture and flat coloring, which does not align with the intricate line work, textured paper effects, or specific compositional rules of traditional ukiyo-e woodblock prints. While the sky has some painterly qualities, it lacks photorealism and instead appears illustrative. The overall aesthetic is more akin to modern digital concept art or anime backgrounds rather than the requested hybrid style. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image clearly exhibits soft brutalism: raw concrete walls and pillars dominate the background, while rounded pastel furniture (pink, mint, cream) fills the foreground. Warm diffused lighting casts gentle glows on surfaces, enhancing texture contrast. All key visual characteristics — brutalist materials, pastel curves, ambient warmth — are present and well-integrated. The style is unmistakable and faithfully rendered. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image exhibits a sparse composition with a single cigarette in an ashtray against a large, muted background, fitting the "anxious minimalism" description. The colors are desaturated and the negative space is prominent. However, the object itself is structurally flawed—the cigarette appears to be melting into or fused with the ashtray in an unnatural way, and the smoke lacks realistic flow. While the mood is correct, the execution of the subject matter is poor. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image strongly exhibits the requested "nostalgic futurism" style. Key characteristics are clearly present: the 1960s retro-futuristic aesthetic is evident in the vintage car, old television, and record player. The cityscape features stylized, space-age architecture with sleek, tapered spires. The color palette perfectly matches the "melancholic warm-toned patina" requirement, utilizing soft teals, oranges, and sepia tones to create a hazy, dreamlike atmosphere. VERDICT: PASS