ImageBench V1 —
192 evaluations across 6 categories
Benchmark V1 verdicts are produced by VLM judges and can contain mistakes. Treat PASS/FAIL labels as machine-assisted assessments, and inspect the images yourself. Learn more about the methodology.

Prompt: The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink
Text Rendering 87%
Typography StyleClick to expand100%
Typography Style measures whether fal/krea/v2-large preserves requested lettering styles, font mood, and text texture such as typewriter ink or neon signage.
PASSEvaluator (Qwen 3 VL)
Text reads “CHAPTER ONE” correctly. Font mimics vintage typewriter style with blocky, slightly uneven ink impressions on aged paper texture. Ink variation and paper grain convincingly simulate typewriter output. All prompt elements are accurately rendered. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Text “TODAY’S SPECIAL” is rendered in white chalk on a dark green surface, with visible chalk dust and smudges. Lettering shows authentic chalk texture and slight imperfections. Spelling is correct. The medium convincingly matches the prompt. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The word "OPEN" is correctly spelled and rendered in a glowing red pixel-style LED matrix, matching the prompt’s description. The letters exhibit the characteristic blocky, dot-matrix appearance of digital displays. The medium is convincingly depicted with appropriate lighting and contrast against a dark background. VERDICT: PASS
Writing accuracyClick to expand83%
Writing accuracy measures whether fal/krea/v2-large renders the exact requested words, numbers, labels, and short phrases without spelling or layout errors.
PASSEvaluator (Qwen 3 VL)
The word "HELLO" is fully spelled correctly and is clearly readable against the white background. All letters are intact and legible. No misspellings, missing words, or illegible parts are present. The text matches the intended specification exactly. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The word "ADVENTURE" is spelled correctly and fully readable. Letters are distinct, bold, and centered on a plain cream background. No misspellings or illegible parts. Text matches the intended specification exactly. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The word "SERENITY" is fully spelled correctly and is clearly legible in a stylized script. The text is centered on a solid light blue background, matching the description. No misspellings, missing letters, or illegibility are present. All criteria are satisfied. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
All words are spelled correctly and fully readable: "The", "quick", "brown", "fox", "jumps", "high". Text is clean black on white background. No misspellings or illegible parts. Matches the intended text exactly. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Text reads: "Every sunset brings a new dawn ahead" — matches intended phrase exactly. All words are spelled correctly and fully legible in cursive script. No misspellings or missing words. Background is pale yellow as specified. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The text reads "Stars shine brightest in the dark night" with correct spelling and full readability. The word "brightest" is stylized with star graphics but remains fully legible. All words match the intended phrase exactly. No misspellings or missing words are present. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
All text matches the intended message exactly: “SUMMER SALE”, “Up to 50% off all items this weekend only”, and “Terms and conditions apply. See store for details.” No misspellings, missing words, or illegible parts. Font sizes and layout align with specifications. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
All text is spelled correctly and fully readable. “GRAND OPENING”, “Join us for food, music, and fun”, and “RSVP required. Limited seating available.” match the intended text exactly. No misspellings, missing words, or illegible parts detected. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
All text matches the intended content exactly: "ANNUAL GALA 2025", "An evening of fine dining and live jazz", and "Black tie optional. Doors open at 6 PM.". No misspellings, missing words, or illegible parts. Font styles and sizes align with description. Text is fully readable. VERDICT: PASS
FAILEvaluator (Qwen 3 VL)
Text is mostly legible but the credits block is heavily distorted and illegible. “Directed by James Harlow,” “Starring Elena Voss, Marcus Chen, and Sofia Reyes,” “Music by Thomas Grant,” and “A Paramount Pictures Release” are unreadable due to severe pixelation and distortion. “Coming Summer 2025” is readable. Core text elements are corrupted. VERDICT: FAIL
PASSEvaluator (Qwen 3 VL)
All text matches the intended content exactly: "NATIONAL EXPLORER", "The Hidden Temples of Cambodia — page 34", "Interview: Dr. Maria Santos on Climate Action", "Top 10 Trails You Must Hike This Year", and "$6.99" with "247 March 2025". No misspellings or illegible parts detected. All elements are fully readable and accurate. VERDICT: PASS
FAILEvaluator (Qwen 3 VL)
Text contains errors: "havveeted" (should be "harvested"), "apiaries" (correct), "addctives" (should be "additives"). "Store in cool, dry place" is correct. All other text matches intended content. Misspellings and missing words prevent full accuracy. VERDICT: FAIL
Spatial Reasoning 90%
Attributes BindingClick to expand100%
Attributes Binding measures whether fal/krea/v2-large attaches the right colors, materials, and properties to the correct requested objects.
PASSEvaluator (Qwen 3.5 122B)
The image shows two objects on a white surface: a red sphere (ball) and a blue cube. The red ball is on the left, the blue cube on the right — matching the prompt’s description. Colors, shapes, and spatial arrangement are correctly assigned with no swaps. Both objects appear appropriately sized relative to each other. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a green apple and an orange pumpkin on a wooden surface. The apple is correctly green, round, and has a stem. The pumpkin is orange, ribbed, and has a stem — all matching expected attributes. No color, size, or shape swaps are present. Both objects are appropriately rendered with correct bindings. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows two vehicles on a street. The vehicle on the right is yellow and has the shape and features of a taxi (including a roof sign). The vehicle on the left is black, elongated, and has the shape of a limousine. Both are parked next to each other. All attributes (color, size, shape) are correctly bound to the right objects with no swaps. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains three shapes on a white background: - A large red circle (top left) - A small light blue triangle (bottom left) - A medium dark green star (bottom right) All attributes (color, size, shape) are correctly bound to the intended objects as described in the prompt. No swaps or errors are present. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows three candles on a shelf: 1. Left: Tall, thin, white candle — matches description. 2. Center: Short, fat, red candle in a red holder — matches description. 3. Right: Medium-height, spiral-shaped, yellow candle — matches description. All attributes (color, size, shape) are correctly bound to each object with no swaps. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a large brown horse with a lighter mane and tail, correctly matching the description. A small white rabbit is present near the horse’s front legs, matching its attributes. A medium-sized black cat with a white chest patch stands to the right, also matching its description. All objects are correctly bound to their specified attributes — no swaps or errors in color, size, or shape. Background is a meadow as requested. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a pink fire truck with a blue banana attached to its front bumper. A large green sun is visible in the sky above rolling hills. All requested objects are present with correct color attributes: pink for the fire truck, blue for the banana, and green for the sun. No attribute swaps are observed. The surreal landscape context is also appropriately rendered. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows three objects: a purple carrot with green leaves, a red tree with red leaves and a brown trunk, and a watermelon that is red on the outside with dark stripes and white inside (sliced open). All attributes match the prompt exactly — colors, shapes, and slicing are correctly assigned to each object. No swaps or mismatches observed. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains an orange elephant, a yellow penguin, and a silver strawberry — all correctly colored and shaped as described. The elephant is large, the penguin small, and the strawberry medium-sized with green leaves. No attributes are swapped: colors match objects, shapes are accurate, and arrangement fits “on a table” (implied by surface). All criteria met. VERDICT: PASS
CompositionalityClick to expand100%
Compositionality measures whether fal/krea/v2-large combines subjects, actions, outfits, props, and scene details into one coherent requested image.
PASSEvaluator (Qwen 3.5 122B)
The image contains all described elements: - Tabby cat: Present (brown/grey striped fur pattern) - Oversized red sunglasses: Present (bright red frames, large relative to cat’s face) - Coherent combination: Yes — sunglasses sit naturally on the cat’s face, no anatomical distortions or blending errors. All prompt requirements are met with visual coherence. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains: - Golden retriever: Present and clearly depicted. - Cowboy hat: Present, black, and worn by the dog. - “Tiny” hat: The hat appears proportionally normal for the dog’s head — not obviously tiny. This may be a minor deviation. All core elements are present and coherently combined. The “tiny” descriptor is subjective but not critically violated. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains: - A penguin: Present, clearly depicted with black and white coloring, yellow beak and feet. - Holding: Present, the penguin’s flipper is wrapped around the umbrella handle. - Colorful umbrella: Present, red and orange panels visible. - Coherent combination: Yes, all elements are logically and visually integrated. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains all described elements: 1. Medieval knight in full armor - Present (helmet, chainmail, plate armor, red cape) 2. Sitting at a desk - Present (wooden desk with items) 3. Typing on a modern laptop - Present (black laptop with hands on keyboard) The concepts are coherently combined with the knight's gloved hands typing on the laptop, creating a logical fusion of medieval and modern elements. The setting includes appropriate details like a sword on the desk and European-style bui
PASSEvaluator (Qwen 3.5 122B)
The image contains all required elements: 1. Roman gladiator - Present (chainmail, helmet with crest, shield) 2. Selfie action - Present (arm extended holding phone) 3. Smartphone - Present (modern device in hand) 4. Colosseum - Present (arched structure in background) All components are coherently combined in a single scene. The anachronistic juxtaposition is intentional and executed clearly. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains all required elements: 1. Astronaut in a spacesuit - Present (white suit with helmet, backpack, and patches) 2. Riding a bicycle - Present (astronaut is seated on and pedaling a black bicycle) 3. Through a park - Present (tree-lined path with greenery, flowers, and blue sky) All elements are coherently combined in a single scene with proper perspective and lighting. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains all described elements: - Octopus: Present (red, cartoon-style) - Top hat: Present (black, on octopus’s head) - Monocle: Present (gold-rimmed, over one eye) - Chessboard and pieces: Present (white pieces, board on lunar surface) - Moon surface: Present (cratered, gray terrain) - Earth in background: Present (large, detailed globe behind) All elements are coherently combined in a single scene. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains: - Dinosaur: Present (T-Rex-like head on human body) - Business suit: Present (dark suit, white shirt, tie) - Coffee: Present (holding a coffee cup) - Modern office: Present (glass walls, ceiling lights, office setting) - PowerPoint presentation: Present (large screen showing slides with dinosaur images) All elements are present and coherently combined. The dinosaur is dressed professionally, holding coffee, and appears to be presenting in an office environment with a large d
PASSEvaluator (Qwen 3.5 122B)
The image contains all described elements: 1. Robot dressed as a pirate - Present (wearing tricorn hat and coat) 2. Paper boat - Present (folded paper construction) 3. Lake of lava - Present (glowing red molten background) 4. Three kittens - Present (three tabby kittens in mid-air) 5. Juggling action - Present (kittens positioned as if being juggled) All concepts are coherently combined in a surreal but visually consistent scene. VERDICT: PASS
CountingClick to expand89%
Counting measures whether fal/krea/v2-large creates the exact number of requested objects without adding or missing items.
PASSEvaluator (Qwen 3.5 122B)
The image displays exactly three red apples placed on a weathered wooden table surface. The apples are arranged in a loose triangular formation. The count matches the prompt’s request precisely: three red apples, on a wooden table. No extra or missing objects. Lighting and texture appear natural, supporting realism. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows five yellow rubber ducks arranged in a row on a white bathroom shelf. Each duck is identical in appearance, with orange beaks and black eyes. The count of ducks matches the requested number exactly. The setting and arrangement also align with the prompt’s description. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image displays exactly four white pillar candles arranged in a row on a dark surface. Each candle is unlit with visible wicks. The count matches the prompt’s request precisely: four white candles, standing upright, on a dark background. No extra or missing objects. All criteria are met. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows five animals sitting on grass: three black dogs (two larger, one smaller) and two orange cats. The count matches the prompt exactly — 3 black dogs and 2 orange cats. All are seated together on a lawn with yellow flowers in the background. No extra or missing subjects. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains exactly 4 red roses and 2 white daisies arranged in a clear glass vase, placed on a light-colored surface (table). The background is textured brown. All requested elements are present and correctly counted. No extra or missing flowers. The composition matches the prompt precisely. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows four butterflies resting on a wooden fence. From left to right: one blue butterfly, followed by three yellow butterflies. The count matches the prompt exactly — 1 blue and 3 yellow butterflies. All are positioned on what appears to be a garden fence with a natural background. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows a round fishbowl with blue water and seaweed. Counting the goldfish: there are 10 goldfish visible, not 9 as requested. The fish are orange and white, swimming in various directions. Since the count does not exactly match the prompt’s requirement of “exactly 9,” this fails the core criterion. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image shows a fruit bowl with: - 3 bananas (visible, with brown spots) - 4 oranges (clearly visible in front and middle) - 2 apples (red, behind the oranges) All counts match the prompt exactly: 3 bananas, 4 oranges, 2 apples. No extra or missing fruits. The arrangement is natural and the counts are accurate. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a cluster of colorful balloons against a blue sky. Counting them: 1 red (top left), 1 silver, 1 pink (top right), 1 white, 1 pink (center), 1 yellow, 1 green, 1 small red (bottom right), 1 large red (bottom left), 1 red (left middle), and 1 blue (partially hidden behind white and pink). That’s 11 balloons total. The count matches the prompt exactly. VERDICT: PASS
NegationClick to expand89%
Negation measures whether fal/krea/v2-large respects what should be absent from the image, not only what should be present.
PASSEvaluator (Qwen 3.5 122B)
The image shows a tropical beach scene with clear turquoise water, white sand, and distant mountains under a blue sky. There are no people visible anywhere in the frame — not on the beach, in the water, or in the distance. The exclusion criterion (“no people”) is fully satisfied. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a fluffy, light-colored cat lying on a textured couch. The cat has no visible collar around its neck — the fur is uninterrupted and smooth in that area. No collar, leash, or accessory is present. The excluded element (collar) is truly absent. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows an empty classroom with rows of wooden desks and chairs facing a large chalkboard with faint writing on it. There are no students visible in the room — the space is completely unoccupied by people. The excluded element (“students”) is truly absent. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a sandy beach covered with numerous seashells in the foreground. In the midground, there is a red floral beach umbrella with some items (possibly towels or bags) underneath. The background features a dense line of palm trees against a bright sky. There is no visible water, ocean, or sea in any part of the image — only sand, shells, umbrella, and trees. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a clean office workspace with a wooden desk, black rolling chair, tall bookshelf filled with books, and large arched window. There is no computer, monitor, laptop, tablet, or any screen visible on the desk or elsewhere. A mug, glass, pen, and small black object (possibly a phone dock or case) are present — none qualify as screens. The excluded elements are fully absent. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a kitchen scene with a stainless steel pot in the foreground on a wooden surface, a cutting board to its right, and a stove with another pot and a pan in the background. No food items are visible anywhere in the image — no ingredients, cooked meals, or edible substances. The excluded element (“food”) is truly absent. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a close-up of a pizza with a visible crust, red tomato sauce, and numerous pepperoni slices. The right side and bottom are covered with fresh green basil leaves. There is no visible melted or shredded cheese anywhere on the pizza — the sauce and toppings are directly on the dough. The excluded element (cheese) is truly absent. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a silver car body with windows, doors, and headlights clearly visible. The wheels are absent — the wheel wells are empty, exposing internal suspension components. This matches the prompt’s exclusion criteria: no wheels are present. All other specified elements (windows, doors, headlights) are included as required. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows a sketch of a human face with eyes, nose, and mouth. However, prominent dark eyebrows are clearly drawn above each eye — they are thick, textured, and integral to the facial expression. Since the prompt explicitly excluded eyebrows, and they are visibly present, the core constraint is violated. VERDICT: FAIL
Relative PositionClick to expand92%
Relative Position measures whether fal/krea/v2-large places objects in the requested spatial relationship, such as above, below, inside, or next to another object.
PASSEvaluator (Qwen 3.5 122B)
The image shows a fluffy cat positioned directly on the upper surface of a brown cardboard box. The cat is centered on the box, with its paws resting on the top edge. The spatial relationship described — “on top of” — is accurately represented. No other positional relationships are specified or contradicted. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a red ball positioned directly beneath a wooden chair, with the chair’s legs framing the ball. The spatial relationship “underneath” is accurately depicted — the ball is on the floor, and the chair sits above it. No conflicting or incorrect positional descriptors are present. All specified spatial relationships match the visual content. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a steaming coffee mug with orange floral patterns positioned to the right of an open book. The book lies flat on a surface, likely a desk, with its pages visible and slightly blurred in the background. The spatial relationship described — “a coffee mug next to an open book on a desk” — is accurately represented: the mug is beside the book, both rest on the same surface, and no conflicting positional claims (e.g., above, under) are violated. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a bird perched on a branch that extends from a tree. The tree is positioned next to a park bench, which is visible in the background to the left of the tree. The spatial relationships described in the prompt are accurately represented in the image. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a tabby cat sitting on a light-colored pillow. The pillow is placed on the seat of a wooden chair with vertical slats in the backrest. All spatial relationships described — cat on pillow, pillow on chair — are accurately represented. The cat is centered on the pillow, which rests flat on the chair seat. No positional errors are present. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a candle in a brass holder placed on top of a stack of books. The books are positioned to the right of a potted plant with long green leaves. All specified spatial relationships — “candle on top of books” and “books next to potted plant” — are accurately represented. The arrangement matches the prompt’s description. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows five white chess pieces on a black-and-white checkered board. The king is in the center. To its left is the queen, to its right is the bishop, above it is the rook, and below it is the knight. All spatial relationships described in the prompt are accurately represented in the image. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows a desk scene with a laptop in the center, coffee mug to the left, and lamp above — all matching the prompt. However, the phone is positioned to the *left* of the laptop (not right), and the notebook is below — which matches. Since the phone’s position contradicts the prompt (“phone to the right”), the spatial relationships are not fully correct. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image shows five animals in a grassy field arranged left to right: a horse, a cow, a sheep, a goat, and a dog. The horse is on the far left. The cow is to the right of the horse. The sheep is centered. The goat is to the right of the sheep. The dog is on the far right. All specified spatial relationships match the prompt exactly. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a red barn in the foreground, large and detailed, with a windmill positioned to its right in the midground at a smaller scale. Snow-capped mountains appear in the background, rendered small due to perspective. The spatial arrangement matches the prompt: barn (foreground, large), windmill (midground, medium), mountains (background, small). Depth cues like size reduction and atmospheric blur are consistent. All positional relationships are correctly represented. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image correctly depicts a beach scene with a sandcastle in the foreground, a surfer riding waves in the midground, and a cruise ship on the horizon in the background. The sizes of the objects decrease appropriately with distance, maintaining proper spatial relationships as described. All positional relationships (on, under, left, right, above, below) are accurate. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a park bench and pigeons in the foreground, a fountain in the midground, and skyscrapers in the background — matching the prompt’s layered spatial description. The bench is on the left, pigeons are scattered on the ground (below the bench), the fountain is centered behind them, and buildings rise above in the distance. All positional relationships align with the prompt. VERDICT: PASS
Scale & ProportionsClick to expand67%
Scale & Proportions measures whether fal/krea/v2-large keeps people, vehicles, buildings, and everyday objects at believable relative sizes.
FAILEvaluator (Qwen 3.5 122B)
The person in the foreground appears disproportionately small compared to the school bus, suggesting an unrealistic scale relationship. In reality, a person standing next to a school bus would appear much larger relative to the vehicle than shown here. The perspective and depth cues do not adequately compensate for this size discrepancy, making the proportions inconsistent with real-world expectations. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The ladybug appears appropriately sized relative to the leaf it is resting on. The leaf shows natural venation and serrated edges, and the insect’s scale matches real-world proportions — a typical ladybug is about 5–8 mm, and this one fits comfortably on a leaf that would be several centimeters wide. No anatomical or scaling distortions are evident. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The child’s hands are holding the basketball, and the ball appears appropriately sized relative to the child’s torso and arms — consistent with a real-world basketball held by a young child. No anatomical or object-scale distortions are visible. The proportions are believable and physically plausible. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a miniature figurine of a person positioned on the rim of a coffee cup, appearing to look down into the liquid. The scale relationship between the tiny person and the cup is consistent with the prompt’s description — the person is appropriately small relative to the cup, and their posture (leaning forward, looking down) matches the intent. Lighting and shadows enhance the realism of the size contrast. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image depicts a giant tabby cat walking between skyscrapers, with the cat’s body towering over the buildings. The scale is intentionally exaggerated to match the “kaiju” description — the cat’s legs and torso dwarf the high-rises, and its head reaches above them. This aligns with the prompt’s intent for dramatic size contrast. No internal inconsistencies in proportion are visible. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a miniature house resting in the palm of a human hand. The scale is consistent: the house fits comfortably within the hand, with fingers curled slightly around it, indicating proper size relationship between the objects. No anatomical or proportional distortions are visible. The lighting and depth of field also support realistic spatial relationships. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows five animals in a line: elephant, horse, dog, cat, and mouse. Their relative sizes are correctly scaled according to real-world proportions — the elephant is largest, followed by the horse, then dog, cat, and finally the tiny mouse. All are standing upright and aligned horizontally with consistent perspective and lighting. No anatomical or scaling errors are visible. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows a plate with a wine glass placed directly on top of it, which is physically unrealistic for a standard dinner setting. Additionally, the fork and knife appear disproportionately large relative to the plate, and the salt shaker seems oversized compared to typical tableware proportions. The peppercorn is also unusually large relative to the other objects. These inconsistencies violate realistic size relationships. VERDICT: FAIL
FAILEvaluator (Qwen 3.5 122B)
The image shows a farm scene with a barn, tractor, farmer, cow, and chicken. However, the proportions are inconsistent: the tractor is unrealistically large compared to the farmer and cow; the chicken is nearly as tall as the cow’s legs; and the barn appears too small relative to the tractor. Additionally, no fence post is visible despite being mentioned in the prompt. These scaling errors violate the requirement for “correct proportional scale.” VERDICT: FAIL
Human realism 83%
Faces & ExpressionsClick to expand92%
Faces & Expressions measures whether fal/krea/v2-large creates realistic faces with the requested age, emotion, expression, and identity cues.
PASSEvaluator (Qwen 3 VL)
Face is mostly symmetric with natural features and freckles. Smile appears warm and genuine, clearly conveying happiness. Slight asymmetry in eye shape and minor facial distortion are present but subtle. Overall, the emotion is readable and the portrait meets the prompt’s intent. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Face is symmetric with natural, correctly proportioned features. Expression conveys calm seriousness with subtle brow lines and neutral mouth. No visible distortions. Skin texture and hair detail appear authentic. Lighting and focus enhance realism. Meets all criteria for the prompt. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Face is symmetric with correctly proportioned features. Expression is neutral and relaxed, matching the prompt. No visible distortions or anomalies. Lighting and focus are natural, enhancing clarity. The teenager’s gaze is steady and composed. VERDICT: PASS
FAILEvaluator (Qwen 3 VL)
Face is asymmetric with distorted features—eyes squeezed shut, nose wrinkled, lips pulled back showing teeth. Emotion (disgust/anger) is strongly conveyed but not cleanly aligned with prompt’s “disgust” description. Skin texture and lighting add realism but don’t resolve structural asymmetry. Core emotion is readable, though facial structure is unnaturally warped. VERDICT: FAIL
PASSEvaluator (Qwen 3 VL)
Face is largely symmetric with correct anatomical features. The expression clearly conveys surprise: wide eyes, raised eyebrows, and slightly open mouth. Slight asymmetry in eyebrow lift and eye dilation is natural, not distorted. Lighting and focus enhance the emotional intent without artifacts. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Face is largely symmetric with natural features. Eyes are downcast, lips slightly drawn, gaze distant — clearly conveying quiet melancholy. No obvious distortions. Lighting and texture are realistic. Emotion is strongly readable and aligned with prompt. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Face is symmetric with natural, aged features: deep wrinkles, white hair, and warm smile. Eyes convey kindness and gentle emotion. No significant distortions; lighting and texture are realistic. Facial structure and expression align perfectly with the prompt. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The child’s face is symmetric with natural features: round cheeks, large eyes, and tears streaming down. The crying expression is clearly readable, with furrowed brows, downturned mouth, and visible tears. No significant distortions are present. The image accurately fulfills the prompt’s requirements. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The face shows natural asymmetry and realistic aging details: age spots, thinning gray hair, and deep wrinkles. The skeptical expression is clearly conveyed through downturned lips and narrowed eyes. Slight distortion exists in the glasses’ reflection and minor facial warping, but overall features are coherent and emotionally expressive. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Faces are symmetric and feature-accurate for anime style. Emotions are clearly readable: left figure laughs with head back, center looks shocked with wide eyes, right rolls eyes with smirk. No distortions. All three expressions match the prompt precisely. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Faces are symmetric with correct anime-style features. Emotions are clearly readable: father beams, mother smiles gently, teenager looks bored, child grins excitedly. No distortions; expressions match prompt. All four characters are present and identifiable. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Faces are symmetric with correct anatomical features. Emotions are clearly readable: left man shows frustration (furrowed brow, downturned mouth), center man nods with pleased smile, right woman looks confused (tilted head, furrowed brow). No distortions. All three expressions match the prompt. VERDICT: PASS
Full BodyClick to expand92%
Full Body measures whether fal/krea/v2-large renders full-body people with natural anatomy, clothing, pose, and proportions.
PASSEvaluator (Qwen 3 VL)
The woman stands upright with arms at her sides, wearing jeans and a white t-shirt as described. Limb count and proportions are anatomically correct. Pose is relaxed and recognizable. No significant distortions are visible. All elements align with the prompt. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The figure has two arms and two legs, with normal proportions relative to torso and head. Hands are visibly in pockets, and posture is upright. No anatomical distortions are present. The pose matches the prompt exactly. All elements are clearly recognizable and correctly rendered. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The person’s full body is visible from head to feet, standing in a neutral pose on grass beside a path. Limb count and proportions are anatomically correct. The pose is recognizable as standing still, facing forward. No significant distortions are present. The clothing and background are consistent with a park setting. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The figure shows a woman in motion blur, consistent with running. Limb count and proportions appear anatomically correct despite motion blur. The pose—arms swinging, torso leaning, hair flowing—clearly suggests running. No obvious distortions in limb structure. Motion blur effectively conveys dynamic movement as requested. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The man’s limbs are correctly counted and proportioned. His bent-knee jump over the puddle is anatomically plausible and recognizable. The reflection shows a distorted but coherent silhouette. No major distortions in body structure or pose are evident. The action matches the prompt precisely. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The dancer’s limbs are correctly counted and proportioned. The pose—dynamic spin with one leg extended—is anatomically plausible and recognizable. Fabric flows naturally around the body, suggesting motion. Slight softness in edges and hair suggests AI generation, but no major distortions. The action and description are accurately rendered. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The silhouette shows a person in a tree pose: one leg standing, other foot pressed to inner thigh, arms raised overhead. Limb count and proportions appear correct. The pose is recognizable despite being a silhouette. No anatomical distortions are visible due to the lack of detail. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The climber has two arms and two legs, with proportions and pose consistent with rock climbing. Limbs are extended to different holds as described. No anatomical distortions are visible. The action is clearly recognizable as dynamic rock climbing. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The cellist has correct limb count and proportions. The pose shows legs around the instrument and bow arm extended, matching the prompt. The cello is held properly between the knees, and the bow is drawn across strings. No anatomical distortions are present. The action is clearly recognizable as playing the cello. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Both players show correct limb count and proportions. The sliding player’s pose is dynamic and anatomically plausible; the jumping player’s posture is also realistic. The ball is at the sliding player’s feet, matching the prompt. No obvious distortions. The action is clearly recognizable as a mid-tackle scenario. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Both gymnasts show correct limb count and proportions. Their synchronized handstands are anatomically accurate, with legs straight and feet together, heads aligned. Pose is clearly recognizable as a handstand. No distortions observed. Alignment is nearly identical. VERDICT: PASS
Not generated
Model refused to generate this prompt (HTTP 422).
HandsClick to expand58%
Hands measures whether fal/krea/v2-large draws anatomically plausible hands, fingers, and gestures.
FAILEvaluator (Qwen 3.5 122B)
The left hand (viewer’s right) shows 5 fingers, but the thumb is oddly positioned and lacks natural curvature. The right hand (viewer’s left) also appears to have 5 fingers, yet the pinky is slightly fused or malformed at the tip. Both hands show unnatural spacing and slight distortion in finger joints — not fully anatomically correct. Minor anomalies present. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
Upon close inspection, the left hand (on the viewer’s left) clearly shows five fingers: thumb, index, middle, ring, and pinky — all anatomically correct. The right hand (viewer’s right) also displays five fingers, though slightly blurred due to depth of field; no extra or missing digits are evident. Both hands are palms-up, fingers spread naturally. No anomalies detected. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The left hand (on the right side of the image) shows 5 fingers clearly and correctly positioned. The right hand (on the left side of the image) also shows 5 fingers, but the thumb is partially obscured and appears to merge slightly with the index finger — a minor anatomical ambiguity. Overall, both hands have 5 fingers visible or implied, and the crossed-arm gesture is anatomically plausible. No major anomalies. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows a left hand (not right) holding the mug by its body, not the handle. The thumb is on top, and four fingers are wrapped around the side — but one finger appears fused or missing a joint, and the pinky is unusually short or absent. Anatomically incorrect grip and finger count anomaly present. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The visible right hand shows five fingers: thumb, index, middle, ring, and pinky. The grip on the pen appears anatomically correct — fingers are positioned naturally for writing, with the pen resting between thumb and index finger, supported by the middle finger. No anomalies detected in finger count or grip. Left hand is not visible, so no evaluation possible there. Since only one hand is shown and it meets criteria, core requirement is satisfied. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The left hand (fretting) shows four fingers clearly visible on the fretboard, with the thumb likely behind the neck — anatomically plausible. The right hand (strumming) is blurred but appears to have 4–5 fingers in a natural strumming position. No obvious extra or missing fingers; gestures are consistent with guitar playing. Minor blur doesn’t obscure anatomy enough to fail. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The hand displays five fingers total: index and middle fingers extended in a V-shape, while thumb, ring, and pinky are curled inward — anatomically correct for a peace sign. No extra or missing fingers; proportions and joint creases appear natural. Grip is consistent with human anatomy. No anomalies detected. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a single hand making a thumbs-up gesture. The thumb is extended upward, and the other four fingers are curled into a fist — consistent with the prompt. Anatomically, this is correct: one thumb + four fingers = five digits total on the hand. No anomalies or extra/missing fingers are visible. The grip and gesture match the description precisely. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The hand in the foreground shows four fingers extended (index, middle, ring, and pinky), with the thumb tucked or obscured — not matching the prompt’s request for “index, middle, and ring fingers extended” (which implies only three). Additionally, the thumb appears malformed or fused with the palm, lacking clear separation. Anatomical correctness is compromised. The gesture does not align with the prompt. VERDICT: FAIL
FAILEvaluator (Qwen 3.5 122B)
Upon close examination, the handshake shows anatomical inconsistencies. The hand on the left (blue shirt) appears to have only four visible fingers gripping the other hand, with the thumb positioned unusually. The hand on the right (gray suit) also shows distorted finger alignment — some fingers appear fused or misaligned, and the thumb placement is unnatural. The grip does not reflect a natural, anatomically correct handshake. VERDICT: FAIL
FAILEvaluator (Qwen 3.5 122B)
Upon close examination, the left hand (darker skin) shows 5 fingers clearly. The right hand (lighter skin) also displays 5 fingers. However, the gesture is not anatomically correct for a high-five: the hands are overlapping in an unnatural way — the right hand’s thumb and index finger appear to be merging with or passing through the left hand’s palm area. Additionally, the right hand’s pinky finger seems slightly distorted or fused with the ring finger. These anomalies violate anatomical plausib
PASSEvaluator (Qwen 3.5 122B)
Upon close examination: The left hand (dark sleeve) shows 4 fingers clearly visible, with thumb positioned correctly — likely 5 total. The right hand (light sleeve) also appears to have 5 fingers, though one is partially obscured. Grip posture is natural and anatomically plausible for handing over a box. No obvious deformities or extra/missing digits. Minor occlusion doesn’t invalidate correctness. VERDICT: PASS
Multi-SubjectClick to expand100%
Multi-Subject measures whether fal/krea/v2-large keeps multiple people distinct while preserving their requested attributes, outfits, and positions.
PASSEvaluator (Qwen 3 VL)
Left woman: short blonde hair, wears red fringed jacket over white top and jeans. Right woman: long black hair, wears blue dress with zipper detail. Both are visually distinct and match described features exactly. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Left man: beard, glasses, green sweater — matches description. Right man: clean-shaven, black hoodie, black cap — matches description. Both visually distinct, seated at cafe. All prompt features accurately represented. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Person 1: Tall, curly red hair, denim overalls over striped shirt, giving thumbs up. Person 2: Shorter, straight brown hair, yellow sundress. Both visually distinct and match descriptions precisely. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Bald man in suit: present, distinct. Woman with silver hair in red blouse: present, distinct. Young man with dreadlocks in denim jacket: present, distinct. Woman with hijab in green dress: present, distinct. All four match descriptions precisely and are visually separable. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Tall woman: pink short hair, playing bass. Man: beard, bandana, on drums. Slim man: glasses, playing keyboard. All three are visually distinct and match descriptions. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Three hikers are present and visually distinct. Left: man with sunhat and backpack. Center: woman with braided hair and hiking poles. Right: teenager with baseball cap and water bottle visible on backpack. All features match the prompt exactly. VERDICT: PASS
Truthfulness 74%
PhotorealismClick to expand100%
Photorealism measures whether fal/krea/v2-large makes materials, surfaces, and real-world objects look visually plausible and distinct.
PASSEvaluator (Qwen 3.6 27B)
The image accurately renders the requested materials. The glass shows correct refraction and transparency, the spoon exhibits realistic metallic reflections, the napkin displays a convincing linen weave, and the apple has a smooth, waxy skin with appropriate highlights. The wooden table provides a textured, rustic background with natural grain patterns. All elements demonstrate accurate light behavior and material properties. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The image successfully renders the requested materials with high fidelity. The laptop exhibits a convincing brushed aluminum finish with appropriate metallic reflections. The notebook shows a textured leather cover with visible stitching. The mug has the characteristic smooth, glossy sheen of ceramic. The paperweight displays accurate glass properties, including refraction and sharp specular highlights. The wooden desk surface is also distinct. All materials are visually distinct and realistic.
PASSEvaluator (Qwen 3.6 27B)
The image accurately renders the distinct material properties requested. The stainless steel pot exhibits realistic metallic reflections and brushed texture. The glass bottle shows appropriate translucency and refraction of the olive oil within. The wooden cutting board displays natural grain patterns, and the cotton towel has a soft, fibrous texture with correct light absorption. All surfaces interact with light in a physically plausible manner. VERDICT: PASS
Physics & ReflectionsClick to expand83%
Physics & Reflections measures whether fal/krea/v2-large follows basic physical cues such as shadows, reflections, gravity, transparency, and contact points.
PASSEvaluator (Qwen 3.6 27B)
The image displays a red sphere on a white surface. The lighting originates from the upper left, creating a highlight on the sphere’s upper-left side and casting a soft shadow to the lower right. This shadow direction is consistent with the light source. The shadow’s shape and gradient are natural, with no distortion or color bleeding. The reflection is minimal, as expected on a matte surface. All elements align with physical plausibility. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The lighting is consistent with a strong light source from the right, casting a sharp, well-defined shadow to the left. The shadow's shape accurately matches the mug's silhouette, including the handle. The reflection on the liquid surface is subtle and realistic. The high contrast and grainy texture enhance the "sunny windowsill" atmosphere without breaking physical laws. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The shadow is cast directly beneath the figure, consistent with a high sun position at midday. The orientation and length are physically plausible for the lighting conditions. The shadow’s edges are slightly soft but realistic given the outdoor environment. No distortions or color anomalies are present. The shadow aligns correctly with the person’s stance and the light source direction. VERDICT: PASS
FAILEvaluator (Qwen 3.6 27B)
The reflection in the mirror is physically implausible. A mirror directly behind the woman should reflect her front, not her back. Additionally, the reflection shows a recursive effect (a reflection within a reflection) that doesn't align with the spatial setup. The orientation and content of the reflection contradict basic optical physics for this scene. VERDICT: FAIL
FAILEvaluator (Qwen 3.6 27B)
The reflection is physically implausible. It appears as a dark, solid silhouette rather than a true optical reflection of the cat’s fur texture and lighting. Additionally, the shadow/reflection extends downward in a way that suggests both a cast shadow and a mirror reflection simultaneously, which is inconsistent with a single light source and polished surface. The orientation and distortion also lack realism for a marble floor reflection. VERDICT: FAIL
PASSEvaluator (Qwen 3.6 27B)
The reflection is physically plausible. It is correctly oriented (inverted vertically), matches the object's position, and exhibits appropriate distortion and slight blurring consistent with a reflective glass surface. The color and lighting of the reflection align with the real object, and the transparency of the vase and stems is preserved in the reflection. No directional or structural inconsistencies are present. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The reflections are vertically oriented and mirror the tree line, which is physically correct. The water surface shows gentle ripples that distort the reflection appropriately. The colors in the reflection are slightly darker and muted compared to the sky and trees, consistent with how water absorbs and scatters light. The overall effect is realistic and plausible for a calm lake at sunset. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The reflections are physically plausible. The orientation is correctly inverted, mirroring the skyline above. The distortion is consistent with a shallow puddle surface, showing slight waviness and blurring. The color shift is also realistic; the reflection appears warmer and darker than the actual buildings, which is typical for water reflections under overcast or twilight conditions. The alignment of the reflected structures matches their positions in the skyline. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The reflection is physically plausible. It is correctly inverted vertically, matching the mountain peaks and slopes. The color is subtly muted and darker, consistent with light absorption in water. The surface is still, creating a mirror-like effect with minimal distortion, which aligns with the prompt’s description of a “still alpine lake.” No directional or orientation errors are present. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The image displays a glass sphere on a reflective surface next to a pool of water. The reflection of the sphere is correctly inverted and positioned directly below it, consistent with a mirror surface. The refraction through the sphere distorts the background texture realistically. The water pool shows appropriate surface ripples and reflections. The lighting and shadows are consistent with the scene's geometry. The physical interactions between light, glass, water, and the mirror are rendered a
PASSEvaluator (Qwen 3.6 27B)
The reflections are physically plausible. The chrome surface accurately mirrors the room’s windows and grid patterns with appropriate distortion due to curvature. The glass floor shows a clear, slightly blurred reflection of the sculpture, consistent with real-world optics. Shadows are soft and directional, matching ambient light from the windows. Color fidelity is maintained — cool tones reflect naturally without unnatural saturation or inversion. No obvious artifacts break immersion. VERDICT:
PASSEvaluator (Qwen 3.6 27B)
The wine glass casts a clear, correctly oriented reflection on the polished marble surface, showing appropriate distortion and fading. The refraction through the glass is visible, subtly warping the background view. The lake’s surface reflects sunlight with realistic specular highlights, consistent with the light source direction. Shadows are soft and logically placed. All optical effects align with physical expectations for this scene. VERDICT: PASS
World KnowledgeClick to expand58%
World Knowledge measures whether fal/krea/v2-large depicts recognizable places, landmarks, flags, and known objects with accurate real-world details.
PASSEvaluator (Qwen 3.6 27B)
The image depicts the Eiffel Tower with high architectural accuracy. The iconic iron lattice structure, the three distinct levels (including the top platform and antenna), and the arched base are all correctly rendered. The perspective is a low-angle shot against a clear blue sky, matching the prompt's description of a "clear day." The structural details, such as the crisscrossing beams and the shape of the arches, are factually correct and recognizable. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The image depicts the Taj Mahal with high architectural accuracy, including the central dome, four minarets, and symmetrical layout. The reflection in the pool is rendered realistically with appropriate water distortion. The black-and-white tone enhances the structural details without compromising recognizability. No significant inaccuracies are present; the subject is clearly identifiable and faithful to real-world architecture. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The Statue of Liberty is clearly recognizable and accurately depicted with correct green patina, torch, crown, and tablet. The pedestal’s stone structure and crenellations are also faithful to the real-world monument. Minor cloud cover doesn’t detract from architectural fidelity. No significant inaccuracies detected in key features. VERDICT: PASS
FAILEvaluator (Qwen 3.6 27B)
The instrument is recognizable as a shamisen, but it contains significant factual inaccuracies. The body is depicted as a boxy, rectangular shape rather than the traditional rounded, drum-like form. The bachi (plectrum) is shown resting awkwardly on the neck instead of being held or placed separately. Additionally, the tuning pegs appear structurally inconsistent with traditional designs. These deviations from the authentic design lead to a failure in accuracy. VERDICT: FAIL
FAILEvaluator (Qwen 3.6 27B)
The image depicts a brass instrument resembling an astrolabe, but it fails the accuracy test. The text and numbers engraved on the rings are nonsensical gibberish, a common AI artifact. Furthermore, the mechanical structure of the rete (the star map overlay) is distorted and lacks the precise geometric logic of a real medieval astrolabe. While the texture and lighting are convincing, the core subject is not factually accurate. VERDICT: FAIL
PASSEvaluator (Qwen 3.6 27B)
The image accurately depicts a Japanese kintsugi bowl, showing cracked ceramic repaired with visible gold lacquer lines. The texture of the ceramic and the irregular, branching pattern of the gold repairs are realistic and consistent with traditional kintsugi aesthetics. No architectural or factual inaccuracies are present. The subject is clearly recognizable and aligns perfectly with the prompt’s description. VERDICT: PASS
FAILEvaluator (Qwen 3.6 27B)
The image is a stylized, cartoon-like illustration rather than a detailed scientific diagram. While it depicts key water cycle components (evaporation arrows, cloud, rain, river), the arrows are ambiguous—some point up, some down, and one river arrow points inland, contradicting natural flow. The lack of labels, precise directional flow, and schematic clarity makes it inaccurate for educational or factual representation. It’s visually appealing but not factually rigorous. VERDICT: FAIL
PASSEvaluator (Qwen 3.6 27B)
The image accurately depicts the geological features of a convergent plate boundary. It clearly shows the oceanic plate subducting beneath the continental plate, forming a deep ocean trench. The resulting volcanic arc on the continental side is correctly positioned above the subduction zone. The layered representation of the mantle and crust is visually distinct and scientifically consistent with standard educational diagrams of plate tectonics. VERDICT: PASS
FAILEvaluator (Qwen 3.6 27B)
The image depicts a human heart in a classic anatomical illustration style. It correctly shows the four chambers, the thick muscular wall of the left ventricle, and the major vessels like the aorta and pulmonary artery. However, there are significant anatomical inaccuracies. The superior vena cava is depicted with an impossible branching structure at its origin. The pulmonary veins are not clearly or correctly shown entering the left atrium. The internal valve structures and chordae tendineae ar
FAILEvaluator (Qwen 3.6 27B)
The image depicts a butterfly with the characteristic blue dorsal wings and brown ventral wings of a Morpho species. However, the anatomy is incorrect; the wings are fused or malformed at the body, lacking distinct separation between forewings and hindwings. The eyespots on the ventral side are also irregularly shaped and placed. While the color palette is correct, the structural inaccuracies make it fail the "factually accurate" criterion. VERDICT: FAIL
PASSEvaluator (Qwen 3.6 27B)
The image accurately depicts the interior of the Hagia Sophia, showcasing its iconic central dome, pendentives, and semi-domes. The presence of large Islamic calligraphy medallions alongside faded Byzantine mosaics is historically and architecturally correct for this site. The perspective and lighting effectively capture the grandeur and complex structure of the building. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The image depicts a bismuth crystal with high fidelity. It accurately portrays the characteristic geometric hopper crystal formation and the distinct "staircase" structure typical of this mineral. The iridescent oxide layers are rendered realistically, displaying the expected spectrum of metallic blues, purples, and golds. The subject is clearly recognizable and factually accurate. VERDICT: PASS
Professional Studio 85%
Camera & LightingClick to expand83%
Camera & Lighting measures whether fal/krea/v2-large follows specific lens, angle, focus, exposure, and lighting instructions.
PASSEvaluator (Gemma 4 26B)
The image successfully implements the requested camera effects. It features a wide-angle perspective that captures an expansive living room. Most notably, it exhibits significant barrel distortion at the edges, creating a curved, "fisheye-like" effect on the walls and ceiling as specified. The lighting shows natural light patterns hitting the furniture and walls, consistent with a modern interior shot. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image perfectly matches the prompt. It features a close-up macro shot of a single, clear dewdrop resting on a red rose petal. The depth of field is shallow, with the background and the upper parts of the rose being softly blurred, emphasizing the dewdrop. The lighting is bright and natural, creating a realistic reflection within the droplet. VERDICT: PASS
FAILEvaluator (Gemma 4 26B)
The image fails to meet the prompt's core requirement of an "aerial bird's-eye view looking straight down." Instead, the perspective is a high-angle oblique view (a three-quarter view). Because the camera is looking at an angle rather than straight down, the geometry of the roundabout and the orientation of the cars are skewed. The lighting is consistent with a bright day, but the perspective error is a fundamental failure. VERDICT: FAIL
PASSEvaluator (Gemma 4 26B)
The image successfully implements Rembrandt lighting. There is a distinct, small triangle of light visible on the shadowed cheek (the subject's right cheek, viewer's left). The lighting is directional, creating a high-contrast, moody atmosphere with deep shadows. The perspective is a standard portrait close-up, and the shallow depth of field keeps the focus on the facial features, matching the requested aesthetic perfectly. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image perfectly matches the prompt's requirements. The sky is a vibrant orange, characteristic of golden hour. The lighting creates long, warm shadows cast by the fence and the barn across the foreground. There is a soft, warm glow across the landscape, and the depth of field is appropriate for a landscape photograph, keeping the foreground and midground elements in focus. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image perfectly matches the prompt. It features a high-key portrait with a bright white background and extremely minimal shadows. The lighting is airy, luminous, and overexposed in a way that creates a soft, ethereal aesthetic. The subject is a person in a profile/three-quarter view, and the overall brightness adheres strictly to the "high-key" and "minimal shadows" requirement. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image perfectly matches the prompt. The depth of field is shallow, creating a creamy bokeh background typical of an 85mm f/1.4 lens. The lighting features a clear warm rim light on the subject's hair and shoulder, effectively separating her from the blurred garden background. The perspective is a classic portrait shot. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image successfully implements the requested camera settings. The subject is sharp and in focus, while the background exhibits a shallow depth of field characteristic of a 50mm f/1.8 lens. The background elements, including the buildings and pedestrians, are softly blurred. The lighting is natural and consistent with an outdoor street setting. VERDICT: PASS
FAILEvaluator (Gemma 4 26B)
The image fails to meet several key criteria. The prompt specifies a shot "from above" (top-down/flat lay perspective), but the image is shot from a low-angle, eye-level perspective. Additionally, the lighting is bright and even across the scene, lacking the specific directional "natural window light from the left" and the "slight vignette" requested. The depth of field is also relatively deep, not showing the shallow depth expected from f/2.8. VERDICT: FAIL
PASSEvaluator (Gemma 4 26B)
The image successfully implements the tilt-shift effect. It features a distinct selective focus strip where the middle ground (the street level with cars) is sharp, while the foreground and the large building in the background are heavily blurred. This creates the intended "miniature model" look. The colors are also highly saturated, and the high-angle perspective enhances the toy-like appearance. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image features a dark, moody scene with a prominent horizontal light streak. The streak contains both blue and orange tones, effectively mimicking the requested anamorphic lens flare. The lighting is cinematic, and the depth of field is shallow, with the foreground and background appearing out of focus. The horizontal orientation of the light streaks matches the prompt's description perfectly. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image perfectly matches the prompt. It features a close-up portrait with dramatic split lighting. One side of the face is brightly lit, while the other side is cast in deep, heavy shadow, creating a high-contrast chiaroscuro effect. The lighting pattern is sharp and distinct, successfully dividing the face into light and dark halves. VERDICT: PASS
Color PrecisionClick to expand92%
Color Precision measures whether fal/krea/v2-large renders requested colors clearly and consistently across subjects, products, and backgrounds.
PASSEvaluator (Gemma 4 26B)
The image features a bright red sports car parked on a gray asphalt road. The red color of the car is vivid, saturated, and unambiguous, exactly as described in the prompt. The road is a neutral gray, providing a clear contrast. All color requirements are met perfectly. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The woman is wearing a vibrant, deep blue dress that perfectly matches the description of "royal blue." The background is a plain, light grey/off-white, which qualifies as a "neutral background." All color requirements specified in the prompt are accurately represented in the image. VERDICT: PASS
FAILEvaluator (Gemma 4 26B)
The image features a field of sunflowers with petals that are a pale, creamy yellow rather than a "pure sunflower yellow." The sky is a clear, bright blue, which matches the prompt. However, because the flowers lack the vibrant, saturated yellow color specifically requested, the color accuracy for the flowers is not fully met. VERDICT: FAIL
PASSEvaluator (Gemma 4 26B)
The image features a bedroom with a prominent wall painted in a vibrant, saturated orange. This color closely aligns with the described "burnt orange" (#FF6B35). The furniture, including the bed, nightstand, and lounge chair, is consistently white as requested. The lighting creates natural daylight effects across the room. All color requirements for the wall and furniture are met. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image features a forest scene with foliage in various shades of green and brown tree trunks. The foliage displays a dark, muted green tone that aligns well with the requested color #2D5F2D. The tree trunks are clearly visible and are a natural brown color. The colors in the image accurately reflect the specific color requirements provided in the prompt. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image features walls in a medium blue color that closely aligns with the requested #4A90D9. The floor is a light-colored wood, which can be described as a warm wooden floor, and the baseboard/trim is white. All color requirements specified in the prompt are accurately represented in the image. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image accurately reflects the requested colors. The walls are a warm cream color matching #F5E6CC, the sofa is a dark navy consistent with #2C3E50, and the accent pillows feature a bright red pattern matching #E74C3C. All color requirements and hex values are correctly represented in the scene. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image accurately reflects the requested colors. The background is a deep, dark navy blue consistent with #1A1A2E. The watch features a soft gold metallic finish matching #F0C27F. The earbuds are a clean, pure white consistent with #FFFFFF. All color requirements from the prompt are met. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image features a near-white background, a vibrant orange-red main shape, and dark charcoal accent lines. The background is a very light off-white, consistent with #FAFAFA. The main shape is a bright orange-red, matching the description of #FF4500. The accent lines are a dark charcoal/black, consistent with #333333. The colors accurately reflect the specified hex values and descriptions. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image displays a smooth horizontal gradient. On the far left, the color is a vibrant red, which matches the #FF0000 specification. On the far right, the color is a deep blue, matching the #0000FF specification. The transition through the middle shows a smooth blend through purple, exactly as described in the prompt. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image features a vertical gradient. The bottom color is a vibrant orange, which closely matches the requested #FF6B35. The top color is a very dark, deep indigo/navy, which aligns with the requested #1B1464. The transition through the middle colors also follows a natural sunset progression. The colors accurately reflect the specific hex values and names provided in the prompt. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image shows a horizontal banner with a smooth gradient. On the left side, the color is a bright, vibrant cyan, which aligns well with #00C9FF. On the right side, the color transitions into a soft, pale mint green, which matches the description of #92FE9D. The gradient transition is smooth and follows the specified color direction accurately. VERDICT: PASS
PhotorealismClick to expand67%
Photorealism measures whether fal/krea/v2-large produces polished product and studio imagery with realistic materials, lighting, shadows, and finish.
PASSEvaluator (Gemma 4 26B)
The image successfully captures the requested studio aesthetic. The lighting is soft, and the shadow is realistic for a product shot. The textures of the suede and the laces appear natural. There are no obvious AI artifacts or structural distortions visible; the shoe maintains a coherent shape and the laces are rendered correctly. It passes as a high-quality professional product photograph. VERDICT: PASS
FAILEvaluator (Gemma 4 26B)
The image fails to pass as a real photograph. There are significant AI artifacts regarding the physics of light and glass. The internal geometry of the bottle is nonsensical; the way the glass edges and liquid lines intersect is inconsistent and physically impossible. The reflection on the surface is also distorted and does not accurately match the base of the bottle. The overall structure of the glass looks "melted" and lacks the sharp, precise edges of a real product shot. VERDICT: FAIL
PASSEvaluator (Gemma 4 26B)
The image is a high-quality studio shot of a green apple. The lighting is soft and realistic, creating a natural-looking highlight on the skin and a subtle, soft shadow on the white background. The texture of the apple skin, including the small lenticels, is well-rendered. There are no obvious AI artifacts or distortions; the stem and the indentation at the top look natural and physically accurate. It passes as a real photograph. VERDICT: PASS
Graphical design 58%
Data VisualisationClick to expand33%
Data Visualisation measures whether fal/krea/v2-large produces charts with correct labels, proportions, colors, and readable values.
FAILEvaluator (Qwen 3 VL)
The chart correctly displays a bar graph titled "Monthly Sales 2025" with labeled axes and months Jan–Jun. Bars are color-coded uniformly (teal). However, no specific sales values (12K, 18K, etc.) are labeled on the bars or axis, making the exact data unreadable. The chart type matches, but critical data is missing. VERDICT: FAIL
PASSEvaluator (Qwen 3 VL)
The pie chart correctly displays all four categories with accurate percentages: Company A (35%), Company B (25%), Company C (20%), and Others (20%). Each slice has a distinct color and clear, readable labels with percentages. The chart type matches the requested pie chart visualization. All elements are properly rendered and legible. VERDICT: PASS
FAILEvaluator (Qwen 3 VL)
The chart correctly displays a line graph titled "Weekly Traffic" with days Mon-Sun and data points 1200, 1500, 1100, 1800, 2000, 1700, 1400. Grid lines, axis labels, and readable annotations are present. However, the y-axis scale is mislabeled — it shows "2000" twice and "3000" incorrectly positioned, suggesting a rendering error. The chart type matches the request. VERDICT: FAIL
Layout & DesignClick to expand44%
Layout & Design measures whether fal/krea/v2-large organizes posters, menus, covers, and other designed compositions with the requested hierarchy and elements.
PASSEvaluator (Qwen 3 VL)
The image features a large popcorn bucket centered, with "MOVIE NIGHT" in bold at the top and "Every Friday at 8 PM" at the bottom. Text is readable and correctly placed per the prompt. Visual hierarchy is clear, with the popcorn as the focal point and text anchoring top and bottom. All required layout elements are present and correctly positioned. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The image features a guitar illustration, the headline "LIVE MUSIC FEST" in large, bold text with correct visual hierarchy, and the tagline "Tickets available now" below it. Text is readable and correctly placed. All required layout elements are present and properly arranged. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The poster features the title "JOIN THE BOOK CLUB" prominently at the top in large, bold text. Below it, "Meets every Wednesday" is clearly visible in smaller font. An open book illustration occupies the lower half, centered beneath the text. Visual hierarchy is correct, and all text is readable and properly placed per the prompt. VERDICT: PASS
FAILEvaluator (Qwen 3 VL)
The image shows a hero header, headline, subtitle, “Sign up” button, and footer with three social icons — all present with correct visual hierarchy. However, the headline and subtitle text is gibberish and unreadable, failing the text readability requirement. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
The image lacks a proper newsletter layout. It has a header image, but no two-column body section with text on the left and image on the right. The quote block is present but misaligned. All text is gibberish and unreadable. The footer link is present but nonsensical. No visual hierarchy matches the prompt. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
The image shows a menu with a logo header, three category sections (labeled "Mertises," "Piesterts," "Desserts"), and prices. However, category labels are not clearly "Appetizers," "Mains," or "Desserts" — they are misspelled or fabricated. Text is readable but mislabeled, and no footer with address/hours exists. Layout hierarchy is present but content is inaccurate. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
The left page features a large hero photo of a coastal landscape, matching the prompt. The right page has a single column of text, a pull quote, and a sidebar with author bio, but lacks the specified two-column article layout. Text is present but unreadable due to gibberish placeholder content. Layout hierarchy is partially correct but incomplete. VERDICT: FAIL
PASSEvaluator (Qwen 3 VL)
The image shows a mobile UI mockup with a top search bar, a 2x2 grid of feature cards with icons, a labeled "Events" list section, and a bottom tab bar. All described layout zones are present. Text is readable and correctly placed. Visual hierarchy is consistent with standard UI design. VERDICT: PASS
FAILEvaluator (Qwen 3 VL)
The left page has a full-bleed photo as required. The right page shows a headline, body copy in two columns, and page number “04” at bottom. However, the headline and body text are gibberish and unreadable, violating the requirement for readable, correctly placed text. Layout structure is present but text fails core functionality. VERDICT: FAIL
Style DiversityClick to expand75%
Style Diversity measures whether fal/krea/v2-large can shift between visual styles like oil painting, pixel art, watercolor, and technical illustration.
PASSEvaluator (Qwen 3.5 122B)
The image clearly depicts a golden retriever sitting in a garden setting. The style is unmistakably painterly, mimicking oil painting techniques with visible, textured brushstrokes throughout the fur, grass, and foliage. The color blending and lack of photographic sharpness confirm the requested artistic style is successfully rendered. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image clearly depicts a golden retriever in a garden setting. The art style matches the requested Japanese anime aesthetic with flat color shading, clean line work, and notably large, expressive eyes with prominent highlights. The character design and rendering techniques are consistent with anime conventions. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image depicts a golden retriever in a garden setting, but it fails to meet the specific stylistic requirements. The requested "32x32 grid" implies a very low-resolution, blocky aesthetic where individual pixels are large and distinct. This image appears to be a higher-resolution pixelation or a blurred image that mimics pixel art without adhering to the strict grid constraint. The visual characteristics of true 32x32 pixel art are not clearly recognizable. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image clearly exhibits Art Deco characteristics: strong geometric shapes in the building facades and street layout, a dominant gold and black color palette, and ornamental symmetry in the central building’s decorative frieze. The stylized trees and vintage cars also fit the era’s aesthetic. While not perfectly symmetrical overall, the core visual language of Art Deco is unmistakably present and recognizable. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image clearly exhibits the requested Impressionist style. Key characteristics are present: soft, visible brushstrokes define the forms and background; dappled light is evident in the sunlit patches on the dress and grass; and the palette consists of soft pastels—blues, whites, greens, and yellows. The overall effect matches the prompt’s description of an Impressionist painting. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image clearly exhibits Pop Art characteristics: bold black outlines define the figure, flat vivid primary colors (yellow background, blue jacket, red accents) dominate the palette, and Ben-Day dots are visibly applied across the entire surface to create shading and texture. The stylized, graphic rendering of the man’s face aligns with classic Pop Art portraiture. All requested visual elements are present and recognizable. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image successfully mimics a watercolor painting style with visible brush strokes, bleeds, and paper texture effects around the edges. The forest scene has a soft, diffused quality typical of watercolor, with layered transparency and color bleeding. However, it lacks true photorealism — the tree bark and background are stylized rather than detailed like a photograph. While the watercolor aesthetic is clear, the “photorealistic” component is missing. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image clearly exhibits the requested cyberpunk Art Nouveau style. Key characteristics are present: neon color palette (pinks, purples, teals), futuristic architecture with flowing organic lines, floral borders framing the scene, and a central female figure blending tech elements (visor, sleek suit) with natural motifs (floral hair extensions). The fusion of technology and nature in an ornate, glowing aesthetic matches the prompt precisely. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image displays a medieval castle with strong ukiyo-e woodblock print characteristics: bold outlines, flat shading, and stylized cloud patterns reminiscent of Hokusai or Hiroshige. However, the sky and lighting are not photorealistic — they retain the graphic, illustrative quality of the woodblock style throughout. The prompt requires a *combination* of ukiyo-e castle + photorealistic sky/lighting, which is not achieved. The entire image remains stylistically unified in the woodblock aestheti
PASSEvaluator (Qwen 3.5 122B)
The image clearly exhibits soft brutalism: raw concrete dominates the architecture (walls, ceiling, coffee table, curved partition), while rounded pastel furniture (white sofa with mint and lavender cylindrical cushions) provides contrast. Warm diffused light emanates from pastel-shaded wall sconces in arched niches, enhancing the mood. The skylight adds natural illumination. All key visual characteristics of the requested style are present and well-executed. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image exhibits a sparse composition with a single armchair and an apple against a vast, empty white background, creating uncomfortable negative space. The colors are muted — dusty pink, gray, and off-white — and the brushwork is soft and painterly, evoking a quiet, anxious stillness. Visual tension arises from the isolation of objects and the stark emptiness. These elements align closely with “anxious minimalism.” VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image displays a cityscape with rounded, dome-topped buildings and curved streetlights that strongly evoke 1960s retro-futurism. The entire scene is bathed in a monochromatic, warm orange/sepia tone that creates a nostalgic, melancholic atmosphere. The soft, hazy rendering further enhances the vintage feel. The requested style is clearly recognizable and its key visual characteristics are present. VERDICT: PASS