ImageBench V1 —
192 evaluations across 6 categories
Benchmark V1 verdicts are produced by VLM judges and can contain mistakes. Treat PASS/FAIL labels as machine-assisted assessments, and inspect the images yourself. Learn more about the methodology.

Prompt: The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink
Text Rendering 47%
Typography StyleClick to expand100%
Typography Style measures whether local/flux-2-klein-9b preserves requested lettering styles, font mood, and text texture such as typewriter ink or neon signage.
PASSEvaluator (Qwen 3 VL)
Text reads “CHAPTER ONE” correctly. Font mimics vintage typewriter style with blocky, slightly uneven ink strokes and visible paper texture. Aged paper background shows creases, stains, and wear. Ink appears imperfectly saturated, matching “slightly uneven ink” description. All elements convincingly simulate a typewritten document on aged paper. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Text reads “TODAY’S SPECIAL” correctly. Letters show white chalk texture, smudges, and dust particles on dark green board, convincingly mimicking chalk writing. Spelling and medium match prompt precisely. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The word "OPEN" is correctly spelled and rendered in a glowing, pixel-style LED matrix with distinct red dots forming each letter. The style convincingly mimics a classic LED sign, with uniform brightness and clear pixel structure. The red glow and reflection on the surface enhance realism. VERDICT: PASS
Writing accuracyClick to expand33%
Writing accuracy measures whether local/flux-2-klein-9b renders the exact requested words, numbers, labels, and short phrases without spelling or layout errors.
PASSEvaluator (Qwen 3 VL)
The word "HELLO" is fully spelled correctly and is perfectly legible against the white background. All letters are intact and clearly formed. No misspellings, missing letters, or illegible parts are present. The text matches the intended specification exactly. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The word "ADVENTURE" is spelled correctly and fully readable in bold, centered text on a cream background. All letters are intact and clearly legible. No misspellings or missing parts. The design matches the intended specification exactly. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The word "SERENITY" is spelled correctly and fully readable, centered on a solid light blue background. No misspellings, missing words, or illegible parts are present. The text matches the intended specification exactly. VERDICT: PASS
FAILEvaluator (Qwen 3 VL)
The word "brown" is misspelled as "boøøx" with distorted characters. "jumps high" is correct. Text is readable but contains a clear spelling error. The core requirement of correct spelling is not met. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
The text reads "Every sunset brings a new down ahead". The word "down" incorrectly replaces "dawn". This is a misspelling/incorrect word. The intended phrase "Every sunset brings a new dawn ahead" is not fully accurate. VERDICT: FAIL
PASSEvaluator (Qwen 3 VL)
The text "Stars shine brightest in the dark night" is fully spelled correctly and clearly legible against the off-white background. No words are missing or misspelled. The font is consistent and high-contrast, ensuring readability. All characters are intact and properly formed. VERDICT: PASS
FAILEvaluator (Qwen 3 VL)
The word "tems" is misspelled (should be "items"). All other text is correctly spelled and readable. The fine print is present but contains a minor error. The core requirement is met only if all text is correct; since "tems" is wrong, the accuracy fails. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
The text "seetting" is misspelled; it should be "seating". All other words are correctly spelled and fully readable. The intended disclaimer contains an error that affects accuracy. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
Text contains errors: "fm" instead of "fine", "fne" instead of "fine". "GALA" has a decorative accent mark, but "GALA" is correct. "2025" is correctly displayed. Footnote is accurate. Misspellings and illegible parts prevent full correctness. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
“MARUS CHEN” is misspelled (should be “Marcus”). “PARMOUT PICTURES RELEASE” is misspelled (should be “PARAMOUNT PICTURES RELEASE”). All other text is correct and readable. Minor errors in credits block prevent full accuracy. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
Text errors: “Cambodia” misspelled as “Camboodia”; “Maria Santos” as “Marria Santos”; “Climate Action” as “Climate Acize”; “This Sa Yeah” instead of “This Year”. “Issue 247” is correct. Multiple words are misspelled or garbled. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
“Unflirteted” is misspelled (should be “Unfiltered”). “harserted” is misspelled (should be “harvested”). “Northwewst” is misspelled (should be “Northwest”). “aditives” is misspelled (should be “additives”). “presertaives” is misspelled (should be “preservatives”). Multiple critical errors in spelling and content. VERDICT: FAIL
Spatial Reasoning 77%
Attributes BindingClick to expand67%
Attributes Binding measures whether local/flux-2-klein-9b attaches the right colors, materials, and properties to the correct requested objects.
PASSEvaluator (Qwen 3.5 122B)
The image shows two objects on a white surface. On the left is a red sphere (ball). On the right is a blue cube. Both objects are correctly colored and shaped as specified. They are positioned side by side. All attributes (color, shape) are correctly bound to their respective objects with no swaps. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a green apple and an orange pumpkin placed on a wooden cutting board. The apple is correctly colored green and has the typical round shape with a stem. The pumpkin is orange, round, and has a stem as expected. Both objects are appropriately sized relative to each other and the board. No attributes are swapped or misassigned. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a yellow taxi and a black limousine parked next to each other on a street. The taxi is correctly yellow and has the typical taxi shape with a roof sign. The limousine is correctly black and has the elongated shape characteristic of a limousine. Both vehicles are positioned side-by-side as requested. All attributes (color, size, shape) are correctly bound to the right objects with no swaps. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows three distinct shapes on a white background. The large red shape is a semicircle (half-circle), not a full circle as requested. The blue triangle is small and correctly shaped. The green star is medium-sized and correctly shaped. Since the prompt specifically asked for a "large red circle" and the image contains a semicircle, the shape attribute is incorrectly bound. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image shows three candles on a wooden shelf: 1. Left: Tall, thin, white candle — matches description. 2. Center: Short, fat, red candle — matches description. 3. Right: Medium-height, spiral-shaped, yellow candle — matches description. All attributes (color, size, shape) are correctly bound to each object with no swaps. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a big brown horse with a white star on its forehead, standing in a meadow. A small white rabbit is sitting beside it. A medium-sized black cat is also present, sitting next to the rabbit. All objects match their described attributes: color, size, and shape are correctly bound. No swaps or mismatches observed. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains a pink fire truck with correct shape and features (ladder, lights). A large blue banana is present, correctly colored and shaped. A green sun with a face and rays is visible, matching the description. All attributes are correctly bound to their respective objects with no swaps. The surreal landscape context is also maintained. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image contains: 1. A purple carrot with green/purple stems — matches “purple carrot”. 2. A tree with red leaves and brown trunk — matches “red tree with red leaves”. 3. A watermelon sliced open, but it has green rind and pink/red flesh — does NOT match “white watermelon”. The watermelon’s color is incorrect per the prompt. Attributes are mostly bound correctly except for the watermelon’s color. VERDICT: FAIL
FAILEvaluator (Qwen 3.5 122B)
The image shows three objects on a wooden table: 1. Orange elephant – correct color and shape. 2. Yellow penguin – correct color, but the object is clearly a chick or duckling (not a penguin). 3. Silver strawberry – correct color and shape, with green leaves. The penguin is misidentified; it’s not a penguin. This violates the “specific attributes” requirement. VERDICT: FAIL
CompositionalityClick to expand100%
Compositionality measures whether local/flux-2-klein-9b combines subjects, actions, outfits, props, and scene details into one coherent requested image.
PASSEvaluator (Qwen 3.5 122B)
The image contains all required elements: - Tabby cat: Present (brown/black striped fur with white muzzle/chest) - Oversized red sunglasses: Present (large red frames with dark lenses covering eyes) - Coherent combination: Yes — sunglasses fit naturally on the cat’s face, no anatomical distortions or blending errors. All prompt elements are accurately and coherently rendered. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains: - Golden retriever: Present and clearly depicted. - Tiny cowboy hat: Present, though not particularly "tiny" — it fits proportionally on the dog’s head. - Coherent combination: Yes, the hat sits naturally on the dog’s head without visual glitches. All prompt elements are present and coherently combined. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains: 1. A penguin - Present (cartoon-style, black and white with orange beak/feet) 2. Holding - Present (penguin’s flipper grips umbrella handle) 3. Colorful umbrella - Present (rainbow-colored panels: red, blue, yellow, green, orange, purple) All elements are clearly depicted and coherently combined. The penguin is centrally positioned under the open umbrella, which it holds upright. No contradictions or missing components. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a medieval knight in full armor sitting at a wooden desk, typing on a modern laptop. All described elements are present: the knight’s armor is detailed and complete, the desk is rustic and period-appropriate, and the laptop is clearly modern (Apple logo visible). The combination is coherent and visually integrated. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains all required elements: 1. Roman gladiator - Present (wearing armor, helmet with red crest, shield) 2. Taking a selfie - Present (holding smartphone up, smiling at camera) 3. Smartphone - Present (black phone in hand) 4. Colosseum - Present (ancient Roman amphitheater structure in background) All elements are coherently combined in a single scene. The gladiator is actively taking a selfie with a modern smartphone while standing inside the Colosseum. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains all described elements: 1. Astronaut in a spacesuit - Present (white suit with helmet, American flag patch) 2. Riding a bicycle - Present (mountain bike being ridden) 3. Through a park - Present (trees, grass, benches, paved path) The unusual concept is coherently combined - the astronaut appears to be genuinely riding through a realistic park setting with proper lighting and shadows. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains all described elements: - Octopus: Present, central figure. - Top hat: Present, worn by the octopus. - Monocle: Present, worn by the octopus. - Chess: Present, being played on a board. - Moon surface: Present, depicted as the ground. - Earth in background: Present, two Earth-like planets visible. All elements are coherently combined. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains all requested elements: 1. Dinosaur: Present (T-Rex head). 2. Business suit: Present (dark suit, white shirt, blue tie). 3. PowerPoint presentation: Present (screen with slides and pointer). 4. Modern office: Present (desks, chairs, windows, plants). 5. Drinking coffee: Present (holding a white cup). All elements are coherently combined in a single scene. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains all described elements: 1. Robot dressed as a pirate - Present (metallic body, pirate hat, coat) 2. Paper boat - Present (white origami-style boat) 3. Lake of lava - Present (glowing orange liquid with cracks) 4. Three kittens - Present (one on each hand, one floating above) 5. Juggling action - Present (kittens positioned as if being juggled) All elements are coherently combined in a single scene. VERDICT: PASS
CountingClick to expand44%
Counting measures whether local/flux-2-klein-9b creates the exact number of requested objects without adding or missing items.
PASSEvaluator (Qwen 3.5 122B)
The image displays exactly three red apples placed on a wooden surface. The apples are arranged in a row, and the count matches the prompt’s request precisely. No other objects are present that contradict the description. The visual content aligns perfectly with the specified criteria. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image displays exactly five yellow rubber ducks arranged in a row on a white bathroom shelf. The count matches the prompt’s request precisely. All ducks are identical in appearance, facing forward, with orange beaks and black eyes. No extra or missing objects are present. The setting (bathroom shelf, tiled wall, mirror) aligns with context. Core criteria fully met. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image displays exactly four white pillar candles arranged in a row on a dark, textured surface. Each candle is unlit and upright, with visible wicks. The count matches the prompt’s request precisely: four white candles, standing on a dark surface. No extra or missing objects are present. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows 1 orange cat and 2 black dogs sitting on a grassy lawn. The prompt requested 2 orange cats and 3 black dogs. The counts do not match — there is one fewer cat and one fewer dog than specified. VERDICT: FAIL
FAILEvaluator (Qwen 3.5 122B)
The image shows a bouquet with multiple red roses — at least 5–6 visible, not 4. There are also 2–3 white daisies (one fully visible, one partially behind it, and possibly a third in the back). The vase is glass and on a wooden table, which matches. However, the rose count exceeds the requested “4,” and daisy count may be slightly off. Since the prompt specified exact counts and they don’t match precisely, this fails the core criterion. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image shows a wooden garden fence with butterflies resting on it. There is exactly one blue butterfly on the left side of the fence. There are three yellow butterflies: one large in the center, one smaller to its right, and another small one further right near the fence post. The count matches the prompt’s request of “1 blue butterfly and 3 yellow butterflies.” All objects are clearly visible and correctly counted. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows a round glass fishbowl containing goldfish and decorative stones at the bottom. Counting the goldfish: there are 8 clearly visible goldfish swimming in various orientations. The prompt requested exactly 9 goldfish. Since only 8 are present, the count does not match the requirement. VERDICT: FAIL
FAILEvaluator (Qwen 3.5 122B)
The image shows a fruit bowl with: - 2 bananas (not 3) - 4 oranges (correct) - 1 apple (not 2) The counts do not match the requested quantities. Specifically, there is one fewer banana and one fewer apple than specified. VERDICT: FAIL
FAILEvaluator (Qwen 3.5 122B)
The image shows 10 colorful balloons floating against a clear blue sky. The prompt requested exactly 11 balloons. Since the count is off by one, the core criterion is not met. VERDICT: FAIL
NegationClick to expand56%
Negation measures whether local/flux-2-klein-9b respects what should be absent from the image, not only what should be present.
PASSEvaluator (Qwen 3.5 122B)
The image shows a tropical beach scene with white sand, turquoise water, and palm trees leaning over the shore. The sky is blue with scattered clouds. There are footprints in the sand and some seaweed washed up near the waterline. No people are visible anywhere in the frame — no bodies, limbs, clothing, or silhouettes. The exclusion criterion (“no people”) is fully satisfied. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows an orange and white cat lying on a gray couch. The cat is looking at the camera with its front paws extended. There is no collar visible around the cat’s neck — the fur appears uninterrupted and clean in that area. The excluded element (collar) is truly absent. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows an empty classroom with wooden desks and chairs arranged in rows facing a green chalkboard. There are no students visible in the room. The excluded element (students) is truly absent. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a sandy beach with palm trees, seashells in the foreground, and a green beach umbrella with lounge chairs. The background is a clear blue sky meeting the horizon line where sand ends — there is no visible water, ocean, sea, or any body of liquid. The excluded element (water/ocean) is truly absent. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows an office workspace with a wooden desk, a beige office chair, a tall bookshelf filled with books, and a black desk lamp. On the desk, there is a closed laptop (which could be considered a computer/screen when open) and some notebooks. Since a laptop is present — even if closed — it technically violates the exclusion of “no computers or screens.” VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image shows a kitchen scene with a gas stove, several stainless steel pots and pans, wooden cutting boards, and a folded cloth. The burners are lit with blue flames. There is no visible food in any of the pots, pans, or on the cutting boards. The excluded element — food — is indeed absent. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows a pepperoni pizza with tomato sauce, basil leaves, and a golden-brown crust. However, it clearly contains melted cheese covering the sauce and beneath the pepperoni slices — visible as the white/yellowish layer with browned spots. Since the prompt explicitly excluded cheese, and cheese is prominently present, the core criterion is violated. VERDICT: FAIL
FAILEvaluator (Qwen 3.5 122B)
The image shows a silver car with windows, doors, and headlights as described. However, the excluded element — wheels — is NOT truly absent. The rear wheel is clearly visible. The front wheel area is missing or obscured (possibly due to rendering error), but since at least one wheel is present, the exclusion criterion is violated. VERDICT: FAIL
FAILEvaluator (Qwen 3.5 122B)
The image shows a human face with eyes, nose, and mouth. However, the excluded element — eyebrows — are clearly present. Both eyebrows are well-defined, dark, and naturally shaped above each eye. Since the prompt explicitly required “no eyebrows,” their presence violates the exclusion criterion. VERDICT: FAIL
Relative PositionClick to expand92%
Relative Position measures whether local/flux-2-klein-9b places objects in the requested spatial relationship, such as above, below, inside, or next to another object.
PASSEvaluator (Qwen 3.5 122B)
The image shows a cat sitting directly on the top surface of a cardboard box. The cat is positioned centrally on the box, with its body oriented slightly to the right. The box is on the floor, and the cat is above it. All spatial relationships described in the prompt are accurately represented. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a red ball positioned directly beneath the seat of a wooden chair, resting on the floor between the chair legs. The spatial relationship described — “a red ball underneath a wooden chair” — is accurately represented. All positional descriptors (underneath) are correct and visually confirmed. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a coffee mug positioned to the right of an open book on a wooden desk. The book is open and lies flat, with the mug placed beside it. The spatial relationship described in the prompt — “a coffee mug next to an open book on a desk” — is accurately represented. All positional relationships (on, next to) are correct. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a bird perched on a branch that extends from a large tree trunk. The tree is positioned next to a park bench, which sits on grass near a paved path. The spatial relationships described — bird “on” branch, branch extending “from” tree, tree “next to” bench — are all accurately represented in the image. No positional errors detected. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a cat sitting on a pillow, which is placed on a wooden chair. The spatial relationships described in the prompt are accurately represented: the cat is on the pillow, and the pillow is on the chair. All positional relationships (on, under) are correct. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a lit candle placed directly on top of a stack of books. The stack of books is positioned to the left of a potted plant. All specified spatial relationships — “candle on top of books” and “books next to potted plant” — are accurately represented. The arrangement matches the prompt’s description precisely. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows five chess pieces on a board, but the spatial arrangement does not match the prompt. The prompt specifies: king in center, queen left, bishop right, rook above, knight below. In the image: there is no queen or bishop visible — instead, we see a rook (left), king (center), knight (right), pawn (below), and another king (above). The pieces and their positions are incorrect relative to the prompt’s description. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image shows a desk scene with a laptop centered, a coffee mug to its left, and a smartphone to its right — all correct. A desk lamp is positioned above and to the right of the laptop, casting light downward — consistent with “lamp above.” Below the laptop, an open notebook lies flat on the desk — matching “notebook below.” All specified spatial relationships are accurately represented. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows five animals in a field arranged left to right as follows: a brown horse, a brown-and-white cow, a white sheep, a black-and-brown goat, and a tan dog. The spatial relationships match the prompt exactly — horse on far left, cow to its right, sheep in center, goat to sheep’s right, dog on far right. All positional descriptors are accurate. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a red barn in the foreground on the right, a windmill in the midground to the left, and snow-capped mountains in the background spanning the horizon. The spatial arrangement matches the prompt: barn (foreground, large), windmill (midground, medium), mountains (background, small). Depth perspective is consistent. All positional relationships are correctly represented. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image correctly shows a sandcastle in the foreground, a surfer riding waves in the midground, and a cruise ship on the horizon in the background. All objects are positioned as described: sandcastle closest to viewer, surfer further out on water, ship farthest away on horizon line. Sizes decrease appropriately with distance. Spatial relationships match the prompt exactly. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a park bench and pigeons in the foreground, a fountain in the midground, and skyscrapers in the background. The spatial arrangement matches the prompt: foreground elements are largest and closest, midground fountain is centered and smaller, background buildings are smallest and farthest. All positional relationships (on, under, left, right, above, below) are correctly represented. VERDICT: PASS
Scale & ProportionsClick to expand100%
Scale & Proportions measures whether local/flux-2-klein-9b keeps people, vehicles, buildings, and everyday objects at believable relative sizes.
PASSEvaluator (Qwen 3.5 122B)
The image shows a man standing next to a yellow school bus. The relative scale appears realistic — the man is appropriately sized compared to the bus, neither too large nor too small. His height relative to the bus’s door and windows matches real-world proportions. No obvious distortions or AI artifacts affect the size relationship. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The ladybug appears appropriately sized relative to the leaf it is resting on. The leaf shows detailed venation and texture consistent with a macro photograph, and the insect’s scale matches real-world proportions for a ladybug on a typical plant leaf. No anatomical or dimensional inconsistencies are visible. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The child’s hands are holding the basketball in a natural, proportionate manner. The ball appears appropriately sized relative to the child’s torso and arms — not unnaturally large or small. Fingers wrap around the ball realistically, and the overall scale matches real-world expectations for a young child holding a standard basketball. No anatomical or object-size distortions are evident. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a tiny figurine of a person standing on the rim of a coffee cup, looking down into the liquid. The size relationship is consistent with the prompt — the person is miniature relative to the cup, which is itself a normal-sized coffee cup on a saucer. The proportions are visually coherent and match the described scenario. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image depicts a giant tabby cat walking down a city street, towering over cars and buildings. The cat’s head reaches several stories high, consistent with the "kaiju" description. Its paws are larger than vehicles, and its body spans the width of the road. The surrounding skyscrapers appear appropriately scaled relative to the cat’s exaggerated size. All proportions align with the prompt’s intent. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a miniature house resting on an open human palm. The house is appropriately scaled to fit comfortably within the hand, with proportions that suggest it is a small model or dollhouse. The fingers and palm appear realistically sized relative to the house, maintaining believable spatial relationships. No obvious distortions or scale inconsistencies are present. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows an elephant, horse, dog, cat, and mouse arranged in a line from largest to smallest, which aligns with real-world proportional sizes. The elephant is largest, followed by the horse, then the dog, cat, and mouse — each progressively smaller as expected. While the scene is fantastical (these animals wouldn’t naturally stand together), the relative scaling between them is accurate. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a table setting with plates, cutlery, wine glasses, and shakers. The relative sizes appear mostly consistent — the wine glass is appropriately taller than the plates, cutlery is proportional to the plates, and salt/pepper shakers are smaller than the plates but larger than scattered peppercorns. However, the two knives placed side-by-side next to the foreground plate are nearly identical in size and shape, which is unusual for a standard place setting (typically one dinner knife
PASSEvaluator (Qwen 3.5 122B)
The image displays a farm scene with a barn, tractor, farmer, cow, and chickens. The relative sizes appear consistent: the barn is the largest structure, the tractor is appropriately sized relative to the farmer and cow, and the chickens are small in comparison to the other elements. The fence post is also proportionate to the farmer and animals. All objects maintain realistic scale relationships. VERDICT: PASS
Human realism 88%
Faces & ExpressionsClick to expand100%
Faces & Expressions measures whether local/flux-2-klein-9b creates realistic faces with the requested age, emotion, expression, and identity cues.
PASSEvaluator (Qwen 3 VL)
Face is symmetric with natural features: eyes, nose, and mouth are well-proportioned. Warm, genuine smile is clearly readable, with crinkled eyes and relaxed cheeks. No obvious distortions. Lighting and hair detail enhance realism. Meets prompt criteria for a natural, warm smile in a close-up portrait. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Face is symmetric with natural, correctly proportioned features. Expression conveys calm seriousness through neutral mouth, direct gaze, and subtle brow lines. No visible distortions. Lighting and detail are realistic. Meets prompt requirements. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The face is symmetric with correctly proportioned features. The teenager’s expression is neutral and relaxed, matching the prompt. No significant distortions are visible. Eyes, nose, and mouth are natural and well-defined. Lighting and focus are appropriate for a headshot. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The face shows clear disgust: nose wrinkled, lips curled, and brows furrowed. Features are symmetric and anatomically correct. No distortions; skin texture and fine details are natural. Emotion is unmistakably readable. Minor blemishes are present but don’t affect facial integrity. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The face is symmetric with correctly rendered features. The expression clearly conveys surprise: eyebrows are raised, eyes are wide, and mouth is slightly open. No distortions are present. The emotion is vivid and readable. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Face is symmetric with natural, correctly rendered features. The downcast eyes, slightly drawn mouth, and distant gaze clearly convey quiet melancholy. No significant distortions or artifacts. Lighting and texture are realistic. The emotion is unmistakably readable and well-executed. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Face is symmetric with natural, aged features including deep wrinkles and white hair. Warm, genuine smile with crinkled eyes clearly conveys kindness. No distortions; lighting and texture are realistic. Emotion and prompt details are accurately rendered. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The child’s face is symmetric with natural features: round cheeks, large eyes, and tears streaming down. The crying expression is clearly readable, with furrowed brows, parted lips, and visible tears. No distortions are present. Features align well with the prompt. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Face is symmetric with natural wrinkles and age spots. Thinning gray hair and reading glasses are accurate. Skeptical expression is clearly readable via furrowed brow and downturned mouth. No significant distortions. Features align well with prompt. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Faces are symmetric with clear, stylized features. Emotions are distinctly readable: left person laughs with head back, center looks shocked with wide eyes, right person smirks while rolling eyes. No significant distortions. Art style is consistent and expressive. All three expressions match the prompt accurately. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Faces show natural symmetry and correct anatomical features. Emotions are clearly readable: father beams, mother smiles gently, child grins excitedly. Teenager’s expression is ambiguous but not visibly distorted. All faces are well-defined without AI artifacts. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Faces are symmetric with correct anatomical features. Emotions are clearly readable: left man shows frustration (furrowed brow), center woman shows pleased (smiling, chin resting), right man shows confusion (tilted head, puzzled gaze). No distortions or unnatural artifacts. All three expressions match the prompt accurately. VERDICT: PASS
Full BodyClick to expand100%
Full Body measures whether local/flux-2-klein-9b renders full-body people with natural anatomy, clothing, pose, and proportions.
PASSEvaluator (Qwen 3 VL)
The woman stands upright with arms at her sides, wearing jeans and a white t-shirt as described. Limb count and proportions are anatomically correct. Pose is clearly recognizable as relaxed standing. No distortions or anomalies are present. All elements match the prompt precisely. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The man stands straight, facing forward, with hands in pockets. Limb count and proportions are anatomically correct. The pose is clearly recognizable as described. No distortions are present. All elements match the prompt precisely. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The person’s full body is visible from head to feet, standing in a neutral pose on a park path. Limb count and proportions are anatomically correct. The pose is recognizable as standing still, hands slightly curled at sides. No significant distortions are present. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The woman’s limbs are correctly proportioned and count is accurate. Her running pose is anatomically plausible, with one leg forward, the other back, arms bent and swinging. Hair and clothing show motion blur consistent with running. No distortions are visible. The action is clearly recognizable as mid-stride running. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The man’s limbs are correctly counted and proportioned. His pose—bent knees, arms out for balance—matches the described mid-air jump over a puddle. The action is clearly recognizable, with realistic motion blur and splash effects. No anatomical distortions are visible. The reflection in the puddle adds authenticity. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The dancer’s limbs are correctly proportioned and anatomically plausible. One leg is extended in a dynamic spin, the other supports weight en pointe. The dress flows outward as described. No distortions in body structure or pose are evident. The action is clearly recognizable as a spinning dance move. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The person has two legs, correct proportions, and performs the tree pose accurately: standing on one leg, the other foot pressed to the inner thigh, arms raised overhead in prayer position. No distortions or anatomical errors are visible. The pose is clearly recognizable and matches the prompt. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The climber has two arms and two legs, with proportions and pose consistent with realistic rock climbing. Limbs are extended to different holds as described. No anatomical distortions are visible. The action is clearly recognizable as dynamic climbing. All elements align with the prompt. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The cellist’s posture and limb count are anatomically correct. Legs are wrapped around the instrument, bow arm is extended, and the cello is properly supported. Proportions are natural, and the pose is clearly recognizable as a seated cello performance. No distortions are visible. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The dancers’ limbs are correctly counted and proportioned. The woman is dipped backward with intertwined legs, matching the prompt. The pose is anatomically plausible and recognizable as a tango move. No significant distortions are visible. The composition and lighting are natural. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Both players show correct limb count and proportions. The sliding player’s body posture and the jumping player’s airborne pose are anatomically plausible and match the described action. The ball is at the sliding player’s feet, and the dynamic interaction is recognizable. No distortions are present. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Both gymnasts show correct limb count and proportions. Their synchronized handstand pose is anatomically accurate, with legs straight, feet together, and hands supporting weight. Alignment is nearly identical, though slight variations in head tilt and arm positioning are natural. No distortions are visible. The action is clearly recognizable as a synchronized handstand. VERDICT: PASS
HandsClick to expand58%
Hands measures whether local/flux-2-klein-9b draws anatomically plausible hands, fingers, and gestures.
PASSEvaluator (Qwen 3.5 122B)
The left hand (viewer’s left) shows 4 fingers clearly; the thumb is partially obscured but likely present. The right hand (viewer’s right) also shows 4 fingers clearly; thumb is visible and separate. Both hands appear to have 5 digits anatomically, though one finger per hand is slightly hidden or blended due to angle/pose. No gross anomalies — grip is relaxed, fingers naturally spread as described. Core criteria met. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The woman’s hands are extended palms-up with all ten fingers clearly visible. Each hand has exactly five fingers: thumb, index, middle, ring, and pinky — all anatomically proportioned and correctly positioned. No extra, missing, or malformed digits. Gesture is natural and consistent with the prompt. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The left hand (on the right side of image) shows 4 fingers clearly — thumb is tucked or obscured. The right hand (on left side of image) also shows 4 fingers; thumb is not visible. Neither hand displays exactly 5 visible fingers. Grip is anatomically plausible for crossed arms, but finger count fails the prompt’s requirement. Anomaly: missing visible thumbs on both hands. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The right hand gripping the mug shows four fingers and a thumb, totaling five digits — anatomically correct. The left hand is not visible in the image. Since only one hand is shown and it meets the criteria of 5 fingers with proper grip, the core requirement is satisfied. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The right hand gripping the pen shows 5 fingers, but the thumb and index finger appear fused or malformed at the tip, and the pen grip is awkward. The left hand resting on the page also shows 5 fingers, but they are slightly distorted with unnatural spacing and curvature. While the count is technically 5 per hand, the anatomy and grip are not fully correct. VERDICT: FAIL
FAILEvaluator (Qwen 3.5 122B)
The right hand (strumming) shows only 4 fingers clearly, with the thumb partially visible — but the pinky appears fused or missing. The left hand (fretting) has 5 fingers, but the index finger is oddly bent and the ring/pinky fingers appear merged or distorted. Anatomical correctness fails on both hands due to unnatural finger positioning and possible fusion/omission. VERDICT: FAIL
FAILEvaluator (Qwen 3.5 122B)
The hand making the peace sign shows two fingers extended (index and middle) and three curled (thumb, ring, pinky) — anatomically correct for a right-hand peace gesture. However, the thumb is visibly curled inward but appears to be fused or malformed with the index finger base, lacking clear separation. The other fingers are proportionate and correctly positioned. No extra or missing fingers overall (5 total), but the thumb-index junction is anomalous. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image shows one hand making a thumbs-up gesture. The thumb is extended upward, and the other four fingers are curled into a fist — this matches the prompt exactly. Anatomically, the hand appears correct: five fingers total (thumb + four curled), proper knuckle alignment, natural skin texture, and no extra or missing digits. No anomalies detected. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows one hand with four fingers extended (thumb, index, middle, ring) and the pinky curled — not matching the prompt’s description of “index, middle, and ring fingers extended” for counting to three. The thumb is also extended, making it a four-finger gesture. Anatomically, the hand structure appears correct, but the gesture does not match the prompt. Only one hand is visible; no second hand to evaluate. Core criteria (counting to three with specified fingers) is not met. VERDICT: FA
PASSEvaluator (Qwen 3.5 122B)
Upon close examination of the handshake: - The left hand (from viewer’s perspective) shows 5 fingers clearly visible and correctly positioned. - The right hand also displays 5 fingers, with natural curvature and grip alignment consistent with a firm handshake. - No extra, missing, or malformed fingers are present. - Thumb placement and finger interlock appear anatomically plausible for a professional handshake. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows two hands meeting in a high-five gesture. The left hand (from viewer’s perspective) clearly has 5 fingers extended and anatomically correct. The right hand also displays 5 fingers, but the thumb is partially obscured behind the other hand — however, it is still visible at the edge. All fingers are extended as required. No anatomical anomalies or extra/missing digits are present. Gesture matches prompt. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows four hands interacting with a gift box. The left person’s hands (two visible) each show 5 fingers — thumbs and fingers are clearly defined and anatomically correct. The right person’s hands (two visible) also each show 5 fingers; grip is natural, no extra or missing digits. No anomalies detected. All hands appear realistic and correctly formed. VERDICT: PASS
Multi-SubjectClick to expand100%
Multi-Subject measures whether local/flux-2-klein-9b keeps multiple people distinct while preserving their requested attributes, outfits, and positions.
PASSEvaluator (Qwen 3 VL)
Left woman: short blonde hair, wearing red jacket over dark top. Right woman: long black hair, wearing blue dress. Both visually distinct and match prompt descriptions precisely. No missing or mismatched elements. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Both men are present and visually distinct. Left: bearded, wearing glasses and a green sweater. Right: clean-shaven, wearing a black hoodie. Features match prompt exactly. No visual artifacts or mismatches. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Tall person: curly red hair, denim overalls (over tank top). Short person: straight brown hair, yellow floral sundress. Both visually distinct and match descriptions precisely. No missing elements or mismatches. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
All four individuals are present and visually distinct. Left: bald man in suit. Next: woman with silver hair in red blouse. Center: young man with dreadlocks in denim jacket. Right: woman with hijab in green dress. Features match prompt exactly. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Left: Tall woman with short pink hair, playing bass guitar. Center: Man with beard and bandana, seated at drum kit. Right: Slim man with glasses, playing keyboard. All three are visually distinct and match described features. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Man: wears sunhat and backpack. Woman: has braided hair, uses hiking poles, wears backpack. Teenager: wears baseball cap, carries water bottle, wears backpack. All three are visually distinct and match described features precisely. VERDICT: PASS
Truthfulness 74%
PhotorealismClick to expand100%
Photorealism measures whether local/flux-2-klein-9b makes materials, surfaces, and real-world objects look visually plausible and distinct.
PASSEvaluator (Qwen 3.6 27B)
The image successfully renders the requested materials with high fidelity. The glass exhibits correct refraction and transparency, distorting the wood grain behind it. The silver spoon shows sharp, accurate specular reflections. The wooden table displays a convincing grain texture, and the linen napkin has a realistic woven fabric appearance. The apple’s skin reflects light naturally. All visual properties align with the prompt's requirements for accurate texture and light behavior. VERDICT: PA
PASSEvaluator (Qwen 3.6 27B)
The image successfully renders the requested materials with high fidelity. The laptop exhibits a convincing brushed aluminum finish with appropriate metallic reflections. The notebook displays realistic leather grain and texture. The ceramic mug has a matte, speckled surface consistent with pottery, while the glass paperweight demonstrates accurate refraction and transparency. The wooden desk shows natural grain patterns. All materials are visually distinct and realistic. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The image accurately renders the distinct material properties requested. The stainless steel pot exhibits sharp, high-contrast reflections consistent with polished metal. The glass bottle shows correct translucency and refraction of the olive oil within. The wooden cutting board displays natural grain and matte texture, while the cotton towel features soft folds and diffuse light absorption typical of fabric. All surfaces interact with light realistically. VERDICT: PASS
Physics & ReflectionsClick to expand92%
Physics & Reflections measures whether local/flux-2-klein-9b follows basic physical cues such as shadows, reflections, gravity, transparency, and contact points.
PASSEvaluator (Qwen 3.6 27B)
The image displays a red sphere on a white surface. The lighting originates from the upper left, evidenced by the bright specular highlight on the sphere's upper-left quadrant. Consequently, the shadow is cast to the lower right, which is physically consistent with the light source direction. The shadow’s shape and softness are realistic for a spherical object under directional lighting. The color and orientation are accurate. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The lighting is consistent with a strong light source from the upper right, casting a sharp, dark shadow to the left of the mug. The shadow’s shape accurately mirrors the mug’s form, including the handle. Reflections on the coffee surface and the glossy ceramic are natural. The window frame and outdoor scene show appropriate depth and lighting. No distortions or implausible elements detected. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The shadow is cast directly beneath the subject, consistent with a midday sun position. The shadow’s shape accurately mirrors the person’s stance and clothing, with no distortion or unnatural stretching. The darkness and soft edges are appropriate for bright outdoor lighting. No reflections are expected in this scene, and none appear. Overall, the shadow behavior is physically plausible and aligns with the prompt’s description. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The reflection in the mirror is physically plausible. It accurately depicts the woman’s back, matching her posture and outfit (beige top, dark jeans). The orientation is correctly reversed relative to the room’s layout — for example, the bed and window in the reflection align with expected spatial inversion. Lighting and shadows are consistent with ambient indoor light, and no distortion or color mismatch is evident. The mirror frame and floor perspective also support realism. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The reflection is physically plausible. It is correctly oriented vertically below the cat, matching its pose and position. The slight blurring and reduced contrast in the reflection accurately simulate the optical properties of a polished surface. The lighting direction is consistent, with highlights on the floor aligning with the light source. The colors are appropriately muted in the reflection compared to the real subject. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The reflection on the glass table is physically plausible. It correctly mirrors the vase and flowers with appropriate vertical inversion and slight distortion due to the glass surface. The reflection’s brightness and color match the original objects, and its position aligns with the light source and table geometry. No obvious artifacts or inconsistencies are present. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The reflections are vertically inverted, matching the trees' orientation. The color palette accurately mirrors the sunset sky and dark silhouettes. The water surface shows gentle ripples that distort the reflection slightly, consistent with a calm lake. The lighting and shadow directions are coherent with a sunset scene. No physical anomalies are present. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The reflection in the puddle is physically plausible. It correctly inverts the scene vertically, showing the buildings and sky upside down. The distortion is consistent with a shallow water surface, exhibiting slight waviness and blurring that mimics real-world refraction. The color shift is subtle but present, with the reflection appearing slightly darker and more muted than the direct view, which aligns with how light behaves when reflecting off water. The orientation matches the perspective o
PASSEvaluator (Qwen 3.6 27B)
The reflection is physically plausible, showing a clear inverted mirror image of the mountains. The orientation is correct, and the subtle muting of colors in the water compared to the sky and peaks aligns with realistic light absorption and scattering in water. The stillness of the lake allows for a sharp reflection, which is consistent with the calm surface depicted. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The image displays a glass sphere on a reflective surface with a clear, inverted reflection below it, consistent with optical physics. The refraction within the sphere correctly inverts the background scene. The lighting and shadows align logically with the light source. The water reflection in the foreground is coherent with the environment. All optical phenomena—refraction, reflection, and shadow—are physically plausible and accurately rendered. VERDICT: PASS
FAILEvaluator (Qwen 3.6 27B)
The reflections on the chrome sculpture are physically plausible, showing distorted environmental mapping of the windows and room. The reflection on the glass floor is correctly oriented and aligned with the sculpture above. However, the shadows are missing; there is no cast shadow from the sculpture onto the floor or surrounding surfaces, which is physically impossible given the strong lighting from the windows. This lack of grounding shadows makes the scene physically implausible. VERDICT: FA
PASSEvaluator (Qwen 3.6 27B)
The reflections and refractions are physically plausible. The marble countertop shows a clear, slightly distorted reflection of the glass stem and base, consistent with a polished surface. The glass itself exhibits realistic refraction, bending the background view through its curved shape and cut patterns. The lake’s surface reflects the sky and clouds naturally, with appropriate shimmering highlights. Shadows align with the light source from the window. No directional or color inconsistencies a
World KnowledgeClick to expand50%
World Knowledge measures whether local/flux-2-klein-9b depicts recognizable places, landmarks, flags, and known objects with accurate real-world details.
PASSEvaluator (Qwen 3.6 27B)
The image depicts the Eiffel Tower with high architectural accuracy. The iron lattice structure, the three distinct levels (including the top platform and antenna), and the arched base are all correctly rendered. The perspective is realistic, showing the tower against a clear blue sky with surrounding trees and buildings in the background. There are no visible distortions or structural anomalies. The subject is clearly recognizable and factually accurate. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The image depicts the Taj Mahal with high fidelity. The central dome, four minarets, and the symmetrical layout are architecturally accurate. The perspective from the reflecting pool is correct, showing the iconic reflection. The surrounding red sandstone structures and cypress trees are also present and correctly placed. The lighting and textures appear realistic, contributing to the overall authenticity of the scene. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The Statue of Liberty is clearly recognizable and accurately depicted with correct green patina, raised torch, spiked crown, and tablet. The pedestal and base structure align with real-world architecture. Minor details like the surrounding water, island vegetation, and visitor crowds enhance realism. No significant factual or architectural inaccuracies are present. VERDICT: PASS
FAILEvaluator (Qwen 3.6 27B)
The image depicts a shamisen, but it contains significant factual inaccuracies. The instrument shown has four tuning pegs and four strings, whereas a traditional shamisen is strictly a three-stringed instrument. Additionally, the bridge and tailpiece assembly appear structurally confused, with the bridge floating oddly and the tailpiece lacking the traditional frog-shaped design. These deviations from the standard three-stringed design make it factually inaccurate. VERDICT: FAIL
FAILEvaluator (Qwen 3.6 27B)
The image depicts a brass instrument that resembles an astrolabe but contains significant factual inaccuracies. The text and numbers are garbled, nonsensical, and inconsistent (e.g., mixed Roman/Arabic numerals, random letters like "S", "N", "W" placed illogically). The rete and tympan structures are visually confused and do not match historical astrolabe designs. These errors indicate AI generation rather than a real-world accurate depiction. VERDICT: FAIL
PASSEvaluator (Qwen 3.6 27B)
The image depicts a ceramic bowl with visible cracks repaired using gold lacquer, which is the defining characteristic of kintsugi. The subject is clearly recognizable as a Japanese kintsugi bowl. The lighting, texture, and perspective are realistic, and there are no architectural or factual inaccuracies in the representation of the object. VERDICT: PASS
FAILEvaluator (Qwen 3.6 27B)
The image depicts the water cycle, but the text labels are nonsensical gibberish (e.g., "Evapiration fexr," "Contrtiad," "Rennn"). This makes the diagram factually inaccurate and fails to meet the requirement for a "detailed diagram" with recognizable information. VERDICT: FAIL
FAILEvaluator (Qwen 3.6 27B)
The image depicts the correct geological structure of a convergent boundary, including subduction, a trench, and a volcanic arc. However, the text labels are nonsensical gibberish (e.g., "Wuame Nats," "Convergent Buiderly") rather than accurate scientific terminology. This indicates a failure in text rendering, making the diagram factually inaccurate as a labeled educational resource. VERDICT: FAIL
FAILEvaluator (Qwen 3.6 27B)
The image depicts a human heart, but the labels are nonsensical (e.g., "Aotaa," "Pminulke," "Maidto"), indicating a failure in text rendering. Anatomically, the illustration is also flawed: the aorta is incorrectly shown as a large, bulbous chamber on the left side rather than the central arching vessel. The pulmonary artery and veins are mislabeled and misplaced. The overall structure lacks the precision required for an accurate anatomical illustration. VERDICT: FAIL
FAILEvaluator (Qwen 3.6 27B)
The image depicts a butterfly with dorsal wings showing iridescent blue and brown edges, and ventral wings with brown patterns and eyespots. However, the Morpho menelaus is known for its entirely iridescent blue dorsal wings without the prominent brown borders seen here. Additionally, the ventral side of Morpho menelaus typically has a more uniform brown pattern with fewer distinct eyespots compared to what is shown. The body morphology also appears slightly off, with an unusual texture and colo
PASSEvaluator (Qwen 3.6 27B)
The image accurately depicts the interior of the Hagia Sophia, featuring the massive central dome, pendentives, semi-domes, and Islamic calligraphy medallions alongside Byzantine mosaics. The architectural elements are correctly represented, and the combination of Islamic and Byzantine art is evident. The details such as the mosaics, calligraphy, and structural features align with the real-world subject. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The image depicts a bismuth crystal with high fidelity. It accurately portrays the characteristic geometric hopper crystal formation, featuring the distinct "staircase" structure caused by faster growth at the edges. The iridescent oxide layers are rendered with realistic coloration and banding patterns typical of this mineral. The lighting and texture appear natural, making the subject easily recognizable and factually accurate. VERDICT: PASS
Professional Studio 93%
Camera & LightingClick to expand92%
Camera & Lighting measures whether local/flux-2-klein-9b follows specific lens, angle, focus, exposure, and lighting instructions.
PASSEvaluator (Gemma 4 26B)
The image successfully implements the requested camera effects. It features a wide-angle perspective that emphasizes the expansive space of the modern living room. Most notably, it includes a heavy circular vignette/distortion effect at the edges, which mimics the "slight barrel distortion" requested (though the effect is quite pronounced, it fulfills the intent). The lighting is bright and even, consistent with a modern interior. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image perfectly matches the prompt. It features a close-up macro shot of a single, prominent dewdrop on a red rose petal. The depth of field is extremely shallow, with the central droplet in sharp focus while the surrounding petal and other smaller droplets are softly blurred. The lighting is natural and highlights the transparency and refractive properties of the water. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image perfectly matches the prompt. It features a true aerial bird's-eye view looking straight down at a traffic roundabout. The perspective is flat and top-down, with no vanishing point distortion. The depth of field is consistent with a high-altitude drone shot, keeping the entire scene in sharp focus. The lighting is even and natural, consistent with daylight. VERDICT: PASS
FAILEvaluator (Gemma 4 26B)
The image features a portrait with a moody atmosphere and a shallow depth of field. However, it fails the specific lighting requirement. Rembrandt lighting is characterized by a distinct triangle of light on the shadowed cheek. In this image, the light is a broad, harsh side-light that illuminates a large portion of the face, lacking the characteristic small triangle of light on the dark side. VERDICT: FAIL
PASSEvaluator (Gemma 4 26B)
The image perfectly matches the prompt. The lighting is a rich golden hour glow with a vibrant orange sky. The sun is low on the horizon, creating long, soft shadows across the rolling hills and foreground vegetation. The perspective provides a deep sense of scale, and the lighting pattern is consistent with a sunset/sunrise scenario, featuring a soft glow and warm atmospheric haze. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image perfectly matches the prompt. It features a high-key portrait with a bright white background and a very bright, luminous quality. The lighting is soft and even, resulting in minimal shadows on the subject's face. The perspective is a standard portrait shot, and the overall aesthetic is airy and bright as requested. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image perfectly matches the prompt. The perspective is a classic portrait shot. The depth of field is shallow, creating a creamy bokeh background with large, soft light orbs. Most importantly, the lighting features a strong, warm rim light (backlighting) that creates a glowing halo around the subject's hair and shoulders, effectively separating her from the blurred garden background. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image perfectly matches the prompt's technical requirements. The subject is sharp and in focus, while the background features a shallow depth of field with beautiful, soft bokeh from the city lights. The perspective is consistent with a 50mm lens, and the lighting creates a realistic street photography atmosphere with a clear distinction between the sharp subject and the blurred urban environment. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image matches all specified criteria. The perspective is a top-down shot (shot from above). The lighting is natural and directional, coming from the left side, creating soft shadows on the right. The color temperature is warm, and there is a subtle vignette around the edges. The depth of field is consistent with a 35mm f/2.8 setting, with a gentle blur on the edges of the bowl and the wooden background. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image perfectly captures the tilt-shift effect. It features a distinct selective focus strip where the middle ground is sharp, while the foreground and background are heavily blurred, creating a miniature model aesthetic. The colors are highly saturated, and the high-angle perspective enhances the "toy-like" appearance of the city street. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image perfectly matches the prompt. It features a dark, moody scene with a cinematic quality. Most importantly, it displays prominent horizontal anamorphic lens flares in both blue and orange, streaking across the center of the frame. The lighting and perspective align with the requested cinematic aesthetic. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image perfectly matches the prompt's requirements. It features a close-up portrait with dramatic split lighting. One side of the face is brightly illuminated, highlighting skin texture and detail, while the other side is cast in deep, heavy shadow. This creates a high-contrast chiaroscuro effect against a black background. The perspective is a direct, intimate portrait, and the depth of field is shallow, focusing sharply on the eye and facial features. VERDICT: PASS
Color PrecisionClick to expand92%
Color Precision measures whether local/flux-2-klein-9b renders requested colors clearly and consistently across subjects, products, and backgrounds.
PASSEvaluator (Gemma 4 26B)
The image features a bright, vivid red sports car parked on a gray asphalt road. The red color of the car is highly saturated, intense, and unambiguous, exactly as described in the prompt. The gray of the road is also clearly visible. All color requirements are perfectly met. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image features a woman wearing a vibrant, deep blue evening dress. The color of the dress is a true royal blue, and the background is a solid, neutral beige/tan color. Both the dress color and the background color strictly adhere to the prompt's specifications. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image perfectly matches the color requirements specified in the prompt. The flowers are a vibrant, pure sunflower yellow, and the sky is a clear, bright blue. There are no deviations in the color palette from the requested "sunflower yellow" and "clear blue." VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image features a bedroom with a prominent accent wall in a vibrant burnt orange color. The color closely matches the hex code #FF6B35, which is a saturated, warm orange-red. The furniture, including the bed frame, nightstands, and lampshades, is white as requested. The lighting is bright and natural, consistent with the "natural daylight" requirement. VERDICT: PASS
FAILEvaluator (Gemma 4 26B)
The foliage in the image consists of various shades of green, ranging from very light, pale mint/teal in the background to darker, more saturated greens in the foreground. The hex code #2D5F2D represents a specific, dark, muted forest green. While the image contains dark green elements, it also features many lighter, brighter, and more vibrant green tones that do not match the specific, singular color requirement of the prompt. VERDICT: FAIL
PASSEvaluator (Gemma 4 26B)
The image accurately reflects all color specifications. The walls are a medium blue that closely matches the hex code #4A90D9. The floor is a warm-toned wood, and the baseboard/ceiling trim is white. All elements of the color prompt are present and correctly rendered. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image accurately reflects the specified colors: - The walls are a warm cream/off-white, matching #F5E6CC. - The sofa is a dark navy blue, matching #2C3E50. - The accent pillows are a bright, vibrant red, matching #E74C3C. All color requirements from the prompt are met precisely. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image matches the color requirements. The background is a deep dark blue consistent with #1A1A2E. The watch features a soft gold/champagne color (#F0C27F) on the face and a tan strap. The earbuds and their charging case are pure white (#FFFFFF). All specified colors are accurately represented in the flat-lay composition. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image accurately follows the color specifications. The background is a near-white (#FAFAFA), the main central shape is a vibrant orange-red (#FF4500), and the thin accent lines are a dark charcoal color (#333333). All three requested colors are clearly visible and match the prompt's descriptions. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image displays a smooth horizontal gradient. On the far left, the color is a vibrant red (#FF0000). As it moves toward the right, it transitions through a deep purple/magenta middle section and ends in a bright blue (#0000FF) on the far right. The transition is even and matches the specified color sequence and hex values perfectly. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image features a vertical gradient. The bottom color is a bright, saturated orange, which aligns well with the requested #FF6B35. The top of the image transitions into a very dark, near-black shade. While the prompt requested a deep indigo (#1B1464), the top of the image appears almost black, which is a common way for AI to interpret very dark indigo in a gradient. The overall color progression matches the intent. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image shows a horizontal banner with a gradient. On the left, the color is a bright cyan, which matches the description of #00C9FF. On the right, the color is a light, pale mint green, which matches the description of #92FE9D. The transition between the two colors is a smooth gradient as requested. VERDICT: PASS
PhotorealismClick to expand100%
Photorealism measures whether local/flux-2-klein-9b produces polished product and studio imagery with realistic materials, lighting, shadows, and finish.
PASSEvaluator (Gemma 4 26B)
The image successfully achieves the studio-quality look requested. The lighting is soft and consistent, creating a natural, gentle shadow beneath the shoe. The textures of the leather and laces are sharp and realistic. There are no obvious AI artifacts, such as warped logos, nonsensical stitching, or melting shapes. The white background is clean, and the overall composition is professional and high-quality. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image successfully captures the requested aesthetic. The glass textures, liquid transparency, and reflections on the black surface are highly realistic. There are no obvious AI artifacts; the geometry of the bottle and the way the straw/tube behaves within the liquid look physically accurate. The lighting is clean and professional, meeting the "studio-quality" requirement perfectly. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image successfully meets the prompt requirements. The green apple features realistic textures, including subtle lenticels and a natural-looking stem. The lighting is soft and directional, creating a realistic highlight on the skin and a soft, natural shadow on the white background. There are no obvious AI artifacts, such as warping or unnatural textures, and it successfully mimics a high-quality studio product photograph. VERDICT: PASS
Graphical design 58%
Data VisualisationClick to expand0%
Data Visualisation measures whether local/flux-2-klein-9b produces charts with correct labels, proportions, colors, and readable values.
FAILEvaluator (Qwen 3 VL)
The chart mislabels months (e.g., “Jarry” for January, “Jule” for June) and shows incorrect values (29K instead of 25K). The y-axis label is garbled (“Aaunegauty”), and the legend (“Salles”) doesn’t match the chart’s color scheme. The requested values and labels are not correctly represented. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
The chart is a pie chart with correct percentages (35%, 25%, 20%, 20%) and labels. However, “Company B” is mislabeled as “Ompany C” in its slice, and “Company C” is incorrectly labeled as “Company C 20%” in a slice that visually belongs to “Others.” The chart type matches the request, but critical labeling errors invalidate accuracy. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
The chart shows a line graph titled "Weekly Traffic" with grid lines and axis labels. However, the x-axis labels are incorrect (showing "Daik" and repeated values like 1800, 2000), and the y-axis values are misaligned with the data points. The requested data sequence (1200, 1500, 1100, 1800, 2000, 1700, 1400) is not accurately represented. VERDICT: FAIL
Layout & DesignClick to expand44%
Layout & Design measures whether local/flux-2-klein-9b organizes posters, menus, covers, and other designed compositions with the requested hierarchy and elements.
PASSEvaluator (Qwen 3 VL)
The image features a large popcorn bucket centered, with "MOVIE NIGHT" in bold yellow at the top and "Every Friday at 8PM" in white at the bottom. Text is readable and correctly placed per the prompt. Visual hierarchy is clear, with the popcorn as the focal point and text anchoring top and bottom. All required layout elements are present and accurately rendered. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The poster features a guitar illustration integrated into the headline "LIVE MUSIC FEST," with "Tickets available now" correctly placed below. Text is readable and properly sized, maintaining visual hierarchy. All required elements are present and correctly positioned. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The poster features an open book illustration centered below the title "JOIN THE BOOK CLUB," which is prominently displayed in large, bold text. The tagline "Meets every Wednesday" is clearly visible and correctly placed beneath the book. Visual hierarchy is maintained with appropriate sizing and positioning. All required elements are present and legible. VERDICT: PASS
FAILEvaluator (Qwen 3 VL)
The image shows a hero header with a person using a laptop, followed by a headline, subtitle, and “Sign Up” button — all correctly placed. The footer contains three social media icons. However, the text is gibberish (placeholder), making it unreadable and violating the prompt’s requirement for readable text. Layout structure is correct, but text fails readability. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
The image shows a newsletter layout with a header banner, two-column body (text left, image right), a highlighted quote block, and a footer unsubscribe link. However, all text is gibberish placeholder text, making it unreadable. The visual hierarchy is present but meaningless due to illegible content. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
The image shows a menu with a logo header, three category sections (Appetizers, Mains, Desserts), and a footer with address/hours. However, all text is gibberish, unreadable placeholder text, not actual menu items or correct information. The layout structure is present but content is invalid. VERDICT: FAIL
PASSEvaluator (Qwen 3 VL)
The left page features a large hero photo of a woman. The right page has a two-column article, a pull quote (“Pullbäckan tänd...”), a sidebar with “Ramme för identitet” and author bio, and page numbers (13, 14). Text is readable and correctly placed per layout. All described zones are present with appropriate visual hierarchy. VERDICT: PASS
FAILEvaluator (Qwen 3 VL)
The UI includes a top navigation bar, search field, 2x2 grid of feature cards with icons, recent activity list, and bottom tab bar. Text is present but mostly garbled and unreadable. Layout hierarchy is structurally correct, but illegible text violates core usability requirements. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
Left page has full-bleed photo; right page shows three-column layout with headline, body copy, inset image, and page numbers. Text is present but illegible due to gibberish placeholder characters. Visual hierarchy is structurally correct, but readability fails. Layout elements are present as described. VERDICT: FAIL
Style DiversityClick to expand83%
Style Diversity measures whether local/flux-2-klein-9b can shift between visual styles like oil painting, pixel art, watercolor, and technical illustration.
PASSEvaluator (Qwen 3.5 122B)
The image clearly depicts a golden retriever in a garden setting. The style is unmistakably that of an oil painting, featuring thick, visible brushstrokes (impasto technique), rich color blending, and textured surfaces that mimic canvas work. The lighting and composition further enhance the painterly aesthetic. All key visual characteristics requested are present and well-executed. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image clearly exhibits the requested Japanese anime style. Key characteristics are present: the golden retriever has large, expressive eyes with prominent highlights, simplified facial features, and clean line art. The colors are flat and vibrant without complex shading gradients, consistent with anime aesthetics. The overall composition and rendering match the prompt’s stylistic requirements. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image clearly depicts a golden retriever in a garden setting. The pixel art style is unmistakable, with visible pixelation and blocky forms throughout. However, the grid appears much finer than 32x32 — likely 64x64 or higher — and the color palette is not particularly limited, showing many shades of green, pink, yellow, and brown. While the core style is recognizable, the specific technical constraints (32x32 grid, limited palette) are not accurately met. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image clearly exhibits Art Deco characteristics: strong geometric shapes, ornamental symmetry, and a gold/black palette. The buildings feature stepped forms and stylized patterns typical of the style. The central emblem, decorative border, and streamlined vehicles reinforce the aesthetic. While some text elements appear nonsensical (common in AI art), the core visual style is accurately rendered. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image clearly exhibits Impressionist characteristics: soft, visible brushstrokes throughout the foliage and dress; dappled sunlight filtering through trees onto the subject and ground; and a pastel palette dominated by pinks, yellows, and muted greens. The overall effect is painterly and atmospheric, matching the requested style precisely. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image clearly exhibits Pop Art characteristics: bold black outlines define the figure, flat vivid colors (yellow skin, blue eyes, magenta/purple hair highlights) dominate, and Ben-Day dots are prominently used for shading and texture across the face and background. The composition is stylized and graphic, matching the requested aesthetic precisely. All key visual elements of the specified style are present and well-executed. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image clearly exhibits watercolor characteristics: visible paper texture, soft color bleeds, and translucent layering typical of the medium. However, it lacks the "photorealistic" detail and lighting requested. The rendering is stylized and illustrative rather than photo-realistic. While the watercolor style is present, the core requirement of combining it with photorealism is not met. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image successfully combines cyberpunk and Art Nouveau styles. The central figure and background feature neon colors, futuristic tech elements, and a cyberpunk cityscape. The composition is framed by distinct flowing organic lines and prominent floral borders, fulfilling the Art Nouveau requirement. The requested style is clearly recognizable with all key visual characteristics present. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image successfully combines ukiyo-e woodblock print aesthetics (flat colors, stylized clouds, Japanese text, textured paper effect) with a photorealistic sky gradient and lighting. The castle architecture blends medieval European elements with Japanese roof styles, fitting the hybrid prompt. Key visual characteristics of both styles are clearly present and well-integrated. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image clearly exhibits soft brutalism: raw concrete walls and ceiling with visible formwork marks, paired with rounded, pastel-colored furniture in muted pink and green tones. Warm diffused lighting enhances the cozy yet industrial atmosphere. All key visual characteristics from the prompt are present and well-executed. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image successfully embodies the requested style of "anxious minimalism." The composition is sparse, featuring only a cup, a small fruit, and a paper on a table against a vast, empty wall. The negative space dominates the upper two-thirds, creating the requested sense of unease or tension. The color palette is muted (grays, beiges, soft browns), and the brushwork appears deliberate yet slightly raw, enhancing the subtle visual tension. The key characteristics are clearly present. VERDICT: PA
PASSEvaluator (Qwen 3.5 122B)
The image successfully embodies the requested "nostalgic futurism" style. Key characteristics are clearly present: the architecture features classic 1960s space-age elements like saucer-shaped structures and sleek spires. The color palette is dominated by a warm, sepia-toned patina that evokes nostalgia. Additionally, the inclusion of multiple moons, flying saucers, and the vintage paper texture overlay perfectly aligns with the retro-futuristic aesthetic. VERDICT: PASS