ImageBench V1 —
192 evaluations across 6 categories
Benchmark V1 verdicts are produced by VLM judges and can contain mistakes. Treat PASS/FAIL labels as machine-assisted assessments, and inspect the images yourself. Learn more about the methodology.

Prompt: The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink
Text Rendering 80%
Typography StyleClick to expand100%
Typography Style measures whether local/krea-2-raw preserves requested lettering styles, font mood, and text texture such as typewriter ink or neon signage.
PASSEvaluator (Qwen 3 VL)
Text reads “CHAPTER ONE” correctly. Font mimics vintage typewriter style with uneven, slightly smudged ink strokes and distressed edges. Background shows aged paper texture. All elements align with prompt’s medium and spelling requirements. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Text reads "TODAY'S SPECIAL" correctly. Letters show convincing chalk texture and smudges on dark green board, matching the "white chalk on chalkboard" prompt. Chalk dust effect is implied by texture. No obvious artifacts or mismatched medium. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The word "OPEN" is rendered in glowing red pixel-style letters, matching an LED display. Letters are clearly formed with dot-matrix precision, and spelling is correct. The style convincingly emulates a classic LED sign with bright, individual pixels and ambient glow. VERDICT: PASS
Writing accuracyClick to expand75%
Writing accuracy measures whether local/krea-2-raw renders the exact requested words, numbers, labels, and short phrases without spelling or layout errors.
PASSEvaluator (Qwen 3 VL)
The word "HELLO" is fully spelled correctly and is perfectly legible against the white background. All five letters are intact, clear, and properly formed. No misspellings, missing letters, or illegible parts are present. The text matches the intended specification exactly. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The word "ADVENTURE" is spelled correctly and fully readable in bold, centered text on a plain cream background. All letters are distinct and legible. No misspellings or missing words. The visual matches the intended specification exactly. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The word "SERENITY" is spelled correctly and is fully readable, centered on a solid light blue background. All letters are distinct and legible. No misspellings or missing parts. The text matches the intended specification exactly. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Text reads: "The quick brown fox jumps high". All words are spelled correctly and fully readable against the white background. No misspellings or illegible parts. Matches the intended text exactly. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Text reads: "Every sunset brings a new dawn ahead". All words are spelled correctly and fully legible against the pale yellow background. No misspellings or missing words detected. Font is clear and contrast is sufficient for readability. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Text reads: "Stars shine brightest in the dark night" — matches intended phrase exactly. All words are spelled correctly and fully legible against the clean off-white background. No misspellings or missing words detected. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
All text is spelled correctly and fully readable. Headline: "SUMMER SALE". Body: "Up to 50% off all items this weekend only". Fine print: "Terms and conditions apply. See store for details." No misspellings or illegible parts detected. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
All text matches the intended message exactly. "GRAND OPENING" is correct. Subtitle "Join us for food, music, and fun" is fully readable. Disclaimer "RSVP required. Limited seating available." is accurate. No misspellings, missing words, or illegible parts detected. Text is clear and correctly rendered. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
All text matches the intended content exactly: "ANNUAL GALA 2025", "An evening of fine dining and live jazz", and "Black tie optional. Doors open at 6 PM." Every word is spelled correctly and fully readable in all three text blocks. No misspellings or illegible parts detected. VERDICT: PASS
FAILEvaluator (Qwen 3 VL)
The tagline and title are correct. Credits block has multiple misspellings and garbled text (e.g., “Ma aramouns GAETV V”, “Thiula GTrisicı”). “Music by Thomas Grant” is missing. “Paramount Pictures Release” is partially illegible. Core text is not fully readable or accurate. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
All intended text is rendered as gibberish or distorted, not readable. “Cambbida” is misspelled. The barcode area shows “Issue 247 | March 2025 | $6.99” correctly, but the surrounding text is illegible nonsense. No actual words match the intended text. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
“Unfliltered” and “Northweist” are misspellings. “preserivatives” is misspelled. “Northwest” and “preservatives” are correct in the description but misspelled in the text. All other text matches. Key errors prevent full accuracy. VERDICT: FAIL
Spatial Reasoning 77%
Attributes BindingClick to expand89%
Attributes Binding measures whether local/krea-2-raw attaches the right colors, materials, and properties to the correct requested objects.
PASSEvaluator (Qwen 3.5 122B)
The image shows two objects on a white surface: 1. A red sphere (ball) — correct color and shape. 2. A blue cube — correct color and shape. Both are side by side, as requested. No attribute swaps: red is on the ball, blue on the cube; shapes match descriptions. Size is not specified in prompt, so irrelevant. All core attributes correctly bound. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a green apple and an orange pumpkin on a wooden cutting board. The apple is correctly green, round, and has a stem. The pumpkin is orange, round, and has a stem. All attributes (color, shape) are correctly bound to the respective objects with no swaps. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a yellow taxi and a black limousine parked side by side on a street. The taxi is correctly colored yellow with typical taxi markings, including a "T" logo and checkered patterns. The limousine is black, elongated, and has the characteristic shape of a limo. Both vehicles are appropriately sized and shaped for their types. All attributes are correctly bound to the respective objects. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains: - A large red circle (correct color, size, shape) - A small blue triangle (correct color, size, shape) - A medium green star (correct color, size, shape) All attributes are correctly bound to their respective objects with no swaps. The arrangement on a white canvas is also accurate. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows three candles on a shelf: 1. Left: Tall, thin, white candle — matches description. 2. Center: Short, fat, red candle — matches description. 3. Right: Medium height, spiral-shaped, yellow candle — matches description. All attributes (color, size, shape) are correctly bound to each object with no swaps. The arrangement and visual properties align precisely with the prompt. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a big brown horse, a small white rabbit, and a medium-sized black cat in a meadow. All attributes (color, size, shape) are correctly bound to the right objects with no swaps. The horse is indeed big and brown, the rabbit is small and white, and the cat is medium-sized and black. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains a pink fire truck with an extended ladder, a large blue banana positioned in front of it, and a green sun in the sky. All objects match their requested colors and shapes: the fire truck is pink, the banana is blue, and the sun is green. The surreal landscape is also present with orange grass and a light blue sky. No attribute swaps are observed. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image contains a purple carrot with green leaves, a red tree with red leaves, and a white round object (possibly a melon or pumpkin) next to a sliced watermelon that is red with black seeds and green rind. The watermelon is not white — it’s red inside, which contradicts the prompt’s “white watermelon” requirement. The white object is separate and not identified as a watermelon. Thus, attribute binding is incorrect for the watermelon. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image contains three objects on a white surface: an orange elephant-shaped object, a yellow penguin-shaped object, and a silver strawberry-shaped object. All colors and shapes match the prompt’s description exactly — orange for the elephant, yellow for the penguin, silver for the strawberry. No attributes are swapped or misassigned. The arrangement is on a table-like surface with background context suggesting a dining setting. VERDICT: PASS
CompositionalityClick to expand78%
Compositionality measures whether local/krea-2-raw combines subjects, actions, outfits, props, and scene details into one coherent requested image.
PASSEvaluator (Qwen 3.5 122B)
The image contains all described elements: 1. Tabby cat: Present (brown/black striped fur pattern visible) 2. Oversized red sunglasses: Present (large red frames covering eyes) 3. Coherent combination: Present (glasses sit naturally on cat's face) All components are accurately rendered and logically integrated. The cat's anatomy, fur texture, and accessory placement appear realistic and well-composed. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows a golden retriever with a straw cowboy hat on its head. The dog’s fur is golden, and it has a happy expression with tongue out. The hat is proportionally large relative to the dog’s head — not “tiny” as specified in the prompt. While all core elements (golden retriever, cowboy hat) are present and coherently combined, the size descriptor “tiny” is inaccurate. Since the prompt explicitly includes “tiny,” and that detail is not met, the full description is not fulfilled. VERDICT:
PASSEvaluator (Qwen 3.5 122B)
The image contains: - A penguin: Present (black and white body, yellow beak/feet, purple eye) - Holding: Present (right flipper grips umbrella handle) - Colorful umbrella: Present (rainbow-colored panels with gradient effects) All elements are coherently combined — the penguin is upright, holding the umbrella naturally over its head. The colors are vivid and match the “colorful” descriptor. No missing or contradictory elements. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a figure in full medieval armor, including helmet, chainmail, and plate armor, seated at a wooden desk. The knight’s gloved hands are positioned on a modern laptop keyboard as if typing. All described elements — medieval knight, full armor, desk, modern laptop, typing action — are present and coherently combined. The juxtaposition is clear and visually consistent. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains all required elements: 1. Roman gladiator - Present (helmet with red crest, armor, cape) 2. Selfie with smartphone - Present (holding phone in selfie position) 3. Colosseum - Present (ancient amphitheater structure in background) All elements are coherently combined in a single scene. The gladiator is positioned in front of the Colosseum while taking a selfie with a modern smartphone, creating the requested anachronistic combination. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains all required elements: 1. Astronaut in a spacesuit - Present (white suit with helmet, American flag patch, life support pack) 2. Riding a bicycle - Present (blue bicycle being ridden) 3. Through a park - Present (trees with autumn leaves, grass, paved path) All elements are coherently combined in a single scene. The astronaut is actively riding the bicycle on a park path with appropriate environmental context. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains all described elements: - Octopus: Present (orange, with tentacles) - Top hat: Present (black, on octopus head) - Monocle: Present (on octopus eye) - Chess game: Present (board and pieces on table) - Moon surface: Present (cratered gray terrain) - Earth in background: Present (large blue planet) All elements are coherently combined in a surreal but visually consistent composition. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains all described elements: - Dinosaur: Present (green T-Rex head/body) - Business suit: Present (blue suit with red tie) - PowerPoint presentation: Present (screen with text behind dinosaur) - Modern office: Present (glass windows, desk, laptop) - Drinking coffee: Present (holding white coffee cup) All elements are coherently combined in a single scene. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
Let’s examine each element from the prompt: 1. Robot dressed as a pirate — ✅ Present (wearing pirate hat with skull, red vest, striped shirt). 2. Sailing a paper boat — ✅ Present (white origami-style boat on water). 3. On a lake of lava — ❌ Not present — it’s a calm blue lake with green islands and trees; no lava. 4. Juggling three kittens — ⚠️ Partially present — robot is holding/tossing three kittens, but they’re floating in air around him, not clearly “juggling” in motion; still, visually ac
CountingClick to expand67%
Counting measures whether local/krea-2-raw creates the exact number of requested objects without adding or missing items.
PASSEvaluator (Qwen 3.5 122B)
The image displays exactly three red apples positioned on a light-colored wooden surface. The apples are arranged in a triangular formation, with one in the foreground left, one in the foreground right, and one slightly behind in the center. All visual elements match the prompt’s request precisely. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows exactly five yellow rubber ducks arranged in a row on a wooden bathroom shelf. Each duck is identical in appearance, with red beaks and black eyes. The count matches the prompt’s request precisely: five ducks, no more, no less. All are positioned as described. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image displays exactly four white candles, each with a lit flame, standing upright on a dark surface against a black background. The count matches the prompt’s request precisely: four candles, white in color, standing on a dark surface. No extra or missing objects are present. All visual elements align with the specified criteria. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows 1 orange cat and 3 black dogs sitting on grass. The prompt requested 2 orange cats and 3 black dogs. The dog count matches, but the cat count is off by one. Since the core requirement of exact object counts is not met, this fails the specification. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image shows a glass vase on a table containing flowers. Counting the blooms: there are 4 red roses and 2 white daisies, which exactly matches the requested composition. The stems and leaves are visible in the clear vase filled with water. All specified objects are present in the correct quantities. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows four butterflies total: one blue butterfly and three yellow butterflies. They are all resting on a dark brown garden fence with vertical slats. The count matches the prompt exactly — 1 blue and 3 yellow butterflies. All are clearly visible and correctly positioned on the fence structure. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows a round fishbowl with multiple goldfish. Counting carefully: there are 8 clearly visible goldfish. One is partially obscured behind another near the center, but even counting that, it totals 8 — not 9. The prompt requested exactly 9 goldfish. Since the count does not match, the core criterion fails. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image shows a fruit bowl with 3 bananas, 4 oranges, and 2 apples — matching the requested counts exactly. All objects are clearly visible and correctly categorized. No extra or missing items. The composition aligns precisely with the prompt’s specifications. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows a cluster of colorful balloons against a blue sky. Counting carefully: purple (1), green (1), red (1), blue (2), yellow (1), pink (1), orange (1), and one partially hidden behind the red balloon — that’s 9 clearly visible, plus at least 1 more peeking out, totaling 10. The prompt requested exactly 11. Since the count does not match, this fails the requirement. VERDICT: FAIL
NegationClick to expand56%
Negation measures whether local/krea-2-raw respects what should be absent from the image, not only what should be present.
PASSEvaluator (Qwen 3.5 122B)
The image depicts a serene tropical beach scene with golden sand, turquoise water, and palm fronds framing the top against a clear blue sky. There are no people visible anywhere in the frame — no figures on the beach, in the water, or under the palms. The exclusion criterion (“no people”) is fully satisfied. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a white and gray cat lying on a beige couch. The cat has no collar around its neck — the fur is uninterrupted from head to body. No collar, leash, or accessory is visible. The exclusion criterion (“without a collar”) is fully satisfied. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows an empty classroom with rows of wooden desks and chairs facing a large green chalkboard at the front. There are no students visible anywhere in the room. The space is completely unoccupied by people, matching the exclusion criteria. Lighting fixtures hang from the ceiling, and there’s a podium centered before the board. All elements align with the prompt’s constraints. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a sandy beach scene with two palm trees, a beach umbrella, and several seashells scattered on the sand. The sky is clear blue, and there is no visible water or ocean in the frame — the horizon fades into sky without any sea line. All excluded elements (water/ocean) are absent as required. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows an office workspace with a desk, two chairs, a bookshelf, and a lamp. However, on the desk, there is a black object that strongly resembles a computer monitor or screen — it has a rectangular shape, a stand, and appears to be electronic equipment typically associated with computers. This violates the explicit exclusion in the prompt. VERDICT: FAIL
FAILEvaluator (Qwen 3.5 122B)
The image shows a kitchen scene with pots, pans, a stove, and a cutting board — all as described. However, on the left side of the counter, there is a glass bottle containing what appears to be dried beans or grains — which qualifies as “food.” Since the prompt explicitly excluded food, and this element is present, the exclusion criterion is violated. VERDICT: FAIL
FAILEvaluator (Qwen 3.5 122B)
The image shows a pepperoni pizza with tomato sauce, basil leaves, and a golden crust. However, there is clearly melted cheese visible beneath the pepperoni and around the edges — it’s orange-yellow, bubbly, and integrated into the sauce layer. Since the prompt explicitly forbade cheese, and cheese is present, the exclusion criterion is violated. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image shows a 3D model of a car body with windows, doors, and headlights clearly visible. The wheels are absent — only the wheel wells are present, with no tires or rims attached. This matches the exclusion criteria: “no wheels” is satisfied. The car body is rendered in grayscale against a neutral background, consistent with a clay render or untextured 3D model. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image clearly shows a human face with eyes, nose, and mouth — as expected. However, the prompt explicitly excluded “eyebrows,” and in this image, well-defined, dark eyebrows are prominently visible above each eye. Since the excluded element (eyebrows) is present, the image fails to meet the exclusion criterion. VERDICT: FAIL
Relative PositionClick to expand100%
Relative Position measures whether local/krea-2-raw places objects in the requested spatial relationship, such as above, below, inside, or next to another object.
PASSEvaluator (Qwen 3.5 122B)
The cat is clearly positioned on top of the cardboard box, with its paws resting on the upper surface. The box supports the cat from below, and there are no conflicting spatial relationships. All described positional relationships match the visual content. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a red ball positioned directly beneath the seat of a wooden chair, resting on a rug between the chair legs. The spatial relationship “underneath” is accurately depicted — the ball is clearly below the chair’s structure and contained within its footprint. All described positional relationships are correct. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a coffee mug positioned to the right of an open book, both resting on a flat surface that appears to be a desk. The book is in the foreground, slightly angled, with text visible but blurred. The mug sits behind and to the right of the book’s open pages. All spatial relationships described — “next to,” “on a desk” — are accurately represented. No positional errors detected. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a bird perched on a branch that extends from a tree. The tree is positioned next to a park bench, which is visible in the background to the left. All described spatial relationships are accurate: the bird is on the branch, the branch extends from the tree, and the tree is next to the bench. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a cat sitting on a white pillow, which is placed on a wooden chair. The spatial relationships described in the prompt are accurately represented: the cat is on the pillow, and the pillow is on the chair. All positional relationships are correct. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a candle placed directly on top of a stack of four books. To the left of the book stack is a potted plant with green leaves in a black pot. All specified spatial relationships — “candle on top of books” and “books next to potted plant” — are accurately represented. The arrangement matches the prompt’s description precisely. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows five chess pieces arranged on a 3x3 grid. The black king is in the center square. To its left is a black queen. To its right is a white bishop. Above the king is a black rook. Below the king is a white knight. All spatial relationships described in the prompt — left, right, above, below, and center — are accurately represented in the image. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a top-down view of a desk. The laptop is centered. A coffee mug is to the left of the laptop. A phone is to the right. A lamp is positioned above (top-right corner). A notebook is below (bottom center). All specified spatial relationships — left, right, above, below, and center — are accurately represented as described in the prompt. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows five animals in a field arranged left to right: a horse, a cow, a sheep, a goat, and a dog. This matches the prompt’s specified spatial relationships exactly — horse on far left, cow to its right, sheep in center, goat to sheep’s right, dog on far right. All positional descriptors are accurate. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a red barn on the right side in the foreground, appearing large. A windmill stands to the left in the midground, smaller than the barn but clearly visible. Snow-capped mountains are in the distant background, appearing small due to perspective. The spatial arrangement matches the prompt: barn (foreground, large), windmill (midground, medium), mountains (background, small). All positional relationships are consistent with depth perspective. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image correctly depicts a beach scene with a sandcastle in the foreground, a surfer on the waves in the midground, and a cruise ship on the horizon in the background. The sizes of the objects decrease with distance as specified. All described positional relationships are accurate. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image correctly shows a park bench with pigeons in the foreground, a fountain in the midground, and skyscrapers in the background. The spatial arrangement matches the prompt: bench and pigeons are closest to the viewer, fountain is behind them, and skyscrapers are furthest away. All positional relationships (foreground/midground/background) are accurate. VERDICT: PASS
Scale & ProportionsClick to expand67%
Scale & Proportions measures whether local/krea-2-raw keeps people, vehicles, buildings, and everyday objects at believable relative sizes.
PASSEvaluator (Qwen 3.5 122B)
The person’s height relative to the bus appears realistic — roughly 1/3 to 1/4 of the bus’s height, which is consistent with real-world proportions. The wheel size, window spacing, and body panels also scale appropriately. No obvious distortions or unnatural sizing are present. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The ladybug appears disproportionately large relative to the leaf it rests upon. In reality, a ladybug is small enough that multiple could fit on a leaf of this apparent size. Here, the insect occupies nearly half the visible leaf surface, violating realistic proportion expectations. The leaf’s curvature and texture also suggest it should be much larger than the bug shown. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The child’s arm and hand are holding the basketball in a natural, proportionate way. The ball appears appropriately sized relative to the child’s torso and limb — consistent with real-world scale for a youth basketball. No anatomical or object-size distortions are visible. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a tiny person standing on the rim of a coffee cup, which aligns with the prompt’s description. The person is proportionally small relative to the cup, and their posture suggests they are looking down into it. The scale relationship between the person and the cup is consistent and visually coherent. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image depicts a tabby cat that is disproportionately large compared to the surrounding skyscrapers, consistent with the "giant" and "kaiju" description. The cat towers over the buildings and fills the street, creating a clear and intentional size relationship where the feline is the dominant, massive figure. The proportions are internally consistent with the prompt's surreal premise. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a miniature house resting on an open human hand. The house is appropriately scaled to fit within the palm, with its base spanning roughly from the thumb to the middle finger. The proportions between the hand and the house are realistic for a “miniature” representation — the house appears small enough to be held, yet detailed enough to show windows, doors, and roof features. No anatomical or scaling inconsistencies are visible. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows an elephant, horse, dog, cat, and mouse arranged in a line. However, the proportions are incorrect: the mouse is nearly as tall as the dog’s paw, the cat is disproportionately large relative to the horse’s leg, and the dog appears too small compared to the horse. Real-world size relationships are not preserved — the mouse should be tiny, the cat small, the dog medium, the horse large, and the elephant largest. These distortions violate the prompt’s requirement for correct proport
PASSEvaluator (Qwen 3.5 122B)
The image shows a dinner table setting with a plate, fork, knife, wine glass, salt shaker, and peppercorn shaker. The relative sizes appear realistic: the plate is appropriately large compared to the cutlery, the wine glass is proportionate to the plate, and the shakers are small but correctly sized for tabletop condiments. No obvious scaling errors are present. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows a farm scene with a barn, tractor, farmer, cow, chicken, and fence post. However, the proportions are inconsistent: the chicken in the foreground is nearly as tall as the fence post, and the farmer standing beside the tractor appears disproportionately small compared to the vehicle. The cow also seems oversized relative to the tractor and barn. These scale inconsistencies violate the prompt’s requirement for correct proportional relationships. VERDICT: FAIL
Human realism 79%
Faces & ExpressionsClick to expand92%
Faces & Expressions measures whether local/krea-2-raw creates realistic faces with the requested age, emotion, expression, and identity cues.
PASSEvaluator (Qwen 3 VL)
Face is symmetric with natural, well-proportioned features. Smile is warm and clearly conveys happiness, matching the prompt. No obvious distortions; skin texture and lighting are realistic. Slight AI smoothing on cheeks and eyes is minimal and not distracting. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Face is symmetric with natural features, no obvious distortions. Expression is calm and serious, matching the prompt. Skin texture and hair details appear realistic. Lighting and composition support a professional portrait. All elements align with the requested depiction. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The face is symmetric with correctly proportioned features. Eyes, nose, and mouth are well-placed and natural. The expression is neutral and relaxed, matching the prompt. No significant distortions or artifacts are visible. Lighting and skin texture appear realistic. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Face is largely symmetric with correctly rendered features. Disgust is clearly conveyed: nose is wrinkled, lips are curled, brows are furrowed. Slight asymmetry in eyebrow shape and minor texture artifacts are present but don’t obscure the emotion. Overall, the intended expression is unmistakable and well-executed. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Face is symmetric with correctly rendered features. Eyes are wide, eyebrows raised, mouth slightly open — all clearly conveying genuine surprise. No noticeable distortions. Lighting and texture are natural. The emotion is unmistakably readable and matches the prompt precisely. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The face is symmetric with correctly rendered features: downcast eyes, slightly drawn mouth, and distant gaze. The melancholy emotion is clearly readable through facial expression and lighting. No significant distortions are present. The stylized, painterly rendering enhances the somber mood without compromising anatomical coherence. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The face is symmetric with natural, aged features: deep wrinkles, white hair, and a warm, genuine smile. Eyes crinkle slightly, conveying kindness. No distortions or unnatural elements are present. The emotion is clearly readable as joyful and warm. All prompt requirements are accurately fulfilled. VERDICT: PASS
FAILEvaluator (Qwen 3 VL)
The child’s face is symmetric with round cheeks and large eyes, matching the prompt. However, no tears are visible, and the expression appears surprised or curious, not crying. Facial features are well-defined without distortions. The emotion does not match the requested “crying” state. VERDICT: FAIL
PASSEvaluator (Qwen 3 VL)
Face is symmetric with natural wrinkles, age spots, and thinning gray hair. Skeptical expression is clearly readable via furrowed brow and downturned mouth. No significant distortions. Features align well with prompt. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Faces show natural asymmetry and correct anatomical features. Emotions are clearly readable: laughing (left), shocked (center), rolling eyes with smirk (right). No significant distortions. All three expressions match the prompt accurately. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Faces show clear symmetry and correct anatomical features. Emotions are distinctly readable: father beams, mother smiles gently, teenager looks bored, child grins excitedly. No noticeable distortions. All four individuals match the prompt’s emotional descriptions accurately. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Faces show clear, symmetric features with no distortions. The left person displays frustration (furrowed brows, hand on head), the center person shows pleasure (smiling, nodding), and the right person shows confusion (tilted head, furrowed brow). All emotions are clearly readable and match the prompt. VERDICT: PASS
Full BodyClick to expand92%
Full Body measures whether local/krea-2-raw renders full-body people with natural anatomy, clothing, pose, and proportions.
PASSEvaluator (Qwen 3 VL)
The woman stands upright with arms at her sides, wearing jeans and a white t-shirt as described. Limb count and proportions are anatomically correct. Pose is recognizable and relaxed. No significant distortions observed. All elements match the prompt accurately. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The man stands straight, facing forward, with both hands in pockets. Limb count and proportions are anatomically correct. The pose is clearly recognizable as described. No distortions are visible. All elements match the prompt precisely. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The person’s limbs are correctly counted and proportioned. The neutral standing pose is anatomically accurate, with hands by sides and feet flat. No distortions are visible. The full body from head to feet is shown in a park setting. The pose matches the prompt description. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Limb count and proportions are anatomically correct. Pose is a natural running stride with arms bent and hair flowing backward, matching the prompt. Motion is convincingly rendered in hair and clothing. No distortions or unnatural anatomy. The action is clearly recognizable as running through a park. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The man’s full body is captured mid-air, knees bent, arms out for balance, and feet above a puddle — matching the prompt. Limb count and proportions are anatomically correct. No distortions are visible. The pose is clearly recognizable as a jump over water. All criteria are met. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The dancer’s pose is dynamic, with one leg extended and the dress flowing outward, matching the prompt. Limb count and proportions appear anatomically correct. Motion blur on hair and dress enhances the spinning effect. No obvious distortions in body structure. The action is clearly recognizable as a spin. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The person stands on one leg with the other foot pressed to the inner thigh, arms raised overhead in prayer position — classic tree pose. Limb count and proportions are accurate. No visible distortions. Pose is clearly recognizable and anatomically correct. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The climber has correct limb count and proportions. Pose is anatomically plausible for rock climbing, with arms and legs extended to holds. No obvious distortions. Climbing gear and rope are realistic. The action matches the prompt: clinging to a cliff face with limbs on holds. VERDICT: PASS
FAILEvaluator (Qwen 3 VL)
The cellist’s posture and limb count are anatomically correct. The instrument rests between his legs, and his bow arm is extended. However, his left leg appears unnaturally bent and positioned, failing to fully support the cello as described. The pose is recognizable but distorted in leg positioning. VERDICT: FAIL
PASSEvaluator (Qwen 3 VL)
The dancers’ limbs are correctly counted and proportioned. The man dips the woman backward with intertwined legs, matching the prompt. The pose is anatomically plausible and recognizable as a tango move. No significant distortions are visible. The composition captures the dynamic motion accurately. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Both players show correct limb count and proportions. The sliding player’s pose is anatomically plausible, and the jumping player’s posture is dynamic and realistic. The ball is near the jumping player’s feet, matching the prompt. No significant distortions are visible. The action is clearly recognizable as a mid-tackle scenario. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Both gymnasts show correct limb count and proportions. Their synchronized handstand pose is anatomically plausible and recognizable. Legs are straight, feet together, arms supporting weight. Slight asymmetry in head/neck angle and minor leg separation visible. No major distortions. Core criteria met. VERDICT: PASS
HandsClick to expand42%
Hands measures whether local/krea-2-raw draws anatomically plausible hands, fingers, and gestures.
FAILEvaluator (Qwen 3.5 122B)
Upon close examination, the left hand (viewer’s right) shows only 4 fingers — the thumb is missing or fused. The right hand (viewer’s left) appears to have 5 fingers, but the pinky is oddly shortened and merged with the ring finger. Neither hand displays naturally spread, relaxed fingers; both appear stiff and anatomically distorted. Anomalies include missing/merged digits and unnatural positioning. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image shows a woman with both hands extended forward, palms facing the viewer. Each hand clearly displays five fingers — thumb, index, middle, ring, and pinky — totaling ten fingers. The thumbs are positioned naturally, slightly curled inward toward the center, which is anatomically plausible for this gesture. No extra or missing fingers; no fusion or distortion. Gesture appears intentional and correct. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The left hand (on the right arm) shows 5 fingers, but the right hand (on the left arm) has only 4 visible fingers — the thumb is missing or obscured unnaturally. The grip appears slightly stiff and the finger placement on the opposite arm is not fully anatomically natural, especially the way the right hand’s fingers lie flat without clear knuckle curvature. This violates the “exactly 5 per hand” criterion. VERDICT: FAIL
FAILEvaluator (Qwen 3.5 122B)
The visible right hand gripping the mug handle shows only four fingers clearly — thumb, index, middle, and ring finger — with the pinky either hidden or missing. The grip itself appears anatomically plausible for holding a mug, but the finger count is incomplete. Since the prompt requires exactly 5 fingers per hand and this is not met, the image fails the anatomical accuracy criterion. VERDICT: FAIL
FAILEvaluator (Qwen 3.5 122B)
The right hand gripping the pen shows 5 fingers, but the thumb and index finger appear fused or malformed at the tip. The left hand resting on the notebook is blurred and partially visible — only 3–4 fingers are discernible due to depth-of-field blur, not clear anatomical error. However, the right hand’s grip is anatomically questionable: the pen is held with an unnatural pinch, and finger positioning suggests AI distortion. Core criterion (anatomically correct grip + 5 fingers per hand) is not
FAILEvaluator (Qwen 3.5 122B)
The left hand gripping the fretboard shows only four fingers clearly visible, with the thumb positioned correctly behind the neck. The right hand strumming area is blurred but appears to show a natural strumming motion; however, finger count is indistinct due to motion blur and angle. No obvious anatomical anomalies are present, but incomplete finger visibility prevents confirming “exactly 5 per hand.” Core criteria not fully met. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The hand in the image displays five fingers total: two extended (index and middle) forming a peace sign, and three curled (thumb, ring, pinky). The anatomy appears correct — no extra or missing fingers, natural joint bends, and proper spatial arrangement. The gesture is clearly a right hand making a peace sign as described. No anomalies detected. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a single hand making a thumbs-up gesture. The thumb is extended upward, and four fingers are curled into the palm — this matches the prompt’s description. Anatomically, it is correct for a thumbs-up: only the thumb is extended; the other four fingers are curled. There are no anomalies or extra/missing fingers. The hand is stylized but structurally accurate for the gesture. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows a single hand with all five fingers extended — thumb, index, middle, ring, and pinky — in an open palm gesture. The prompt asked for “counting to three” with only index, middle, and ring fingers extended. This is not what’s shown. Additionally, the hand has 5 fingers (anatomically correct), but the gesture does not match the prompt. Core criteria failed. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
Upon close examination, the left hand (on the right side of the image) shows a thumb and four fingers — anatomically correct. The right hand (on the left side) also displays five fingers: thumb, index, middle, ring, and pinky — all visible and properly positioned in the handshake. No extra or missing fingers; grip is natural and professional. All criteria met. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
Upon close inspection, the left hand (from viewer’s perspective) appears to have 5 fingers, but the right hand shows only 4 clearly visible fingers — the thumb is present, but one finger seems missing or merged. The gesture resembles a high-five, but anatomical accuracy fails due to the missing digit on the right hand. This violates the prompt’s requirement of “all fingers extended on both hands.” VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image shows two pairs of hands interacting with a small gift box. The person in red (giver) has both hands visible — left hand clearly shows 4 fingers and thumb gripping the box; right hand is partially obscured but appears to have 5 digits. The receiver’s hands: right hand (foreground) shows 4 fingers + thumb gripping correctly; left hand is mostly hidden. All visible fingers appear anatomically correct, no extra or missing digits. Grip is natural and consistent with handing/receiving a gif
Multi-SubjectClick to expand100%
Multi-Subject measures whether local/krea-2-raw keeps multiple people distinct while preserving their requested attributes, outfits, and positions.
PASSEvaluator (Qwen 3 VL)
Left woman: short blonde hair, wearing red jacket. Right woman: long black hair, wearing blue dress. Both are visually distinct and match descriptions exactly. Features are clear and accurate. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Left man: beard, glasses, green sweater — matches. Right man: clean-shaven, black hoodie — matches. Both visually distinct, in cafe setting. All described features accurately rendered. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Tall person: curly red hair, denim overalls over striped shirt. Short person: straight brown hair with bangs, yellow sundress. Both visually distinct and match descriptions. All features accurately rendered. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
All four individuals are present and visually distinct. Left: bald man in suit/tie. Next: woman with silver hair in red blouse. Center: young man with dreadlocks in denim jacket. Right: woman with hijab in green dress. Features match prompt exactly. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Tall woman: pink short hair, playing bass — matches. Drummer: bearded, wearing bandana — matches. Slim man: glasses, playing keyboard — matches. All three are visually distinct and correctly positioned. Features align precisely with prompt description. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Man: wears sunhat, backpack, holds hiking poles. Woman: braided hair, backpack, holds hiking poles. Teenager: baseball cap, backpack, holds water bottle. All three are visually distinct and match described features. VERDICT: PASS
Truthfulness 63%
PhotorealismClick to expand100%
Photorealism measures whether local/krea-2-raw makes materials, surfaces, and real-world objects look visually plausible and distinct.
PASSEvaluator (Qwen 3.6 27B)
The image demonstrates high-fidelity material rendering. The glass exhibits correct refraction and transparency, distorting the background wood grain. The spoon shows sharp, accurate specular reflections of the apple and environment. The apple’s skin has a realistic waxy sheen with subsurface scattering effects. The linen napkin displays a convincing woven texture with soft, diffuse lighting. The wooden table provides a consistent grain pattern and warm tone. All materials behave physically corr
PASSEvaluator (Qwen 3.6 27B)
The image successfully renders the requested materials with high fidelity. The laptop exhibits a convincing brushed aluminum texture with appropriate metallic sheen. The notebook displays realistic leather grain and thickness. The mug has the correct matte ceramic finish, and the glass paperweight demonstrates accurate refraction, reflection, and translucency. The wood desk surface also appears natural. All materials are visually distinct and realistic. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The image successfully renders the distinct material properties requested. The stainless steel pot exhibits sharp, metallic reflections. The glass bottle shows appropriate translucency and refraction of the oil inside. The wooden cutting board displays a natural grain texture, and the cotton towel has a visible woven fabric texture. The lighting interacts correctly with each surface, creating realistic highlights and shadows. VERDICT: PASS
Physics & ReflectionsClick to expand83%
Physics & Reflections measures whether local/krea-2-raw follows basic physical cues such as shadows, reflections, gravity, transparency, and contact points.
PASSEvaluator (Qwen 3.6 27B)
The image displays a red sphere on a white surface. The shadow is cast to the right and slightly behind the sphere, which is consistent with a light source positioned to the upper left. The shadow's shape is an ellipse, correctly representing the perspective distortion of a sphere's shadow on a plane. The lighting on the sphere itself also matches this direction, with the brightest highlight on the upper-left side. The physics of the shadow and lighting are accurate. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The shadow is cast to the left, consistent with sunlight coming from the right. The reflection in the window glass is physically plausible, showing the mug’s interior and liquid. However, the reflection appears slightly distorted and misaligned — the mug’s position in the reflection doesn’t perfectly mirror its real-world placement relative to the window frame. Also, the shadow’s edge is too soft for “sharp” as described. Minor inconsistencies exist but don’t break plausibility entirely. VERDIC
PASSEvaluator (Qwen 3.6 27B)
The shadow is cast directly beneath the subject, consistent with midday lighting. The shape accurately mirrors the person’s stance and clothing folds without distortion. The shadow’s darkness and soft edges are realistic for bright outdoor conditions. No directional inconsistencies or unnatural reflections are present. The scene’s lighting and shadow behavior align with physical expectations. VERDICT: PASS
FAILEvaluator (Qwen 3.6 27B)
The reflection is physically implausible. The woman is facing the mirror, so her reflection should show her front. However, the reflection displays her back, while the real woman’s back is visible to the viewer. This violates basic mirror physics. Additionally, the reflection’s pose and orientation do not match a true mirror image — it appears as a rotated or duplicated figure rather than a correct reflection. Shadows and lighting also lack consistency with the scene. VERDICT: FAIL
PASSEvaluator (Qwen 3.6 27B)
The reflection is vertically inverted and positioned directly beneath the cat, consistent with a polished floor. The distortion increases with distance from the contact point, which is physically accurate for such surfaces. Colors and lighting in the reflection match the subject. The orientation and direction are correct. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The reflection is vertically inverted and positioned directly beneath the vase, consistent with a glass surface. The colors and shapes of the flowers and vase are mirrored accurately, with slight blurring and distortion typical of reflections on glass. The lighting and shadows align with the scene’s illumination, and no directional inconsistencies are present. The reflection’s clarity and orientation support physical plausibility. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The reflections are vertically inverted and correctly aligned with the tree trunks, maintaining proper orientation. The color palette accurately mirrors the sunset sky, appearing slightly darker in the water as expected due to light absorption. The gentle ripples distort the reflection naturally, creating a realistic texture on the water surface. The physics of the reflection, including direction and distortion, are consistent with a calm lake at sunset. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The reflection is vertically inverted, which is physically correct for a puddle. The orientation matches the skyline above. There is a noticeable color shift toward teal/green in the water, consistent with light absorption and reflection physics. Mild distortion is present, especially near the edges and around objects like poles, which aligns with real-world puddle behavior. The reflection’s clarity and alignment with the scene are plausible. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The reflection is physically plausible. It is correctly inverted vertically, matching the mountain range's orientation. The color is subtly muted and darker than the actual mountains, consistent with light absorption in water. The surface shows slight ripples, creating natural distortion rather than a perfect mirror image. The reflection aligns perfectly with the shoreline, and the lighting direction is consistent across the scene. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The image displays a glass sphere on a reflective surface with a pool in the background. The refraction through the sphere correctly inverts the background scene, which is physically accurate. The reflection on the mirror surface is sharp and directly beneath the sphere, maintaining correct orientation and alignment. The lighting and shadows are consistent with an outdoor setting. The water reflection in the background is also plausible. All optical phenomena are rendered realistically. VERDICT
PASSEvaluator (Qwen 3.6 27B)
The reflections are physically plausible. The glass floor accurately mirrors the sculpture and windows with correct orientation and slight transparency. The chrome surfaces exhibit realistic environmental mapping, distorting the room’s geometry consistent with their curved shapes. Lighting is coherent, with highlights matching the window positions. No directional inconsistencies or impossible distortions are present. VERDICT: PASS
FAILEvaluator (Qwen 3.6 27B)
The reflection on the marble counter is physically plausible, showing a clear, inverted image of the glass with appropriate distortion and color matching. The lake’s surface also reflects the sky and trees realistically. However, the prompt specifically asks to check shadows/reflections for physical plausibility — and while reflections are well-rendered, there are no visible cast shadows from the glass onto the counter, which would be expected under natural lighting. This omission makes the scen
World KnowledgeClick to expand33%
World Knowledge measures whether local/krea-2-raw depicts recognizable places, landmarks, flags, and known objects with accurate real-world details.
PASSEvaluator (Qwen 3.6 27B)
The image depicts the Eiffel Tower with high architectural accuracy, including the iron lattice structure, three levels, and surrounding trees. However, there is a noticeable anomaly: a tall, slender tower-like structure visible through the arch at the base, which does not exist in reality. This suggests an AI-generated artifact or hallucination. Despite this minor flaw, the core subject remains recognizable and largely accurate. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The image depicts the Taj Mahal with high fidelity. The central onion dome, four surrounding minarets, and the symmetrical layout are architecturally accurate. The perspective from the reflecting pool is correct, capturing the iconic reflection. The intricate details, such as the arches and decorative elements, are well-rendered. The lighting and colors appear natural, enhancing the realism of the scene. Overall, the image is a faithful representation of the Taj Mahal. VERDICT: PASS
FAILEvaluator (Qwen 3.6 27B)
The Statue of Liberty is clearly recognizable with accurate green patina, torch, crown, and tablet. However, the pedestal and base are architecturally inaccurate — they appear simplified, blocky, and lack the correct neoclassical details (e.g., missing columns, incorrect window placement, unnatural textures). The surrounding island and water also show AI artifacts (e.g., overly smooth surfaces, unnatural crowd lines). While the statue itself is mostly correct, the structural inaccuracies in the
FAILEvaluator (Qwen 3.6 27B)
The image depicts a shamisen and a bachi, which are recognizable. However, the instrument is not factually accurate. The headstock features four tuning pegs arranged in a cross pattern, whereas a traditional shamisen has only three strings and three pegs. Additionally, the body shape is overly symmetrical and lacks the characteristic asymmetry and traditional materials (like cat or dog skin) of a real shamisen. VERDICT: FAIL
FAILEvaluator (Qwen 3.6 27B)
The image depicts an object resembling an astrolabe, but it fails the accuracy test. The text and symbols are nonsensical gibberish rather than accurate Latin or Arabic inscriptions. The geometric structures of the rete and tympan are distorted and illogical, lacking the precise mathematical layout of a real instrument. The "engraved degree markings" are blurry and inconsistent. While the general shape is recognizable, the specific details are hallucinated, making it factually inaccurate. VERDI
PASSEvaluator (Qwen 3.6 27B)
The image depicts a ceramic bowl with a network of cracks filled with gold, accurately representing the Japanese art of kintsugi. The subject is clearly recognizable as a bowl, and the repair technique is visually consistent with real-world examples—irregular fracture lines highlighted with metallic gold against an off-white glaze. Lighting and texture appear realistic. No architectural or factual inaccuracies are present. VERDICT: PASS
FAILEvaluator (Qwen 3.6 27B)
The image depicts the water cycle, but the text labels are nonsensical gibberish (e.g., "Syop," "Vannl wanle," "Eveporration"). This makes the diagram factually inaccurate and unreadable as an educational tool. The core requirement for a "detailed diagram" is not met because the information conveyed is corrupted. VERDICT: FAIL
FAILEvaluator (Qwen 3.6 27B)
The image depicts a geological cross-section with distinct layers and a subduction zone. However, it fails the "real-world subject" criterion due to its low-fidelity, cartoon-like rendering. The textures are flat and artificial, lacking the complexity of real geological formations. While the structural elements (subducting plate, magma chamber, volcanic arc) are present, the overall aesthetic is that of a basic educational diagram rather than a realistic or high-quality visualization. VERDICT:
FAILEvaluator (Qwen 3.6 27B)
The image is a stylized, cartoon-like illustration rather than a realistic anatomical drawing. While the four chambers and major vessels (aorta, pulmonary arteries/veins) are present and roughly positioned, key details are inaccurate: valves are not clearly depicted or labeled, coronary vessels are oversimplified and misplaced, and vessel connections lack anatomical precision. The artistic style sacrifices factual accuracy for visual simplicity. VERDICT: FAIL
FAILEvaluator (Qwen 3.6 27B)
The image depicts a butterfly that resembles a Morpho, but it is not factually accurate. The prompt specifies a Morpho menelaus, which has iridescent blue dorsal wings and brown ventral wings with eyespots. However, the image shows both dorsal and ventral patterns simultaneously on the same side, which is biologically impossible. Additionally, the wing patterns are overly vibrant and stylized, deviating from realistic morphology. VERDICT: FAIL
FAILEvaluator (Qwen 3.6 27B)
The image is not factually or architecturally accurate. While it captures the general grandeur of the Hagia Sophia, it contains significant hallucinations. The central dome’s calligraphy medallion is stylized and incorrect in form. The pendentives feature large, dark angelic figures that do not match the actual Byzantine mosaics (which depict Christ, the Virgin, and prophets). The lighting and structural details are overly dramatic and inconsistent with reality. VERDICT: FAIL
PASSEvaluator (Qwen 3.6 27B)
The image depicts a bismuth crystal with high fidelity. It accurately portrays the characteristic geometric hopper crystal formation, featuring distinct staircase structures and sharp, angular edges. The iridescent oxide layers are rendered with realistic coloration, displaying the typical rainbow hues (blues, purples, yellows) caused by thin-film interference. The subject is clearly recognizable and factually accurate regarding its mineralogical properties. VERDICT: PASS
Professional Studio 74%
Camera & LightingClick to expand83%
Camera & Lighting measures whether local/krea-2-raw follows specific lens, angle, focus, exposure, and lighting instructions.
PASSEvaluator (Gemma 4 26B)
The image successfully implements a wide-angle perspective that emphasizes the expansive space of the living room. There is a noticeable barrel distortion at the edges, particularly visible in the curvature of the walls and the ceiling edges, which matches the prompt's requirement. The lighting is bright and even, consistent with a modern interior shot. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image perfectly matches the prompt. It features a close-up macro shot of a single, clear dewdrop resting on a rose petal. The depth of field is extremely shallow, with the dewdrop and the immediate edge of the petal in sharp focus, while the background and foreground petals are heavily blurred. The lighting is soft and natural, highlighting the transparency and refractive properties of the droplet. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image perfectly matches the prompt. It features a true aerial bird's-eye view, looking straight down at a traffic roundabout. The perspective is flat and top-down, with no tilt or angle. The lighting is even and diffuse, consistent with a high-angle shot, and the depth of field is deep, keeping the entire scene from the center roundabout to the surrounding trees in sharp focus. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image successfully implements Rembrandt lighting. There is a clear, distinct triangle of light on the shadowed cheek (the subject's right cheek), which is the hallmark of this lighting style. The lighting creates a moody, high-contrast atmosphere with a dark background, and the depth of field is appropriate for a portrait, keeping the focus on the subject's face. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image perfectly captures the requested elements. The sky is a vibrant orange, creating a strong golden hour atmosphere. The lighting pattern features long, dramatic shadows stretching across the foreground grass, consistent with a low sun angle. The perspective provides a clear sense of depth, moving from the textured foreground to the silhouetted trees and the distant, hazy mountains. VERDICT: PASS
FAILEvaluator (Gemma 4 26B)
The image fails to meet the prompt's requirements. While it attempts a bright aesthetic, the subject is a tiny, low-resolution square centered in a massive white void, which is not a standard portrait composition. The lighting is flat and lacks the "luminous" and "airy" quality described; instead, it looks like a poorly cropped, low-quality snapshot with washed-out colors. It lacks the professional high-key studio quality requested. VERDICT: FAIL
PASSEvaluator (Gemma 4 26B)
The image perfectly matches the prompt. The portrait features a shallow depth of field with a very creamy, blurred background (bokeh) characteristic of an 85mm f/1.4 lens. There is a clear, warm rim light effect on the subject's hair and shoulders, creating a distinct separation from the garden background, which fulfills the backlit requirement. VERDICT: PASS
FAILEvaluator (Gemma 4 26B)
The image fails to meet the prompt's core requirement. The prompt specifies a "subject sharp" with a shallow depth of field. However, in this image, the central subject (the person walking away) is completely out of focus and blurry. A 50mm f/1.8 lens should keep the subject sharp while blurring the background; here, both the subject and the background are blurry, resulting in a failed execution of the requested depth of field. VERDICT: FAIL
PASSEvaluator (Gemma 4 26B)
The image matches the prompt's requirements. The perspective is a top-down shot (flat lay). The depth of field is shallow, with the edges of the plate and the background window slightly out of focus, consistent with f/2.8. The lighting is clearly coming from the window on the left, creating a natural light pattern. The color temperature is warm, and there is a subtle vignette around the edges of the plate. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image successfully implements the tilt-shift effect. It features a distinct selective focus strip where the mid-ground is sharp, while the foreground and background are heavily blurred to create a miniature model appearance. The colors are highly saturated, and the perspective effectively mimics a high-angle miniature look. The lighting and depth of field perfectly match the prompt's requirements. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image perfectly matches the prompt. It features a dark, moody scene with a shallow depth of field. Most importantly, it displays a prominent horizontal anamorphic lens flare with distinct blue and orange color streaks across the center of the frame, exactly as described. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image perfectly matches the prompt's description. It features a portrait with dramatic split lighting, where one side of the face is brightly illuminated and the other side is cast in deep, heavy shadow. This creates a high-contrast chiaroscuro effect against a black background. The perspective is a standard portrait view, and the lighting pattern is exactly as specified. VERDICT: PASS
Color PrecisionClick to expand75%
Color Precision measures whether local/krea-2-raw renders requested colors clearly and consistently across subjects, products, and backgrounds.
PASSEvaluator (Gemma 4 26B)
The image features a bright, vivid red sports car parked on a gray asphalt road. The red color of the car is highly saturated and unambiguous, matching the prompt's description perfectly. The road is a consistent gray, and the yellow line is also visible. All color requirements specified in the prompt are accurately represented. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The woman is wearing a vibrant, deep blue evening dress that accurately represents the "royal blue" color requested. The background is a solid, light beige/off-white color, which perfectly matches the description of a "neutral background." All color requirements from the prompt have been met. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image perfectly matches the prompt's color requirements. The flowers are a vibrant, pure sunflower yellow, and the sky is a clear, bright blue. There are no deviations from the specified color descriptions. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The wall color is a vibrant, saturated orange that closely aligns with the requested burnt orange (#FF6B35) description. The furniture, including the bed, lamp base, and nightstand, is white as specified. The scene also features natural daylight coming from the window. All color requirements from the prompt are accurately represented in the image. VERDICT: PASS
FAILEvaluator (Gemma 4 26B)
The image fails to meet the color requirements. The prompt specifies a dark muted green (#2D5F2D) for the foliage. However, the image has a heavy yellowish-olive tint, making the foliage appear much more yellowish and bright than the requested dark muted green. Additionally, the text overlay in the image (#D5FFD) does not match the requested hex code (#2D5F2D), further indicating a color mismatch. VERDICT: FAIL
PASSEvaluator (Gemma 4 26B)
The image accurately reflects the requested colors. The wall color is a medium blue that matches the specified hex code #4A90D9 (as indicated by the text overlay in the image). The floor is a warm wooden color, and the trim is white, adhering to all color requirements in the prompt. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image matches the color requirements. The walls are a warm cream color, the sofa is a dark navy (#2C3E50), and the accent pillows are a bright red (#E74C3C). Although there is a text overlay of a different hex code in the corner, the actual colors of the objects in the scene correctly reflect the prompt's specific color instructions. VERDICT: PASS
FAILEvaluator (Gemma 4 26B)
The image fails to follow the color instructions. The background is a very dark blue, but the text overlays are incorrect and do not match the prompt's requirements. The watch color is a muted gold, but the text below it says "#F0C27F" (which is a different hex code than the prompt's gold). The earbuds are white, but the text below them says "#FAFFEF" instead of the requested "#FFFFFF". The prompt's specific hex values and color names are not correctly represented in the image's text or the actu
FAILEvaluator (Gemma 4 26B)
The image contains text labels that appear to be intended as hex codes, but they do not match the prompt's requirements. The prompt specifies the main shape as #FF4500 (orange-red), but the label in the image reads "#FF4500" while the background is labeled "#FAFAFA" and the accent line is labeled "#333333". However, the actual colors used in the graphic are slightly off from the requested hex values (the orange is more of a pure orange than the specified orange-red). Most importantly, the labels
PASSEvaluator (Gemma 4 26B)
The image shows a smooth horizontal gradient. On the far left, the color is a pure red (#FF0000). On the far right, the color is a pure blue (#0000FF). The transition between these two colors passes through a purple/magenta middle section, exactly as described in the prompt. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image displays a vertical gradient. The bottom color is a vibrant orange, which matches the hex code #FF6B35 provided in the prompt and visible in the bottom-left corner. The top color is a deep, dark indigo/navy, which matches the hex code #1B1464 provided in the prompt and visible in the bottom-right corner. The gradient transition accurately follows the specified color scheme. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image shows a horizontal banner with a gradient. On the left, the color is a bright cyan, which matches the hex code #00C9FF. On the right, the color is a light mint green, which matches the hex code #92FE9D. The gradient transitions smoothly between these two specific colors as requested. VERDICT: PASS
PhotorealismClick to expand33%
Photorealism measures whether local/krea-2-raw produces polished product and studio imagery with realistic materials, lighting, shadows, and finish.
FAILEvaluator (Gemma 4 26B)
The image fails to look like a real photograph. The sneaker lacks realistic texture, appearing as a flat, monochromatic 3D render rather than a physical object made of fabric or leather. There is a complete absence of material detail, stitching, or subtle highlights expected in studio photography. The lighting is unnaturally uniform, and the shadow is a generic, soft blur that doesn't interact realistically with the shoe's base. VERDICT: FAIL
FAILEvaluator (Gemma 4 26B)
The image fails to meet the criteria for a studio-quality product shot. While the subject is clear, there are significant AI artifacts. The glass bottle has unnatural, warped edges and inconsistent thickness. Most notably, the straw/dip tube is a single, thin, floating line that lacks physical realism. The reflection on the surface is also poorly rendered and lacks the complex light interactions expected in a high-end professional photograph. VERDICT: FAIL
PASSEvaluator (Gemma 4 26B)
The image successfully meets the prompt requirements. The green apple features realistic skin texture, subtle specular highlights, and a natural-looking stem. The lighting is soft and directional, creating a gentle highlight on the upper left and a realistic, soft contact shadow underneath. The seamless white background is clean and professional, characteristic of a high-quality studio product shot. There are no obvious AI artifacts or distortions. VERDICT: PASS
Graphical design 50%
Data VisualisationClick to expand0%
Data Visualisation measures whether local/krea-2-raw produces charts with correct labels, proportions, colors, and readable values.
FAILEvaluator (Qwen 3 VL)
The chart misrepresents data: y-axis is labeled "Januuary" (spelled wrong), x-axis labels are incorrect (e.g., "18K", "15K", "Junge" instead of months). Values shown (12K, 18K, 22K, 19K, 25K) are incomplete or mismatched. Title is correct, but axes and labels are inaccurate and unreadable. Chart type is bar chart, but data presentation is flawed. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
The pie chart uses correct colors and percentages visually, but labels are inaccurate: “Company C” is mislabeled as “Company C ur wet ker:” and “Company B 25%” is missing from the chart’s slice labels. The title also mislabels “Company B” as “Corhlans21%.” These errors violate the prompt’s requirements for accurate labels and percentages. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
The chart is a line graph titled "Weekly Traffic" with axes and grid lines. However, axis labels and tick values are garbled (e.g., "Won Soly Veiilor", "Wiv 50 Daly Traibe Sun"), and the data points do not align correctly with the requested values. The chart type matches, but critical textual elements are unreadable and inaccurate. VERDICT: FAIL
Layout & DesignClick to expand22%
Layout & Design measures whether local/krea-2-raw organizes posters, menus, covers, and other designed compositions with the requested hierarchy and elements.
PASSEvaluator (Qwen 3 VL)
The image features a large popcorn bucket centered, with "MOVIE NIGHT" in bold white text at the top and "Every Friday at 8 PM" in black text at the bottom. Text is readable and correctly placed per the prompt. Visual hierarchy is clear, with the popcorn as the focal point and text anchoring top and bottom. All required layout elements are present. VERDICT: PASS
FAILEvaluator (Qwen 3 VL)
The guitar illustration and "Tickets available now" tagline are present and correctly placed. The headline "LIVE MUSIC FEST" uses proper visual hierarchy with large, bold text. However, the top text is garbled nonsense, not matching the prompt. The core layout and readable elements are mostly correct, but the garbled text violates the prompt. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
The poster includes an open book illustration and the main title "JOIN THE BOOK CLUB" with correct visual hierarchy. The tagline "Meets every Wednesday" is present below the book. However, the title text is incorrect ("Moob s Wour Book Club:" instead of "JOIN THE BOOK CLUB"), and the website URL is garbled. The core layout elements are present but text is inaccurate. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
The image shows a landing page with a hero header, headline/subtitle, "Sign Up" button, and footer with social icons — all present. However, all text is gibberish and unreadable, violating the requirement for readable, correctly placed text. The visual hierarchy is intact, but content failure renders it non-functional. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
The image shows a newsletter layout with a banner header, two-column body (text left, image right), a centered quote block, and a footer with an unsubscribe link. However, all text is gibberish placeholder text, unreadable and incorrectly placed. The visual hierarchy is present but meaningless due to illegible content. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
The image shows a menu with a logo header, three category sections (Appetizers, Mains, Desserts) with items and prices, and a footer with address and hours. However, all text is gibberish or unreadable, appearing as random characters or reversed text. The layout structure is correct, but content is not legible or meaningful. VERDICT: FAIL
PASSEvaluator (Qwen 3 VL)
The layout matches the prompt: left page has a large hero photo; right page has two columns, a pull quote, sidebar with author bio, and page number. Text is present but unreadable due to gibberish placeholder characters. Visual hierarchy is correct, but content is not legible. Core layout structure is intact. VERDICT: PASS
FAILEvaluator (Qwen 3 VL)
The image shows a mobile app UI with a top navigation bar, search field, 2x2 grid of feature cards with icons, "Recent" activity list, and bottom tab bar — all present. However, all text is garbled and unreadable, violating the requirement for readable, correctly placed text. Layout hierarchy is visually intact but text illegibility undermines functionality. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
The image shows a magazine spread with a full-bleed left photo and a right page featuring three columns, a headline, body copy, a smaller inset image, and styled page numbers. Text is present but appears as gibberish, unreadable placeholder characters. The layout structure is visually correct, but the text fails the readability requirement. VERDICT: FAIL
Style DiversityClick to expand83%
Style Diversity measures whether local/krea-2-raw can shift between visual styles like oil painting, pixel art, watercolor, and technical illustration.
PASSEvaluator (Qwen 3.5 122B)
The image clearly depicts a golden retriever in a garden setting. The artistic style strongly resembles an oil painting, with visible, directional brushstroke textures throughout the fur and background. The colors are rich and saturated, and the blending mimics traditional oil techniques. The key visual characteristics requested—oil painting style, visible brushstrokes, and rich texture—are all present and clearly recognizable. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image clearly exhibits the requested Japanese anime style. Key characteristics are present: the golden retriever has large, expressive eyes with prominent white highlights, simplified facial features, and flat, unshaded colors. The background elements (flowers, bushes) are also rendered in a consistent flat-color anime aesthetic. The overall composition matches the prompt’s stylistic requirements. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image clearly depicts a golden retriever in a garden setting using pixel art. The grid structure is visible, and the resolution appears to be 16x16 or 32x32 (counting the squares reveals a 16x16 grid, not 32x32 as requested). The color palette is limited, primarily using greens and oranges. However, the prompt specifically requested a 32x32 grid, and the image is visibly 16x16. This is a clear deviation from the specified technical constraint. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image strongly exhibits Art Deco characteristics: strong geometric lines, vertical emphasis in skyscrapers, symmetrical composition, and a striking gold-and-black palette. The ornamental framing and stylized urban elements align with the requested aesthetic. While not historically precise, it clearly evokes the Art Deco style as interpreted for modern AI art. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image clearly exhibits the requested Impressionist style. Key characteristics are present: soft, visible brushstrokes create a painterly texture; dappled light filters through the parasol and background; and the palette is dominated by pastels—pale blues, pinks, yellows, and greens. The subject’s form is softly defined rather than sharply outlined, consistent with Impressionism. The overall effect matches the prompt’s artistic intent. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image clearly exhibits the requested Pop Art style. It features bold black outlines defining the figure and facial features. The entire image is rendered using Ben-Day dots for shading and texture, visible in the skin tones, hair, and clothing. The color palette consists of flat, vivid primary colors: a bright yellow background, blue suit, and red dots on the skin. All key visual characteristics are present and accurately executed. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image successfully combines photorealistic forest details (tree bark, leaf litter, lighting) with visible watercolor characteristics. The edges show characteristic bleeds and blooms where pigment meets paper. The white background has a discernible paper texture. The green canopy shows translucent washes typical of watercolor. The core stylistic requirements are clearly met. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image successfully combines cyberpunk and Art Nouveau styles. The central figure is a woman in futuristic gear within a neon cityscape (cyberpunk). This is framed by an ornate border featuring flowing organic lines, intricate patterns, and prominent floral motifs at the corners (Art Nouveau). The color palette utilizes vibrant neons alongside the decorative style. The requested style is clearly recognizable and its key characteristics are present. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image successfully combines ukiyo-e woodblock print aesthetics (flat colors, bold outlines, stylized clouds, Japanese text box) with a photorealistic sky gradient and lighting effects on the castle. The castle itself is rendered in a semi-realistic style that blends with the traditional print elements. Key characteristics of both requested styles are clearly present and harmoniously integrated. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image clearly exhibits soft brutalism: raw concrete walls and ceiling with visible texture, paired with rounded, pastel-colored furniture (pink and cream). Warm diffused light enters from the left window, casting gentle shadows and enhancing the serene mood. The color palette and material contrast align precisely with the prompt. All key visual characteristics are present and well-executed. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image exhibits a sparse composition with significant negative space, featuring only two small objects on a white surface against a large, muted beige background. The color palette is subdued and earthy. However, the rendering style appears to be photorealistic or hyper-realistic rather than a distinct "painting" style. While the composition fits "minimalism," the specific artistic texture and "anxious" quality requested are not clearly recognizable as a painted style. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image displays a cityscape with retro-futuristic architecture, including spires and rounded towers reminiscent of 1960s space-age design. The color palette features warm, muted tones with a yellow-green patina that evokes nostalgia. While the architectural style aligns well with the prompt, the melancholic mood is somewhat subdued by the bright, hazy lighting. Overall, key visual characteristics are present. VERDICT: PASS