ImageBench V1 —
192 evaluations across 6 categories
Benchmark V1 verdicts are produced by VLM judges and can contain mistakes. Treat PASS/FAIL labels as machine-assisted assessments, and inspect the images yourself. Learn more about the methodology.

Prompt: The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink
Text Rendering 80%
Typography StyleClick to expand100%
Typography Style measures whether local/krea-2-turbo preserves requested lettering styles, font mood, and text texture such as typewriter ink or neon signage.
PASSEvaluator (Qwen 3 VL)
The text "CHAPTER ONE" is spelled correctly and rendered in a bold, blocky font resembling vintage typewriter lettering. The background mimics aged paper with visible texture. However, the ink lacks the expected slight unevenness or smudging characteristic of typewriter ink. The style is convincing overall but misses the specific "slightly uneven ink" detail. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Text reads “TODAY’S SPECIAL” correctly, spelled and punctuated as requested. Letters exhibit convincing chalk texture, smudges, and uneven edges on a dark green surface resembling a chalkboard. Chalk dust effect is visible around strokes. The medium and style match the prompt precisely. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The word "OPEN" is rendered in glowing red pixel-style letters, matching an LED display. The letters are clearly formed with individual illuminated dots, and the spelling is correct. The style convincingly mimics a classic LED sign with warm glow and dark background. VERDICT: PASS
Writing accuracyClick to expand75%
Writing accuracy measures whether local/krea-2-turbo renders the exact requested words, numbers, labels, and short phrases without spelling or layout errors.
PASSEvaluator (Qwen 3 VL)
The word "HELLO" is fully spelled correctly and is clearly readable in bold, black letters on a white background. All characters are intact and legible. No misspellings or missing parts are present. The text matches the intended specification exactly. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The word "ADVENTURE" is spelled correctly and fully readable in bold, centered text on a cream background. All letters are intact and legible. No misspellings or missing words. The visual matches the intended text exactly. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The word "SERENITY" is spelled correctly and fully readable, centered on a solid light blue background. All letters are distinct and legible. No misspellings or missing parts. The text matches the intended specification exactly. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
All words are spelled correctly and fully readable: "quick", "brown", "fox", "jumps", "high". Text is clean black on white background as intended. No misspellings or illegible parts. Accuracy meets criteria. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Text reads: "Every sunset brings a new dawn ahead". All words are spelled correctly and fully readable against the pale yellow background. No misspellings or illegible parts. Matches intended text exactly. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Text reads: "Stars shine brightest in the dark night". All words are spelled correctly and fully legible against the clean off-white background. No misspellings or missing words detected. Font is clear and high-contrast. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
All text matches the intended message exactly: "SUMMER SALE", "Up to 50% off all items this weekend only", and "Terms and conditions apply. See store for details." Every word is spelled correctly and fully readable, with no missing or illegible parts. Font sizes are appropriate for hierarchy. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
All text matches the intended content exactly. "GRAND OPENING", "Join us for food, music, and fun", and "RSVP required. Limited seating available." are all spelled correctly and fully readable. No misspellings or illegible parts detected. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
All text matches the intended content exactly: "ANNUAL GALA 2025", "An evening of fine dining and live jazz", and "Black tie optional. Doors open at 6 PM." Every word is spelled correctly and fully readable, with no missing or illegible parts. Font styles and sizes align with the description. VERDICT: PASS
FAILEvaluator (Qwen 3 VL)
Text errors: "Mcernic" (should be "Elena Voss"), "Pluinis" (illegible/incorrect), "Piclrss Tctures" (should be "Paramount Pictures"). "Starring" is misaligned. "Music by Thomas Grant" is partially obscured. Several words are distorted or unreadable. Core text is not fully accurate. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
“Cambuida” is misspelled (should be “Cambodia”). All other text is correctly spelled and readable. The barcode and pricing are accurate. The core requirement for full text accuracy is not met due to the single misspelling. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
“Unfilterred” is misspelled (should be “Unfiltered”). “preservatives” is misspelled (should be “preservatives”). All other text matches the intended content exactly. The misspellings are clear and fully readable. VERDICT: FAIL
Spatial Reasoning 81%
Attributes BindingClick to expand100%
Attributes Binding measures whether local/krea-2-turbo attaches the right colors, materials, and properties to the correct requested objects.
PASSEvaluator (Qwen 3.5 122B)
The image shows two objects on a white surface: - Left object: Red, spherical (ball) — matches “red ball” - Right object: Blue, cubic (cube) — matches “blue cube” Colors, shapes, and positions are correctly bound with no swaps. Size is comparable and appropriate for “side by side” placement. All prompt attributes are accurately represented. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a green apple and an orange pumpkin on a wooden cutting board. The apple is correctly green, round, and in the foreground. The pumpkin is orange, ribbed, and positioned behind the apple. All attributes (color, shape, size) are correctly bound to their respective objects with no swaps. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a yellow taxi and a black limousine parked side by side on a street. The taxi is correctly colored yellow and has typical taxi features including a roof sign. The limousine is correctly colored black and appears elongated as expected for a limousine. Both vehicles are positioned next to each other on what appears to be a city street with buildings in the background. All attributes are correctly bound to their respective objects. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains three distinct shapes on a white background: 1. A large red circle — correct color, size, and shape. 2. A small blue triangle — correct color, size, and shape. 3. A medium green star — correct color, size, and shape. All attributes are correctly bound to their respective objects with no swaps or errors. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows three candles on a shelf: 1. Left: Tall, thin, white candle — matches description. 2. Center: Short, fat, red candle — matches description. 3. Right: Medium height, spiral-shaped, yellow candle — matches description. All attributes (color, size, shape) are correctly bound to each object with no swaps. Lighting and texture are consistent; no artifacts or misalignments. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a large brown horse with a white facial stripe and black mane, correctly matching "big brown horse." A small white rabbit is present in the foreground, matching "small white rabbit." A medium-sized black cat with green eyes stands beside them, matching "medium-sized black cat." All objects are correctly attributed with no swaps. The setting is a meadow with flowers and grass. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a pink fire truck with correct structural details (ladder, lights, wheels). A large blue banana is present in the background, curving upward. A large green circular object (sun/moon) appears in the sky. All requested objects are present with correct colors and shapes. No attribute swaps observed. The surreal landscape context is also maintained. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains: 1. Purple carrot with green leafy top — correct color and shape. 2. Red tree with red leaves — correct color and structure. 3. White watermelon sliced open — outer rind is white, interior is red with black seeds — matches “white watermelon” as described (unusual but intentional). All attributes are correctly bound to their respective objects. No swaps or misattributions. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains an orange elephant with white tusks and black eyes, a yellow penguin with a white belly and orange beak/feet, and a silver strawberry with green leaves. All objects match the prompt’s color, shape, and identity requirements — no swaps or misattributions. The arrangement on a table-like surface is also correct. VERDICT: PASS
CompositionalityClick to expand89%
Compositionality measures whether local/krea-2-turbo combines subjects, actions, outfits, props, and scene details into one coherent requested image.
PASSEvaluator (Qwen 3.5 122B)
The image shows a tabby cat with distinct black and brown markings. It is wearing large, bright red sunglasses that cover its eyes. The sunglasses are oversized relative to the cat’s face, fitting the description. All elements — tabby cat, oversized red sunglasses — are present and coherently combined. No contradictions or missing components. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a golden retriever with a brown cowboy hat on its head. The hat is proportionally small relative to the dog’s head, fitting the “tiny” descriptor. The dog’s expression is cheerful, mouth open, tongue visible — consistent with typical golden retriever portrayals. All prompt elements are present and coherently combined: breed, hat style, and size relationship. No contradictions or missing components. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains: - A penguin: Present (black and white body, orange beak, yellow-orange neck markings) - Holding: Present (penguin’s flipper grips umbrella handle) - Colorful umbrella: Present (rainbow panels: red, green, yellow, orange, blue, purple) All elements are clearly depicted and coherently combined. The penguin is anthropomorphically holding the umbrella in a plausible, stylized manner. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a medieval knight in full armor, including helmet, chainmail, and plate armor with a red cross emblem. The knight is seated at a wooden desk, typing on a modern Apple laptop. All described elements — knight, armor, desk, laptop — are present and coherently combined in a surreal but visually consistent scene. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains all required elements: 1. Roman gladiator - Present (wearing helmet with red crest, armor, and red cape) 2. Taking a selfie - Present (holding phone up, looking at screen) 3. Smartphone - Present (modern black smartphone with triple camera) 4. Colosseum - Present (ancient Roman amphitheater structure in background) All elements are coherently combined in a single scene with proper lighting and perspective. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains all required elements: 1. Astronaut in a spacesuit: Present (white suit with American flag patch, helmet, backpack). 2. Riding a bicycle: Present (blue bicycle with visible wheels, handlebars, pedals). 3. Through a park: Present (trees, grass, paved path, natural lighting). All elements are coherently combined in a single scene. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains all described elements: 1. Octopus - Present (central figure) 2. Top hat - Present (on octopus head) 3. Monocle - Present (over one eye with chain) 4. Chess game - Present (board with pieces) 5. Moon surface - Present (cratered ground) 6. Earth in background - Present (visible planet) All components are coherently combined in a single scene. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains all required elements: 1. Dinosaur - Present (T-Rex head/body) 2. Business suit - Present (dark suit, white shirt, tie) 3. PowerPoint presentation - Present (projector screen with "Povemfiont" and YouTube-like interface) 4. Modern office - Present (conference room, laptop, windows) 5. Drinking coffee - Present (holding white coffee cup) All elements are coherently combined in a single scene. The dinosaur is actively presenting while holding coffee, creating the intended humor
FAILEvaluator (Qwen 3.5 122B)
Let’s examine each element from the prompt: 1. Robot dressed as a pirate — ✅ Present (robot with pirate hat, vest, skull emblem). 2. Sailing a paper boat — ✅ Present (white origami-style boat on water). 3. On a lake of lava — ❌ Not present — it’s a calm blue lake with lily pads and trees, not lava. 4. Juggling three kittens — ⚠️ Partially present — three kittens are floating around the robot, but not being juggled; they’re mid-air in playful poses, not actively manipulated by the robot’s hands.
CountingClick to expand56%
Counting measures whether local/krea-2-turbo creates the exact number of requested objects without adding or missing items.
PASSEvaluator (Qwen 3.5 122B)
The image displays exactly three red apples with yellow streaks, placed on a wooden surface. The apples are arranged in a triangular formation. The count matches the prompt’s request for “exactly 3 red apples arranged on a wooden table.” No extra or missing objects are present. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows exactly five yellow rubber ducks arranged in a row on a bathroom shelf. The ducks are evenly spaced, facing right, with water droplets visible on them and the surrounding tiles. A soap dispenser is partially visible in the background. The count matches the prompt’s request precisely. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image displays four white candles standing upright on a dark surface. Each candle is unlit, with visible wicks and slight texture variations. The count matches the prompt’s request exactly: four candles, white color, standing position, dark background. No extra or missing objects. All criteria are met precisely. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows exactly 3 black dogs in the back row and 2 orange cats in the front, all sitting on grass. The count matches the prompt’s request precisely: 3 dogs and 2 cats. No extra or missing animals. Composition is clean and aligned with instructions. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image contains exactly 4 red roses and 2 white daisies arranged in a clear glass vase, placed on a patterned tablecloth. All requested objects are present and correctly counted. The composition matches the prompt precisely with no extra or missing elements. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows 1 blue butterfly and 2 yellow butterflies resting on a wooden garden fence with wire. The prompt requested 3 yellow butterflies, but only 2 are visible. Therefore, the count does not exactly match the request. VERDICT: FAIL
FAILEvaluator (Qwen 3.5 122B)
The image shows a round fishbowl containing goldfish. Counting the fish: there are 10 goldfish visible in the bowl, not 9 as requested. The prompt specified “Exactly 9 goldfish,” so the count does not match. All other elements (round bowl, swimming fish) are correct, but the numerical requirement is violated. VERDICT: FAIL
FAILEvaluator (Qwen 3.5 122B)
The image shows a fruit bowl with 3 bananas, 5 oranges, and 2 apples. The prompt requested exactly 3 bananas, 4 oranges, and 2 apples. While the banana and apple counts are correct, there is one extra orange. Therefore, the total count does not exactly match the request. VERDICT: FAIL
FAILEvaluator (Qwen 3.5 122B)
The image shows a cluster of colorful balloons against a blue sky. Counting carefully: there are 12 distinct balloons visible (yellow, blue, red, green, pink, orange, purple — some partially hidden). The prompt requested exactly 11. Since the count is 12, not 11, the requirement is not met. VERDICT: FAIL
NegationClick to expand56%
Negation measures whether local/krea-2-turbo respects what should be absent from the image, not only what should be present.
PASSEvaluator (Qwen 3.5 122B)
The image shows a tropical beach scene with white sand, turquoise water, palm trees, and a blue sky with clouds. There are footprints in the sand, but no people are visible anywhere in the frame — on the beach, in the water, or in the background foliage. The excluded element (“people”) is fully absent. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a cat lying on a couch. The cat has white and tabby fur, green eyes, and is positioned comfortably between textured pillows. Importantly, there is no collar visible around the cat’s neck — the prompt explicitly excluded “a collar,” and it is indeed absent. All other elements (cat, couch, lying pose) match the prompt’s inclusion criteria. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows an empty classroom with rows of wooden desks and chairs facing a green chalkboard. The chalkboard has faint chalk marks and some writing on the sides. There are no students visible in the room. The excluded element (students) is truly absent. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image clearly shows a sandy beach with palm trees, seashells in the foreground, and a beach umbrella. However, contrary to the exclusion criterion, there is visible water — specifically, turquoise ocean waves breaking along the shoreline in the midground. The ocean stretches across the horizon under a blue sky. Since the prompt explicitly forbade any water or ocean, and it is prominently present, the image fails the constraint. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image shows an office workspace with a wooden desk, a black office chair, a bookshelf filled with books and binders, and a lit table lamp. There is also a landline telephone on the desk. No computers, monitors, or screens are visible in the scene. The excluded elements are truly absent. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows a kitchen scene with pots, pans, a stove, and a cutting board as described. However, the prompt explicitly excluded “food,” and two glass jars on the counter contain visible food items: one with dark granules (possibly spices or coffee) and another with light-colored beans or grains. These are clearly food products. Therefore, the excluded element is present. VERDICT: FAIL
FAILEvaluator (Qwen 3.5 122B)
The image shows a pepperoni pizza with tomato sauce, basil leaves, and a crust. However, there is clearly melted cheese visible beneath the pepperoni slices and around the edges — it’s bubbly, golden in places, and integrated into the sauce layer. The prompt explicitly excluded cheese, but cheese is present. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image shows a silver car body with windows, doors, and headlights clearly visible. The wheel wells are empty — no wheels are present. The front bumper area is exposed, showing internal structure, and the rear wheel arch also lacks a wheel. All excluded elements (wheels) are absent as required. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows a close-up of a human face with clearly visible eyes, nose, and mouth. Importantly, eyebrows are present above both eyes — they are sparse but distinctly drawn or rendered with fine hairs. Since the prompt explicitly excluded eyebrows, and they are visibly present in the image, the exclusion criterion is violated. VERDICT: FAIL
Relative PositionClick to expand100%
Relative Position measures whether local/krea-2-turbo places objects in the requested spatial relationship, such as above, below, inside, or next to another object.
PASSEvaluator (Qwen 3.5 122B)
The image shows a cat sitting directly on top of a cardboard box. The cat is positioned centrally on the box’s upper surface, with its paws resting on the box and its body elevated above it. The spatial relationship “on top of” is accurately represented. No other positional descriptors (left, right, under, etc.) are contradicted. All described spatial relationships are correct. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a red ball positioned directly beneath the seat of a wooden chair, resting on the floor between the chair’s legs. The spatial relationship “underneath” is accurately depicted — the ball is below the chair’s seat and within its leg structure. All described positional relationships are correct. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a coffee mug positioned to the left and slightly behind an open book, both resting on a desk surface. A laptop is visible in the background to the right. The spatial relationship “coffee mug next to an open book on a desk” is accurately represented — the mug is adjacent to the book, and both are on the desk. No positional errors are present. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a bird perched on a branch that extends from a tree trunk on the left side of the frame. The branch reaches toward the right, above a park bench situated below and to the right of the bird. The spatial relationships described — “bird on branch,” “branch extending from tree,” and “tree next to bench” — are all accurately represented. The bench is beneath and adjacent to the branch’s extension. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a cat sitting on a pillow, and the pillow is placed on a wooden chair. The spatial relationships described in the prompt are accurately reflected in the image: the cat is on the pillow, and the pillow is on the chair. All positional relationships are correct. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a lit candle in a glass holder placed directly on top of a stack of three books. To the right of the books (from the viewer’s perspective) is a potted plant with green leaves in a speckled ceramic pot. The spatial relationships described — “candle on top of books” and “books next to potted plant” — are accurately represented. All positional descriptors align with the visual layout. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows five black chess pieces on a checkered board. The king is centered. The queen is to its left, the bishop to its right, the rook above, and the knight below — matching the prompt’s spatial description exactly. All positional relationships are correctly rendered. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a desk scene with a laptop centered, a coffee mug to its left, a phone to its right, a lamp positioned above and to the right of the laptop, and an open notebook with a pen below the laptop. All specified spatial relationships — left, right, above, below — are accurately represented as described in the prompt. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows five animals in a field arranged as described: a horse on the far left, a cow to its right, a sheep centered in front, a goat to the sheep’s right, and a dog seated on the far right. All positional relationships match the prompt exactly — left/right/center placements are accurate with no contradictions. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a red barn on the right side in the foreground, appearing large and dominant. A windmill stands to the left of the barn in the midground, smaller in scale. Snow-capped mountains rise behind both structures in the background, appearing distant and smaller due to perspective. The spatial arrangement matches the prompt: barn (foreground, large), windmill (midground, medium), mountains (background, small). All positional relationships are correct. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image correctly shows a sandcastle in the foreground, a surfer riding waves in the midground, and a cruise ship on the horizon in the background. The sizes of the objects decrease with distance as specified. All positional relationships (foreground, midground, background) are accurate and consistent with the prompt. No spatial errors detected. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image correctly shows pigeons and a park bench in the foreground, a fountain in the midground, and skyscrapers in the background. The spatial layering is accurate: pigeons are on and around the bench, the fountain sits behind the bench on grass, and buildings rise behind trees in the distance. All described positional relationships align with the prompt. VERDICT: PASS
Scale & ProportionsClick to expand78%
Scale & Proportions measures whether local/krea-2-turbo keeps people, vehicles, buildings, and everyday objects at believable relative sizes.
PASSEvaluator (Qwen 3.5 122B)
The person standing next to the bus appears to be of normal human proportions relative to the vehicle. The height of the person reaches approximately the middle of the bus door, which is consistent with real-world scale. The wheels, windows, and overall dimensions of the bus also appear proportionally correct for a standard school bus. No obvious distortions or scaling errors are present. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The ladybug appears appropriately sized relative to the leaf, consistent with real-world proportions. The leaf’s texture, veins, and dewdrops are detailed and scale-consistent with a macro shot of a small insect on foliage. No exaggerated or distorted sizing is evident. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a child’s torso and arm holding a basketball. The ball appears appropriately sized relative to the child’s hand and body — consistent with real-world proportions for a standard basketball held by a child. No anatomical or scale distortions are visible. The grip, arm angle, and ball placement all look natural and physically plausible. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a person standing on the rim of a large coffee cup, looking down into it. The person is depicted as tiny relative to the cup, which aligns with the prompt’s description of “a tiny person standing on the rim of a coffee cup, looking down into it.” The proportions are consistent with the intended surreal scale — the cup is oversized, and the figure is miniature in comparison. No contradictions in size relationships are present. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image depicts a tabby cat walking down a city street with skyscrapers in the background. However, the cat is not giant relative to the buildings — it appears normal-sized, roughly the size of a real house cat, while the buildings are full-scale skyscrapers. The perspective is low-angle, making the cat appear prominent, but not kaiju-sized. The prompt requires the cat to be “giant” and comparable in scale to the skyscrapers, which is not achieved here. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image shows a miniature house resting on an open human palm. The house is appropriately scaled to fit comfortably within the hand, with proportions that suggest it is a small model rather than a full-sized structure. The fingers and thumb frame the house naturally, and the perspective supports the intended size relationship. No anatomical or scaling inconsistencies are visible. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows five animals arranged in a line: elephant, dog, cat, mouse, and horse. However, the size relationships are incorrect. The dog is nearly as tall as the horse’s shoulder, and the cat is almost as large as the dog — both unrealistic. The mouse is also disproportionately large relative to the cat. Real-world proportions are not maintained. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image shows a table setting with a plate, fork, knife, wine glass, salt shaker, and peppercorn shaker. The relative sizes appear realistic: the plate is appropriately larger than the cutlery, the wine glass is proportionate to the plate, and the shakers are smaller than the plate but larger than the cutlery handles. All objects maintain consistent scale relationships typical of a real dining setup. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a farm scene with all requested elements: barn, tractor, farmer, cow, chicken, and fence post. Proportions are generally consistent — the tractor is appropriately large relative to the farmer, the cow is realistically sized compared to both, and the chicken is small in the foreground. The fence post and barn scale also align with real-world expectations. Minor stylization doesn’t break proportionality. VERDICT: PASS
Human realism 79%
Faces & ExpressionsClick to expand92%
Faces & Expressions measures whether local/krea-2-turbo creates realistic faces with the requested age, emotion, expression, and identity cues.
PASSEvaluator (Qwen 3 VL)
Face is largely symmetric with natural features. Smile appears warm and genuine, with subtle lip curvature and slight eye crinkling. No obvious distortions. Skin texture and lighting look authentic. The emotion of contentment is clearly readable. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Face is symmetric with natural, correctly proportioned features. Expression is calm and serious, with subtle lines conveying maturity. No visible distortions. Skin texture and lighting are realistic. The prompt’s requirements are fully met. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The face is symmetric with natural, correctly proportioned features. Eyes, nose, and mouth are aligned and realistic. The neutral, relaxed expression is clearly readable with no distortions. Skin texture, freckles, and hair look authentic. No AI artifacts or unnatural deformities are present. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The face shows clear disgust: nose wrinkled, lips curled, and brows furrowed. Features are symmetric and anatomically correct. Skin texture and fine details (moles, pores) are natural. No distortions or artifacts. Emotion is unmistakably readable. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Face is symmetric with natural features. Eyes are wide, eyebrows raised, mouth slightly open — clearly conveying genuine surprise. No distortions; expression is authentic and well-executed. Skin texture and freckles appear realistic. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Face is symmetric with realistic features. Eyes are downcast, lips slightly drawn, gaze distant — all aligning with the prompt. Skin texture and freckles add authenticity. No major distortions. Emotion is clearly readable as quiet melancholy. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The face shows natural asymmetry and deep wrinkles consistent with an elderly woman. White, textured hair frames the face. The warm smile and kind eyes are clearly visible, conveying genuine emotion. No significant distortions; features are anatomically plausible. The image successfully captures the requested expression and age. VERDICT: PASS
FAILEvaluator (Qwen 3 VL)
The child’s face is symmetric with round cheeks and large eyes, matching the prompt. However, there are no visible tears streaming down the face; the expression appears neutral or slightly sad, not clearly crying. Facial features are well-defined without distortions. The prompt’s key emotional element (crying) is missing. VERDICT: FAIL
PASSEvaluator (Qwen 3 VL)
Face is symmetric with realistic wrinkles, age spots, and thinning gray hair. Skeptical expression is clearly readable via furrowed brow and downturned mouth. No significant distortions. Features align with prompt. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Faces show clear, symmetric features with natural proportions. Emotions are vividly readable: left man laughs with head back, center woman looks shocked with wide eyes and hands to cheeks, right man smirks with rolled eyes. No distortions or unnatural artifacts. All three expressions match the prompt precisely. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Faces show natural symmetry and correct anatomical features. Emotions are clearly readable: father beams proudly, mother smiles gently, teenager looks bored (hand on cheek, neutral expression), child grins excitedly. No distortions or unnatural artifacts. All four individuals match the prompt’s emotional descriptions accurately. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Faces are symmetric with correct features. Left woman shows frustration (furrowed brows). Center man shows pleased expression (smile, gaze). Right woman shows confusion (tilted head, furrowed brow). Emotions are clearly readable. No distortions. VERDICT: PASS
Full BodyClick to expand92%
Full Body measures whether local/krea-2-turbo renders full-body people with natural anatomy, clothing, pose, and proportions.
PASSEvaluator (Qwen 3 VL)
The figure shows a full-body view of a woman standing upright with arms at her sides, wearing jeans and a white t-shirt. Limb count and proportions are anatomically correct. The pose is clearly recognizable as described. No distortions are visible. All elements match the prompt precisely. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The man stands straight, facing forward, with hands in pockets. Limb count and proportions are anatomically correct. The pose is clearly recognizable as described. No distortions are visible. All elements match the prompt precisely. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Limb count and proportions are anatomically correct. The pose is a neutral standing stance, recognizable as requested. No significant distortions are visible. The full body from head to feet is shown, though the head is cropped out. The park setting and clothing match the prompt. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The woman’s limbs and proportions are anatomically correct. Her running pose is dynamic and recognizable, with arms swinging and hair flowing mid-stride. No distortions in limb count or proportions are visible. Motion is convincingly rendered in both hair and clothing. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The man’s limbs are correctly counted and proportioned. His pose—bent knees, arms outstretched mid-air—accurately depicts jumping over a puddle, with water splashing realistically. No anatomical distortions are visible. The action is clearly recognizable and matches the prompt. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The dancer has correct limb count and proportions. The pose is a dynamic spin with one leg extended, recognizable as ballet. Dress flows outward as described. Slight distortion in the right arm’s bend and hair strands appear overly stylized, but core anatomy and action are accurate. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The person stands on one leg, with the other foot pressed against the inner thigh, arms raised overhead in prayer position. Limb count and proportions are correct. The pose is clearly recognizable as Tree Pose. No anatomical distortions are visible. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The climber has correct limb count and proportions. Her pose—arms extended to holds, legs positioned for support—is anatomically plausible and matches the prompt. No obvious distortions in limbs or joints. The action is clearly recognizable as rock climbing. Gear and holds appear realistic. VERDICT: PASS
FAILEvaluator (Qwen 3 VL)
The cellist has correct limb count and proportions. His posture is anatomically plausible for playing: seated, legs around the instrument, bow arm extended. The cello rests on a stand, not his legs, which contradicts the prompt. The pose is recognizable as cello playing, though the instrument’s placement is inaccurate. VERDICT: FAIL
PASSEvaluator (Qwen 3 VL)
The dancers’ limbs are correctly counted and proportioned. The woman is dipped backward with intertwined legs, matching the prompt. Anatomical structure is accurate, with no visible distortions. The pose is recognizable as a classic tango embrace with a dip. All elements align with the requested action. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Both players show correct limb count and proportions. The sliding player’s body posture and the jumping player’s mid-air pose are anatomically plausible and match the described action. No distortions are visible. The ball is at the feet of the jumping player, as prompted. The scene captures a realistic soccer tackle. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Both gymnasts show correct limb count and proportions. Their synchronized handstand pose is anatomically accurate, with aligned bodies, straight legs, and stable hand placement. No distortions are visible. The action is clearly recognizable as a synchronized handstand. All elements match the prompt precisely. VERDICT: PASS
HandsClick to expand42%
Hands measures whether local/krea-2-turbo draws anatomically plausible hands, fingers, and gestures.
PASSEvaluator (Qwen 3.5 122B)
Upon close examination: The left hand (viewer’s right) shows 5 fingers — thumb, index, middle, ring, and pinky — all present and anatomically plausible. The right hand (viewer’s left) also displays 5 fingers with natural spacing and relaxed posture. No extra or missing digits; no fusion or distortion. Grip/gesture matches “relaxed and naturally spread.” All criteria met. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows a woman with both hands extended forward, palms facing the viewer. Each hand clearly displays five fingers: thumb, index, middle, ring, and pinky. The fingers are well-formed, proportionate, and anatomically correct — no extra, missing, or fused digits. The gesture is natural and matches the prompt’s requirement of “palms-up, all ten fingers clearly visible.” No anomalies detected. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows a man with arms crossed. The right hand (on the left side of image) is not fully visible — only part of the forearm and elbow are seen; no fingers are visible. The left hand (on the right side) rests on the right upper arm, showing 4 fingers clearly (thumb is hidden or obscured). Anatomically, the grip is plausible for crossed arms, but finger count is incomplete: one hand has 0 visible fingers, the other has 4. Not exactly 5 per hand as required. VERDICT: FAIL
FAILEvaluator (Qwen 3.5 122B)
The image shows a right hand gripping a white ceramic mug by its handle. The thumb is positioned on top of the handle, and four fingers are wrapped around the front side of the handle — this is anatomically correct for a right-hand grip. However, only four fingers are visibly distinct; the pinky finger is either obscured or not fully rendered. While the grip itself is plausible, the incomplete visibility/rendering of all five fingers violates the “exactly 5 per hand” criterion. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The right hand gripping the pen shows 5 fingers with anatomically correct positioning for writing. The left hand resting on the notebook also displays 5 fingers in a natural, relaxed position. No extra or missing fingers are visible. The grip is realistic and the hands appear properly proportioned and positioned. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The left hand gripping the fretboard shows only four fingers clearly visible, with the thumb positioned behind the neck — anatomically plausible for guitar playing. However, the right hand (strumming hand) is partially obscured and appears to have only three visible fingers; the pinky and ring finger are either missing or merged unnaturally. This suggests an anatomical anomaly inconsistent with a natural human hand. The grip on the fretboard is mostly correct, but the strumming hand fails the “e
PASSEvaluator (Qwen 3.5 122B)
The image shows a single right hand making a peace sign. The index and middle fingers are extended upward, while the thumb, ring, and pinky fingers are curled inward — totaling 5 fingers visible on one hand. The gesture is anatomically correct for a right hand. No anomalies or extra/missing fingers are present. Since only one hand is shown and it meets the criteria, the core requirement is satisfied. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image shows one hand making a thumbs-up gesture. The thumb is extended upward, and the other four fingers are curled into a fist — this matches the prompt exactly. Anatomically, the hand appears correct: five digits total (one thumb + four fingers), proper joint alignment, natural skin texture, and no extra or missing fingers. No anomalies detected. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image shows one hand with all five fingers extended — thumb, index, middle, ring, and pinky — clearly visible. The prompt asked for “counting to three” (index, middle, ring extended), but instead, all five are up. Anatomically, the hand is correct: no extra or missing fingers, natural creases, proper proportions. However, it fails the prompt’s specific gesture requirement. VERDICT: FAIL
FAILEvaluator (Qwen 3.5 122B)
Upon close examination, the handshake shows anatomical inconsistencies. The hand on the left (wearing white cuff) appears to have only 4 visible fingers gripping, with the thumb obscured or merged. The hand on the right (blue cuff) also shows ambiguous finger count — possibly 4 fingers visible, with unclear separation between digits. Grip structure is unnatural; fingers seem fused or misaligned. Not all fingers are clearly present or correctly positioned. VERDICT: FAIL
FAILEvaluator (Qwen 3.5 122B)
The image shows two hands pressed together in a prayer-like or clapping gesture, not a high-five. The left hand (viewer’s left) clearly displays 5 fingers. The right hand (viewer’s right) also shows 5 fingers — thumb, index, middle, ring, and pinky — though the pinky is partially obscured by the left hand’s palm. Anatomically, the grip is plausible for a prayer position but incorrect for a high-five, which requires palms facing each other with fingers spread outward, not interlocked or pressed f
FAILEvaluator (Qwen 3.5 122B)
Upon close examination, the image shows four hands interacting with a gift box. The two hands on the left (receiving) appear anatomically correct with 5 fingers each. However, the two hands on theright (giving) show anomalies: the top right hand has only 4 visible fingers, and the bottom right hand’s thumb and index finger appear fused or malformed. Grip mechanics are also unnatural. These are clear anatomical errors. VERDICT: FAIL
Multi-SubjectClick to expand100%
Multi-Subject measures whether local/krea-2-turbo keeps multiple people distinct while preserving their requested attributes, outfits, and positions.
PASSEvaluator (Qwen 3 VL)
Left woman: short blonde hair, wearing red jacket over black top. Right woman: long black hair, wearing blue sleeveless dress. Both visually distinct and match prompt descriptions precisely. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Left man: beard, glasses, green sweater — matches description. Right man: clean-shaven, black hoodie — matches description. Both are visually distinct and clearly identifiable. All prompt features are accurately rendered. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Tall person: curly red hair, denim overalls over striped shirt. Short person: straight brown hair, yellow sundress. Both visually distinct and match descriptions. Features are accurately rendered. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
All four individuals are present and visually distinct, matching the prompt: bald man in suit, silver-haired woman in red blouse, young man with dreadlocks in denim jacket, woman in hijab with green dress. Features are accurately rendered. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Tall woman: pink short hair, playing bass. Man: beard, bandana, on drums. Slim man: glasses, playing keyboard. All three are visually distinct and match descriptions precisely. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
Man: wears sunhat and backpack. Woman: has braided hair, holds hiking poles, wears backpack. Teenager: wears baseball cap, holds water bottle, wears backpack. All three are visually distinct and match described features. VERDICT: PASS
Truthfulness 70%
PhotorealismClick to expand100%
Photorealism measures whether local/krea-2-turbo makes materials, surfaces, and real-world objects look visually plausible and distinct.
PASSEvaluator (Qwen 3.6 27B)
The image demonstrates high-fidelity rendering of all requested materials. The glass exhibits accurate refraction and transparency, distorting the background appropriately. The metal spoon shows sharp, realistic specular reflections. The wooden table has a convincing grain and matte finish, while the linen napkin displays a soft, woven texture with natural folds. The apple’s skin reflects light with a subtle sheen, indicating ripeness. All materials behave correctly under the lighting conditions
PASSEvaluator (Qwen 3.6 27B)
The image successfully renders the requested materials with high fidelity. The laptop exhibits a realistic brushed aluminum finish with appropriate matte reflections. The notebook displays a convincing, textured leather surface with natural creases. The ceramic mug shows a speckled, matte glaze, while the glass paperweight demonstrates accurate refraction and transparency. Each material is visually distinct and physically plausible. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The image accurately renders the distinct material properties requested. The stainless steel pot shows sharp, distorted reflections typical of polished metal. The wooden cutting board displays natural grain and diffuse light scattering. The glass bottle exhibits correct translucency and refraction of the oil inside, along with specular highlights on the surface. The cotton towel presents a soft, matte texture with appropriate light absorption. All materials behave physically correctly under the
Physics & ReflectionsClick to expand83%
Physics & Reflections measures whether local/krea-2-turbo follows basic physical cues such as shadows, reflections, gravity, transparency, and contact points.
PASSEvaluator (Qwen 3.6 27B)
The image displays a red sphere on a white surface. The lighting originates from the upper left, evidenced by the bright specular highlight on the sphere's upper-left quadrant. Consequently, a soft shadow is cast to the right and slightly downward, which is physically consistent with the light source direction. The shadow's shape and gradient are natural, and the reflection on the sphere aligns with the lighting setup. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The shadow is cast to the left, consistent with light coming from the right (the window). The shadow’s shape accurately mirrors the mug and handle, with appropriate softness at the edges due to the light source’s size. The reflection in the window glass is faint but plausible, showing a reversed image of the mug. No distortions or color anomalies are present. All elements align with real-world physics. VERDICT: PASS
FAILEvaluator (Qwen 3.6 27B)
The shadow is cast to the right and slightly behind the subject, which contradicts the prompt's requirement for a shadow "directly below them" (indicating overhead midday sun). Additionally, the shadow is quite long and distinct, suggesting a lower sun angle rather than midday. The shadow’s shape also appears somewhat distorted and disconnected from the feet, lacking natural grounding. These inconsistencies make the shadow physically implausible for the described lighting condition. VERDICT: FA
PASSEvaluator (Qwen 3.6 27B)
The reflection in the mirror is physically plausible. It correctly shows the front of the woman's outfit (which is the back of her actual body facing the mirror), maintaining proper orientation and perspective. The lighting and shadows are consistent, with no distortions or color mismatches. The reflection accurately mirrors the scene, adhering to the laws of physics for reflections. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The reflection is vertically inverted and positioned directly beneath the cat, consistent with a polished floor. The orientation matches the subject’s pose, and the slight blurring and reduced contrast in the reflection are physically accurate for such a surface. Colors and lighting align with the real cat, and no directional inconsistencies are present. The marble texture and sheen support the plausibility of the reflection. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The reflection is physically plausible. It is correctly oriented upside down, directly beneath the vase, and exhibits appropriate distortion and slight blurring consistent with a glass surface. The colors and details of the flowers and stems are mirrored accurately, and the lighting conditions support the presence of such a reflection. The shadows cast by the vase and flowers are also consistent with the light source direction. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The reflections are vertically inverted and aligned with the tree trunks, which is physically correct. The ripples distort the reflection appropriately for a calm but not perfectly still surface. The reflection appears slightly darker than the actual trees, consistent with light absorption in water. The warm sunset tones are preserved in the reflection, matching the sky’s gradient. No directional or orientation errors are present. VERDICT: PASS
FAILEvaluator (Qwen 3.6 27B)
The image depicts a city skyline reflected in a puddle, but the reflection is inverted vertically relative to the scene (the sky is at the bottom, buildings point down). This is physically impossible; a reflection in a horizontal surface like a puddle should mirror the scene upright relative to the horizon line, not flip it upside down. The orientation is fundamentally incorrect. VERDICT: FAIL
PASSEvaluator (Qwen 3.6 27B)
The reflection is physically plausible. It is correctly inverted vertically, matching the mountain range’s orientation. The colors are subtly muted and darker in the water, consistent with light absorption and scattering in real alpine lakes. Minor ripples cause slight distortion, which is realistic for a still but not perfectly glassy surface. Shadows align with the lighting direction on the mountains. No directional or color inconsistencies are present. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The image demonstrates high physical plausibility. The glass sphere correctly refracts the background scene (inverting it), while the mirror surface provides a sharp, accurate reflection of the sphere itself. The lighting is consistent, with specular highlights matching the light source direction on both the sphere and its reflection. The water in the background shows appropriate bokeh and reflections. The interaction between the sphere, mirror, and environment is coherent and realistic. VERDIC
PASSEvaluator (Qwen 3.6 27B)
The reflections are physically plausible. The chrome sculpture accurately mirrors the surrounding environment, including the windows and the mirrored wall, with appropriate distortion based on its curved surface. The glass floor creates a clear, inverted reflection of the sculpture and the room, consistent with the light sources from the windows. The shadows and highlights align with the natural lighting coming from the left, enhancing the realism of the scene. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The reflections and refractions are physically plausible. The wine glass shows realistic distortion of the background through its curved, faceted surface. The reflection on the polished marble is sharp and correctly oriented, mirroring the glass’s base and stem. The lake outside displays natural water reflections with appropriate softness and color blending. Shadows align with the light source from the window. All elements maintain consistent direction, orientation, and optical behavior. VERDIC
World KnowledgeClick to expand50%
World Knowledge measures whether local/krea-2-turbo depicts recognizable places, landmarks, flags, and known objects with accurate real-world details.
PASSEvaluator (Qwen 3.6 27B)
The image depicts the Eiffel Tower with high architectural accuracy. The iconic iron lattice structure, the four arched legs, and the three distinct observation levels are clearly visible and correctly proportioned. The perspective is a low-angle shot looking up, which is common for this landmark. The background shows a clear blue sky and distant buildings consistent with the Parisian skyline (likely the Palais de Chaillot). There are no obvious AI artifacts or structural distortions. VERDICT:
PASSEvaluator (Qwen 3.6 27B)
The image depicts the Taj Mahal with high architectural accuracy, including the central dome, four minarets, and symmetrical layout. The reflecting pool is correctly positioned in the foreground, enhancing the symmetry. The surrounding gardens and cypress trees are also accurately represented. The lighting and shadows appear natural, contributing to the realism of the scene. Overall, the image meets the criteria for recognizability and factual accuracy. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The Statue of Liberty is clearly recognizable and accurately depicted with correct proportions, green patina, torch, crown, and tablet. The pedestal and base architecture align with real-world structure. Minor crowd details at the base are plausible. No significant factual or architectural inaccuracies detected. VERDICT: PASS
FAILEvaluator (Qwen 3.6 27B)
The image depicts a shamisen with a bachi, but the instrument is structurally inaccurate. The body resembles a banjo with a drumhead and rim rather than the traditional wooden or lacquered body of a shamisen. The bridge is a simple bar instead of the traditional curved "kama," and the strings appear to have four tuning pegs rather than the standard three. These deviations make it factually inaccurate. VERDICT: FAIL
FAILEvaluator (Qwen 3.6 27B)
The image depicts a highly detailed brass astrolabe with recognizable components like the rete, tympan, and alidade. However, the engraved text and numbers are nonsensical gibberish, a common artifact in AI-generated images. While the general form is correct, the lack of accurate, legible inscriptions and degree markings makes it factually inaccurate as a functional scientific instrument. VERDICT: FAIL
PASSEvaluator (Qwen 3.6 27B)
The image depicts a ceramic bowl featuring the distinct aesthetic of kintsugi, with visible cracks repaired using gold lacquer. The bowl displays a traditional Japanese style with blue floral patterns and crackle glaze. The gold lines accurately represent the repair technique described in the prompt. The subject is clearly recognizable and factually accurate to the description of a Japanese kintsugi bowl. VERDICT: PASS
FAILEvaluator (Qwen 3.6 27B)
The image depicts a stylized, cartoon-like illustration rather than a detailed scientific diagram. While it shows the water cycle components (evaporation, condensation, precipitation, collection), the artistic style is inconsistent with a "detailed diagram." Furthermore, the visual logic is flawed: the "evaporation" is depicted as a giant splash of bubbles shooting directly into a cloud, which is not factually accurate. The river flows into a waterfall that seems to disappear into the ground rat
FAILEvaluator (Qwen 3.6 27B)
The image fails to depict a convergent boundary with subduction. Instead, it shows a divergent boundary or rift zone where plates are pulling apart, evidenced by the central gap and upward magma flow. There is no subducting plate, trench, or volcanic arc as requested. The "layered mantle" is stylized with rainbow colors and textures that resemble clay or cake rather than geological strata. The core subject matter is fundamentally incorrect. VERDICT: FAIL
FAILEvaluator (Qwen 3.6 27B)
The image depicts a human heart with recognizable major structures like the aorta, pulmonary arteries, and chambers. However, it contains significant anatomical inaccuracies. The coronary arteries are depicted as massive, thick blue and red vessels running superficially across the entire ventricular surface, which is incorrect; they are much smaller and follow specific grooves. Additionally, the internal valve structures appear simplified and somewhat distorted, and the spatial relationship of t
FAILEvaluator (Qwen 3.6 27B)
The image depicts a butterfly with the characteristic iridescent blue dorsal wings and brown ventral wings with eyespots of the Morpho genus. However, the subject is anatomically impossible: it displays both the dorsal and ventral wing surfaces simultaneously in a flat, 2D projection. Real butterflies cannot show both sides at once; this is a composite or AI hallucination of anatomy. Therefore, it fails the "real-world subject" and "correct body morphology" criteria. VERDICT: FAIL
PASSEvaluator (Qwen 3.6 27B)
The image accurately depicts the Hagia Sophia’s interior, including the massive central dome, pendentives with angelic figures, semi-domes, and large Islamic calligraphy medallions. The architectural layout, lighting, and decorative elements align with known photographs of the site. Minor stylization in mosaic patterns does not detract from factual accuracy. VERDICT: PASS
PASSEvaluator (Qwen 3.6 27B)
The image depicts a bismuth crystal specimen with high fidelity. It accurately captures the characteristic geometric hopper crystal formation, displaying the distinct staircase-like terraces and sharp, angular edges typical of bismuth. The iridescent oxide layers are rendered with realistic coloration, showing the expected spectrum of yellows, blues, and purples caused by thin-film interference. The metallic luster and rough matrix base are also consistent with real-world samples. VERDICT: PASS
Professional Studio 82%
Camera & LightingClick to expand83%
Camera & Lighting measures whether local/krea-2-turbo follows specific lens, angle, focus, exposure, and lighting instructions.
PASSEvaluator (Gemma 4 26B)
The image successfully implements all requested elements. It features a wide-angle perspective that captures the expansive living room. There is a clear and noticeable barrel distortion at the edges (visible in the darkening and curving of the corners), which matches the prompt perfectly. The lighting is layered, using both natural light from the window and warm artificial light from the recessed ceiling and chandelier to create a modern interior feel. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image features a close-up macro shot of a rose petal with multiple water droplets. The perspective is a macro view, and the depth of field is extremely shallow, with the central droplet in sharp focus and the surrounding areas heavily blurred (bokeh). The lighting is bright and naturalistic, highlighting the translucency of the droplets. All elements requested in the prompt are accurately represented. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image perfectly matches the prompt's requirements. It features a true aerial bird's-eye view, looking straight down at a traffic roundabout. The perspective is top-down, with a deep depth of field that keeps the entire scene—from the center greenery to the surrounding roads and trees—in sharp focus. The lighting is even and consistent across the scene, typical of a high-altitude overhead shot. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image successfully implements the requested lighting. There is a clear light source from the side, creating a deep shadow on one side of the face. Crucially, a distinct triangle of light is visible on the shadowed cheek, which is the hallmark of Rembrandt lighting. The atmosphere is moody and dark, and the depth of field is appropriate for a portrait. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image perfectly matches the prompt's requirements. The lighting is a classic golden hour scene, featuring a bright, low sun that creates a warm orange sky and a soft glow across the landscape. The long, dark shadows cast by the trees across the field are highly prominent and consistent with a low sun angle. The perspective and depth of field are appropriate for a landscape photograph. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image perfectly matches the prompt. It features a high-key portrait with a bright, white, seamless background. The lighting is soft, even, and airy, creating a luminous effect with minimal shadows on the subject's face. The perspective is a standard eye-level portrait, and the depth of field is shallow, keeping the focus entirely on the person's facial features. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image perfectly matches the prompt's technical requirements. The depth of field is shallow, creating a creamy bokeh background with large, soft light orbs. The lighting features a distinct warm rim light (backlighting) that creates a glowing outline around the subject's hair and shoulders, effectively separating her from the blurred garden background. The perspective is a classic 85mm portrait shot. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image successfully implements the requested technical specifications. The subject is sharp and in focus, while the background features a shallow depth of field consistent with a 50mm f/1.8 lens. The background displays soft, circular bokeh from city lights (bokeh effect), and the lighting pattern accurately reflects a street scene at dusk or night with various light sources creating a pleasing depth. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image matches all the specified criteria. The perspective is a top-down shot (shot from above). The depth of field is shallow, consistent with an f/2.8 aperture, as the edges of the bowl and the wooden surface are slightly out of focus. The lighting clearly originates from the top-left (window side), creating a natural light pattern with a warm color temperature and a subtle vignette effect. VERDICT: PASS
FAILEvaluator (Gemma 4 26B)
The image fails to capture the "tilt-shift" effect described. A true tilt-shift photograph uses a very shallow depth of field to create a selective focus strip, making the scene look like a miniature model. In this image, the focus is relatively deep; the foreground (crosswalk/pedestrians) and the mid-ground (cars) are both sharp, and there is no characteristic blur at the top and bottom of the frame to simulate a miniature scale. VERDICT: FAIL
FAILEvaluator (Gemma 4 26B)
The image features a dark, moody scene with a cinematic feel. It successfully incorporates the requested blue and orange light flares. However, the flares are not strictly "anamorphic lens flares streaking horizontally." Instead, the light patterns are chaotic, featuring many diagonal rays, starburst effects, and non-horizontal streaks that obscure the subject's face. While the colors are correct, the specific horizontal anamorphic characteristic is not accurately captured. VERDICT: FAIL
PASSEvaluator (Gemma 4 26B)
The image perfectly matches the prompt's requirements. The lighting is a textbook example of dramatic split lighting: the left side of the subject's face is brightly illuminated, while the right side is cast in deep, high-contrast shadow. This creates a sharp vertical division down the center of the face, achieving the requested chiaroscuro effect. The perspective is a standard portrait, and the lighting pattern is executed precisely. VERDICT: PASS
Color PrecisionClick to expand92%
Color Precision measures whether local/krea-2-turbo renders requested colors clearly and consistently across subjects, products, and backgrounds.
PASSEvaluator (Gemma 4 26B)
The image features a bright, vivid red sports car. The red color is intense, saturated, and unambiguous, exactly as described. The car is parked on a gray asphalt road, which is also clearly visible. The colors match the prompt's specific requirements perfectly. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image features a woman wearing a deep, vibrant blue dress that accurately matches the description of "royal blue." The background is a solid, muted beige/tan color, which qualifies as a "neutral background." All color requirements specified in the prompt are met. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image perfectly matches the color descriptions provided in the prompt. The sunflowers exhibit a vibrant, "pure sunflower yellow" color, and the background features a "clear blue sky." There are no deviations from the specified color palette. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image features a bedroom wall painted in a vibrant burnt orange color, which closely aligns with the hex code #FF6B35 and the description "burnt orange." The furniture (bedding and lamp) is white, and the scene is illuminated by natural daylight from a nearby window. All color requirements specified in the prompt are accurately represented. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The foliage in the image consists of various shades of green, ranging from light olive to a darker forest green. The hex code #2D5F2D represents a specific, deep, dark muted green. While the image contains dark green tones, it also includes much lighter, yellowish-green hues that deviate from the strict color requirement. However, the core instruction of "dark muted green" is generally represented in the darker areas of the canopy and ground. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image features walls in a medium blue color that closely matches the description of #4A90D9. The floor is a warm, natural wood tone, and the window trim and baseboards are white, as specified. All color requirements from the prompt are accurately represented in the generated image. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image accurately reflects the requested colors. The walls are a warm cream/beige (#F5E6CC), the sofa is a dark navy blue (#2C3E50), and the accent pillows are a bright red (#E74C3C). All color requirements from the prompt are met precisely. VERDICT: PASS
FAILEvaluator (Gemma 4 26B)
The background is a deep dark blue, consistent with #1A1A2E. The earbuds are pure white (#FFFFFF). However, the watch is not a "soft gold" (#F0C27F); instead, it is a pale, yellowish-beige or tan color. Furthermore, the AI incorrectly rendered the hex code "#F0C27F" as text directly onto the watch face, which is a failure to follow the color instruction and a hallucination of the text. VERDICT: FAIL
PASSEvaluator (Gemma 4 26B)
The image accurately reflects the requested color palette. The background is a near-white color consistent with #FAFAFA. The large main shape is a vibrant orange-red, matching the #FF4500 specification. The text and the horizontal accent line are a dark charcoal color, consistent with #333333. All specified hex values and color descriptions are correctly represented in the composition. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image shows a smooth gradient transitioning from a bright red on the left to a deep blue on the right. The middle section features a purple transition, as requested. The colors align with the specified hex values: the left side is a pure red (#FF0000) and the right side is a pure blue (#0000FF), with an even transition through purple. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image displays a vertical gradient. The bottom color is a vibrant orange, which aligns well with the specified #FF6B35 (orange). The top color is a very dark, deep indigo/navy, which matches the description of #1B1464 (deep indigo). The transition between the two colors is smooth and follows the requested color scheme accurately. VERDICT: PASS
PASSEvaluator (Gemma 4 26B)
The image shows a horizontal banner with a gradient. On the left side, the color is a bright cyan, which matches the hex code #00C9FF. On the right side, the color is a light mint green, which matches the hex code #92FE9D. The gradient transitions smoothly between these two specific colors as requested. VERDICT: PASS
PhotorealismClick to expand33%
Photorealism measures whether local/krea-2-turbo produces polished product and studio imagery with realistic materials, lighting, shadows, and finish.
FAILEvaluator (Gemma 4 26B)
The image fails to pass as a real photograph due to several glaring AI artifacts. The laces are physically impossible, merging into the shoe's upper and lacking coherent structure. The eyelets are distorted and uneven. Additionally, the texture of the leather is inconsistent, and the shadow is unnaturally dark and sharp for "soft studio lighting." The overall geometry of the shoe is warped and nonsensical upon close inspection. VERDICT: FAIL
FAILEvaluator (Gemma 4 26B)
The image fails to meet the prompt's requirements. The background is a light gray/white gradient, not a studio setting that matches the "reflective black surface" context effectively. Most importantly, the glass and liquid physics are highly flawed: the straw/dip tube is nonsensical, bending through the liquid in an impossible way, and the internal reflections and refraction are inconsistent and distorted. It does not look like a real photograph. VERDICT: FAIL
PASSEvaluator (Gemma 4 26B)
The image successfully captures the requested studio aesthetic. The green apple features realistic textures, including natural-looking lenticels and a subtle sheen. The lighting is soft, creating a gentle highlight on the side and a realistic, soft shadow on the white surface. There are no obvious AI artifacts, such as warped geometry or unnatural textures. It looks like a high-quality, professional catalog photograph. VERDICT: PASS
Graphical design 54%
Data VisualisationClick to expand0%
Data Visualisation measures whether local/krea-2-turbo produces charts with correct labels, proportions, colors, and readable values.
FAILEvaluator (Qwen 3 VL)
The chart shows a bar chart titled "Monthly Sales 2025" with color-coded bars. Values 12K, 18K, 22K, 19K, 25K are visible, but the x-axis labels are incorrect (shows "18K", "15K", "June" twice) and the y-axis lacks proper scale labeling. The chart type is correct, but critical labeling errors invalidate the visualization. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
The chart is a pie chart with correct percentages and labels: Company A (35%), Company B (25%), Company C (20%), Others (20%). However, "Company B" is incorrectly labeled twice — once as 25% and once as 20%. The visual elements are readable and distinct. The chart type matches the request. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
The chart shows a line graph titled "Weekly Traffic" with axes labeled "Mon-Sun" and "12on-S000". Data points match the requested values (1200, 1500, 1100, 1800, 2000, 1700, 1400), though the Y-axis label is garbled. Grid lines are present, but axis labels are misaligned and unreadable. The chart type is correct, but critical labeling errors prevent accurate interpretation. VERDICT: FAIL
Layout & DesignClick to expand33%
Layout & Design measures whether local/krea-2-turbo organizes posters, menus, covers, and other designed compositions with the requested hierarchy and elements.
PASSEvaluator (Qwen 3 VL)
The image features a large popcorn bucket centered, with "MOVIE NIGHT" in bold at the top and "Every Friday at 8 PM" at the bottom in a banner. Text is readable and correctly placed per the prompt. Visual hierarchy is clear, with the title prominent, popcorn central, and tagline anchored below. All required layout elements are present and accurately rendered. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The image features a guitar illustration, the headline "LIVE MUSIC FEST" in large, prominent text above it, and the tagline "Tickets available now" at the bottom. Text is readable and correctly placed with clear visual hierarchy. All required layout elements are present and properly positioned. VERDICT: PASS
PASSEvaluator (Qwen 3 VL)
The poster features an open book illustration centered below the title "JOIN THE BOOK CLUB," which is prominently displayed at the top in large, bold text. The tagline "Meets every Wednesday" is clearly visible at the bottom. Visual hierarchy is correct, with title > illustration > tagline. All text is readable and correctly placed per the prompt. VERDICT: PASS
FAILEvaluator (Qwen 3 VL)
The image shows a hero header, headline/subtitle, “Sign Up” button, and footer with social icons — all present with correct visual hierarchy. However, the headline and subtitle text is gibberish and unreadable, failing the text readability requirement. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
The image shows a banner header, a two-column layout (text left, image right), a quote block, and a footer with a link. However, all text is gibberish and unreadable, violating the requirement for readable, correctly placed text. The visual hierarchy is present but meaningless due to illegible content. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
The image shows a menu with a logo header (“RESTANTERS”), three category sections (Appetizers, Mains, Desserts) with items and prices, and a footer with address/hours. However, all text is gibberish, unreadable, and incorrectly placed. The layout structure is present but the content is nonsensical and unrecognizable. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
The left page features a large hero photo as required. The right page shows a pull quote, two columns of text, and a sidebar with author bio, all with correct visual hierarchy. However, all text is gibberish and unreadable, violating the requirement for readable content. Layout structure is present but text fails functional criteria. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
The image shows a mobile app UI with a top navigation bar (status bar + user profile), search field, 2x2 grid of feature cards with icons, “Recent Activity” list, and bottom tab bar. Text is present but heavily distorted and unreadable. Visual hierarchy is mostly correct, but illegible text violates core usability requirements. VERDICT: FAIL
FAILEvaluator (Qwen 3 VL)
The image shows a full-bleed left-page photo and a right page with three-column layout, headline, body copy, and a smaller inset image. Page numbers “206” are visible. However, all text is gibberish, unreadable, and incorrectly placed, violating the prompt’s requirement for readable, correctly positioned text. Layout structure is present but text fails core criteria. VERDICT: FAIL
Style DiversityClick to expand83%
Style Diversity measures whether local/krea-2-turbo can shift between visual styles like oil painting, pixel art, watercolor, and technical illustration.
PASSEvaluator (Qwen 3.5 122B)
The image clearly depicts a golden retriever in a garden setting. The style is unmistakably that of an oil painting, characterized by thick, visible brushstrokes (impasto effect) throughout the fur, foliage, and background. The texture is rich and painterly, with distinct dabs of color creating form and light. All key visual characteristics requested are present and well-executed. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image clearly exhibits Japanese anime style characteristics: flat color application with minimal gradients, bold black outlines defining the subject, and large expressive eyes with simple highlights. The golden retriever is depicted with stylized features including simplified fur texture and exaggerated facial expression. The garden background uses cel-shading techniques typical of anime. All requested visual elements are present and recognizable. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image depicts a golden retriever in a garden setting, matching the subject matter. However, the requested style specifies a "32x32 grid" and "pixel art style." The image is clearly rendered as a mosaic or cross-stitch pattern with visible square tiles, not pixel art. Additionally, the grid resolution appears significantly higher than 32x32 pixels. The key visual characteristics of the requested style are absent. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image strongly exhibits Art Deco characteristics: geometric building facades with stepped silhouettes, ornamental gold-and-black patterns on foreground structures, symmetrical street layout, and a luxurious metallic palette. The stylized figures, vehicles, and glowing orbs reinforce the era’s aesthetic. While not perfectly symmetrical down the center, the overall composition and decorative motifs clearly align with the requested style. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image clearly exhibits Impressionist characteristics: visible, soft brushstrokes throughout the composition, dappled light effects especially on the parasol and dress, and a predominantly pastel palette with blues, pinks, and yellows. The subject matter — a woman with a parasol — aligns with classic Impressionist themes. While the rendering is somewhat stylized or AI-interpreted, the core visual hallmarks of the requested style are unmistakably present. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image clearly exhibits the requested Pop Art style. It features bold black outlines defining the subject, flat vivid primary colors in the background (red and yellow) and clothing (blue), and the characteristic Ben-Day dot pattern used for shading on the face. All key visual characteristics specified in the prompt are present and clearly recognizable. VERDICT: PASS
FAILEvaluator (Qwen 3.5 122B)
The image clearly exhibits watercolor characteristics: visible paper texture, pigment bleeds, and translucent layering. However, it fails the "photorealistic" criterion. The rendering is stylized and illustrative rather than photo-realistic; the lighting lacks photographic fidelity, and the details are painterly rather than sharp and true-to-life. While the watercolor style is present, the core requirement of photorealism is not met. VERDICT: FAIL
PASSEvaluator (Qwen 3.5 122B)
The image successfully combines cyberpunk and Art Nouveau elements. The futuristic city, neon colors, and tech elements (headphones, jacket patches) fulfill the cyberpunk aspect. The flowing organic lines and prominent floral borders framing the scene clearly represent the Art Nouveau style. The fusion is visually coherent and matches the prompt’s requirements. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image clearly exhibits the requested hybrid style. The castle, trees, and clouds feature the bold outlines, flat color areas, and textured patterns characteristic of Japanese ukiyo-e woodblock prints. Simultaneously, the sky displays a gradient and lighting effect that mimics a photorealistic sunset or twilight, with subtle noise/grain suggesting a photographic quality. The combination is distinct and successfully executed. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image perfectly embodies "soft brutalism." The raw, textured concrete walls and ceiling are prominent, establishing the brutalist foundation. This is contrasted by the rounded, pastel-colored furniture (pink, yellow, mint) which adds the required softness. The lighting is warm and diffused, casting gentle shadows that enhance the textures without harshness. All key visual characteristics requested are clearly present and well-executed. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image displays a still life with a sparse composition featuring a single fruit and a dark bottle against a textured, off-white background. The color palette is muted, dominated by grays, beiges, and the dark vessel. The brushwork is visible and somewhat agitated, contributing to a sense of visual tension. The negative space is prominent and creates an uneasy, minimalist atmosphere consistent with the "anxious minimalism" description. VERDICT: PASS
PASSEvaluator (Qwen 3.5 122B)
The image successfully captures the requested nostalgic futurism style. The color palette features a distinct warm, yellowish-green patina that evokes aged film or vintage prints, matching the "melancholic warm-toned" requirement. The architecture blends recognizable modern skyscrapers with the retro-futuristic Space Needle, creating a 1960s space-age aesthetic. The overall texture and lighting effectively convey a sense of retro nostalgia. VERDICT: PASS