ImageBench V1 —

192 evaluations across 6 categories

Benchmark V1 verdicts are produced by VLM judges and can contain mistakes. Treat PASS/FAIL labels as machine-assisted assessments, and inspect the images yourself. Learn more about the methodology.

138

Pass

Fail

72%

Pass Rate

493.0s

Avg Latency

487.9s

Min Latency

577.1s

Max Latency

Text Rendering › Typography Style › Easylocal/krea-2-raw

Prompt: The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink

Text Rendering 80%

Typography StyleClick to expand100%

Typography Style measures whether local/krea-2-raw preserves requested lettering styles, font mood, and text texture such as typewriter ink or neon signage.

Easy 100%

PASS

The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink

Evaluator (Qwen 3 VL)

Text reads “CHAPTER ONE” correctly. Font mimics vintage typewriter style with uneven, slightly smudged ink strokes and distressed edges. Background shows aged paper texture. All elements align with prompt’s medium and spelling requirements. VERDICT: PASS

PASS

The phrase 'TODAY'S SPECIAL' written in white chalk on a dark green chalkboard with chalk dust

Evaluator (Qwen 3 VL)

Text reads "TODAY'S SPECIAL" correctly. Letters show convincing chalk texture and smudges on dark green board, matching the "white chalk on chalkboard" prompt. Chalk dust effect is implied by texture. No obvious artifacts or mismatched medium. VERDICT: PASS

PASS

The word 'OPEN' displayed on a red LED sign with glowing pixel-style letters

Evaluator (Qwen 3 VL)

The word "OPEN" is rendered in glowing red pixel-style letters, matching an LED display. Letters are clearly formed with dot-matrix precision, and spelling is correct. The style convincingly emulates a classic LED sign with bright, individual pixels and ambient glow. VERDICT: PASS

Writing accuracyClick to expand75%

Writing accuracy measures whether local/krea-2-raw renders the exact requested words, numbers, labels, and short phrases without spelling or layout errors.

Easy 100%

PASS

A single large word 'HELLO' centered on a white background

Evaluator (Qwen 3 VL)

The word "HELLO" is fully spelled correctly and is perfectly legible against the white background. All five letters are intact, clear, and properly formed. No misspellings, missing letters, or illegible parts are present. The text matches the intended specification exactly. VERDICT: PASS

PASS

The word 'ADVENTURE' in big bold letters centered on a plain cream background

Evaluator (Qwen 3 VL)

The word "ADVENTURE" is spelled correctly and fully readable in bold, centered text on a plain cream background. All letters are distinct and legible. No misspellings or missing words. The visual matches the intended specification exactly. VERDICT: PASS

PASS

The word 'SERENITY' written large and centered on a solid light blue background

Evaluator (Qwen 3 VL)

The word "SERENITY" is spelled correctly and is fully readable, centered on a solid light blue background. All letters are distinct and legible. No misspellings or missing parts. The text matches the intended specification exactly. VERDICT: PASS

Medium 100%

PASS

The sentence 'The quick brown fox jumps high' in clean black text on a white background

Evaluator (Qwen 3 VL)

Text reads: "The quick brown fox jumps high". All words are spelled correctly and fully readable against the white background. No misspellings or illegible parts. Matches the intended text exactly. VERDICT: PASS

PASS

'Every sunset brings a new dawn ahead' written neatly on a pale yellow background

Evaluator (Qwen 3 VL)

Text reads: "Every sunset brings a new dawn ahead". All words are spelled correctly and fully legible against the pale yellow background. No misspellings or missing words detected. Font is clear and contrast is sufficient for readability. VERDICT: PASS

PASS

'Stars shine brightest in the dark night' displayed on a clean off-white background

Evaluator (Qwen 3 VL)

Text reads: "Stars shine brightest in the dark night" — matches intended phrase exactly. All words are spelled correctly and fully legible against the clean off-white background. No misspellings or missing words detected. VERDICT: PASS

Hard 100%

PASS

A poster with headline 'SUMMER SALE' in large bold text, body text 'Up to 50% off all items this weekend only' in medium size, and fine print 'Terms and conditions apply. See store for details.' at the bottom in small text

Evaluator (Qwen 3 VL)

All text is spelled correctly and fully readable. Headline: "SUMMER SALE". Body: "Up to 50% off all items this weekend only". Fine print: "Terms and conditions apply. See store for details." No misspellings or illegible parts detected. VERDICT: PASS

PASS

A flyer with large title 'GRAND OPENING', subtitle 'Join us for food, music, and fun' in medium text, and small disclaimer 'RSVP required. Limited seating available.' at the bottom

Evaluator (Qwen 3 VL)

All text matches the intended message exactly. "GRAND OPENING" is correct. Subtitle "Join us for food, music, and fun" is fully readable. Disclaimer "RSVP required. Limited seating available." is accurate. No misspellings, missing words, or illegible parts detected. Text is clear and correctly rendered. VERDICT: PASS

PASS

An event banner with headline 'ANNUAL GALA 2025' at the top in large serif font, description 'An evening of fine dining and live jazz' in medium text, and footnote 'Black tie optional. Doors open at 6 PM.' in small text

Evaluator (Qwen 3 VL)

All text matches the intended content exactly: "ANNUAL GALA 2025", "An evening of fine dining and live jazz", and "Black tie optional. Doors open at 6 PM." Every word is spelled correctly and fully readable in all three text blocks. No misspellings or illegible parts detected. VERDICT: PASS

Extreme 0%

FAIL

A vintage movie poster for 'THE LAST VOYAGE' featuring a tagline 'One ship. One chance. No turning back.' with credits block reading 'Directed by James Harlow. Starring Elena Voss, Marcus Chen, and Sofia Reyes. Music by Thomas Grant. A Paramount Pict

Evaluator (Qwen 3 VL)

The tagline and title are correct. Credits block has multiple misspellings and garbled text (e.g., “Ma aramouns GAETV V”, “Thiula GTrisicı”). “Music by Thomas Grant” is missing. “Paramount Pictures Release” is partially illegible. Core text is not fully readable or accurate. VERDICT: FAIL

FAIL

A magazine cover titled 'NATIONAL EXPLORER' with cover lines 'The Hidden Temples of Cambodia — page 34', 'Interview: Dr. Maria Santos on Climate Action', 'Top 10 Trails You Must Hike This Year', and a barcode area with 'Issue 247 | March 2025 | $6.99

Evaluator (Qwen 3 VL)

All intended text is rendered as gibberish or distorted, not readable. “Cambbida” is misspelled. The barcode area shows “Issue 247 | March 2025 | $6.99” correctly, but the surrounding text is illegible nonsense. No actual words match the intended text. VERDICT: FAIL

FAIL

A product label for 'GOLDEN HARVEST ORGANIC HONEY' with subtitle 'Raw & Unfiltered', weight '12 oz (340g)', description 'Pure wildflower honey harvested from sustainable apiaries in the Pacific Northwest. No additives or preservatives.', and small te

Evaluator (Qwen 3 VL)

“Unfliltered” and “Northweist” are misspellings. “preserivatives” is misspelled. “Northwest” and “preservatives” are correct in the description but misspelled in the text. All other text matches. Key errors prevent full accuracy. VERDICT: FAIL

Spatial Reasoning 77%

Attributes BindingClick to expand89%

Attributes Binding measures whether local/krea-2-raw attaches the right colors, materials, and properties to the correct requested objects.

Easy 100%

PASS

A red ball and a blue cube sitting side by side on a white surface

Evaluator (Qwen 3.5 122B)

The image shows two objects on a white surface: 1. A red sphere (ball) — correct color and shape. 2. A blue cube — correct color and shape. Both are side by side, as requested. No attribute swaps: red is on the ball, blue on the cube; shapes match descriptions. Size is not specified in prompt, so irrelevant. All core attributes correctly bound. VERDICT: PASS

PASS

A green apple and an orange pumpkin on a wooden cutting board

Evaluator (Qwen 3.5 122B)

The image shows a green apple and an orange pumpkin on a wooden cutting board. The apple is correctly green, round, and has a stem. The pumpkin is orange, round, and has a stem. All attributes (color, shape) are correctly bound to the respective objects with no swaps. VERDICT: PASS

PASS

A yellow taxi and a black limousine parked next to each other on a street

Evaluator (Qwen 3.5 122B)

The image shows a yellow taxi and a black limousine parked side by side on a street. The taxi is correctly colored yellow with typical taxi markings, including a "T" logo and checkered patterns. The limousine is black, elongated, and has the characteristic shape of a limo. Both vehicles are appropriately sized and shaped for their types. All attributes are correctly bound to the respective objects. VERDICT: PASS

Medium 100%

PASS

A large red circle, a small blue triangle, and a medium green star arranged on a white canvas

Evaluator (Qwen 3.5 122B)

The image contains: - A large red circle (correct color, size, shape) - A small blue triangle (correct color, size, shape) - A medium green star (correct color, size, shape) All attributes are correctly bound to their respective objects with no swaps. The arrangement on a white canvas is also accurate. VERDICT: PASS

PASS

A tall thin white candle, a short fat red candle, and a medium spiral yellow candle on a shelf

Evaluator (Qwen 3.5 122B)

The image shows three candles on a shelf: 1. Left: Tall, thin, white candle — matches description. 2. Center: Short, fat, red candle — matches description. 3. Right: Medium height, spiral-shaped, yellow candle — matches description. All attributes (color, size, shape) are correctly bound to each object with no swaps. The arrangement and visual properties align precisely with the prompt. VERDICT: PASS

PASS

A big brown horse, a small white rabbit, and a medium-sized black cat standing in a meadow

Evaluator (Qwen 3.5 122B)

The image shows a big brown horse, a small white rabbit, and a medium-sized black cat in a meadow. All attributes (color, size, shape) are correctly bound to the right objects with no swaps. The horse is indeed big and brown, the rabbit is small and white, and the cat is medium-sized and black. VERDICT: PASS

Hard 67%

PASS

A pink fire truck, a blue banana, and a green sun in a surreal landscape

Evaluator (Qwen 3.5 122B)

The image contains a pink fire truck with an extended ladder, a large blue banana positioned in front of it, and a green sun in the sky. All objects match their requested colors and shapes: the fire truck is pink, the banana is blue, and the sun is green. The surreal landscape is also present with orange grass and a light blue sky. No attribute swaps are observed. VERDICT: PASS

FAIL

A purple carrot, a red tree with red leaves, and a white watermelon sliced open

Evaluator (Qwen 3.5 122B)

The image contains a purple carrot with green leaves, a red tree with red leaves, and a white round object (possibly a melon or pumpkin) next to a sliced watermelon that is red with black seeds and green rind. The watermelon is not white — it’s red inside, which contradicts the prompt’s “white watermelon” requirement. The white object is separate and not identified as a watermelon. Thus, attribute binding is incorrect for the watermelon. VERDICT: FAIL

PASS

An orange elephant, a yellow penguin, and a silver strawberry arranged on a table

Evaluator (Qwen 3.5 122B)

The image contains three objects on a white surface: an orange elephant-shaped object, a yellow penguin-shaped object, and a silver strawberry-shaped object. All colors and shapes match the prompt’s description exactly — orange for the elephant, yellow for the penguin, silver for the strawberry. No attributes are swapped or misassigned. The arrangement is on a table-like surface with background context suggesting a dining setting. VERDICT: PASS

CompositionalityClick to expand78%

Compositionality measures whether local/krea-2-raw combines subjects, actions, outfits, props, and scene details into one coherent requested image.

Easy 67%

PASS

A tabby cat wearing oversized red sunglasses

Evaluator (Qwen 3.5 122B)

The image contains all described elements: 1. Tabby cat: Present (brown/black striped fur pattern visible) 2. Oversized red sunglasses: Present (large red frames covering eyes) 3. Coherent combination: Present (glasses sit naturally on cat's face) All components are accurately rendered and logically integrated. The cat's anatomy, fur texture, and accessory placement appear realistic and well-composed. VERDICT: PASS

FAIL

A golden retriever wearing a tiny cowboy hat

Evaluator (Qwen 3.5 122B)

The image shows a golden retriever with a straw cowboy hat on its head. The dog’s fur is golden, and it has a happy expression with tongue out. The hat is proportionally large relative to the dog’s head — not “tiny” as specified in the prompt. While all core elements (golden retriever, cowboy hat) are present and coherently combined, the size descriptor “tiny” is inaccurate. Since the prompt explicitly includes “tiny,” and that detail is not met, the full description is not fulfilled. VERDICT:

PASS

A penguin holding a colorful umbrella

Evaluator (Qwen 3.5 122B)

The image contains: - A penguin: Present (black and white body, yellow beak/feet, purple eye) - Holding: Present (right flipper grips umbrella handle) - Colorful umbrella: Present (rainbow-colored panels with gradient effects) All elements are coherently combined — the penguin is upright, holding the umbrella naturally over its head. The colors are vivid and match the “colorful” descriptor. No missing or contradictory elements. VERDICT: PASS

Medium 100%

PASS

A medieval knight in full armor sitting at a desk typing on a modern laptop

Evaluator (Qwen 3.5 122B)

The image shows a figure in full medieval armor, including helmet, chainmail, and plate armor, seated at a wooden desk. The knight’s gloved hands are positioned on a modern laptop keyboard as if typing. All described elements — medieval knight, full armor, desk, modern laptop, typing action — are present and coherently combined. The juxtaposition is clear and visually consistent. VERDICT: PASS

PASS

A Roman gladiator taking a selfie with a smartphone in the Colosseum

Evaluator (Qwen 3.5 122B)

The image contains all required elements: 1. Roman gladiator - Present (helmet with red crest, armor, cape) 2. Selfie with smartphone - Present (holding phone in selfie position) 3. Colosseum - Present (ancient amphitheater structure in background) All elements are coherently combined in a single scene. The gladiator is positioned in front of the Colosseum while taking a selfie with a modern smartphone, creating the requested anachronistic combination. VERDICT: PASS

PASS

An astronaut in a spacesuit riding a bicycle through a park

Evaluator (Qwen 3.5 122B)

The image contains all required elements: 1. Astronaut in a spacesuit - Present (white suit with helmet, American flag patch, life support pack) 2. Riding a bicycle - Present (blue bicycle being ridden) 3. Through a park - Present (trees with autumn leaves, grass, paved path) All elements are coherently combined in a single scene. The astronaut is actively riding the bicycle on a park path with appropriate environmental context. VERDICT: PASS

Hard 67%

PASS

An octopus wearing a top hat and monocle playing chess on the surface of the moon with Earth visible in the background

Evaluator (Qwen 3.5 122B)

The image contains all described elements: - Octopus: Present (orange, with tentacles) - Top hat: Present (black, on octopus head) - Monocle: Present (on octopus eye) - Chess game: Present (board and pieces on table) - Moon surface: Present (cratered gray terrain) - Earth in background: Present (large blue planet) All elements are coherently combined in a surreal but visually consistent composition. VERDICT: PASS

PASS

A dinosaur in a business suit giving a PowerPoint presentation in a modern office while drinking coffee

Evaluator (Qwen 3.5 122B)

The image contains all described elements: - Dinosaur: Present (green T-Rex head/body) - Business suit: Present (blue suit with red tie) - PowerPoint presentation: Present (screen with text behind dinosaur) - Modern office: Present (glass windows, desk, laptop) - Drinking coffee: Present (holding white coffee cup) All elements are coherently combined in a single scene. VERDICT: PASS

FAIL

A robot dressed as a pirate sailing a paper boat on a lake of lava while juggling three kittens

Evaluator (Qwen 3.5 122B)

Let’s examine each element from the prompt: 1. Robot dressed as a pirate — ✅ Present (wearing pirate hat with skull, red vest, striped shirt). 2. Sailing a paper boat — ✅ Present (white origami-style boat on water). 3. On a lake of lava — ❌ Not present — it’s a calm blue lake with green islands and trees; no lava. 4. Juggling three kittens — ⚠️ Partially present — robot is holding/tossing three kittens, but they’re floating in air around him, not clearly “juggling” in motion; still, visually ac

CountingClick to expand67%

Counting measures whether local/krea-2-raw creates the exact number of requested objects without adding or missing items.

Easy 100%

PASS

Exactly 3 red apples arranged on a wooden table

Evaluator (Qwen 3.5 122B)

The image displays exactly three red apples positioned on a light-colored wooden surface. The apples are arranged in a triangular formation, with one in the foreground left, one in the foreground right, and one slightly behind in the center. All visual elements match the prompt’s request precisely. VERDICT: PASS

PASS

Exactly 5 yellow rubber ducks sitting in a row on a bathroom shelf

Evaluator (Qwen 3.5 122B)

The image shows exactly five yellow rubber ducks arranged in a row on a wooden bathroom shelf. Each duck is identical in appearance, with red beaks and black eyes. The count matches the prompt’s request precisely: five ducks, no more, no less. All are positioned as described. VERDICT: PASS

PASS

Exactly 4 white candles standing on a dark surface

Evaluator (Qwen 3.5 122B)

The image displays exactly four white candles, each with a lit flame, standing upright on a dark surface against a black background. The count matches the prompt’s request precisely: four candles, white in color, standing on a dark surface. No extra or missing objects are present. All visual elements align with the specified criteria. VERDICT: PASS

Medium 67%

FAIL

2 orange cats and 3 black dogs sitting together on a grassy lawn

Evaluator (Qwen 3.5 122B)

The image shows 1 orange cat and 3 black dogs sitting on grass. The prompt requested 2 orange cats and 3 black dogs. The dog count matches, but the cat count is off by one. Since the core requirement of exact object counts is not met, this fails the specification. VERDICT: FAIL

PASS

4 red roses and 2 white daisies in a glass vase on a table

Evaluator (Qwen 3.5 122B)

The image shows a glass vase on a table containing flowers. Counting the blooms: there are 4 red roses and 2 white daisies, which exactly matches the requested composition. The stems and leaves are visible in the clear vase filled with water. All specified objects are present in the correct quantities. VERDICT: PASS

PASS

1 blue butterfly and 3 yellow butterflies resting on a garden fence

Evaluator (Qwen 3.5 122B)

The image shows four butterflies total: one blue butterfly and three yellow butterflies. They are all resting on a dark brown garden fence with vertical slats. The count matches the prompt exactly — 1 blue and 3 yellow butterflies. All are clearly visible and correctly positioned on the fence structure. VERDICT: PASS

Hard 33%

FAIL

Exactly 9 goldfish swimming in a large round fishbowl

Evaluator (Qwen 3.5 122B)

The image shows a round fishbowl with multiple goldfish. Counting carefully: there are 8 clearly visible goldfish. One is partially obscured behind another near the center, but even counting that, it totals 8 — not 9. The prompt requested exactly 9 goldfish. Since the count does not match, the core criterion fails. VERDICT: FAIL

PASS

A fruit bowl containing exactly 3 bananas, 4 oranges, and 2 apples

Evaluator (Qwen 3.5 122B)

The image shows a fruit bowl with 3 bananas, 4 oranges, and 2 apples — matching the requested counts exactly. All objects are clearly visible and correctly categorized. No extra or missing items. The composition aligns precisely with the prompt’s specifications. VERDICT: PASS

FAIL

Exactly 11 colorful balloons floating against a clear blue sky

Evaluator (Qwen 3.5 122B)

The image shows a cluster of colorful balloons against a blue sky. Counting carefully: purple (1), green (1), red (1), blue (2), yellow (1), pink (1), orange (1), and one partially hidden behind the red balloon — that’s 9 clearly visible, plus at least 1 more peeking out, totaling 10. The prompt requested exactly 11. Since the count does not match, this fails the requirement. VERDICT: FAIL

NegationClick to expand56%

Negation measures whether local/krea-2-raw respects what should be absent from the image, not only what should be present.

Easy 100%

PASS

A tropical beach scene with no people

Evaluator (Qwen 3.5 122B)

The image depicts a serene tropical beach scene with golden sand, turquoise water, and palm fronds framing the top against a clear blue sky. There are no people visible anywhere in the frame — no figures on the beach, in the water, or under the palms. The exclusion criterion (“no people”) is fully satisfied. VERDICT: PASS

PASS

A cat lying on a couch without a collar

Evaluator (Qwen 3.5 122B)

The image shows a white and gray cat lying on a beige couch. The cat has no collar around its neck — the fur is uninterrupted from head to body. No collar, leash, or accessory is visible. The exclusion criterion (“without a collar”) is fully satisfied. VERDICT: PASS

PASS

A classroom with desks and a chalkboard but no students

Evaluator (Qwen 3.5 122B)

The image shows an empty classroom with rows of wooden desks and chairs facing a large green chalkboard at the front. There are no students visible anywhere in the room. The space is completely unoccupied by people, matching the exclusion criteria. Lighting fixtures hang from the ceiling, and there’s a podium centered before the board. All elements align with the prompt’s constraints. VERDICT: PASS

Medium 33%

PASS

A sandy beach with palm trees, seashells, and a beach umbrella but no water or ocean visible

Evaluator (Qwen 3.5 122B)

The image shows a sandy beach scene with two palm trees, a beach umbrella, and several seashells scattered on the sand. The sky is clear blue, and there is no visible water or ocean in the frame — the horizon fades into sky without any sea line. All excluded elements (water/ocean) are absent as required. VERDICT: PASS

FAIL

An office workspace with a desk, chair, bookshelf, and lamp but no computers or screens

Evaluator (Qwen 3.5 122B)

The image shows an office workspace with a desk, two chairs, a bookshelf, and a lamp. However, on the desk, there is a black object that strongly resembles a computer monitor or screen — it has a rectangular shape, a stand, and appears to be electronic equipment typically associated with computers. This violates the explicit exclusion in the prompt. VERDICT: FAIL

FAIL

A kitchen with pots, pans, a stove, and a cutting board but no food

Evaluator (Qwen 3.5 122B)

The image shows a kitchen scene with pots, pans, a stove, and a cutting board — all as described. However, on the left side of the counter, there is a glass bottle containing what appears to be dried beans or grains — which qualifies as “food.” Since the prompt explicitly excluded food, and this element is present, the exclusion criterion is violated. VERDICT: FAIL

Hard 33%

FAIL

A pepperoni pizza with tomato sauce, basil, and crust but absolutely no cheese

Evaluator (Qwen 3.5 122B)

The image shows a pepperoni pizza with tomato sauce, basil leaves, and a golden crust. However, there is clearly melted cheese visible beneath the pepperoni and around the edges — it’s orange-yellow, bubbly, and integrated into the sauce layer. Since the prompt explicitly forbade cheese, and cheese is present, the exclusion criterion is violated. VERDICT: FAIL

PASS

A car body with windows, doors, and headlights but no wheels

Evaluator (Qwen 3.5 122B)

The image shows a 3D model of a car body with windows, doors, and headlights clearly visible. The wheels are absent — only the wheel wells are present, with no tires or rims attached. This matches the exclusion criteria: “no wheels” is satisfied. The car body is rendered in grayscale against a neutral background, consistent with a clay render or untextured 3D model. VERDICT: PASS

FAIL

A human face with eyes, nose, and mouth but no eyebrows

Evaluator (Qwen 3.5 122B)

The image clearly shows a human face with eyes, nose, and mouth — as expected. However, the prompt explicitly excluded “eyebrows,” and in this image, well-defined, dark eyebrows are prominently visible above each eye. Since the excluded element (eyebrows) is present, the image fails to meet the exclusion criterion. VERDICT: FAIL

Relative PositionClick to expand100%

Relative Position measures whether local/krea-2-raw places objects in the requested spatial relationship, such as above, below, inside, or next to another object.

Easy 100%

PASS

A cat sitting on top of a cardboard box

Evaluator (Qwen 3.5 122B)

The cat is clearly positioned on top of the cardboard box, with its paws resting on the upper surface. The box supports the cat from below, and there are no conflicting spatial relationships. All described positional relationships match the visual content. VERDICT: PASS

PASS

A red ball underneath a wooden chair

Evaluator (Qwen 3.5 122B)

The image shows a red ball positioned directly beneath the seat of a wooden chair, resting on a rug between the chair legs. The spatial relationship “underneath” is accurately depicted — the ball is clearly below the chair’s structure and contained within its footprint. All described positional relationships are correct. VERDICT: PASS

PASS

A coffee mug next to an open book on a desk

Evaluator (Qwen 3.5 122B)

The image shows a coffee mug positioned to the right of an open book, both resting on a flat surface that appears to be a desk. The book is in the foreground, slightly angled, with text visible but blurred. The mug sits behind and to the right of the book’s open pages. All spatial relationships described — “next to,” “on a desk” — are accurately represented. No positional errors detected. VERDICT: PASS

Medium 100%

PASS

A bird perched on a branch, the branch extending from a tree next to a park bench

Evaluator (Qwen 3.5 122B)

The image shows a bird perched on a branch that extends from a tree. The tree is positioned next to a park bench, which is visible in the background to the left. All described spatial relationships are accurate: the bird is on the branch, the branch extends from the tree, and the tree is next to the bench. VERDICT: PASS

PASS

A cat sitting on a pillow, the pillow placed on a wooden chair

Evaluator (Qwen 3.5 122B)

The image shows a cat sitting on a white pillow, which is placed on a wooden chair. The spatial relationships described in the prompt are accurately represented: the cat is on the pillow, and the pillow is on the chair. All positional relationships are correct. VERDICT: PASS

PASS

A candle on top of a stack of books, the books next to a potted plant

Evaluator (Qwen 3.5 122B)

The image shows a candle placed directly on top of a stack of four books. To the left of the book stack is a potted plant with green leaves in a black pot. All specified spatial relationships — “candle on top of books” and “books next to potted plant” — are accurately represented. The arrangement matches the prompt’s description precisely. VERDICT: PASS

Hard 100%

PASS

Five chess pieces on a board: a king in the center, a queen to its left, a bishop to its right, a rook above the king, and a knight below the king

Evaluator (Qwen 3.5 122B)

The image shows five chess pieces arranged on a 3x3 grid. The black king is in the center square. To its left is a black queen. To its right is a white bishop. Above the king is a black rook. Below the king is a white knight. All spatial relationships described in the prompt — left, right, above, below, and center — are accurately represented in the image. VERDICT: PASS

PASS

A desk scene: laptop in the center, coffee mug to the left, phone to the right, lamp above, and notebook below

Evaluator (Qwen 3.5 122B)

The image shows a top-down view of a desk. The laptop is centered. A coffee mug is to the left of the laptop. A phone is to the right. A lamp is positioned above (top-right corner). A notebook is below (bottom center). All specified spatial relationships — left, right, above, below, and center — are accurately represented as described in the prompt. VERDICT: PASS

PASS

Five animals in a field: a horse on the far left, a cow to the right of the horse, a sheep in the center, a goat to the right of the sheep, and a dog on the far right

Evaluator (Qwen 3.5 122B)

The image shows five animals in a field arranged left to right: a horse, a cow, a sheep, a goat, and a dog. This matches the prompt’s specified spatial relationships exactly — horse on far left, cow to its right, sheep in center, goat to sheep’s right, dog on far right. All positional descriptors are accurate. VERDICT: PASS

Extreme 100%

PASS

A landscape with a red barn in the foreground at large scale, a windmill in the midground at medium scale, and snow-capped mountains in the background at small scale, all with consistent depth perspective

Evaluator (Qwen 3.5 122B)

The image shows a red barn on the right side in the foreground, appearing large. A windmill stands to the left in the midground, smaller than the barn but clearly visible. Snow-capped mountains are in the distant background, appearing small due to perspective. The spatial arrangement matches the prompt: barn (foreground, large), windmill (midground, medium), mountains (background, small). All positional relationships are consistent with depth perspective. VERDICT: PASS

PASS

A beach scene with a sandcastle in the foreground, a surfer on the waves in the midground, and a cruise ship on the horizon in the background, sizes decreasing with distance

Evaluator (Qwen 3.5 122B)

The image correctly depicts a beach scene with a sandcastle in the foreground, a surfer on the waves in the midground, and a cruise ship on the horizon in the background. The sizes of the objects decrease with distance as specified. All described positional relationships are accurate. VERDICT: PASS

PASS

A city park with a park bench and pigeons in the foreground, a fountain in the midground, and skyscrapers in the background, each layer progressively smaller

Evaluator (Qwen 3.5 122B)

The image correctly shows a park bench with pigeons in the foreground, a fountain in the midground, and skyscrapers in the background. The spatial arrangement matches the prompt: bench and pigeons are closest to the viewer, fountain is behind them, and skyscrapers are furthest away. All positional relationships (foreground/midground/background) are accurate. VERDICT: PASS

Scale & ProportionsClick to expand67%

Scale & Proportions measures whether local/krea-2-raw keeps people, vehicles, buildings, and everyday objects at believable relative sizes.

Easy 67%

PASS

A person standing next to a yellow school bus, showing realistic relative scale

Evaluator (Qwen 3.5 122B)

The person’s height relative to the bus appears realistic — roughly 1/3 to 1/4 of the bus’s height, which is consistent with real-world proportions. The wheel size, window spacing, and body panels also scale appropriately. No obvious distortions or unnatural sizing are present. VERDICT: PASS

FAIL

A ladybug sitting on a large green leaf, shown at realistic proportions

Evaluator (Qwen 3.5 122B)

The ladybug appears disproportionately large relative to the leaf it rests upon. In reality, a ladybug is small enough that multiple could fit on a leaf of this apparent size. Here, the insect occupies nearly half the visible leaf surface, violating realistic proportion expectations. The leaf’s curvature and texture also suggest it should be much larger than the bug shown. VERDICT: FAIL

PASS

A child holding a basketball, both at correct relative sizes

Evaluator (Qwen 3.5 122B)

The child’s arm and hand are holding the basketball in a natural, proportionate way. The ball appears appropriately sized relative to the child’s torso and limb — consistent with real-world scale for a youth basketball. No anatomical or object-size distortions are visible. VERDICT: PASS

Medium 100%

PASS

A tiny person standing on the rim of a coffee cup, looking down into it

Evaluator (Qwen 3.5 122B)

The image shows a tiny person standing on the rim of a coffee cup, which aligns with the prompt’s description. The person is proportionally small relative to the cup, and their posture suggests they are looking down into it. The scale relationship between the person and the cup is consistent and visually coherent. VERDICT: PASS

PASS

A giant tabby cat walking between city skyscrapers like a kaiju

Evaluator (Qwen 3.5 122B)

The image depicts a tabby cat that is disproportionately large compared to the surrounding skyscrapers, consistent with the "giant" and "kaiju" description. The cat towers over the buildings and fills the street, creating a clear and intentional size relationship where the feline is the dominant, massive figure. The proportions are internally consistent with the prompt's surreal premise. VERDICT: PASS

PASS

A miniature house sitting on the palm of someone's hand

Evaluator (Qwen 3.5 122B)

The image shows a miniature house resting on an open human hand. The house is appropriately scaled to fit within the palm, with its base spanning roughly from the thumb to the middle finger. The proportions between the hand and the house are realistic for a “miniature” representation — the house appears small enough to be held, yet detailed enough to show windows, doors, and roof features. No anatomical or scaling inconsistencies are visible. VERDICT: PASS

Hard 33%

FAIL

A group photo of an elephant, a horse, a dog, a cat, and a mouse, all standing in a line with correct real-world proportional sizes

Evaluator (Qwen 3.5 122B)

The image shows an elephant, horse, dog, cat, and mouse arranged in a line. However, the proportions are incorrect: the mouse is nearly as tall as the dog’s paw, the cat is disproportionately large relative to the horse’s leg, and the dog appears too small compared to the horse. Real-world size relationships are not preserved — the mouse should be tiny, the cat small, the dog medium, the horse large, and the elephant largest. These distortions violate the prompt’s requirement for correct proport

PASS

A dinner table scene with a plate, fork, knife, wine glass, salt shaker, and peppercorn, all at realistic relative sizes

Evaluator (Qwen 3.5 122B)

The image shows a dinner table setting with a plate, fork, knife, wine glass, salt shaker, and peppercorn shaker. The relative sizes appear realistic: the plate is appropriately large compared to the cutlery, the wine glass is proportionate to the plate, and the shakers are small but correctly sized for tabletop condiments. No obvious scaling errors are present. VERDICT: PASS

FAIL

A farm scene with a barn, tractor, farmer, cow, chicken, and a fence post, all at correct proportional scale

Evaluator (Qwen 3.5 122B)

The image shows a farm scene with a barn, tractor, farmer, cow, chicken, and fence post. However, the proportions are inconsistent: the chicken in the foreground is nearly as tall as the fence post, and the farmer standing beside the tractor appears disproportionately small compared to the vehicle. The cow also seems oversized relative to the tractor and barn. These scale inconsistencies violate the prompt’s requirement for correct proportional relationships. VERDICT: FAIL

Human realism 79%

Faces & ExpressionsClick to expand92%

Faces & Expressions measures whether local/krea-2-raw creates realistic faces with the requested age, emotion, expression, and identity cues.

Easy 100%

PASS

A close-up portrait of a young woman with a warm natural smile

Evaluator (Qwen 3 VL)

Face is symmetric with natural, well-proportioned features. Smile is warm and clearly conveys happiness, matching the prompt. No obvious distortions; skin texture and lighting are realistic. Slight AI smoothing on cheeks and eyes is minimal and not distracting. VERDICT: PASS

PASS

A portrait of a middle-aged man with a calm, serious expression

Evaluator (Qwen 3 VL)

Face is symmetric with natural features, no obvious distortions. Expression is calm and serious, matching the prompt. Skin texture and hair details appear realistic. Lighting and composition support a professional portrait. All elements align with the requested depiction. VERDICT: PASS

PASS

A headshot of a teenager with a relaxed, neutral face

Evaluator (Qwen 3 VL)

The face is symmetric with correctly proportioned features. Eyes, nose, and mouth are well-placed and natural. The expression is neutral and relaxed, matching the prompt. No significant distortions or artifacts are visible. Lighting and skin texture appear realistic. VERDICT: PASS

Medium 100%

PASS

A close-up of a person's face showing clear disgust, nose wrinkled and lip curled

Evaluator (Qwen 3 VL)

Face is largely symmetric with correctly rendered features. Disgust is clearly conveyed: nose is wrinkled, lips are curled, brows are furrowed. Slight asymmetry in eyebrow shape and minor texture artifacts are present but don’t obscure the emotion. Overall, the intended expression is unmistakable and well-executed. VERDICT: PASS

PASS

A portrait capturing genuine surprise — raised eyebrows, wide eyes, slightly open mouth

Evaluator (Qwen 3 VL)

Face is symmetric with correctly rendered features. Eyes are wide, eyebrows raised, mouth slightly open — all clearly conveying genuine surprise. No noticeable distortions. Lighting and texture are natural. The emotion is unmistakably readable and matches the prompt precisely. VERDICT: PASS

PASS

A face showing quiet melancholy — downcast eyes, slightly drawn mouth, distant gaze

Evaluator (Qwen 3 VL)

The face is symmetric with correctly rendered features: downcast eyes, slightly drawn mouth, and distant gaze. The melancholy emotion is clearly readable through facial expression and lighting. No significant distortions are present. The stylized, painterly rendering enhances the somber mood without compromising anatomical coherence. VERDICT: PASS

Hard 67%

PASS

An elderly woman in her 80s with deep wrinkles and white hair, smiling warmly with kind eyes

Evaluator (Qwen 3 VL)

The face is symmetric with natural, aged features: deep wrinkles, white hair, and a warm, genuine smile. Eyes crinkle slightly, conveying kindness. No distortions or unnatural elements are present. The emotion is clearly readable as joyful and warm. All prompt requirements are accurately fulfilled. VERDICT: PASS

FAIL

A 5-year-old child with round cheeks and large eyes, crying with tears streaming down their face

Evaluator (Qwen 3 VL)

The child’s face is symmetric with round cheeks and large eyes, matching the prompt. However, no tears are visible, and the expression appears surprised or curious, not crying. Facial features are well-defined without distortions. The emotion does not match the requested “crying” state. VERDICT: FAIL

PASS

An elderly man in his 70s with age spots and thinning gray hair, looking skeptically over reading glasses

Evaluator (Qwen 3 VL)

Face is symmetric with natural wrinkles, age spots, and thinning gray hair. Skeptical expression is clearly readable via furrowed brow and downturned mouth. No significant distortions. Features align well with prompt. VERDICT: PASS

Extreme 100%

PASS

A group of three friends at a table: one laughing with head tilted back, one looking shocked with wide eyes, and one rolling their eyes with a smirk

Evaluator (Qwen 3 VL)

Faces show natural asymmetry and correct anatomical features. Emotions are clearly readable: laughing (left), shocked (center), rolling eyes with smirk (right). No significant distortions. All three expressions match the prompt accurately. VERDICT: PASS

PASS

A family photo with four people: a father beaming with pride, a mother smiling gently, a teenager looking bored, and a child grinning excitedly

Evaluator (Qwen 3 VL)

Faces show clear symmetry and correct anatomical features. Emotions are distinctly readable: father beams, mother smiles gently, teenager looks bored, child grins excitedly. No noticeable distortions. All four individuals match the prompt’s emotional descriptions accurately. VERDICT: PASS

PASS

Three coworkers in a meeting: one looking frustrated with furrowed brows, one nodding with a pleased expression, and one looking confused with a tilted head

Evaluator (Qwen 3 VL)

Faces show clear, symmetric features with no distortions. The left person displays frustration (furrowed brows, hand on head), the center person shows pleasure (smiling, nodding), and the right person shows confusion (tilted head, furrowed brow). All emotions are clearly readable and match the prompt. VERDICT: PASS

Full BodyClick to expand92%

Full Body measures whether local/krea-2-raw renders full-body people with natural anatomy, clothing, pose, and proportions.

Easy 100%

PASS

A woman standing upright in a relaxed pose, arms at her sides, wearing jeans and a white t-shirt

Evaluator (Qwen 3 VL)

The woman stands upright with arms at her sides, wearing jeans and a white t-shirt as described. Limb count and proportions are anatomically correct. Pose is recognizable and relaxed. No significant distortions observed. All elements match the prompt accurately. VERDICT: PASS

PASS

A man standing straight facing the camera, hands in pockets, normal proportions

Evaluator (Qwen 3 VL)

The man stands straight, facing forward, with both hands in pockets. Limb count and proportions are anatomically correct. The pose is clearly recognizable as described. No distortions are visible. All elements match the prompt precisely. VERDICT: PASS

PASS

A person standing in a neutral pose in a park, full body visible from head to feet

Evaluator (Qwen 3 VL)

The person’s limbs are correctly counted and proportioned. The neutral standing pose is anatomically accurate, with hands by sides and feet flat. No distortions are visible. The full body from head to feet is shown in a park setting. The pose matches the prompt description. VERDICT: PASS

Medium 100%

PASS

A woman mid-stride running through a park, hair and clothes showing motion

Evaluator (Qwen 3 VL)

Limb count and proportions are anatomically correct. Pose is a natural running stride with arms bent and hair flowing backward, matching the prompt. Motion is convincingly rendered in hair and clothing. No distortions or unnatural anatomy. The action is clearly recognizable as running through a park. VERDICT: PASS

PASS

A man captured mid-jump over a puddle, body in the air with bent knees

Evaluator (Qwen 3 VL)

The man’s full body is captured mid-air, knees bent, arms out for balance, and feet above a puddle — matching the prompt. Limb count and proportions are anatomically correct. No distortions are visible. The pose is clearly recognizable as a jump over water. All criteria are met. VERDICT: PASS

PASS

A dancer in a dynamic spin, dress flowing outward with one leg extended

Evaluator (Qwen 3 VL)

The dancer’s pose is dynamic, with one leg extended and the dress flowing outward, matching the prompt. Limb count and proportions appear anatomically correct. Motion blur on hair and dress enhances the spinning effect. No obvious distortions in body structure. The action is clearly recognizable as a spin. VERDICT: PASS

Hard 67%

PASS

A person in a yoga tree pose — standing on one leg with the other foot pressed to the inner thigh, arms raised overhead

Evaluator (Qwen 3 VL)

The person stands on one leg with the other foot pressed to the inner thigh, arms raised overhead in prayer position — classic tree pose. Limb count and proportions are accurate. No visible distortions. Pose is clearly recognizable and anatomically correct. VERDICT: PASS

PASS

A rock climber clinging to a cliff face with arms and legs extended to different holds

Evaluator (Qwen 3 VL)

The climber has correct limb count and proportions. Pose is anatomically plausible for rock climbing, with arms and legs extended to holds. No obvious distortions. Climbing gear and rope are realistic. The action matches the prompt: clinging to a cliff face with limbs on holds. VERDICT: PASS

FAIL

A cellist seated with correct posture, legs around the instrument, bow arm extended

Evaluator (Qwen 3 VL)

The cellist’s posture and limb count are anatomically correct. The instrument rests between his legs, and his bow arm is extended. However, his left leg appears unnaturally bent and positioned, failing to fully support the cello as described. The pose is recognizable but distorted in leg positioning. VERDICT: FAIL

Extreme 100%

PASS

Two tango dancers in a close embrace, the man dipping the woman backward with intertwined legs

Evaluator (Qwen 3 VL)

The dancers’ limbs are correctly counted and proportioned. The man dips the woman backward with intertwined legs, matching the prompt. The pose is anatomically plausible and recognizable as a tango move. No significant distortions are visible. The composition captures the dynamic motion accurately. VERDICT: PASS

PASS

Two soccer players mid-tackle, one sliding and the other jumping over with the ball at their feet

Evaluator (Qwen 3 VL)

Both players show correct limb count and proportions. The sliding player’s pose is anatomically plausible, and the jumping player’s posture is dynamic and realistic. The ball is near the jumping player’s feet, matching the prompt. No significant distortions are visible. The action is clearly recognizable as a mid-tackle scenario. VERDICT: PASS

PASS

Two gymnasts performing a synchronized handstand, side by side with identical body alignment

Evaluator (Qwen 3 VL)

Both gymnasts show correct limb count and proportions. Their synchronized handstand pose is anatomically plausible and recognizable. Legs are straight, feet together, arms supporting weight. Slight asymmetry in head/neck angle and minor leg separation visible. No major distortions. Core criteria met. VERDICT: PASS

HandsClick to expand42%

Hands measures whether local/krea-2-raw draws anatomically plausible hands, fingers, and gestures.

Easy 33%

FAIL

A person standing with both hands visible at their sides, fingers relaxed and naturally spread

Evaluator (Qwen 3.5 122B)

Upon close examination, the left hand (viewer’s right) shows only 4 fingers — the thumb is missing or fused. The right hand (viewer’s left) appears to have 5 fingers, but the pinky is oddly shortened and merged with the ring finger. Neither hand displays naturally spread, relaxed fingers; both appear stiff and anatomically distorted. Anomalies include missing/merged digits and unnatural positioning. VERDICT: FAIL

PASS

A woman holding her hands out in front of her palms-up, all ten fingers clearly visible

Evaluator (Qwen 3.5 122B)

The image shows a woman with both hands extended forward, palms facing the viewer. Each hand clearly displays five fingers — thumb, index, middle, ring, and pinky — totaling ten fingers. The thumbs are positioned naturally, slightly curled inward toward the center, which is anatomically plausible for this gesture. No extra or missing fingers; no fusion or distortion. Gesture appears intentional and correct. VERDICT: PASS

FAIL

A man with arms crossed, both hands resting on opposite upper arms with fingers visible

Evaluator (Qwen 3.5 122B)

The left hand (on the right arm) shows 5 fingers, but the right hand (on the left arm) has only 4 visible fingers — the thumb is missing or obscured unnaturally. The grip appears slightly stiff and the finger placement on the opposite arm is not fully anatomically natural, especially the way the right hand’s fingers lie flat without clear knuckle curvature. This violates the “exactly 5 per hand” criterion. VERDICT: FAIL

Medium 0%

FAIL

A person holding a white ceramic coffee mug by the handle with their right hand

Evaluator (Qwen 3.5 122B)

The visible right hand gripping the mug handle shows only four fingers clearly — thumb, index, middle, and ring finger — with the pinky either hidden or missing. The grip itself appears anatomically plausible for holding a mug, but the finger count is incomplete. Since the prompt requires exactly 5 fingers per hand and this is not met, the image fails the anatomical accuracy criterion. VERDICT: FAIL

FAIL

A woman writing in a notebook with a pen gripped correctly in her right hand

Evaluator (Qwen 3.5 122B)

The right hand gripping the pen shows 5 fingers, but the thumb and index finger appear fused or malformed at the tip. The left hand resting on the notebook is blurred and partially visible — only 3–4 fingers are discernible due to depth-of-field blur, not clear anatomical error. However, the right hand’s grip is anatomically questionable: the pen is held with an unnatural pinch, and finger positioning suggests AI distortion. Core criterion (anatomically correct grip + 5 fingers per hand) is not

FAIL

A person playing an acoustic guitar with left hand on the fretboard and right hand strumming

Evaluator (Qwen 3.5 122B)

The left hand gripping the fretboard shows only four fingers clearly visible, with the thumb positioned correctly behind the neck. The right hand strumming area is blurred but appears to show a natural strumming motion; however, finger count is indistinct due to motion blur and angle. No obvious anatomical anomalies are present, but incomplete finger visibility prevents confirming “exactly 5 per hand.” Core criteria not fully met. VERDICT: FAIL

Hard 67%

PASS

A person making a peace sign with their right hand, clearly showing two fingers up and three curled

Evaluator (Qwen 3.5 122B)

The hand in the image displays five fingers total: two extended (index and middle) forming a peace sign, and three curled (thumb, ring, pinky). The anatomy appears correct — no extra or missing fingers, natural joint bends, and proper spatial arrangement. The gesture is clearly a right hand making a peace sign as described. No anomalies detected. VERDICT: PASS

PASS

A hand giving a thumbs-up with four fingers curled and thumb extended upward

Evaluator (Qwen 3.5 122B)

The image shows a single hand making a thumbs-up gesture. The thumb is extended upward, and four fingers are curled into the palm — this matches the prompt’s description. Anatomically, it is correct for a thumbs-up: only the thumb is extended; the other four fingers are curled. There are no anomalies or extra/missing fingers. The hand is stylized but structurally accurate for the gesture. VERDICT: PASS

FAIL

A person counting to three on their fingers, with index, middle, and ring fingers extended

Evaluator (Qwen 3.5 122B)

The image shows a single hand with all five fingers extended — thumb, index, middle, ring, and pinky — in an open palm gesture. The prompt asked for “counting to three” with only index, middle, and ring fingers extended. This is not what’s shown. Additionally, the hand has 5 fingers (anatomically correct), but the gesture does not match the prompt. Core criteria failed. VERDICT: FAIL

Extreme 67%

PASS

Two people shaking hands in a professional setting, all fingers of both hands anatomically correct

Evaluator (Qwen 3.5 122B)

Upon close examination, the left hand (on the right side of the image) shows a thumb and four fingers — anatomically correct. The right hand (on the left side) also displays five fingers: thumb, index, middle, ring, and pinky — all visible and properly positioned in the handshake. No extra or missing fingers; grip is natural and professional. All criteria met. VERDICT: PASS

FAIL

Two people doing a high-five, palms meeting with all fingers extended on both hands

Evaluator (Qwen 3.5 122B)

Upon close inspection, the left hand (from viewer’s perspective) appears to have 5 fingers, but the right hand shows only 4 clearly visible fingers — the thumb is present, but one finger seems missing or merged. The gesture resembles a high-five, but anatomical accuracy fails due to the missing digit on the right hand. This violates the prompt’s requirement of “all fingers extended on both hands.” VERDICT: FAIL

PASS

One person handing a small gift box to another person, both sets of hands gripping the box correctly

Evaluator (Qwen 3.5 122B)

The image shows two pairs of hands interacting with a small gift box. The person in red (giver) has both hands visible — left hand clearly shows 4 fingers and thumb gripping the box; right hand is partially obscured but appears to have 5 digits. The receiver’s hands: right hand (foreground) shows 4 fingers + thumb gripping correctly; left hand is mostly hidden. All visible fingers appear anatomically correct, no extra or missing digits. Grip is natural and consistent with handing/receiving a gif

Multi-SubjectClick to expand100%

Multi-Subject measures whether local/krea-2-raw keeps multiple people distinct while preserving their requested attributes, outfits, and positions.

Easy 100%

PASS

Two women standing side by side: one with short blonde hair wearing a red jacket, the other with long black hair wearing a blue dress

Evaluator (Qwen 3 VL)

Left woman: short blonde hair, wearing red jacket. Right woman: long black hair, wearing blue dress. Both are visually distinct and match descriptions exactly. Features are clear and accurate. VERDICT: PASS

PASS

Two men at a cafe: one with a beard and glasses wearing a green sweater, the other clean-shaven in a black hoodie

Evaluator (Qwen 3 VL)

Left man: beard, glasses, green sweater — matches. Right man: clean-shaven, black hoodie — matches. Both visually distinct, in cafe setting. All described features accurately rendered. VERDICT: PASS

PASS

Two friends posing: one tall with curly red hair in denim overalls, the other short with straight brown hair in a yellow sundress

Evaluator (Qwen 3 VL)

Tall person: curly red hair, denim overalls over striped shirt. Short person: straight brown hair with bangs, yellow sundress. Both visually distinct and match descriptions. All features accurately rendered. VERDICT: PASS

Medium 100%

PASS

Four people at a dinner table: a bald man in a suit, a woman with silver hair in a red blouse, a young man with dreadlocks in a denim jacket, and a woman with a hijab in a green dress

Evaluator (Qwen 3 VL)

All four individuals are present and visually distinct. Left: bald man in suit/tie. Next: woman with silver hair in red blouse. Center: young man with dreadlocks in denim jacket. Right: woman with hijab in green dress. Features match prompt exactly. VERDICT: PASS

PASS

Three musicians on stage: a tall woman with short pink hair playing bass, a man with a beard and bandana on drums, and a slim man with glasses playing keyboard

Evaluator (Qwen 3 VL)

Tall woman: pink short hair, playing bass — matches. Drummer: bearded, wearing bandana — matches. Slim man: glasses, playing keyboard — matches. All three are visually distinct and correctly positioned. Features align precisely with prompt description. VERDICT: PASS

PASS

Three hikers on a trail: a man with a backpack and sunhat, a woman with braided hair and hiking poles, and a teenager with a baseball cap and water bottle

Evaluator (Qwen 3 VL)

Man: wears sunhat, backpack, holds hiking poles. Woman: braided hair, backpack, holds hiking poles. Teenager: baseball cap, backpack, holds water bottle. All three are visually distinct and match described features. VERDICT: PASS

Truthfulness 63%

PhotorealismClick to expand100%

Photorealism measures whether local/krea-2-raw makes materials, surfaces, and real-world objects look visually plausible and distinct.

Medium 100%

PASS

A still life on a wooden table featuring a clear glass of water, a polished silver spoon, a folded linen napkin, and a ripe red apple — each material rendered with accurate texture and light behavior

Evaluator (Qwen 3.6 27B)

The image demonstrates high-fidelity material rendering. The glass exhibits correct refraction and transparency, distorting the background wood grain. The spoon shows sharp, accurate specular reflections of the apple and environment. The apple’s skin has a realistic waxy sheen with subsurface scattering effects. The linen napkin displays a convincing woven texture with soft, diffuse lighting. The wooden table provides a consistent grain pattern and warm tone. All materials behave physically corr

PASS

A workspace desk with a brushed aluminum laptop, a leather-bound notebook, a ceramic coffee mug, and a glass paperweight — all materials visually distinct and realistic

Evaluator (Qwen 3.6 27B)

The image successfully renders the requested materials with high fidelity. The laptop exhibits a convincing brushed aluminum texture with appropriate metallic sheen. The notebook displays realistic leather grain and thickness. The mug has the correct matte ceramic finish, and the glass paperweight demonstrates accurate refraction, reflection, and translucency. The wood desk surface also appears natural. All materials are visually distinct and realistic. VERDICT: PASS

PASS

A kitchen counter with a stainless steel pot, a wooden cutting board, a glass bottle of olive oil, and a cotton dish towel — each surface reflecting light correctly

Evaluator (Qwen 3.6 27B)

The image successfully renders the distinct material properties requested. The stainless steel pot exhibits sharp, metallic reflections. The glass bottle shows appropriate translucency and refraction of the oil inside. The wooden cutting board displays a natural grain texture, and the cotton towel has a visible woven fabric texture. The lighting interacts correctly with each surface, creating realistic highlights and shadows. VERDICT: PASS

Physics & ReflectionsClick to expand83%

Physics & Reflections measures whether local/krea-2-raw follows basic physical cues such as shadows, reflections, gravity, transparency, and contact points.

Easy 100%

PASS

A red sphere on a white surface with a single shadow cast to the right, consistent with a light source from the upper left

Evaluator (Qwen 3.6 27B)

The image displays a red sphere on a white surface. The shadow is cast to the right and slightly behind the sphere, which is consistent with a light source positioned to the upper left. The shadow's shape is an ellipse, correctly representing the perspective distortion of a sphere's shadow on a plane. The lighting on the sphere itself also matches this direction, with the brightest highlight on the upper-left side. The physics of the shadow and lighting are accurate. VERDICT: PASS

PASS

A coffee mug on a sunny windowsill casting a sharp shadow to the left

Evaluator (Qwen 3.6 27B)

The shadow is cast to the left, consistent with sunlight coming from the right. The reflection in the window glass is physically plausible, showing the mug’s interior and liquid. However, the reflection appears slightly distorted and misaligned — the mug’s position in the reflection doesn’t perfectly mirror its real-world placement relative to the window frame. Also, the shadow’s edge is too soft for “sharp” as described. Minor inconsistencies exist but don’t break plausibility entirely. VERDIC

PASS

A person standing outdoors at midday with a short shadow directly below them

Evaluator (Qwen 3.6 27B)

The shadow is cast directly beneath the subject, consistent with midday lighting. The shape accurately mirrors the person’s stance and clothing folds without distortion. The shadow’s darkness and soft edges are realistic for bright outdoor conditions. No directional inconsistencies or unnatural reflections are present. The scene’s lighting and shadow behavior align with physical expectations. VERDICT: PASS

Medium 67%

FAIL

A woman standing in front of a full-length mirror, her reflection accurately showing the back of her outfit and reversed orientation

Evaluator (Qwen 3.6 27B)

The reflection is physically implausible. The woman is facing the mirror, so her reflection should show her front. However, the reflection displays her back, while the real woman’s back is visible to the viewer. This violates basic mirror physics. Additionally, the reflection’s pose and orientation do not match a true mirror image — it appears as a rotated or duplicated figure rather than a correct reflection. Shadows and lighting also lack consistency with the scene. VERDICT: FAIL

PASS

A cat sitting on a polished marble floor with its reflection visible below

Evaluator (Qwen 3.6 27B)

The reflection is vertically inverted and positioned directly beneath the cat, consistent with a polished floor. The distortion increases with distance from the contact point, which is physically accurate for such surfaces. Colors and lighting in the reflection match the subject. The orientation and direction are correct. VERDICT: PASS

PASS

A vase of flowers on a glass table with a clear reflection visible on the table surface

Evaluator (Qwen 3.6 27B)

The reflection is vertically inverted and positioned directly beneath the vase, consistent with a glass surface. The colors and shapes of the flowers and vase are mirrored accurately, with slight blurring and distortion typical of reflections on glass. The lighting and shadows align with the scene’s illumination, and no directional inconsistencies are present. The reflection’s clarity and orientation support physical plausibility. VERDICT: PASS

Hard 100%

PASS

A row of trees reflected in a calm lake at sunset, with the reflection slightly darker and gently rippled

Evaluator (Qwen 3.6 27B)

The reflections are vertically inverted and correctly aligned with the tree trunks, maintaining proper orientation. The color palette accurately mirrors the sunset sky, appearing slightly darker in the water as expected due to light absorption. The gentle ripples distort the reflection naturally, creating a realistic texture on the water surface. The physics of the reflection, including direction and distortion, are consistent with a calm lake at sunset. VERDICT: PASS

PASS

A city skyline reflected in a rain puddle on a street, with mild distortion and color shift

Evaluator (Qwen 3.6 27B)

The reflection is vertically inverted, which is physically correct for a puddle. The orientation matches the skyline above. There is a noticeable color shift toward teal/green in the water, consistent with light absorption and reflection physics. Mild distortion is present, especially near the edges and around objects like poles, which aligns with real-world puddle behavior. The reflection’s clarity and alignment with the scene are plausible. VERDICT: PASS

PASS

A mountain range reflected in a still alpine lake, with the reflection inverted and subtly muted in color

Evaluator (Qwen 3.6 27B)

The reflection is physically plausible. It is correctly inverted vertically, matching the mountain range's orientation. The color is subtly muted and darker than the actual mountains, consistent with light absorption in water. The surface shows slight ripples, creating natural distortion rather than a perfect mirror image. The reflection aligns perfectly with the shoreline, and the lighting direction is consistent across the scene. VERDICT: PASS

Extreme 67%

PASS

A clear glass sphere sitting on a mirror surface near a pool of water, showing refraction through the sphere, reflection on the mirror, and water reflection simultaneously

Evaluator (Qwen 3.6 27B)

The image displays a glass sphere on a reflective surface with a pool in the background. The refraction through the sphere correctly inverts the background scene, which is physically accurate. The reflection on the mirror surface is sharp and directly beneath the sphere, maintaining correct orientation and alignment. The lighting and shadows are consistent with an outdoor setting. The water reflection in the background is also plausible. All optical phenomena are rendered realistically. VERDICT

PASS

A chrome sculpture in a room with a glass floor and large windows, showing multiple reflections and environmental mapping on every surface

Evaluator (Qwen 3.6 27B)

The reflections are physically plausible. The glass floor accurately mirrors the sculpture and windows with correct orientation and slight transparency. The chrome surfaces exhibit realistic environmental mapping, distorting the room’s geometry consistent with their curved shapes. Lighting is coherent, with highlights matching the window positions. No directional inconsistencies or impossible distortions are present. VERDICT: PASS

FAIL

A crystal wine glass on a polished marble counter next to a window overlooking a lake — showing refraction in the glass, reflection on the marble, and the lake's own reflective surface

Evaluator (Qwen 3.6 27B)

The reflection on the marble counter is physically plausible, showing a clear, inverted image of the glass with appropriate distortion and color matching. The lake’s surface also reflects the sky and trees realistically. However, the prompt specifically asks to check shadows/reflections for physical plausibility — and while reflections are well-rendered, there are no visible cast shadows from the glass onto the counter, which would be expected under natural lighting. This omission makes the scen

World KnowledgeClick to expand33%

World Knowledge measures whether local/krea-2-raw depicts recognizable places, landmarks, flags, and known objects with accurate real-world details.

Easy 67%

PASS

The Eiffel Tower in Paris on a clear day, architecturally accurate with its iron lattice structure and three levels

Evaluator (Qwen 3.6 27B)

The image depicts the Eiffel Tower with high architectural accuracy, including the iron lattice structure, three levels, and surrounding trees. However, there is a noticeable anomaly: a tall, slender tower-like structure visible through the arch at the base, which does not exist in reality. This suggests an AI-generated artifact or hallucination. Despite this minor flaw, the core subject remains recognizable and largely accurate. VERDICT: PASS

PASS

The Taj Mahal in Agra seen from the reflecting pool, with accurate dome, minarets, and symmetry

Evaluator (Qwen 3.6 27B)

The image depicts the Taj Mahal with high fidelity. The central onion dome, four surrounding minarets, and the symmetrical layout are architecturally accurate. The perspective from the reflecting pool is correct, capturing the iconic reflection. The intricate details, such as the arches and decorative elements, are well-rendered. The lighting and colors appear natural, enhancing the realism of the scene. Overall, the image is a faithful representation of the Taj Mahal. VERDICT: PASS

FAIL

The Statue of Liberty on her island pedestal, with correct torch, crown, tablet, and green patina

Evaluator (Qwen 3.6 27B)

The Statue of Liberty is clearly recognizable with accurate green patina, torch, crown, and tablet. However, the pedestal and base are architecturally inaccurate — they appear simplified, blocky, and lack the correct neoclassical details (e.g., missing columns, incorrect window placement, unnatural textures). The surrounding island and water also show AI artifacts (e.g., overly smooth surfaces, unnatural crowd lines). While the statue itself is mostly correct, the structural inaccuracies in the

Medium 33%

FAIL

A traditional Japanese shamisen instrument with accurate three-stringed design, wooden body, and bachi plectrum

Evaluator (Qwen 3.6 27B)

The image depicts a shamisen and a bachi, which are recognizable. However, the instrument is not factually accurate. The headstock features four tuning pegs arranged in a cross pattern, whereas a traditional shamisen has only three strings and three pegs. Additionally, the body shape is overly symmetrical and lacks the characteristic asymmetry and traditional materials (like cat or dog skin) of a real shamisen. VERDICT: FAIL

FAIL

A medieval astrolabe with accurate rete, tympan, and engraved degree markings on brass

Evaluator (Qwen 3.6 27B)

The image depicts an object resembling an astrolabe, but it fails the accuracy test. The text and symbols are nonsensical gibberish rather than accurate Latin or Arabic inscriptions. The geometric structures of the rete and tympan are distorted and illogical, lacking the precise mathematical layout of a real instrument. The "engraved degree markings" are blurry and inconsistent. While the general shape is recognizable, the specific details are hallucinated, making it factually inaccurate. VERDI

PASS

A Japanese kintsugi bowl — cracked ceramic repaired with visible gold lacquer lines

Evaluator (Qwen 3.6 27B)

The image depicts a ceramic bowl with a network of cracks filled with gold, accurately representing the Japanese art of kintsugi. The subject is clearly recognizable as a bowl, and the repair technique is visually consistent with real-world examples—irregular fracture lines highlighted with metallic gold against an off-white glaze. Lighting and texture appear realistic. No architectural or factual inaccuracies are present. VERDICT: PASS

Hard 0%

FAIL

A detailed diagram of the water cycle showing evaporation from the ocean, condensation forming clouds, precipitation as rain, and collection in rivers flowing back to the sea

Evaluator (Qwen 3.6 27B)

The image depicts the water cycle, but the text labels are nonsensical gibberish (e.g., "Syop," "Vannl wanle," "Eveporration"). This makes the diagram factually inaccurate and unreadable as an educational tool. The core requirement for a "detailed diagram" is not met because the information conveyed is corrupted. VERDICT: FAIL

FAIL

A cross-section showing tectonic plates at a convergent boundary with subduction, a trench, volcanic arc, and layered mantle

Evaluator (Qwen 3.6 27B)

The image depicts a geological cross-section with distinct layers and a subduction zone. However, it fails the "real-world subject" criterion due to its low-fidelity, cartoon-like rendering. The textures are flat and artificial, lacking the complexity of real geological formations. While the structural elements (subducting plate, magma chamber, volcanic arc) are present, the overall aesthetic is that of a basic educational diagram rather than a realistic or high-quality visualization. VERDICT:

FAIL

An anatomical illustration of the human heart showing four chambers, valves, aorta, pulmonary arteries, and veins with correct placement

Evaluator (Qwen 3.6 27B)

The image is a stylized, cartoon-like illustration rather than a realistic anatomical drawing. While the four chambers and major vessels (aorta, pulmonary arteries/veins) are present and roughly positioned, key details are inaccurate: valves are not clearly depicted or labeled, coronary vessels are oversimplified and misplaced, and vessel connections lack anatomical precision. The artistic style sacrifices factual accuracy for visual simplicity. VERDICT: FAIL

Extreme 33%

FAIL

A Morpho menelaus butterfly with accurate iridescent blue dorsal wings, brown ventral wing pattern with eyespots, and correct body morphology

Evaluator (Qwen 3.6 27B)

The image depicts a butterfly that resembles a Morpho, but it is not factually accurate. The prompt specifies a Morpho menelaus, which has iridescent blue dorsal wings and brown ventral wings with eyespots. However, the image shows both dorsal and ventral patterns simultaneously on the same side, which is biologically impossible. Additionally, the wing patterns are overly vibrant and stylized, deviating from realistic morphology. VERDICT: FAIL

FAIL

The Hagia Sophia interior showing the massive central dome, pendentives, semi-domes, and Islamic calligraphy medallions alongside Byzantine mosaics

Evaluator (Qwen 3.6 27B)

The image is not factually or architecturally accurate. While it captures the general grandeur of the Hagia Sophia, it contains significant hallucinations. The central dome’s calligraphy medallion is stylized and incorrect in form. The pendentives feature large, dark angelic figures that do not match the actual Byzantine mosaics (which depict Christ, the Virgin, and prophets). The lighting and structural details are overly dramatic and inconsistent with reality. VERDICT: FAIL

PASS

A specimen of bismuth crystal showing accurate iridescent oxide layers, geometric hopper crystal formation, and characteristic staircase structure

Evaluator (Qwen 3.6 27B)

The image depicts a bismuth crystal with high fidelity. It accurately portrays the characteristic geometric hopper crystal formation, featuring distinct staircase structures and sharp, angular edges. The iridescent oxide layers are rendered with realistic coloration, displaying the typical rainbow hues (blues, purples, yellows) caused by thin-film interference. The subject is clearly recognizable and factually accurate regarding its mineralogical properties. VERDICT: PASS

Professional Studio 74%

Camera & LightingClick to expand83%

Camera & Lighting measures whether local/krea-2-raw follows specific lens, angle, focus, exposure, and lighting instructions.

Easy 100%

PASS

A wide-angle shot of a modern living room interior showing expansive space with slight barrel distortion at the edges

Evaluator (Gemma 4 26B)

The image successfully implements a wide-angle perspective that emphasizes the expansive space of the living room. There is a noticeable barrel distortion at the edges, particularly visible in the curvature of the walls and the ceiling edges, which matches the prompt's requirement. The lighting is bright and even, consistent with a modern interior shot. VERDICT: PASS

PASS

A close-up macro shot of a single dewdrop on a rose petal, shallow depth of field

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. It features a close-up macro shot of a single, clear dewdrop resting on a rose petal. The depth of field is extremely shallow, with the dewdrop and the immediate edge of the petal in sharp focus, while the background and foreground petals are heavily blurred. The lighting is soft and natural, highlighting the transparency and refractive properties of the droplet. VERDICT: PASS

PASS

An aerial bird's-eye view looking straight down at a traffic roundabout with cars and trees

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. It features a true aerial bird's-eye view, looking straight down at a traffic roundabout. The perspective is flat and top-down, with no tilt or angle. The lighting is even and diffuse, consistent with a high-angle shot, and the depth of field is deep, keeping the entire scene from the center roundabout to the surrounding trees in sharp focus. VERDICT: PASS

Medium 67%

PASS

A portrait lit with classic Rembrandt lighting — triangle of light on the shadowed cheek, moody atmosphere

Evaluator (Gemma 4 26B)

The image successfully implements Rembrandt lighting. There is a clear, distinct triangle of light on the shadowed cheek (the subject's right cheek), which is the hallmark of this lighting style. The lighting creates a moody, high-contrast atmosphere with a dark background, and the depth of field is appropriate for a portrait, keeping the focus on the subject's face. VERDICT: PASS

PASS

A landscape photograph bathed in golden hour light — long warm shadows, orange sky, soft glow

Evaluator (Gemma 4 26B)

The image perfectly captures the requested elements. The sky is a vibrant orange, creating a strong golden hour atmosphere. The lighting pattern features long, dramatic shadows stretching across the foreground grass, consistent with a low sun angle. The perspective provides a clear sense of depth, moving from the textured foreground to the silhouetted trees and the distant, hazy mountains. VERDICT: PASS

FAIL

A high-key portrait of a person against a bright white background, minimal shadows, airy and luminous

Evaluator (Gemma 4 26B)

The image fails to meet the prompt's requirements. While it attempts a bright aesthetic, the subject is a tiny, low-resolution square centered in a massive white void, which is not a standard portrait composition. The lighting is flat and lacks the "luminous" and "airy" quality described; instead, it looks like a poorly cropped, low-quality snapshot with washed-out colors. It lacks the professional high-key studio quality requested. VERDICT: FAIL

Hard 67%

PASS

A portrait shot at 85mm f/1.4 with creamy bokeh background, backlit with warm rim light separating the subject from a blurred garden

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. The portrait features a shallow depth of field with a very creamy, blurred background (bokeh) characteristic of an 85mm f/1.4 lens. There is a clear, warm rim light effect on the subject's hair and shoulders, creating a distinct separation from the garden background, which fulfills the backlit requirement. VERDICT: PASS

FAIL

A street photograph with shallow depth of field as if shot at 50mm f/1.8, subject sharp with soft city lights in the background

Evaluator (Gemma 4 26B)

The image fails to meet the prompt's core requirement. The prompt specifies a "subject sharp" with a shallow depth of field. However, in this image, the central subject (the person walking away) is completely out of focus and blurry. A 50mm f/1.8 lens should keep the subject sharp while blurring the background; here, both the subject and the background are blurry, resulting in a failed execution of the requested depth of field. VERDICT: FAIL

PASS

A food photograph shot from above at 35mm f/2.8 with natural window light from the left, slight vignette, warm color temperature

Evaluator (Gemma 4 26B)

The image matches the prompt's requirements. The perspective is a top-down shot (flat lay). The depth of field is shallow, with the edges of the plate and the background window slightly out of focus, consistent with f/2.8. The lighting is clearly coming from the window on the left, creating a natural light pattern. The color temperature is warm, and there is a subtle vignette around the edges of the plate. VERDICT: PASS

Extreme 100%

PASS

A tilt-shift photograph of a city street making real-life look like a miniature model — selective focus strip, saturated colors

Evaluator (Gemma 4 26B)

The image successfully implements the tilt-shift effect. It features a distinct selective focus strip where the mid-ground is sharp, while the foreground and background are heavily blurred to create a miniature model appearance. The colors are highly saturated, and the perspective effectively mimics a high-angle miniature look. The lighting and depth of field perfectly match the prompt's requirements. VERDICT: PASS

PASS

A cinematic frame with blue and orange anamorphic lens flare streaking horizontally across a dark moody scene

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. It features a dark, moody scene with a shallow depth of field. Most importantly, it displays a prominent horizontal anamorphic lens flare with distinct blue and orange color streaks across the center of the frame, exactly as described. VERDICT: PASS

PASS

A portrait with dramatic split lighting — half the face brightly lit, the other half in deep shadow, high contrast chiaroscuro

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt's description. It features a portrait with dramatic split lighting, where one side of the face is brightly illuminated and the other side is cast in deep, heavy shadow. This creates a high-contrast chiaroscuro effect against a black background. The perspective is a standard portrait view, and the lighting pattern is exactly as specified. VERDICT: PASS

Color PrecisionClick to expand75%

Color Precision measures whether local/krea-2-raw renders requested colors clearly and consistently across subjects, products, and backgrounds.

Easy 100%

PASS

A bright red sports car parked on a gray asphalt road — the red is vivid and unambiguous

Evaluator (Gemma 4 26B)

The image features a bright, vivid red sports car parked on a gray asphalt road. The red color of the car is highly saturated and unambiguous, matching the prompt's description perfectly. The road is a consistent gray, and the yellow line is also visible. All color requirements specified in the prompt are accurately represented. VERDICT: PASS

PASS

A woman wearing a royal blue evening dress against a neutral background

Evaluator (Gemma 4 26B)

The woman is wearing a vibrant, deep blue evening dress that accurately represents the "royal blue" color requested. The background is a solid, light beige/off-white color, which perfectly matches the description of a "neutral background." All color requirements from the prompt have been met. VERDICT: PASS

PASS

A field of pure sunflower yellow flowers under a clear blue sky

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt's color requirements. The flowers are a vibrant, pure sunflower yellow, and the sky is a clear, bright blue. There are no deviations from the specified color descriptions. VERDICT: PASS

Medium 67%

PASS

A bedroom wall painted in #FF6B35 (burnt orange) with white furniture and natural daylight

Evaluator (Gemma 4 26B)

The wall color is a vibrant, saturated orange that closely aligns with the requested burnt orange (#FF6B35) description. The furniture, including the bed, lamp base, and nightstand, is white as specified. The scene also features natural daylight coming from the window. All color requirements from the prompt are accurately represented in the image. VERDICT: PASS

FAIL

A forest scene where the foliage is precisely #2D5F2D (dark muted green) with brown tree trunks

Evaluator (Gemma 4 26B)

The image fails to meet the color requirements. The prompt specifies a dark muted green (#2D5F2D) for the foliage. However, the image has a heavy yellowish-olive tint, making the foliage appear much more yellowish and bright than the requested dark muted green. Additionally, the text overlay in the image (#D5FFD) does not match the requested hex code (#2D5F2D), further indicating a color mismatch. VERDICT: FAIL

PASS

A room with walls in #4A90D9 (medium blue) with warm wooden floor and white trim

Evaluator (Gemma 4 26B)

The image accurately reflects the requested colors. The wall color is a medium blue that matches the specified hex code #4A90D9 (as indicated by the text overlay in the image). The floor is a warm wooden color, and the trim is white, adhering to all color requirements in the prompt. VERDICT: PASS

Hard 33%

PASS

A minimalist room with walls in #F5E6CC (warm cream), a sofa in #2C3E50 (dark navy), and accent pillows in #E74C3C (bright red)

Evaluator (Gemma 4 26B)

The image matches the color requirements. The walls are a warm cream color, the sofa is a dark navy (#2C3E50), and the accent pillows are a bright red (#E74C3C). Although there is a text overlay of a different hex code in the corner, the actual colors of the objects in the scene correctly reflect the prompt's specific color instructions. VERDICT: PASS

FAIL

A product flat-lay on a #1A1A2E (deep dark blue) background, with a #F0C27F (soft gold) watch and #FFFFFF (pure white) earbuds

Evaluator (Gemma 4 26B)

The image fails to follow the color instructions. The background is a very dark blue, but the text overlays are incorrect and do not match the prompt's requirements. The watch color is a muted gold, but the text below it says "#F0C27F" (which is a different hex code than the prompt's gold). The earbuds are white, but the text below them says "#FAFFEF" instead of the requested "#FFFFFF". The prompt's specific hex values and color names are not correctly represented in the image's text or the actu

FAIL

A graphic design composition with background #FAFAFA (near-white), main shape #FF4500 (orange-red), and accent lines #333333 (dark charcoal)

Evaluator (Gemma 4 26B)

The image contains text labels that appear to be intended as hex codes, but they do not match the prompt's requirements. The prompt specifies the main shape as #FF4500 (orange-red), but the label in the image reads "#FF4500" while the background is labeled "#FAFAFA" and the accent line is labeled "#333333". However, the actual colors used in the graphic are slightly off from the requested hex values (the orange is more of a pure orange than the specified orange-red). Most importantly, the labels

Extreme 100%

PASS

A smooth gradient on a wall going from #FF0000 (red) on the left to #0000FF (blue) on the right, transitioning evenly through purple

Evaluator (Gemma 4 26B)

The image shows a smooth horizontal gradient. On the far left, the color is a pure red (#FF0000). On the far right, the color is a pure blue (#0000FF). The transition between these two colors passes through a purple/magenta middle section, exactly as described in the prompt. VERDICT: PASS

PASS

A sunset sky gradient on a flat surface from #FF6B35 (orange) at the bottom to #1B1464 (deep indigo) at the top

Evaluator (Gemma 4 26B)

The image displays a vertical gradient. The bottom color is a vibrant orange, which matches the hex code #FF6B35 provided in the prompt and visible in the bottom-left corner. The top color is a deep, dark indigo/navy, which matches the hex code #1B1464 provided in the prompt and visible in the bottom-right corner. The gradient transition accurately follows the specified color scheme. VERDICT: PASS

PASS

A horizontal banner with a gradient from #00C9FF (cyan) on the left to #92FE9D (mint green) on the right

Evaluator (Gemma 4 26B)

The image shows a horizontal banner with a gradient. On the left, the color is a bright cyan, which matches the hex code #00C9FF. On the right, the color is a light mint green, which matches the hex code #92FE9D. The gradient transitions smoothly between these two specific colors as requested. VERDICT: PASS

PhotorealismClick to expand33%

Photorealism measures whether local/krea-2-raw produces polished product and studio imagery with realistic materials, lighting, shadows, and finish.

Easy 33%

FAIL

A single red sneaker on a clean white background with soft studio lighting and gentle shadow — product photography style

Evaluator (Gemma 4 26B)

The image fails to look like a real photograph. The sneaker lacks realistic texture, appearing as a flat, monochromatic 3D render rather than a physical object made of fabric or leather. There is a complete absence of material detail, stitching, or subtle highlights expected in studio photography. The lighting is unnaturally uniform, and the shadow is a generic, soft blur that doesn't interact realistically with the shoe's base. VERDICT: FAIL

FAIL

A clear glass perfume bottle on a reflective black surface with studio lighting, no background distractions

Evaluator (Gemma 4 26B)

The image fails to meet the criteria for a studio-quality product shot. While the subject is clear, there are significant AI artifacts. The glass bottle has unnatural, warped edges and inconsistent thickness. Most notably, the straw/dip tube is a single, thin, floating line that lacks physical realism. The reflection on the surface is also poorly rendered and lacks the complex light interactions expected in a high-end professional photograph. VERDICT: FAIL

PASS

A ripe green apple on a seamless white background with soft side lighting and subtle shadow — catalog style

Evaluator (Gemma 4 26B)

The image successfully meets the prompt requirements. The green apple features realistic skin texture, subtle specular highlights, and a natural-looking stem. The lighting is soft and directional, creating a gentle highlight on the upper left and a realistic, soft contact shadow underneath. The seamless white background is clean and professional, characteristic of a high-quality studio product shot. There are no obvious AI artifacts or distortions. VERDICT: PASS

Graphical design 50%

Data VisualisationClick to expand0%

Data Visualisation measures whether local/krea-2-raw produces charts with correct labels, proportions, colors, and readable values.

Easy 0%

FAIL

A bar chart showing monthly sales for January through June with values 12K, 18K, 15K, 22K, 19K, 25K — labeled axes, color-coded bars, and a title 'Monthly Sales 2025'

Evaluator (Qwen 3 VL)

The chart misrepresents data: y-axis is labeled "Januuary" (spelled wrong), x-axis labels are incorrect (e.g., "18K", "15K", "Junge" instead of months). Values shown (12K, 18K, 22K, 19K, 25K) are incomplete or mismatched. Title is correct, but axes and labels are inaccurate and unreadable. Chart type is bar chart, but data presentation is flawed. VERDICT: FAIL

FAIL

A pie chart showing market share: Company A 35%, Company B 25%, Company C 20%, Others 20% — each slice a different color with labels and percentages

Evaluator (Qwen 3 VL)

The pie chart uses correct colors and percentages visually, but labels are inaccurate: “Company C” is mislabeled as “Company C ur wet ker:” and “Company B 25%” is missing from the chart’s slice labels. The title also mislabels “Company B” as “Corhlans21%.” These errors violate the prompt’s requirements for accurate labels and percentages. VERDICT: FAIL

FAIL

A simple line graph showing daily website visitors over one week (Mon-Sun): 1200, 1500, 1100, 1800, 2000, 1700, 1400 — with grid lines, axis labels, and title 'Weekly Traffic'

Evaluator (Qwen 3 VL)

The chart is a line graph titled "Weekly Traffic" with axes and grid lines. However, axis labels and tick values are garbled (e.g., "Won Soly Veiilor", "Wiv 50 Daly Traibe Sun"), and the data points do not align correctly with the requested values. The chart type matches, but critical textual elements are unreadable and inaccurate. VERDICT: FAIL

Layout & DesignClick to expand22%

Layout & Design measures whether local/krea-2-raw organizes posters, menus, covers, and other designed compositions with the requested hierarchy and elements.

Easy 33%

PASS

A movie night poster with a large popcorn image in the center, the title 'MOVIE NIGHT' in bold at the top, and the tagline 'Every Friday at 8 PM' at the bottom

Evaluator (Qwen 3 VL)

The image features a large popcorn bucket centered, with "MOVIE NIGHT" in bold white text at the top and "Every Friday at 8 PM" in black text at the bottom. Text is readable and correctly placed per the prompt. Visual hierarchy is clear, with the popcorn as the focal point and text anchoring top and bottom. All required layout elements are present. VERDICT: PASS

FAIL

An event poster featuring a guitar illustration, the headline 'LIVE MUSIC FEST' in large text, and 'Tickets available now' as a tagline below

Evaluator (Qwen 3 VL)

The guitar illustration and "Tickets available now" tagline are present and correctly placed. The headline "LIVE MUSIC FEST" uses proper visual hierarchy with large, bold text. However, the top text is garbled nonsense, not matching the prompt. The core layout and readable elements are mostly correct, but the garbled text violates the prompt. VERDICT: FAIL

FAIL

A book club poster with an open book illustration, the title 'JOIN THE BOOK CLUB' prominently displayed, and 'Meets every Wednesday' as the tagline

Evaluator (Qwen 3 VL)

The poster includes an open book illustration and the main title "JOIN THE BOOK CLUB" with correct visual hierarchy. The tagline "Meets every Wednesday" is present below the book. However, the title text is incorrect ("Moob s Wour Book Club:" instead of "JOIN THE BOOK CLUB"), and the website URL is garbled. The core layout elements are present but text is inaccurate. VERDICT: FAIL

Medium 0%

FAIL

A landing page design with a hero header image at the top, a headline and subtitle in the middle, a 'Sign Up' call-to-action button, and a footer with three social media icons

Evaluator (Qwen 3 VL)

The image shows a landing page with a hero header, headline/subtitle, "Sign Up" button, and footer with social icons — all present. However, all text is gibberish and unreadable, violating the requirement for readable, correctly placed text. The visual hierarchy is intact, but content failure renders it non-functional. VERDICT: FAIL

FAIL

A newsletter layout with a banner image header, a two-column body section with text on the left and an image on the right, a highlighted quote block, and a footer with unsubscribe link

Evaluator (Qwen 3 VL)

The image shows a newsletter layout with a banner header, two-column body (text left, image right), a centered quote block, and a footer with an unsubscribe link. However, all text is gibberish placeholder text, unreadable and incorrectly placed. The visual hierarchy is present but meaningless due to illegible content. VERDICT: FAIL

FAIL

A restaurant menu layout with a logo header, three category sections (appetizers, mains, desserts) each with items and prices, and a footer with address and hours

Evaluator (Qwen 3 VL)

The image shows a menu with a logo header, three category sections (Appetizers, Mains, Desserts) with items and prices, and a footer with address and hours. However, all text is gibberish or unreadable, appearing as random characters or reversed text. The layout structure is correct, but content is not legible or meaningful. VERDICT: FAIL

Hard 33%

PASS

A magazine double-page spread with a left page containing a large hero photo, and a right page with a two-column article, a pull quote, a sidebar with author bio, and a page number

Evaluator (Qwen 3 VL)

The layout matches the prompt: left page has a large hero photo; right page has two columns, a pull quote, sidebar with author bio, and page number. Text is present but unreadable due to gibberish placeholder characters. Visual hierarchy is correct, but content is not legible. Core layout structure is intact. VERDICT: PASS

FAIL

A mobile app UI mockup showing a dashboard with a top navigation bar, a search field, a 2x2 grid of feature cards with icons, a list of recent activity items, and a bottom tab bar

Evaluator (Qwen 3 VL)

The image shows a mobile app UI with a top navigation bar, search field, 2x2 grid of feature cards with icons, "Recent" activity list, and bottom tab bar — all present. However, all text is garbled and unreadable, violating the requirement for readable, correctly placed text. Layout hierarchy is visually intact but text illegibility undermines functionality. VERDICT: FAIL

FAIL

A fashion magazine spread with a full-bleed left-page photo, right page with three-column layout, headline, body copy, a smaller inset image, and styled page numbers

Evaluator (Qwen 3 VL)

The image shows a magazine spread with a full-bleed left photo and a right page featuring three columns, a headline, body copy, a smaller inset image, and styled page numbers. Text is present but appears as gibberish, unreadable placeholder characters. The layout structure is visually correct, but the text fails the readability requirement. VERDICT: FAIL

Style DiversityClick to expand83%

Style Diversity measures whether local/krea-2-raw can shift between visual styles like oil painting, pixel art, watercolor, and technical illustration.

Easy 67%

PASS

A golden retriever sitting in a garden, painted in the style of an oil painting with visible brushstrokes and rich texture

Evaluator (Qwen 3.5 122B)

The image clearly depicts a golden retriever in a garden setting. The artistic style strongly resembles an oil painting, with visible, directional brushstroke textures throughout the fur and background. The colors are rich and saturated, and the blending mimics traditional oil techniques. The key visual characteristics requested—oil painting style, visible brushstrokes, and rich texture—are all present and clearly recognizable. VERDICT: PASS

PASS

A golden retriever sitting in a garden, rendered in Japanese anime style with flat colors and large expressive eyes

Evaluator (Qwen 3.5 122B)

The image clearly exhibits the requested Japanese anime style. Key characteristics are present: the golden retriever has large, expressive eyes with prominent white highlights, simplified facial features, and flat, unshaded colors. The background elements (flowers, bushes) are also rendered in a consistent flat-color anime aesthetic. The overall composition matches the prompt’s stylistic requirements. VERDICT: PASS

FAIL

A golden retriever sitting in a garden, rendered in pixel art style with a 32x32 grid and limited color palette

Evaluator (Qwen 3.5 122B)

The image clearly depicts a golden retriever in a garden setting using pixel art. The grid structure is visible, and the resolution appears to be 16x16 or 32x32 (counting the squares reveals a 16x16 grid, not 32x32 as requested). The color palette is limited, primarily using greens and oranges. However, the prompt specifically requested a 32x32 grid, and the image is visibly 16x16. This is a clear deviation from the specified technical constraint. VERDICT: FAIL

Medium 100%

PASS

A city street scene in the style of Art Deco — geometric shapes, gold and black palette, ornamental symmetry

Evaluator (Qwen 3.5 122B)

The image strongly exhibits Art Deco characteristics: strong geometric lines, vertical emphasis in skyscrapers, symmetrical composition, and a striking gold-and-black palette. The ornamental framing and stylized urban elements align with the requested aesthetic. While not historically precise, it clearly evokes the Art Deco style as interpreted for modern AI art. VERDICT: PASS

PASS

A woman with a parasol in the style of Impressionism — soft brushstrokes, dappled light, pastel palette

Evaluator (Qwen 3.5 122B)

The image clearly exhibits the requested Impressionist style. Key characteristics are present: soft, visible brushstrokes create a painterly texture; dappled light filters through the parasol and background; and the palette is dominated by pastels—pale blues, pinks, yellows, and greens. The subject’s form is softly defined rather than sharply outlined, consistent with Impressionism. The overall effect matches the prompt’s artistic intent. VERDICT: PASS

PASS

A portrait of a man in Pop Art style — bold outlines, Ben-Day dots, flat vivid primary colors

Evaluator (Qwen 3.5 122B)

The image clearly exhibits the requested Pop Art style. It features bold black outlines defining the figure and facial features. The entire image is rendered using Ben-Day dots for shading and texture, visible in the skin tones, hair, and clothing. The color palette consists of flat, vivid primary colors: a bright yellow background, blue suit, and red dots on the skin. All key visual characteristics are present and accurately executed. VERDICT: PASS

Hard 100%

PASS

A forest rendered in photorealistic watercolor — the detail and lighting of a photo combined with visible watercolor bleeds and paper texture

Evaluator (Qwen 3.5 122B)

The image successfully combines photorealistic forest details (tree bark, leaf litter, lighting) with visible watercolor characteristics. The edges show characteristic bleeds and blooms where pigment meets paper. The white background has a discernible paper texture. The green canopy shows translucent washes typical of watercolor. The core stylistic requirements are clearly met. VERDICT: PASS

PASS

A woman in a futuristic city rendered in cyberpunk Art Nouveau — neon colors and tech elements combined with flowing organic lines and floral borders

Evaluator (Qwen 3.5 122B)

The image successfully combines cyberpunk and Art Nouveau styles. The central figure is a woman in futuristic gear within a neon cityscape (cyberpunk). This is framed by an ornate border featuring flowing organic lines, intricate patterns, and prominent floral motifs at the corners (Art Nouveau). The color palette utilizes vibrant neons alongside the decorative style. The requested style is clearly recognizable and its key characteristics are present. VERDICT: PASS

PASS

A medieval castle depicted in Japanese ukiyo-e woodblock print style combined with photorealistic sky and lighting

Evaluator (Qwen 3.5 122B)

The image successfully combines ukiyo-e woodblock print aesthetics (flat colors, bold outlines, stylized clouds, Japanese text box) with a photorealistic sky gradient and lighting effects on the castle. The castle itself is rendered in a semi-realistic style that blends with the traditional print elements. Key characteristics of both requested styles are clearly present and harmoniously integrated. VERDICT: PASS

Extreme 67%

PASS

An interior scene rendered in soft brutalism — raw concrete textures combined with rounded pastel furniture and warm diffused light

Evaluator (Qwen 3.5 122B)

The image clearly exhibits soft brutalism: raw concrete walls and ceiling with visible texture, paired with rounded, pastel-colored furniture (pink and cream). Warm diffused light enters from the left window, casting gentle shadows and enhancing the serene mood. The color palette and material contrast align precisely with the prompt. All key visual characteristics are present and well-executed. VERDICT: PASS

FAIL

A still life painted in anxious minimalism — sparse composition with uncomfortable negative space, muted colors, and subtle visual tension

Evaluator (Qwen 3.5 122B)

The image exhibits a sparse composition with significant negative space, featuring only two small objects on a white surface against a large, muted beige background. The color palette is subdued and earthy. However, the rendering style appears to be photorealistic or hyper-realistic rather than a distinct "painting" style. While the composition fits "minimalism," the specific artistic texture and "anxious" quality requested are not clearly recognizable as a painted style. VERDICT: FAIL

PASS

A cityscape rendered in nostalgic futurism — retro-futuristic 1960s space-age aesthetics with a melancholic warm-toned patina

Evaluator (Qwen 3.5 122B)

The image displays a cityscape with retro-futuristic architecture, including spires and rounded towers reminiscent of 1960s space-age design. The color palette features warm, muted tones with a yellow-green patina that evokes nostalgia. While the architectural style aligns well with the prompt, the melancholic mood is somewhat subdued by the bright, hazy lighting. Overall, key visual characteristics are present. VERDICT: PASS