vs

192 evaluations across 6 categories

Benchmark V1 verdicts are produced by VLM judges and can contain mistakes. Treat PASS/FAIL labels as machine-assisted assessments, and inspect the images yourself. Learn more about the methodology.

68%vs76%

Pass Rate

4.5svs246.2s

Avg Latency

Text Rendering › Typography Style › Easylocal/flux-2-klein-4b

Prompt: The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink

Text Rendering 40%vs80%

Typography StyleClick to expand100%vs100%

Typography Style compares whether local/flux-2-klein-4b or local/krea-2-turbo better preserves requested lettering styles, font mood, and text texture.

Easy 100%vs100%

local/flux-2-klein-4bPASS

The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink

Evaluator (Qwen 3 VL)

Text reads “CHAPTER ONE.” with correct spelling and period. Font mimics vintage typewriter style with blocky, slightly uneven ink strokes. Paper shows aged texture, stains, and creases matching prompt. Ink appears slightly smudged and inconsistent, enhancing authenticity. All elements convincingly simulate typewritten text on aged paper. VERDICT: PASS

local/flux-2-klein-4bPASS

The phrase 'TODAY'S SPECIAL' written in white chalk on a dark green chalkboard with chalk dust

Evaluator (Qwen 3 VL)

Text reads "TODAY'S SPECIAL" correctly. Letters show textured, chalky appearance on dark green surface with visible chalk dust and smudges, convincingly mimicking chalkboard writing. Medium and style match prompt. VERDICT: PASS

local/flux-2-klein-4bPASS

The word 'OPEN' displayed on a red LED sign with glowing pixel-style letters

Evaluator (Qwen 3 VL)

The word "OPEN" is rendered in bright red, glowing pixel-style LED letters, matching the described medium precisely. The spelling is correct and the pixelated, dot-matrix appearance convincingly simulates an LED display. All visual elements align with the prompt. VERDICT: PASS

Writing accuracyClick to expand25%vs75%

Writing accuracy compares whether local/flux-2-klein-4b or local/krea-2-turbo more reliably renders the exact requested words, numbers, labels, and short phrases.

Easy 100%vs100%

local/flux-2-klein-4bPASS

A single large word 'HELLO' centered on a white background

Evaluator (Qwen 3 VL)

The word "HELLO" is fully spelled correctly and is perfectly legible against the white background. All letters are intact, properly formed, and clearly visible. No misspellings, missing letters, or illegible parts are present. The text matches the intended phrase exactly. VERDICT: PASS

local/flux-2-klein-4bPASS

The word 'ADVENTURE' in big bold letters centered on a plain cream background

Evaluator (Qwen 3 VL)

The word "ADVENTURE" is spelled correctly and fully readable in bold, centered text on a cream background. All letters are intact and clearly legible. No misspellings or illegible parts detected. The image matches the intended text exactly. VERDICT: PASS

local/flux-2-klein-4bPASS

The word 'SERENITY' written large and centered on a solid light blue background

Evaluator (Qwen 3 VL)

The word "SERENITY" is fully spelled correctly and is large, centered, and clearly readable against the solid light blue background. No misspellings, missing letters, or illegible parts are present. All text elements match the intended specification exactly. VERDICT: PASS

Medium 0%vs100%

local/flux-2-klein-4bFAIL

The sentence 'The quick brown fox jumps high' in clean black text on a white background

Evaluator (Qwen 3 VL)

The word "quick" is misspelled as "quuick" with an extra 'u'. All other words are correctly spelled and fully readable. The text is black on white, matching the description. VERDICT: FAIL

local/flux-2-klein-4bFAIL

'Every sunset brings a new dawn ahead' written neatly on a pale yellow background

Evaluator (Qwen 3 VL)

Text contains multiple errors: "sunset" is misspelled as "suunst", "brings" is duplicated as "brings brigs", "a" is repeated, and "dawn" is misspelled as "nown". All words are illegible or incorrect. The intended phrase is not accurately rendered. VERDICT: FAIL

local/flux-2-klein-4bFAIL

'Stars shine brightest in the dark night' displayed on a clean off-white background

Evaluator (Qwen 3 VL)

Text contains multiple errors: "shine" is misspelled as "shin", "brightest" as "beratst", and "night" as "ngiht". All words are fully visible but incorrect. The background is clean off-white as intended. Accuracy fails due to spelling errors. VERDICT: FAIL

Hard 0%vs100%

local/flux-2-klein-4bFAIL

A poster with headline 'SUMMER SALE' in large bold text, body text 'Up to 50% off all items this weekend only' in medium size, and fine print 'Terms and conditions apply. See store for details.' at the bottom in small text

Evaluator (Qwen 3 VL)

Text contains errors: "itms" (should be "items"), "keamoned" (should be "weekend"), and "this" is misaligned. The fine print is correct. Misspellings and illegible parts prevent full accuracy. VERDICT: FAIL

local/flux-2-klein-4bFAIL

A flyer with large title 'GRAND OPENING', subtitle 'Join us for food, music, and fun' in medium text, and small disclaimer 'RSVP required. Limited seating available.' at the bottom

Evaluator (Qwen 3 VL)

Text "Limited" is misspelled as "Linited". "Available" is misspelled as "availble". All other text is correct and readable. The misspellings are clear and impactful errors. VERDICT: FAIL

local/flux-2-klein-4bFAIL

An event banner with headline 'ANNUAL GALA 2025' at the top in large serif font, description 'An evening of fine dining and live jazz' in medium text, and footnote 'Black tie optional. Doors open at 6 PM.' in small text

Evaluator (Qwen 3 VL)

"ANNUAL" is misspelled as "ANNUALAL". "evening" → "ewening", "and" → "aind", "live" → "liv", "jazz" → "ja:zz". "optional" → "optinnal". Multiple words are misspelled or distorted. Text is readable but errors are present. VERDICT: FAIL

Extreme 0%vs0%

local/flux-2-klein-4bFAIL

A vintage movie poster for 'THE LAST VOYAGE' featuring a tagline 'One ship. One chance. No turning back.' with credits block reading 'Directed by James Harlow. Starring Elena Voss, Marcus Chen, and Sofia Reyes. Music by Thomas Grant. A Paramount Pict

Evaluator (Qwen 3 VL)

Text contains multiple errors: “SHIIP” (should be SHIP), “HAPFLOW” (should be HARLOW), “TURBNNING” (should be TURNING), “SO FFTA REYESS” (should be SOFIA REYES), “Thamianoss Grant” (should be Thomas Grant), “Pridetioe Relese” (should be Paramount Pictures Release), and “A Ponmiing Summer 2025” (should be Coming Summer 2025). Many words are misspelled or partially illegible. VERDICT: FAIL

local/flux-2-klein-4bFAIL

A magazine cover titled 'NATIONAL EXPLORER' with cover lines 'The Hidden Temples of Cambodia — page 34', 'Interview: Dr. Maria Santos on Climate Action', 'Top 10 Trails You Must Hike This Year', and a barcode area with 'Issue 247 | March 2025 | $6.99

Evaluator (Qwen 3 VL)

Text errors: "Campabe" (should be "Cambodia"), "D. To. María" (should be "Dr. Maria Santos"), "Hln Vant" (illegible), "Chimate Action" (misspelled), "Top 10t Musk" (misspelled), "Hiear" (misspelled). Multiple words are incorrect or unreadable. VERDICT: FAIL

local/flux-2-klein-4bFAIL

A product label for 'GOLDEN HARVEST ORGANIC HONEY' with subtitle 'Raw & Unfiltered', weight '12 oz (340g)', description 'Pure wildflower honey harvested from sustainable apiaries in the Pacific Northwest. No additives or preservatives.', and small te

Evaluator (Qwen 3 VL)

“ORGANC” should be “ORGANIC”; “Unfiluted” and “unffiluttet” are misspellings; “harpatced” instead of “harvested”; “susaainmable” instead of “sustainable”; “apiaviss” instead of “apiaries”; “Pacific Nostkent” instead of “Pacific Northwest”; “Noetwbrlent” instead of “Northwest”; “additistes” instead of “additives”; “resibmantes” instead of “preservatives”; “Best befini” instead of “Best before”. All text is illegible or incorrect. VERDICT: FAIL

Spatial Reasoning 67%vs81%

Attributes BindingClick to expand67%vs100%

Attributes Binding compares whether local/flux-2-klein-4b or local/krea-2-turbo better keeps each requested color, material, or property attached to the correct object.

Easy 100%vs100%

local/flux-2-klein-4bPASS

A red ball and a blue cube sitting side by side on a white surface

Evaluator (Qwen 3.5 122B)

The image shows two objects on a white surface. On the left is a red sphere (ball). On the right is a blue cube. Both objects are correctly colored and shaped as specified. They are positioned side by side. All attributes — color, shape, and spatial arrangement — match the prompt exactly. No swaps or errors observed. VERDICT: PASS

local/flux-2-klein-4bPASS

A green apple and an orange pumpkin on a wooden cutting board

Evaluator (Qwen 3.5 122B)

The image shows a green apple and an orange pumpkin on a wooden cutting board. The apple is correctly green, round, and has a stem. The pumpkin is correctly orange, round with ridges, and has a stem. Both objects are appropriately sized relative to each other and placed on the board as described. No attribute swaps or misidentifications are present. VERDICT: PASS

local/flux-2-klein-4bPASS

A yellow taxi and a black limousine parked next to each other on a street

Evaluator (Qwen 3.5 122B)

The image shows two vehicles parked side-by-side on a city street. On the left is a yellow sedan with "4000" markings and a taxi roof sign — clearly a yellow taxi. On the right is a long, black luxury vehicle with a stretched body, chrome grille, and limousine-style proportions — clearly a black limousine. Colors, sizes, and shapes are correctly assigned: yellow taxi (smaller, standard sedan shape), black limousine (larger, elongated). No attribute swaps. VERDICT: PASS

Medium 67%vs100%

local/flux-2-klein-4bFAIL

A large red circle, a small blue triangle, and a medium green star arranged on a white canvas

Evaluator (Qwen 3.5 122B)

The image shows a large red circle and a medium green star on a white canvas, as requested. However, the third object is a blue triangle that is not small — it appears to be medium-sized or even large relative to the star. The prompt specified a "small blue triangle," but the triangle in the image is not small. Therefore, the size attribute for the blue triangle is incorrect. VERDICT: FAIL

local/flux-2-klein-4bPASS

A tall thin white candle, a short fat red candle, and a medium spiral yellow candle on a shelf

Evaluator (Qwen 3.5 122B)

The image shows three candles on a wooden shelf: 1. Left: Tall, thin, white candle — matches description. 2. Middle: Short, fat, red candle — matches description. 3. Right: Medium-sized, spiral-shaped, yellow candle — matches description. All attributes (color, size, shape) are correctly bound to each object with no swaps. VERDICT: PASS

local/flux-2-klein-4bPASS

A big brown horse, a small white rabbit, and a medium-sized black cat standing in a meadow

Evaluator (Qwen 3.5 122B)

The image shows a large brown horse with a white star on its forehead, standing in a grassy meadow. In front of it are a small white rabbit and a medium-sized black cat. All objects match the prompt’s descriptions: the horse is big and brown, the rabbit is small and white, and the cat is medium-sized and black. No attributes are swapped or misassigned. VERDICT: PASS

Hard 33%vs100%

local/flux-2-klein-4bPASS

A pink fire truck, a blue banana, and a green sun in a surreal landscape

Evaluator (Qwen 3.5 122B)

The image shows a pink fire truck with correct structural details (ladder, lights, wheels). A large blue banana is present in the foreground, matching the requested color and object type. The sun in the sky is green and labeled "LION" — though the label is extraneous, the color and shape match the prompt. All three core objects are correctly colored and shaped as requested, with no attribute swaps. VERDICT: PASS

local/flux-2-klein-4bFAIL

A purple carrot, a red tree with red leaves, and a white watermelon sliced open

Evaluator (Qwen 3.5 122B)

The image contains: 1. A purple carrot with green leafy top — correct color and shape. 2. A tree with red leaves and brown trunk — correct color and form. 3. A watermelon that is green-striped on the outside, not white; it has a slice removed but the interior is not visible as “white” — fails the “white watermelon” attribute. The prompt specified a “white watermelon,” which typically implies a rare variety with pale flesh or rind — neither is accurately depicted here. The watermelon shown is st

local/flux-2-klein-4bFAIL

An orange elephant, a yellow penguin, and a silver strawberry arranged on a table

Evaluator (Qwen 3.5 122B)

The image shows three objects on a table: an orange elephant, a yellow penguin, and a silver penguin. The prompt requested an orange elephant, a yellow penguin, and a silver strawberry. The third object is incorrectly rendered as a silver penguin instead of a silver strawberry. Additionally, the yellow penguin has a small red dot (possibly meant to be a strawberry?) but it’s not shaped or positioned as a strawberry. Attributes are swapped or missing. VERDICT: FAIL

CompositionalityClick to expand89%vs89%

Compositionality compares whether local/flux-2-klein-4b or local/krea-2-turbo better combines the requested subjects, actions, props, and scene details into one coherent image.

Easy 100%vs100%

local/flux-2-klein-4bPASS

A tabby cat wearing oversized red sunglasses

Evaluator (Qwen 3.5 122B)

The image contains all described elements: 1. Tabby cat - Present (clearly visible with tabby markings) 2. Oversized red sunglasses - Present (large red frames covering the cat's eyes) 3. Coherent combination - Present (the sunglasses fit naturally on the cat's face) All elements are present and properly combined. The unusual concept is successfully realized in the image. VERDICT: PASS

local/flux-2-klein-4bPASS

A golden retriever wearing a tiny cowboy hat

Evaluator (Qwen 3.5 122B)

The image shows a golden retriever wearing a brown cowboy hat. The hat is not tiny—it appears proportionate to the dog’s head, not comically small as “tiny” might imply. However, all core elements (golden retriever, cowboy hat) are present and coherently combined. Since “tiny” is subjective and the hat fits naturally, the spirit of the prompt is met. VERDICT: PASS

local/flux-2-klein-4bPASS

A penguin holding a colorful umbrella

Evaluator (Qwen 3.5 122B)

The image contains: - A penguin: Present and clearly visible. - A colorful umbrella: Present with rainbow-colored panels. - Penguin holding the umbrella: Coherently combined — the penguin’s flippers are gripping the umbrella handle. All described elements are present and logically integrated. The composition is coherent and matches the prompt’s unusual concept. VERDICT: PASS

Medium 100%vs100%

local/flux-2-klein-4bPASS

A medieval knight in full armor sitting at a desk typing on a modern laptop

Evaluator (Qwen 3.5 122B)

The image shows a medieval knight in full armor sitting at a wooden desk, typing on a modern laptop. All described elements are present and coherently combined: - Medieval knight in full armor: Present - Sitting at a desk: Present - Typing on a modern laptop: Present The combination is logical and visually coherent. VERDICT: PASS

local/flux-2-klein-4bPASS

A Roman gladiator taking a selfie with a smartphone in the Colosseum

Evaluator (Qwen 3.5 122B)

The image contains all described elements: 1. Roman gladiator - Present (man in armor with helmet) 2. Selfie action - Present (holding phone up to take photo) 3. Smartphone - Present (modern black smartphone) 4. Colosseum - Present (ancient Roman amphitheater background) All elements are coherently combined in a single scene. The anachronistic combination of ancient gladiator with modern technology is clearly depicted as requested. VERDICT: PASS

local/flux-2-klein-4bPASS

An astronaut in a spacesuit riding a bicycle through a park

Evaluator (Qwen 3.5 122B)

The image shows an astronaut in a full spacesuit riding a bicycle on a paved path through a park with trees and benches. All described elements — astronaut, spacesuit, bicycle, park setting — are clearly present and coherently combined. The scene is visually consistent and logically assembled, despite the unusual concept. VERDICT: PASS

Hard 67%vs67%

local/flux-2-klein-4bPASS

An octopus wearing a top hat and monocle playing chess on the surface of the moon with Earth visible in the background

Evaluator (Qwen 3.5 122B)

The image contains all described elements: - Octopus: Present (large, reddish-brown) - Top hat: Present (black, on octopus head) - Monocle/Glasses: Present (round wire-rimmed glasses on octopus face) - Chess: Present (chessboard with pieces in front of octopus) - Moon surface: Present (gray, rocky terrain) - Earth in background: Present (visible in upper left) All elements are coherently combined in a surreal but visually consistent composition. VERDICT: PASS

local/flux-2-klein-4bPASS

A dinosaur in a business suit giving a PowerPoint presentation in a modern office while drinking coffee

Evaluator (Qwen 3.5 122B)

The image contains all described elements: 1. Dinosaur - Present (T-Rex head/body) 2. Business suit - Present (dark suit, white shirt, blue tie) 3. PowerPoint presentation - Present (projector screen with text in background) 4. Modern office - Present (desks, chairs, windows, office equipment) 5. Drinking coffee - Present (holding white coffee cup) All elements are coherently combined in a single scene. VERDICT: PASS

local/flux-2-klein-4bFAIL

A robot dressed as a pirate sailing a paper boat on a lake of lava while juggling three kittens

Evaluator (Qwen 3.5 122B)

Let me examine each element from the prompt: 1. Robot dressed as a pirate - PRESENT (metallic robot with pirate hat, coat, and accessories) 2. Paper boat - PRESENT (boat appears to be made of paper/cardboard material) 3. Lake of lava - PRESENT (glowing orange/red molten surface with volcanic background) 4. Juggling three kittens - FAIL (only two kittens are visible, one in each hand, not three, and they're being held rather than juggled) The core concept is mostly present, but the specific cou

CountingClick to expand33%vs56%

Counting compares whether local/flux-2-klein-4b or local/krea-2-turbo better creates the exact number of requested objects without additions or omissions.

Easy 67%vs100%

local/flux-2-klein-4bPASS

Exactly 3 red apples arranged on a wooden table

Evaluator (Qwen 3.5 122B)

The image displays exactly three red apples placed side by side on a wooden table. The apples are whole, with stems visible, and show natural color variations including yellow streaks. The table surface is textured wood with visible grain and wear. The count matches the prompt’s request precisely: three red apples on a wooden table. No extra or missing objects. VERDICT: PASS

local/flux-2-klein-4bPASS

Exactly 5 yellow rubber ducks sitting in a row on a bathroom shelf

Evaluator (Qwen 3.5 122B)

The image shows exactly five yellow rubber ducks arranged in a row on a white bathroom sink shelf. Each duck is identical in appearance, with orange beaks and black eyes. The count matches the requested number precisely. No other objects are miscounted or missing. The scene is clean, well-lit, and accurately reflects the prompt. VERDICT: PASS

local/flux-2-klein-4bFAIL

Exactly 4 white candles standing on a dark surface

Evaluator (Qwen 3.5 122B)

The image shows exactly two white candles standing on a dark surface. The prompt requested four candles. Since the count does not match (2 ≠ 4), the core criterion is not met. VERDICT: FAIL

Medium 33%vs67%

local/flux-2-klein-4bFAIL

2 orange cats and 3 black dogs sitting together on a grassy lawn

Evaluator (Qwen 3.5 122B)

The image shows 2 orange cats and 2 black dogs sitting on a grassy lawn. The prompt requested 3 black dogs, but only 2 are present. Therefore, the count does not exactly match the request. VERDICT: FAIL

local/flux-2-klein-4bFAIL

4 red roses and 2 white daisies in a glass vase on a table

Evaluator (Qwen 3.5 122B)

The image shows a bouquet with more than 4 red roses — at least 6 are visible. There are also more than 2 white daisies — at least 3–4 can be seen. The vase is glass and on a table, which matches. However, the flower counts do not match the prompt’s request of exactly 4 roses and 2 daisies. VERDICT: FAIL

local/flux-2-klein-4bPASS

1 blue butterfly and 3 yellow butterflies resting on a garden fence

Evaluator (Qwen 3.5 122B)

The image shows exactly one blue butterfly on the left and three yellow butterflies to its right, all resting on a wooden garden fence. The count matches the prompt’s request precisely: 1 blue + 3 yellow butterflies. No extra or missing butterflies are present. VERDICT: PASS

Hard 0%vs0%

local/flux-2-klein-4bFAIL

Exactly 9 goldfish swimming in a large round fishbowl

Evaluator (Qwen 3.5 122B)

The image shows a round fishbowl containing multiple goldfish. Counting carefully: there are 8 clearly visible goldfish (orange/gold with white tails) and 1 smaller dark-colored fish (possibly black or dark blue) in the center. That makes 9 total fish, but only 8 are goldfish. The prompt requested “Exactly 9 goldfish” — since one is not gold, the count does not match. VERDICT: FAIL

local/flux-2-klein-4bFAIL

A fruit bowl containing exactly 3 bananas, 4 oranges, and 2 apples

Evaluator (Qwen 3.5 122B)

The image shows a fruit bowl with: - 3 bananas (visible, clustered together) - 3 oranges (clearly visible; one partially hidden may exist but not confirmed) - 2 apples (one red, one yellow-red) The prompt requested exactly 4 oranges — only 3 are clearly visible. The banana and apple counts match. Since the orange count does not meet the exact requirement, the core criterion fails. VERDICT: FAIL

local/flux-2-klein-4bFAIL

Exactly 11 colorful balloons floating against a clear blue sky

Evaluator (Qwen 3.5 122B)

The image displays colorful balloons against a clear blue sky. Counting them: there are 15 balloons visible — red, yellow, purple, cyan, pink, magenta, light blue, green, lime green, orange, and dark red/purple variants. The prompt requested exactly 11 balloons. Since 15 ≠ 11, the count does not match. VERDICT: FAIL

NegationClick to expand44%vs56%

Negation compares whether local/flux-2-klein-4b or local/krea-2-turbo better respects what should be absent from the image.

Easy 67%vs100%

local/flux-2-klein-4bFAIL

A tropical beach scene with no people

Evaluator (Qwen 3.5 122B)

The image depicts a tropical beach with white sand, turquoise water, and palm trees under a blue sky. Upon close inspection of the shoreline in the distance, there appears to be a very small, indistinct figure standing near the water’s edge — possibly a person. While extremely faint and distant, its presence suggests the excluded element (people) may not be fully absent. Given the strict criterion (“no people”), even ambiguous distant figures should trigger failure. VERDICT: FAIL

local/flux-2-klein-4bPASS

A cat lying on a couch without a collar

Evaluator (Qwen 3.5 122B)

The image shows two cats lying on a gray couch. The cat in the foreground is curled up with eyes closed, displaying calico markings (white, orange, and dark patches). The second cat, partially visible on the left, also has similar coloring. Critically, neither cat is wearing a collar — both necks are bare and unobstructed. Since the prompt explicitly excluded “a cat lying on a couch without a collar,” and no collars are present, the excluded element is indeed absent. VERDICT: PASS

local/flux-2-klein-4bPASS

A classroom with desks and a chalkboard but no students

Evaluator (Qwen 3.5 122B)

The image shows an empty classroom with wooden desks and chairs arranged in rows. A large green chalkboard is mounted on the wall, with faint chalk markings visible. There are windows on the right side letting in daylight. No students or people of any kind are present in the scene. The excluded element (students) is fully absent. VERDICT: PASS

Medium 67%vs33%

local/flux-2-klein-4bFAIL

A sandy beach with palm trees, seashells, and a beach umbrella but no water or ocean visible

Evaluator (Qwen 3.5 122B)

The image shows a sandy beach with palm trees, seashells, and a beach umbrella. However, in the background on the left side, there is clearly visible water — specifically, the ocean meeting the horizon under a blue sky. Since the prompt explicitly excluded “no water or ocean visible,” and the ocean is present, this violates the constraint. VERDICT: FAIL

local/flux-2-klein-4bPASS

An office workspace with a desk, chair, bookshelf, and lamp but no computers or screens

Evaluator (Qwen 3.5 122B)

The image shows an office workspace with a wooden desk, black office chair, bookshelf filled with books and decor, and two lamps (one large desk lamp and one smaller one). There are no computers, monitors, tablets, or any type of screen visible in the scene. The excluded elements are truly absent. VERDICT: PASS

local/flux-2-klein-4bPASS

A kitchen with pots, pans, a stove, and a cutting board but no food

Evaluator (Qwen 3.5 122B)

The image shows a kitchen scene with pots, pans, a stove, and cutting boards as described. However, there is no visible food present — no ingredients, cooked meals, or raw items in any of the cookware or on surfaces. The excluded element (“food”) is truly absent. All other elements match the prompt’s inclusion criteria. VERDICT: PASS

Hard 0%vs33%

local/flux-2-klein-4bFAIL

A pepperoni pizza with tomato sauce, basil, and crust but absolutely no cheese

Evaluator (Qwen 3.5 122B)

The image clearly shows a pepperoni pizza with tomato sauce, basil leaves, and a baked crust. However, there is abundant melted white cheese covering the surface beneath the pepperoni slices — this directly contradicts the prompt’s explicit exclusion of cheese. Since cheese is visibly present, the core constraint is violated. VERDICT: FAIL

local/flux-2-klein-4bFAIL

A car body with windows, doors, and headlights but no wheels

Evaluator (Qwen 3.5 122B)

The image shows a white sedan with a complete car body, including windows, doors, and headlights. However, the excluded element — wheels — is clearly present. Both front and rear wheels are visible, attached to the car’s body, with tires and rims fully rendered. Since the prompt explicitly excluded wheels and they are present, the core criterion is violated. VERDICT: FAIL

local/flux-2-klein-4bFAIL

A human face with eyes, nose, and mouth but no eyebrows

Evaluator (Qwen 3.5 122B)

The image shows a human face with clearly visible eyes, nose, and mouth. However, the face also has prominent, well-defined eyebrows — dark, arched, and natural-looking — positioned above each eye. Since the prompt explicitly excluded eyebrows, and they are clearly present in the image, the exclusion criterion is violated. VERDICT: FAIL

Relative PositionClick to expand83%vs100%

Relative Position compares whether local/flux-2-klein-4b or local/krea-2-turbo more faithfully places objects in the requested spatial relationship.

Easy 100%vs100%

local/flux-2-klein-4bPASS

A cat sitting on top of a cardboard box

Evaluator (Qwen 3.5 122B)

The image shows a cat sitting directly on top of a cardboard box. The cat is centered on the box’s upper surface, with its paws resting near the front edge. The box sits on a rug on the floor. All spatial relationships described — “on top of” — are accurately represented. No contradictions or misplacements observed. VERDICT: PASS

local/flux-2-klein-4bPASS

A red ball underneath a wooden chair

Evaluator (Qwen 3.5 122B)

The image shows a wooden chair with a red ball positioned directly underneath it, between the legs. The spatial relationship described in the prompt — “a red ball underneath a wooden chair” — is accurately represented. The ball is clearly below the seat and within the footprint of the chair’s base. All positional descriptors are correct. VERDICT: PASS

local/flux-2-klein-4bPASS

A coffee mug next to an open book on a desk

Evaluator (Qwen 3.5 122B)

The image shows a coffee mug positioned to the left of an open book, both resting on a wooden desk surface. The spatial arrangement matches the prompt’s description: the mug is “next to” the book, and both are “on” the desk. No incorrect positional relationships (e.g., above, below, under) are present. All described spatial relations are accurate. VERDICT: PASS

Medium 100%vs100%

local/flux-2-klein-4bPASS

A bird perched on a branch, the branch extending from a tree next to a park bench

Evaluator (Qwen 3.5 122B)

The image shows two birds perched on a branch extending from a tree on the left. The branch is positioned above and to the left of a park bench situated in the lower right portion of the frame. The spatial relationships described — bird on branch, branch from tree, tree next to bench — are accurately represented. All positional descriptors (on, above, left, next to) align with the visual layout. VERDICT: PASS

local/flux-2-klein-4bPASS

A cat sitting on a pillow, the pillow placed on a wooden chair

Evaluator (Qwen 3.5 122B)

The image shows a cat sitting on a white pillow, which is placed on a wooden chair. The spatial relationships described in the prompt are accurately represented: the cat is on the pillow, and the pillow is on the chair. All positional relationships are correct. VERDICT: PASS

local/flux-2-klein-4bPASS

A candle on top of a stack of books, the books next to a potted plant

Evaluator (Qwen 3.5 122B)

The image shows a lit candle placed directly on top of a stack of three books. To the right of the book stack is a potted plant in a terracotta pot. The spatial relationships described — “candle on top of books” and “books next to potted plant” — are accurately represented. All positional descriptors (on, next to) match the visual layout. VERDICT: PASS

Hard 33%vs100%

local/flux-2-klein-4bFAIL

Five chess pieces on a board: a king in the center, a queen to its left, a bishop to its right, a rook above the king, and a knight below the king

Evaluator (Qwen 3.5 122B)

The image shows five black chess pieces arranged in a cross pattern on a white background. The central piece is a queen (with a visible gold insert), not a king. To the left is a king, to the right is a knight, above is a rook, and below is another knight. The prompt incorrectly identifies the center piece as a king and claims there’s a bishop to the right — but it’s a knight. Also, no bishop is present at all. Spatial relationships are mostly correct except for piece identity errors. VERDICT:

local/flux-2-klein-4bPASS

A desk scene: laptop in the center, coffee mug to the left, phone to the right, lamp above, and notebook below

Evaluator (Qwen 3.5 122B)

The image shows a desk scene with a laptop centered on the wooden surface. To the left of the laptop is a white coffee mug filled with dark liquid. To the right is a black smartphone. Above the laptop, a black lamp hangs down, casting light on the scene. Below the laptop (in the foreground) lies an open spiral-bound notebook. All specified spatial relationships — left, right, above, below, and center — are accurately represented. VERDICT: PASS

local/flux-2-klein-4bFAIL

Five animals in a field: a horse on the far left, a cow to the right of the horse, a sheep in the center, a goat to the right of the sheep, and a dog on the far right

Evaluator (Qwen 3.5 122B)

The image shows four animals in a field, not five as specified. From left to right: a horse, a sheep, a goat, and a dog. There is no cow present. The prompt incorrectly lists a cow to the right of the horse — instead, the sheep occupies that position. Also, the total count is wrong. Spatial relationships for existing animals are mostly accurate except for the missing/extra animal specification. VERDICT: FAIL

Extreme 100%vs100%

local/flux-2-klein-4bPASS

A landscape with a red barn in the foreground at large scale, a windmill in the midground at medium scale, and snow-capped mountains in the background at small scale, all with consistent depth perspective

Evaluator (Qwen 3.5 122B)

The image shows a red barn in the foreground, a windmill in the midground, and snow-capped mountains in the background. The spatial arrangement matches the prompt: barn (large, front), windmill (medium, middle), mountains (small, back). Depth perspective is consistent. All positional relationships are correctly represented. VERDICT: PASS

local/flux-2-klein-4bPASS

A beach scene with a sandcastle in the foreground, a surfer on the waves in the midground, and a cruise ship on the horizon in the background, sizes decreasing with distance

Evaluator (Qwen 3.5 122B)

The image shows a sandcastle in the foreground, a surfer on a wave in the midground, and a cruise ship on the horizon in the background — all spatially arranged as described. The sandcastle is closest to the viewer, the surfer is further out in the water, and the ship is distant on the horizon. Sizes decrease with distance, matching the prompt. All positional relationships (foreground/midground/background) are correctly represented. VERDICT: PASS

local/flux-2-klein-4bPASS

A city park with a park bench and pigeons in the foreground, a fountain in the midground, and skyscrapers in the background, each layer progressively smaller

Evaluator (Qwen 3.5 122B)

The image correctly depicts a city park scene with a park bench and pigeons in the foreground, a fountain in the midground, and skyscrapers in the background. The spatial arrangement follows the prompt: objects are layered progressively smaller from front to back. Pigeons are on the ground near the bench; the fountain is behind them; skyscrapers rise above the trees in the distance. All positional relationships align with the description. VERDICT: PASS

Scale & ProportionsClick to expand78%vs78%

Scale & Proportions compares whether local/flux-2-klein-4b or local/krea-2-turbo better keeps requested subjects and objects at believable relative sizes.

Easy 100%vs100%

local/flux-2-klein-4bPASS

A person standing next to a yellow school bus, showing realistic relative scale

Evaluator (Qwen 3.5 122B)

The image shows a person standing next to a yellow school bus. The person’s height relative to the bus appears realistic — approximately half the height of the bus’s side, which is consistent with real-world proportions. The bus’s wheels, windows, and overall dimensions also align with standard school bus scales. No obvious distortions or scaling errors are present. VERDICT: PASS

local/flux-2-klein-4bPASS

A ladybug sitting on a large green leaf, shown at realistic proportions

Evaluator (Qwen 3.5 122B)

The ladybug appears appropriately sized relative to the leaf it is resting on. The leaf shows detailed venation and texture consistent with a macro photograph, and the insect’s scale matches real-world proportions for a ladybug on a broad leaf. No unnatural size distortions are evident. VERDICT: PASS

local/flux-2-klein-4bPASS

A child holding a basketball, both at correct relative sizes

Evaluator (Qwen 3.5 122B)

The child’s hands are holding the basketball in a natural, proportionate manner — the ball fits comfortably within his grip, neither too large nor too small for his age. The size of the basketball relative to the child’s torso and arms is realistic for a young boy. No anatomical or object-scale distortions are present. VERDICT: PASS

Medium 100%vs67%

local/flux-2-klein-4bPASS

A tiny person standing on the rim of a coffee cup, looking down into it

Evaluator (Qwen 3.5 122B)

The image shows a miniature figurine of a person standing on the rim of a coffee cup, with its reflection visible in the liquid below. The scale is intentionally surreal — the person is tiny relative to the cup, which aligns with the prompt’s description. Proportions are internally consistent within the scene’s logic. No physical impossibilities contradict the stated size relationship. VERDICT: PASS

local/flux-2-klein-4bPASS

A giant tabby cat walking between city skyscrapers like a kaiju

Evaluator (Qwen 3.5 122B)

The image depicts a giant tabby cat walking down a city street, towering over cars and buildings. The cat’s size relative to the skyscrapers and vehicles is exaggerated in a way that matches the “kaiju” description — it looms over the urban environment as if it were a monster. While physically impossible, the proportions are internally consistent with the prompt’s intent: the cat is clearly meant to be colossal compared to its surroundings. VERDICT: PASS

local/flux-2-klein-4bPASS

A miniature house sitting on the palm of someone's hand

Evaluator (Qwen 3.5 122B)

The image shows a miniature house resting on an open human palm. The house is appropriately scaled to fit comfortably within the hand, with its base spanning roughly the width of the palm and its height comparable to the length of the fingers. The proportions between the house features (windows, door, chimney) and the hand are consistent with a realistic miniature model held in a person’s hand. VERDICT: PASS

Hard 33%vs67%

local/flux-2-klein-4bFAIL

A group photo of an elephant, a horse, a dog, a cat, and a mouse, all standing in a line with correct real-world proportional sizes

Evaluator (Qwen 3.5 122B)

The image shows an elephant, horse, dog, and cat in a line with generally correct relative sizes — elephant largest, then horse, then dog, then cat. However, the prompt explicitly includes a “mouse,” which is completely absent from the image. Since one of the five specified animals is missing, the core requirement of depicting all listed animals with correct proportions is not met. VERDICT: FAIL

local/flux-2-klein-4bFAIL

A dinner table scene with a plate, fork, knife, wine glass, salt shaker, and peppercorn, all at realistic relative sizes

Evaluator (Qwen 3.5 122B)

The image shows a table setting with a plate, fork, knives, wine glass, salt and pepper shakers. The relative sizes appear mostly realistic — the wine glass is appropriately taller than the shakers, the plate is larger than the shakers, and cutlery proportions are plausible. However, the peppercorns on the plate are unusually large relative to the mint leaf and plate surface, suggesting a scaling inconsistency. This violates the “realistic relative sizes” criterion for all objects mentioned. VE

local/flux-2-klein-4bPASS

A farm scene with a barn, tractor, farmer, cow, chicken, and a fence post, all at correct proportional scale

Evaluator (Qwen 3.5 122B)

The image shows a farm scene with a barn, tractor, cow, chickens, and fence. The cow is roughly the same height as the tractor’s cabin, which is plausible. The chickens are appropriately smaller than the cow. The barn is large relative to the tractor and animals, which fits real-world proportions. The fence posts are scaled correctly relative to the animals. No obvious size distortions or inconsistencies are present. VERDICT: PASS

Human realism 81%vs79%

Faces & ExpressionsClick to expand100%vs92%

Faces & Expressions compares whether local/flux-2-klein-4b or local/krea-2-turbo creates more realistic faces with the requested age, emotion, and expression cues.

Easy 100%vs100%

local/flux-2-klein-4bPASS

A close-up portrait of a young woman with a warm natural smile

Evaluator (Qwen 3 VL)

The face is symmetric with natural, correctly proportioned features. The warm, natural smile is clearly readable, showing genuine teeth and crinkled eyes. No distortions are visible. Lighting and texture appear realistic. The portrait successfully captures the requested emotion and aesthetic. VERDICT: PASS

local/flux-2-klein-4bPASS

A portrait of a middle-aged man with a calm, serious expression

Evaluator (Qwen 3 VL)

Face is symmetric with natural, correctly proportioned features. Expression is calm and serious, with subtle furrows and neutral mouth conveying intent. No visible distortions or artifacts. Lighting and texture are realistic. Meets all criteria for the prompt. VERDICT: PASS

local/flux-2-klein-4bPASS

A headshot of a teenager with a relaxed, neutral face

Evaluator (Qwen 3 VL)

The face is symmetric with proportionate, correctly rendered features. The teenager’s expression is neutral and relaxed, matching the prompt. No distortions are visible; eyes, nose, and mouth are natural and well-defined. Lighting and focus are even, enhancing clarity. VERDICT: PASS

Medium 100%vs100%

local/flux-2-klein-4bPASS

A close-up of a person's face showing clear disgust, nose wrinkled and lip curled

Evaluator (Qwen 3 VL)

Face is largely symmetric with correct anatomical features. Disgust is clearly conveyed: nose wrinkled, lips curled, and brows furrowed. Minor distortions include slightly unnatural skin texture and exaggerated lower lip puckering. Overall, the emotion is unmistakable and the facial structure is plausible. VERDICT: PASS

local/flux-2-klein-4bPASS

A portrait capturing genuine surprise — raised eyebrows, wide eyes, slightly open mouth

Evaluator (Qwen 3 VL)

Face is symmetric with correctly rendered features. Eyes are wide, eyebrows raised, mouth slightly open — all clearly conveying genuine surprise. No noticeable distortions. Lighting and detail are natural. The emotion is unmistakably readable and matches the prompt precisely. VERDICT: PASS

local/flux-2-klein-4bPASS

A face showing quiet melancholy — downcast eyes, slightly drawn mouth, distant gaze

Evaluator (Qwen 3 VL)

Face is symmetric with correctly rendered features. Eyes are downcast, mouth slightly drawn, and gaze distant, conveying quiet melancholy. Slight skin texture artifacts and minor asymmetry in eyebrow shape are present but don’t obscure the emotion. Overall, the intended mood is clearly readable. VERDICT: PASS

Hard 100%vs67%

local/flux-2-klein-4bPASS

An elderly woman in her 80s with deep wrinkles and white hair, smiling warmly with kind eyes

Evaluator (Qwen 3 VL)

The face is symmetric with natural, aged features: deep wrinkles, white hair, and warm, kind eyes. The smile is clearly readable and genuine, conveying warmth. No distortions are present; details like skin texture and lighting are realistic. The image accurately fulfills the prompt’s description. VERDICT: PASS

local/flux-2-klein-4bPASS

A 5-year-old child with round cheeks and large eyes, crying with tears streaming down their face

Evaluator (Qwen 3 VL)

The child’s face is symmetric with natural features: round cheeks, large eyes, and tears streaming down. The crying expression is clearly readable, with furrowed brows, an open mouth, and visible tears. No distortions are present. Features align perfectly with the prompt. VERDICT: PASS

local/flux-2-klein-4bPASS

An elderly man in his 70s with age spots and thinning gray hair, looking skeptically over reading glasses

Evaluator (Qwen 3 VL)

Face is symmetric with realistic wrinkles, age spots, and thinning gray hair. Skeptical expression is clearly readable via furrowed brow and downturned mouth. No distortions or unnatural features. Glasses fit naturally. All prompt elements are accurately rendered. VERDICT: PASS

Extreme 100%vs100%

local/flux-2-klein-4bPASS

A group of three friends at a table: one laughing with head tilted back, one looking shocked with wide eyes, and one rolling their eyes with a smirk

Evaluator (Qwen 3 VL)

Faces are symmetric with correct cartoon features. Emotions are clearly readable: left (laughing, head tilted), center (shocked, wide eyes), right (rolling eyes, smirk). No distortions. All three expressions match the prompt precisely. VERDICT: PASS

local/flux-2-klein-4bPASS

A family photo with four people: a father beaming with pride, a mother smiling gently, a teenager looking bored, and a child grinning excitedly

Evaluator (Qwen 3 VL)

Faces show natural symmetry and correct anatomical features. Emotions are clearly readable: father beams, mother smiles gently, teenager looks bored (eyes half-closed, slight smirk), child grins excitedly. No significant distortions. All four individuals match the prompt’s emotional descriptions accurately. VERDICT: PASS

local/flux-2-klein-4bPASS

Three coworkers in a meeting: one looking frustrated with furrowed brows, one nodding with a pleased expression, and one looking confused with a tilted head

Evaluator (Qwen 3 VL)

Faces are symmetric with correct features. Emotions are clearly readable: left man shows frustration (furrowed brows), center woman shows pleased (smiling), right man shows confusion (tilted head, raised eyebrows). No distortions. All three expressions match the prompt precisely. VERDICT: PASS

Full BodyClick to expand100%vs92%

Full Body compares whether local/flux-2-klein-4b or local/krea-2-turbo renders full-body people with more natural anatomy, clothing, pose, and proportions.

Easy 100%vs100%

local/flux-2-klein-4bPASS

A woman standing upright in a relaxed pose, arms at her sides, wearing jeans and a white t-shirt

Evaluator (Qwen 3 VL)

The woman stands upright with arms at her sides, wearing a white t-shirt and jeans as described. Limb count and proportions are anatomically correct. Pose is relaxed and recognizable. No distortions or anomalies are visible. All elements match the prompt precisely. VERDICT: PASS

local/flux-2-klein-4bPASS

A man standing straight facing the camera, hands in pockets, normal proportions

Evaluator (Qwen 3 VL)

The man stands straight, facing forward, with hands in pockets. Limb count and proportions are anatomically correct. Pose is clearly recognizable as described. No distortions or abnormalities are visible. All elements align with the prompt. VERDICT: PASS

local/flux-2-klein-4bPASS

A person standing in a neutral pose in a park, full body visible from head to feet

Evaluator (Qwen 3 VL)

The person’s full body is visible from head to feet, standing in a neutral pose. Limb count and proportions are anatomically correct. The pose is recognizable as standing still, hands relaxed at sides. No significant distortions are present. The scene matches the prompt accurately. VERDICT: PASS

Medium 100%vs100%

local/flux-2-klein-4bPASS

A woman mid-stride running through a park, hair and clothes showing motion

Evaluator (Qwen 3 VL)

The woman’s limbs are correctly counted and proportioned. Her running pose is anatomically plausible, with dynamic motion in hair and clothing. No distortions are visible. The action matches the prompt: mid-stride running through a park. All elements are coherent and realistic. VERDICT: PASS

local/flux-2-klein-4bPASS

A man captured mid-jump over a puddle, body in the air with bent knees

Evaluator (Qwen 3 VL)

The man’s limbs are correctly counted and proportioned. His pose—bent knees, arms outstretched mid-air—is anatomically plausible for a jump. The action is clearly recognizable: he’s leaping over a puddle, with water splashing and ripples visible. No significant distortions are present. The scene is coherent and realistic. VERDICT: PASS

local/flux-2-klein-4bPASS

A dancer in a dynamic spin, dress flowing outward with one leg extended

Evaluator (Qwen 3 VL)

The dancer’s limbs are correctly counted and proportioned. The pose—dynamic spin with one leg extended—is anatomically plausible and recognizable. The dress flows naturally with the motion. No significant distortions in anatomy or pose are present. The action matches the prompt precisely. VERDICT: PASS

Hard 100%vs67%

local/flux-2-klein-4bPASS

A person in a yoga tree pose — standing on one leg with the other foot pressed to the inner thigh, arms raised overhead

Evaluator (Qwen 3 VL)

The person stands on one leg with the other foot pressed to the inner thigh, arms raised overhead in prayer position — correctly depicting Tree Pose. Limb count and proportions are accurate. No anatomical distortions are visible. The pose is clearly recognizable and executed properly. VERDICT: PASS

local/flux-2-klein-4bPASS

A rock climber clinging to a cliff face with arms and legs extended to different holds

Evaluator (Qwen 3 VL)

The climber has two arms and two legs, with proportions and pose anatomically plausible for rock climbing. His limbs are extended to different holds as described. No significant distortions are visible. The action is clearly recognizable as dynamic rock climbing. All elements align with the prompt. VERDICT: PASS

local/flux-2-klein-4bPASS

A cellist seated with correct posture, legs around the instrument, bow arm extended

Evaluator (Qwen 3 VL)

The cellist’s posture and limb count are anatomically correct. Legs are wrapped around the instrument, bow arm is extended, and the cello is properly supported. Proportions are natural, and the playing pose is clearly recognizable. No significant distortions are present. VERDICT: PASS

Extreme 100%vs100%

local/flux-2-klein-4bPASS

Two tango dancers in a close embrace, the man dipping the woman backward with intertwined legs

Evaluator (Qwen 3 VL)

The dancers’ limbs are correctly proportioned and anatomically plausible. The woman is dipped backward with intertwined legs, matching the prompt. Her left leg is lifted, and her right foot is planted. The man supports her securely. No distortions are visible. The pose is recognizable as a tango dip. VERDICT: PASS

local/flux-2-klein-4bPASS

Two soccer players mid-tackle, one sliding and the other jumping over with the ball at their feet

Evaluator (Qwen 3 VL)

Both players show correct limb count and proportions. The sliding player’s body posture and the jumping player’s mid-air pose are anatomically plausible and recognizable as a tackle. The ball is near the jumping player’s feet as described. No significant distortions are visible. The action is clearly identifiable as a soccer tackle. VERDICT: PASS

local/flux-2-klein-4bPASS

Two gymnasts performing a synchronized handstand, side by side with identical body alignment

Evaluator (Qwen 3 VL)

Both gymnasts show correct limb count and proportions. Their synchronized handstands are anatomically accurate, with straight legs and aligned bodies. The pose is recognizable as a classic handstand with feet touching. No distortions are visible. The symmetry and alignment match the prompt precisely. VERDICT: PASS

HandsClick to expand33%vs42%

Hands compares whether local/flux-2-klein-4b or local/krea-2-turbo draws more anatomically plausible hands, fingers, and gestures.

Easy 33%vs67%

local/flux-2-klein-4bFAIL

A person standing with both hands visible at their sides, fingers relaxed and naturally spread

Evaluator (Qwen 3.5 122B)

The left hand (viewer’s right) shows 4 fingers clearly; the thumb is partially obscured but likely present. The right hand (viewer’s left) also appears to show 4 fingers, with the thumb tucked or hidden. Neither hand displays 5 fully visible, naturally spread fingers as requested. Fingers are not relaxed/spread — they’re curled inward slightly. Anatomical correctness is compromised by missing/hidden digits and unnatural positioning. VERDICT: FAIL

local/flux-2-klein-4bPASS

A woman holding her hands out in front of her palms-up, all ten fingers clearly visible

Evaluator (Qwen 3.5 122B)

The woman’s hands are held palms-up with fingers spread. Left hand: 5 fingers clearly visible (thumb + 4 fingers). Right hand: 5 fingers clearly visible (thumb + 4 fingers). All ten fingers are present and anatomically correct — no extra, missing, or fused digits. Gesture is natural and matches the prompt. No anomalies detected. VERDICT: PASS

local/flux-2-klein-4bFAIL

A man with arms crossed, both hands resting on opposite upper arms with fingers visible

Evaluator (Qwen 3.5 122B)

The left hand (on the right side of image) shows 4 fingers clearly, with thumb possibly hidden or merged. The right hand (on left side of image) also appears to show only 4 distinct fingers — the pinky is missing or fused. Anatomical correctness fails: both hands lack a full set of 5 visible fingers. Grip gesture is plausible but finger count is incorrect. VERDICT: FAIL

Medium 0%vs33%

local/flux-2-klein-4bFAIL

A person holding a white ceramic coffee mug by the handle with their right hand

Evaluator (Qwen 3.5 122B)

The visible right hand gripping the mug handle shows four fingers wrapped around it, with the thumb positioned on top — anatomically plausible for a grip. However, only four fingers are clearly visible; the fifth (pinky) is obscured or missing from view. No extra fingers or gross deformities are present. Since the prompt requires “exactly 5 per hand” and we cannot confirm all five are present due to occlusion, this fails strict anatomical verification. VERDICT: FAIL

local/flux-2-klein-4bFAIL

A woman writing in a notebook with a pen gripped correctly in her right hand

Evaluator (Qwen 3.5 122B)

The right hand gripping the pen shows 5 fingers, but the thumb and index finger appear fused or malformed at the joint, and the pen grip is awkward. The left hand resting on the page also shows 5 fingers, but they are unnaturally splayed and slightly distorted in shape. While the count is technically five per hand, the anatomical structure and gesture are incorrect. VERDICT: FAIL

local/flux-2-klein-4bFAIL

A person playing an acoustic guitar with left hand on the fretboard and right hand strumming

Evaluator (Qwen 3.5 122B)

The right hand (strumming) shows a thumb and three fingers clearly, but the pinky appears fused or missing — only 4 digits visible. The left hand (fretting) has 5 fingers, but the index finger is unnaturally bent and partially merged with the fretboard. Grip is plausible but anatomically flawed. Not exactly 5 per hand; anomalies present. VERDICT: FAIL

Hard 33%vs67%

local/flux-2-klein-4bFAIL

A person making a peace sign with their right hand, clearly showing two fingers up and three curled

Evaluator (Qwen 3.5 122B)

The hand in the image shows a peace sign gesture with two fingers extended (index and middle) and three fingers curled (thumb, ring, pinky). However, the thumb is positioned unusually — it’s extended outward rather than curled inward as typically seen in a standard peace sign. Additionally, the hand appears to have only four visible digits due to the thumb’s placement and angle, which may be misinterpreted as missing or malformed. Anatomically, this is not fully correct for a typical peace sign.

local/flux-2-klein-4bPASS

A hand giving a thumbs-up with four fingers curled and thumb extended upward

Evaluator (Qwen 3.5 122B)

The image shows one hand making a thumbs-up gesture. The thumb is extended upward, and the other four fingers are curled into a fist — this matches the prompt’s description. Anatomically, the hand appears correct: five fingers total (thumb + four curled), proper skin texture, natural creases, and no extra or missing digits. No anomalies detected. VERDICT: PASS

local/flux-2-klein-4bFAIL

A person counting to three on their fingers, with index, middle, and ring fingers extended

Evaluator (Qwen 3.5 122B)

The left hand shows only four fingers (thumb, index, middle, and a fused/missing ring finger). The right hand also appears to have only four fingers with an unnatural curling of the pinky and index. Neither hand has five distinct, anatomically correct fingers. The gesture does not match the prompt’s requirement of counting to three with extended index, middle, and ring fingers — the anatomy is distorted and incomplete. VERDICT: FAIL

Extreme 67%vs0%

local/flux-2-klein-4bFAIL

Two people shaking hands in a professional setting, all fingers of both hands anatomically correct

Evaluator (Qwen 3.5 122B)

Upon close examination, the handshake shows anatomical inconsistencies. The left hand (on the right side of the image) appears to have only 4 visible fingers gripping the other hand — the thumb is obscured or missing, and one finger seems fused or absent. The right hand (left side) also shows ambiguous finger count due to overlap and distortion. Grip mechanics are unnatural — fingers don’t wrap correctly around the opposing hand. Anatomical correctness fails. VERDICT: FAIL

local/flux-2-klein-4bPASS

Two people doing a high-five, palms meeting with all fingers extended on both hands

Evaluator (Qwen 3.5 122B)

The image shows two hands meeting in a high-five gesture. The left hand (on the left side of the image) clearly displays 5 fingers. The right hand (on the right side) also shows 5 fingers. The gesture is anatomically correct — palms are facing each other, fingers are extended and aligned appropriately for a high-five. No anomalies detected. VERDICT: PASS

local/flux-2-klein-4bPASS

One person handing a small gift box to another person, both sets of hands gripping the box correctly

Evaluator (Qwen 3.5 122B)

Upon close examination, the left hand (receiving) shows 4 fingers clearly, with the thumb partially visible — totaling 5. The right hand (giving) also displays 5 fingers: thumb, index, middle, ring, and pinky are all present and anatomically positioned. Grip is natural — fingers wrap around box edges, thumbs support from top/side. No extra or missing digits; no distortions. Anatomically correct. VERDICT: PASS

Multi-SubjectClick to expand100%vs100%

Multi-Subject compares whether local/flux-2-klein-4b or local/krea-2-turbo better keeps multiple people distinct with the requested attributes, outfits, and positions.

Easy 100%vs100%

local/flux-2-klein-4bPASS

Two women standing side by side: one with short blonde hair wearing a red jacket, the other with long black hair wearing a blue dress

Evaluator (Qwen 3 VL)

Left woman: short blonde hair, wearing a red zip-up jacket over a black top. Right woman: long black hair, wearing a blue off-the-shoulder dress. Both are visually distinct and match the prompt’s descriptions exactly. VERDICT: PASS

local/flux-2-klein-4bPASS

Two men at a cafe: one with a beard and glasses wearing a green sweater, the other clean-shaven in a black hoodie

Evaluator (Qwen 3 VL)

Left man: beard, glasses, green sweater — matches description. Right man: clean-shaven, black hoodie — matches description. Both are visually distinct and clearly identifiable. All prompt features are accurately rendered. VERDICT: PASS

local/flux-2-klein-4bPASS

Two friends posing: one tall with curly red hair in denim overalls, the other short with straight brown hair in a yellow sundress

Evaluator (Qwen 3 VL)

Tall person: curly red hair, denim overalls. Short person: straight brown hair, yellow sundress. Both visually distinct and match descriptions. Features are accurately rendered. VERDICT: PASS

Medium 100%vs100%

local/flux-2-klein-4bPASS

Four people at a dinner table: a bald man in a suit, a woman with silver hair in a red blouse, a young man with dreadlocks in a denim jacket, and a woman with a hijab in a green dress

Evaluator (Qwen 3 VL)

All four individuals are present and visually distinct. Left: bald man in suit. Next: woman with silver hair in red blouse. Center: young man with dreadlocks in denim jacket. Right: woman with hijab in green dress. All match prompt descriptions precisely. VERDICT: PASS

local/flux-2-klein-4bPASS

Three musicians on stage: a tall woman with short pink hair playing bass, a man with a beard and bandana on drums, and a slim man with glasses playing keyboard

Evaluator (Qwen 3 VL)

Tall woman: pink short hair, playing bass — matches. Man: beard, bandana, on drums — matches. Slim man: glasses, playing keyboard — matches. All three are visually distinct and correctly positioned. Features align precisely with prompt description. VERDICT: PASS

local/flux-2-klein-4bPASS

Three hikers on a trail: a man with a backpack and sunhat, a woman with braided hair and hiking poles, and a teenager with a baseball cap and water bottle

Evaluator (Qwen 3 VL)

Left: Man with sunhat and backpack. Center: Woman with braided hair and hiking poles. Right: Teenager with baseball cap and water bottle. All three are visually distinct and match described features. VERDICT: PASS

Truthfulness 70%vs70%

PhotorealismClick to expand67%vs100%

Photorealism compares whether local/flux-2-klein-4b or local/krea-2-turbo makes materials, surfaces, and real-world objects look more visually plausible and distinct.

Medium 67%vs100%

local/flux-2-klein-4bFAIL

A still life on a wooden table featuring a clear glass of water, a polished silver spoon, a folded linen napkin, and a ripe red apple — each material rendered with accurate texture and light behavior

Evaluator (Qwen 3.6 27B)

The image displays high-fidelity material rendering. The glass shows accurate refraction and transparency, while the wooden table exhibits realistic grain and texture. The linen napkin captures fabric weave and soft folds effectively. The apple has a convincing waxy sheen. However, the metal object is structurally flawed; it appears as a single spoon with two handles merging into one bowl, which is physically impossible. While the metallic reflection is rendered correctly, the fundamental object

local/flux-2-klein-4bPASS

A workspace desk with a brushed aluminum laptop, a leather-bound notebook, a ceramic coffee mug, and a glass paperweight — all materials visually distinct and realistic

Evaluator (Qwen 3.6 27B)

The image successfully renders the requested materials with high fidelity. The brushed aluminum laptop shows appropriate metallic sheen and texture. The leather notebook displays realistic grain and stitching. The ceramic mug has a matte, speckled finish, while the glass paperweight exhibits correct transparency, refraction, and reflections. The wooden table surface shows natural grain and wear. All materials are visually distinct and realistic. VERDICT: PASS

local/flux-2-klein-4bPASS

A kitchen counter with a stainless steel pot, a wooden cutting board, a glass bottle of olive oil, and a cotton dish towel — each surface reflecting light correctly

Evaluator (Qwen 3.6 27B)

The image successfully renders the distinct material properties requested. The stainless steel pot exhibits appropriate metallic reflections and brushed texture. The wooden cutting board shows natural grain and matte finish. The glass bottles demonstrate correct translucency and refraction of the oil inside, with realistic specular highlights. The cotton towel displays soft fabric folds and diffuse light absorption. All surfaces interact with light in a physically plausible manner, maintaining c

Physics & ReflectionsClick to expand92%vs83%

Physics & Reflections compares whether local/flux-2-klein-4b or local/krea-2-turbo better follows physical cues such as shadows, reflections, gravity, transparency, and contact points.

Easy 67%vs67%

local/flux-2-klein-4bPASS

A red sphere on a white surface with a single shadow cast to the right, consistent with a light source from the upper left

Evaluator (Qwen 3.6 27B)

The shadow is cast to the right and slightly downward, which is consistent with a light source from the upper left. The shadow’s shape and softness are physically plausible for a sphere on a flat surface under directional lighting. The red sphere shows appropriate specular highlights and shading gradients. No distortions or color anomalies are present in the shadow or reflections. VERDICT: PASS

local/flux-2-klein-4bPASS

A coffee mug on a sunny windowsill casting a sharp shadow to the left

Evaluator (Qwen 3.6 27B)

The lighting originates from the right side, consistent with the bright window area. Consequently, the mug casts a shadow to the left, which is physically correct. The shadow's shape accurately mimics the mug's form, including the handle, and its sharpness aligns with the direct sunlight. The reflections on the mug's surface are also consistent with the light source. VERDICT: PASS

local/flux-2-klein-4bFAIL

A person standing outdoors at midday with a short shadow directly below them

Evaluator (Qwen 3.6 27B)

The shadow is cast to the left and slightly behind the subject, which contradicts the prompt’s requirement of a “short shadow directly below them.” At midday, shadows should be minimal and nearly directly under the person. Here, the shadow is elongated and angled, suggesting late morning or afternoon lighting rather than midday. This misalignment in direction and length makes the shadow physically implausible for the stated condition. VERDICT: FAIL

Medium 100%vs100%

local/flux-2-klein-4bPASS

A woman standing in front of a full-length mirror, her reflection accurately showing the back of her outfit and reversed orientation

Evaluator (Qwen 3.6 27B)

The reflection in the mirror is physically plausible. It correctly shows the back of the woman’s outfit, matching her actual pose and clothing. The orientation is reversed as expected in a mirror reflection — for example, her left hand appears on the right side in the reflection. Shadows and lighting are consistent with indoor ambient light, and there’s no distortion or color mismatch between the real subject and her reflection. The mirror frame and floor alignment also support realism. VERDICT

local/flux-2-klein-4bPASS

A cat sitting on a polished marble floor with its reflection visible below

Evaluator (Qwen 3.6 27B)

The reflection is vertically inverted and correctly positioned beneath the cat, consistent with a polished floor. The distortion increases with distance from the contact point, which is physically accurate for such surfaces. Colors and lighting in the reflection match the subject. The shadow under the cat is soft and appropriately placed. Overall, the shadows and reflections are physically plausible. VERDICT: PASS

local/flux-2-klein-4bPASS

A vase of flowers on a glass table with a clear reflection visible on the table surface

Evaluator (Qwen 3.6 27B)

The reflection on the glass table is physically plausible. It correctly mirrors the vase and flower stems with appropriate vertical inversion. The reflection is slightly dimmer and less sharp than the object itself, consistent with real-world glass reflections. The position aligns perfectly with the vase’s base, and there are no distortions or color mismatches. Shadows are soft and ambient, fitting the bright indoor lighting. VERDICT: PASS

Hard 100%vs67%

local/flux-2-klein-4bPASS

A row of trees reflected in a calm lake at sunset, with the reflection slightly darker and gently rippled

Evaluator (Qwen 3.6 27B)

The reflections are vertically inverted and aligned with the tree line, which is physically correct. The water surface shows gentle ripples that distort the reflection appropriately, creating a slightly darker and softer appearance compared to the actual trees. The colors in the reflection match the warm sunset tones of the sky and the dark silhouettes of the trees. The overall effect is realistic and consistent with how light behaves on calm water at sunset. VERDICT: PASS

local/flux-2-klein-4bPASS

A city skyline reflected in a rain puddle on a street, with mild distortion and color shift

Evaluator (Qwen 3.6 27B)

The reflections in the puddle are physically plausible. The orientation is correct, mirroring the buildings and sky above. The distortion is consistent with a rippled water surface, creating a wavy, slightly blurred effect. The color shift is subtle but present, with the reflection appearing slightly darker and more saturated than the actual scene, which is typical for reflections on wet surfaces. The direction of the reflection aligns with the perspective of the street and buildings. VERDICT:

local/flux-2-klein-4bPASS

A mountain range reflected in a still alpine lake, with the reflection inverted and subtly muted in color

Evaluator (Qwen 3.6 27B)

The reflection is physically plausible. It is correctly inverted vertically, matching the mountain range and sky above. The colors are subtly muted and darker than the real scene, consistent with light absorption in water. The surface is calm, creating a mirror-like effect with minimal distortion, which aligns with the "still alpine lake" description. The lighting direction on the peaks matches the reflected highlights. VERDICT: PASS

Extreme 100%vs100%

local/flux-2-klein-4bPASS

A clear glass sphere sitting on a mirror surface near a pool of water, showing refraction through the sphere, reflection on the mirror, and water reflection simultaneously

Evaluator (Qwen 3.6 27B)

The image depicts a glass sphere resting on a water surface, not a mirror. The reflection is inverted and distorted by ripples, which is physically accurate for water. The refraction through the sphere correctly inverts the background scene. The lighting and shadows are consistent with an outdoor setting. The interaction between the sphere, water, and light is realistic. VERDICT: PASS

local/flux-2-klein-4bPASS

A chrome sculpture in a room with a glass floor and large windows, showing multiple reflections and environmental mapping on every surface

Evaluator (Qwen 3.6 27B)

The reflections are physically plausible. The chrome sculpture correctly mirrors the surrounding environment, including the window frames and the sculpture itself, with appropriate distortion based on its curved geometry. The glass floor provides a clear, inverted reflection of the sculpture and the room above, consistent with a transparent surface. The lighting and shadows align with the natural light coming from the large windows, enhancing the realism of the scene. VERDICT: PASS

local/flux-2-klein-4bPASS

A crystal wine glass on a polished marble counter next to a window overlooking a lake — showing refraction in the glass, reflection on the marble, and the lake's own reflective surface

Evaluator (Qwen 3.6 27B)

The image displays physically plausible optical effects. The wine glass exhibits correct refraction, distorting the background lake and trees through its curved surface. The marble countertop shows a soft, accurate reflection of the glass base and stem, consistent with a polished surface. The lake in the background reflects the sky and distant treeline with appropriate horizontal distortion. Lighting is coherent, with shadows and highlights aligning with a natural light source from the window.

World KnowledgeClick to expand50%vs50%

World Knowledge compares whether local/flux-2-klein-4b or local/krea-2-turbo depicts recognizable places, landmarks, flags, and known objects with more accurate real-world details.

Easy 100%vs100%

local/flux-2-klein-4bPASS

The Eiffel Tower in Paris on a clear day, architecturally accurate with its iron lattice structure and three levels

Evaluator (Qwen 3.6 27B)

The Eiffel Tower is clearly recognizable and architecturally accurate, featuring the correct iron lattice structure, three distinct levels (ground arch, first platform, second platform, and summit), and proper proportions. The surrounding environment—trimmed hedges, clear blue sky, and distant cityscape—is consistent with a real-world view from the Champ de Mars. No significant distortions or AI artifacts are visible. VERDICT: PASS

local/flux-2-klein-4bPASS

The Taj Mahal in Agra seen from the reflecting pool, with accurate dome, minarets, and symmetry

Evaluator (Qwen 3.6 27B)

The subject is clearly recognizable as the Taj Mahal. The composition correctly captures the iconic view from the reflecting pool, showcasing the central dome, four minarets, and the symmetrical layout. Architectural details like the arched entrances (iwans), smaller domes, and intricate facade patterns are rendered with high fidelity. The reflection in the water is accurate, and the lighting suggests a realistic time of day. No significant architectural distortions or hallucinations are present

local/flux-2-klein-4bPASS

The Statue of Liberty on her island pedestal, with correct torch, crown, tablet, and green patina

Evaluator (Qwen 3.6 27B)

The Statue of Liberty is clearly recognizable and accurately depicted with correct green patina, torch, crown, and tablet. The pedestal and island base are architecturally plausible, though minor details like flag placement or distant trees may not match reality exactly. Overall, the core subject meets factual and visual accuracy standards for a real-world landmark. VERDICT: PASS

Medium 33%vs33%

local/flux-2-klein-4bFAIL

A traditional Japanese shamisen instrument with accurate three-stringed design, wooden body, and bachi plectrum

Evaluator (Qwen 3.6 27B)

The image depicts a stringed instrument that resembles a shamisen in body shape, but it is factually inaccurate. It has four strings instead of the required three, and the tuning pegs are arranged like a violin or mandolin rather than the traditional shamisen configuration. Additionally, no bachi plectrum is visible. These deviations make it not factually accurate to the prompt’s specifications. VERDICT: FAIL

local/flux-2-klein-4bFAIL

A medieval astrolabe with accurate rete, tympan, and engraved degree markings on brass

Evaluator (Qwen 3.6 27B)

The image depicts a brass instrument resembling an astrolabe, but it fails on factual accuracy. The text and symbols around the rim are nonsensical gibberish rather than accurate zodiac signs or degree markings. The internal structure (rete and tympan) is overly simplified and lacks the intricate star pointers and coordinate grids characteristic of real medieval astrolabes. It appears to be a generic fantasy prop rather than a historically accurate scientific instrument. VERDICT: FAIL

local/flux-2-klein-4bPASS

A Japanese kintsugi bowl — cracked ceramic repaired with visible gold lacquer lines

Evaluator (Qwen 3.6 27B)

The image depicts a ceramic bowl with visible cracks repaired using gold lacquer, which is the defining characteristic of kintsugi. The subject is clearly recognizable as a Japanese kintsugi bowl. The texture, lighting, and structural details (such as the foot ring and glaze) appear realistic and consistent with real-world examples. There are no factual or architectural inaccuracies. VERDICT: PASS

Hard 0%vs0%

local/flux-2-klein-4bFAIL

A detailed diagram of the water cycle showing evaporation from the ocean, condensation forming clouds, precipitation as rain, and collection in rivers flowing back to the sea

Evaluator (Qwen 3.6 27B)

The image depicts the water cycle, but the text labels are completely nonsensical gibberish (e.g., "Comdensationn," "Evepstanon," "Elvah Ocean"). While the visual flow of arrows and water movement is somewhat recognizable, the lack of legible, accurate text makes it impossible to verify factual accuracy or understand the diagram's specific claims. The text generation failure renders the diagram useless for its intended educational purpose. VERDICT: FAIL

local/flux-2-klein-4bFAIL

A cross-section showing tectonic plates at a convergent boundary with subduction, a trench, volcanic arc, and layered mantle

Evaluator (Qwen 3.6 27B)

The image fails to meet the criteria for a recognizable and factually accurate depiction of a tectonic cross-section. While the visual structure vaguely resembles a subduction zone, all labels are nonsensical gibberish (e.g., "Pelitrestae," "Stbecbujr," "Fliumolle Cessae"). This makes the subject unrecognizable as a scientific diagram. Furthermore, the geological features are poorly defined and lack the specific architectural accuracy required for such a technical subject. VERDICT: FAIL

local/flux-2-klein-4bFAIL

An anatomical illustration of the human heart showing four chambers, valves, aorta, pulmonary arteries, and veins with correct placement

Evaluator (Qwen 3.6 27B)

The image depicts a heart, but it is not factually accurate. The labels are gibberish (e.g., "RIGK", "MAGBI"), indicating a failure to render text correctly. Anatomically, the structure is distorted: the aorta and pulmonary arteries are misshapen and incorrectly positioned relative to the ventricles. The coronary vessels appear as arbitrary branching lines rather than accurate anatomical structures. The "four chambers" are not clearly defined or correctly represented in this view. Therefore, it

Extreme 67%vs67%

local/flux-2-klein-4bFAIL

A Morpho menelaus butterfly with accurate iridescent blue dorsal wings, brown ventral wing pattern with eyespots, and correct body morphology

Evaluator (Qwen 3.6 27B)

The image depicts a butterfly that combines features of different species rather than accurately representing a Morpho menelaus. While the dorsal wings show iridescent blue, the ventral side displays prominent eyespots typical of Satyrinae (like Morpho’s relatives), but Morpho menelaus does not have such large, distinct eyespots on its underside. The body morphology is also inconsistent with real Morpho anatomy. Thus, it’s not factually accurate. VERDICT: FAIL

local/flux-2-klein-4bPASS

The Hagia Sophia interior showing the massive central dome, pendentives, semi-domes, and Islamic calligraphy medallions alongside Byzantine mosaics

Evaluator (Qwen 3.6 27B)

The image accurately depicts the interior of the Hagia Sophia, showcasing its massive central dome, pendentives, semi-domes, and the distinctive Islamic calligraphy medallions alongside Byzantine mosaics. The architectural details, including the columns, arches, and intricate mosaics, are consistent with the historical and architectural features of the Hagia Sophia. The lighting and perspective enhance the grandeur and complexity of the space. VERDICT: PASS

local/flux-2-klein-4bPASS

A specimen of bismuth crystal showing accurate iridescent oxide layers, geometric hopper crystal formation, and characteristic staircase structure

Evaluator (Qwen 3.6 27B)

The image depicts a bismuth crystal with high fidelity. It accurately portrays the characteristic geometric hopper crystal formation, featuring the distinct "staircase" or stepped structure typical of bismuth growth. The iridescent oxide layers are rendered realistically, displaying the expected rainbow spectrum of colors (purple, blue, green, yellow) caused by thin-film interference. The texture and lighting appear natural and consistent with a real-world mineral specimen. VERDICT: PASS

Professional Studio 89%vs82%

Camera & LightingClick to expand100%vs83%

Camera & Lighting compares whether local/flux-2-klein-4b or local/krea-2-turbo more faithfully follows lens, angle, focus, exposure, and lighting instructions.

Easy 100%vs100%

local/flux-2-klein-4bPASS

A wide-angle shot of a modern living room interior showing expansive space with slight barrel distortion at the edges

Evaluator (Gemma 4 26B)

The image features a wide-angle perspective of a modern living room with large windows providing ample natural light. Most importantly, the image successfully implements the "slight barrel distortion at the edges" by using a circular vignette effect that mimics a fisheye or wide-angle lens distortion, creating a curved, warped frame around the central room. The depth of field is deep, keeping the interior in sharp focus. VERDICT: PASS

local/flux-2-klein-4bPASS

A close-up macro shot of a single dewdrop on a rose petal, shallow depth of field

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. It features a close-up macro shot of a single, clear dewdrop resting on a pink rose petal. The depth of field is extremely shallow, with the dewdrop in sharp focus while the surrounding petal and background are heavily blurred. The lighting creates realistic reflections and refractions within the droplet, consistent with a macro photography setup. VERDICT: PASS

local/flux-2-klein-4bPASS

An aerial bird's-eye view looking straight down at a traffic roundabout with cars and trees

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. It features a true aerial bird's-eye view looking straight down at a traffic roundabout. The perspective is flat and perpendicular to the ground, with a deep depth of field that keeps the entire scene—from the center of the roundabout to the surrounding trees and cars—in sharp focus. The lighting is consistent with a bright, sunny day, creating clear shadows from the trees and vehicles. VERDICT: PASS

Medium 100%vs100%

local/flux-2-klein-4bPASS

A portrait lit with classic Rembrandt lighting — triangle of light on the shadowed cheek, moody atmosphere

Evaluator (Gemma 4 26B)

The image successfully implements the requested lighting. There is a clear, distinct triangle of light on the shadowed cheek (the subject's right cheek), which is the hallmark of Rembrandt lighting. The overall atmosphere is dark and moody with a shallow depth of field, focusing sharply on the facial features while the background remains black. The light placement and the resulting shadows perfectly match the technical description. VERDICT: PASS

local/flux-2-klein-4bPASS

A landscape photograph bathed in golden hour light — long warm shadows, orange sky, soft glow

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. The sky is a vibrant orange, and the sun is low on the horizon, creating a strong golden hour effect. The lighting produces long, warm shadows across the grassy terrain. The perspective is a wide landscape shot with a shallow depth of field, focusing on the foreground grass while the distant trees and sun create a soft, glowing atmosphere. VERDICT: PASS

local/flux-2-klein-4bPASS

A high-key portrait of a person against a bright white background, minimal shadows, airy and luminous

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. It features a high-key portrait with a bright, seamless white background. The lighting is soft and even, creating a luminous, airy feel with minimal shadows on the subject's face and neck. The perspective is a centered, eye-level portrait, and the depth of field is shallow, keeping the focus sharply on the person's features while maintaining the bright, high-key aesthetic. VERDICT: PASS

Hard 100%vs100%

local/flux-2-klein-4bPASS

A portrait shot at 85mm f/1.4 with creamy bokeh background, backlit with warm rim light separating the subject from a blurred garden

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. It features a portrait shot with a shallow depth of field (creamy bokeh) characteristic of an 85mm f/1.4 lens. The subject is clearly backlit, creating a distinct warm rim light around her hair and shoulders, which separates her from the blurred garden background. The lighting pattern and perspective are highly accurate to the technical description. VERDICT: PASS

local/flux-2-klein-4bPASS

A street photograph with shallow depth of field as if shot at 50mm f/1.8, subject sharp with soft city lights in the background

Evaluator (Gemma 4 26B)

The image successfully implements the requested camera settings. The subject (the woman's back/hair) is in sharp focus, while the background features a shallow depth of field with heavy bokeh. The city lights are rendered as soft, out-of-focus orbs, accurately simulating a 50mm f/1.8 lens. The perspective and lighting patterns perfectly match the description of a street photograph with a shallow depth of field. VERDICT: PASS

local/flux-2-klein-4bPASS

A food photograph shot from above at 35mm f/2.8 with natural window light from the left, slight vignette, warm color temperature

Evaluator (Gemma 4 26B)

The image matches all specified criteria. The perspective is a top-down shot (shot from above). The depth of field is shallow, with the center of the dish in sharp focus and the edges of the plate/table slightly blurred, consistent with f/2.8. The lighting is natural, coming from the top-left, creating soft shadows. The color temperature is warm, and a subtle vignette is present around the edges. VERDICT: PASS

Extreme 100%vs33%

local/flux-2-klein-4bPASS

A tilt-shift photograph of a city street making real-life look like a miniature model — selective focus strip, saturated colors

Evaluator (Gemma 4 26B)

The image successfully implements the tilt-shift effect. It features a distinct selective focus strip, with the sharpest area centered on the mid-ground street and cars, while the foreground and background are heavily blurred to create a miniature model appearance. The colors are highly saturated, and the perspective effectively mimics the look of a diorama. The lighting and depth of field perfectly match the prompt's requirements. VERDICT: PASS

local/flux-2-klein-4bPASS

A cinematic frame with blue and orange anamorphic lens flare streaking horizontally across a dark moody scene

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt's requirements. It features a dark, moody scene with a prominent horizontal light streak. The lighting pattern consists of a blue hue on the left transitioning into an orange/amber hue on the right, creating the requested blue and orange anamorphic lens flare effect. The horizontal streaking is sharp and central to the composition. VERDICT: PASS

local/flux-2-klein-4bPASS

A portrait with dramatic split lighting — half the face brightly lit, the other half in deep shadow, high contrast chiaroscuro

Evaluator (Gemma 4 26B)

The image perfectly captures the requested "dramatic split lighting." One side of the subject's face is brightly illuminated, while the other side is cast in deep, heavy shadow, creating a high-contrast chiaroscuro effect. The perspective is a close-up portrait, and the lighting pattern strictly adheres to the description of half the face being lit and the other half in shadow. VERDICT: PASS

Color PrecisionClick to expand83%vs92%

Color Precision compares whether local/flux-2-klein-4b or local/krea-2-turbo renders requested colors more clearly and consistently across subjects, products, and backgrounds.

Easy 100%vs100%

local/flux-2-klein-4bPASS

A bright red sports car parked on a gray asphalt road — the red is vivid and unambiguous

Evaluator (Gemma 4 26B)

The image features a bright, vivid red sports car parked on a gray asphalt road. The red color of the car is intense, saturated, and unambiguous, perfectly matching the prompt's description. The road is a standard gray asphalt, and the surrounding environment is a green forest, which provides a high contrast to the red car. VERDICT: PASS

local/flux-2-klein-4bPASS

A woman wearing a royal blue evening dress against a neutral background

Evaluator (Gemma 4 26B)

The woman is wearing a vibrant, deep blue evening dress that perfectly matches the description of "royal blue." The background is a solid, light gray, which qualifies as a "neutral background." All color requirements specified in the prompt are accurately reflected in the image. VERDICT: PASS

local/flux-2-klein-4bPASS

A field of pure sunflower yellow flowers under a clear blue sky

Evaluator (Gemma 4 26B)

The image features a field of bright yellow sunflowers with dark centers. The sky above is a clear, solid blue. The colors perfectly match the prompt's description of "pure sunflower yellow flowers" and a "clear blue sky." VERDICT: PASS

Medium 67%vs100%

local/flux-2-klein-4bPASS

A bedroom wall painted in #FF6B35 (burnt orange) with white furniture and natural daylight

Evaluator (Gemma 4 26B)

The wall is a vibrant burnt orange, which aligns well with the hex code #FF6B35 and the description. The furniture, including the bed frame, headboard, and nightstand, is white as requested. The room is filled with bright, natural daylight coming from the window. All color and lighting requirements from the prompt are accurately represented in the image. VERDICT: PASS

local/flux-2-klein-4bFAIL

A forest scene where the foliage is precisely #2D5F2D (dark muted green) with brown tree trunks

Evaluator (Gemma 4 26B)

The foliage in the image consists of various shades of green, ranging from bright, vibrant greens in the sunlight to darker, more shadowed areas. The hex code #2D5F2D describes a very specific, dark, and muted forest green. While the image contains dark green tones, it also features much lighter and more saturated greens that do not match the specific, uniform muted tone requested. VERDICT: FAIL

local/flux-2-klein-4bPASS

A room with walls in #4A90D9 (medium blue) with warm wooden floor and white trim

Evaluator (Gemma 4 26B)

The image accurately reflects the prompt's color requirements. The walls are a medium blue shade consistent with #4A90D9, the floor is a warm wood tone, and the baseboards/trim are white. All specified color elements are present and correctly represented. VERDICT: PASS

Hard 67%vs67%

local/flux-2-klein-4bPASS

A minimalist room with walls in #F5E6CC (warm cream), a sofa in #2C3E50 (dark navy), and accent pillows in #E74C3C (bright red)

Evaluator (Gemma 4 26B)

The image accurately reflects the requested colors. The walls are a warm cream/off-white color, the sofa is a dark navy blue, and the accent pillows are a bright, vibrant red. All three primary color elements match the descriptions and hex-coded color profiles provided in the prompt. VERDICT: PASS

local/flux-2-klein-4bFAIL

A product flat-lay on a #1A1A2E (deep dark blue) background, with a #F0C27F (soft gold) watch and #FFFFFF (pure white) earbuds

Evaluator (Gemma 4 26B)

The background is a deep dark blue, consistent with #1A1A2E. The earbuds are pure white (#FFFFFF). However, the watch is not a "soft gold" (#F0C27F); while the watch face is gold, the strap is a pale peach/nude color, which deviates significantly from the requested gold color for the watch. VERDICT: FAIL

local/flux-2-klein-4bPASS

A graphic design composition with background #FAFAFA (near-white), main shape #FF4500 (orange-red), and accent lines #333333 (dark charcoal)

Evaluator (Gemma 4 26B)

The image accurately reflects the requested color palette. The background is a near-white (#FAFAFA), the main organic shape is a vibrant orange-red (#FF4500), and the accent lines/strokes are a dark charcoal color (#333333). The colors are consistent with the specified hex values and descriptions. VERDICT: PASS

Extreme 100%vs100%

local/flux-2-klein-4bPASS

A smooth gradient on a wall going from #FF0000 (red) on the left to #0000FF (blue) on the right, transitioning evenly through purple

Evaluator (Gemma 4 26B)

The image displays a smooth horizontal gradient. On the far left, the color is a vibrant red (#FF0000). On the far right, the color is a bright blue (#0000FF). The transition through the middle features a clear purple/magenta transition, matching the prompt's requirement for an even transition between the two specified colors. VERDICT: PASS

local/flux-2-klein-4bPASS

A sunset sky gradient on a flat surface from #FF6B35 (orange) at the bottom to #1B1464 (deep indigo) at the top

Evaluator (Gemma 4 26B)

The image features a vertical gradient. The top color is a very dark, deep indigo/navy, which aligns well with #1B1464. The bottom color is a vibrant orange, which matches the description of #FF6B35. The transition through the middle shows a purple/magenta hue, consistent with a sunset sky gradient. The colors accurately reflect the specified hex values and names. VERDICT: PASS

local/flux-2-klein-4bPASS

A horizontal banner with a gradient from #00C9FF (cyan) on the left to #92FE9D (mint green) on the right

Evaluator (Gemma 4 26B)

The image shows a horizontal banner with a smooth gradient. On the left, the color is a vibrant cyan, which matches the specified #00C9FF. On the right, the color transitions into a light, pale mint green, which aligns with the specified #92FE9D. The gradient direction and the color accuracy are consistent with the prompt's requirements. VERDICT: PASS

PhotorealismClick to expand67%vs33%

Photorealism compares whether local/flux-2-klein-4b or local/krea-2-turbo produces more polished product and studio imagery with realistic materials, lighting, shadows, and finish.

Easy 67%vs33%

local/flux-2-klein-4bPASS

A single red sneaker on a clean white background with soft studio lighting and gentle shadow — product photography style

Evaluator (Gemma 4 26B)

The image successfully captures the requested studio aesthetic. The lighting is soft and even, and the shadow beneath the shoe is realistic and gentle. The texture of the material looks convincing, and the white background is clean. There are no obvious AI artifacts, such as warped laces or nonsensical stitching patterns, that would break the illusion of a professional product photograph. VERDICT: PASS

local/flux-2-klein-4bFAIL

A clear glass perfume bottle on a reflective black surface with studio lighting, no background distractions

Evaluator (Gemma 4 26B)

The image fails to meet the quality of a real studio photograph due to significant AI artifacts. The geometry of the glass bottle is warped and inconsistent, particularly where the liquid meets the glass and around the sprayer. The straw/dip tube is unnaturally curved and lacks physical realism. Furthermore, the reflection on the surface is disconnected from the base of the bottle, and the lighting lacks the sophisticated caustic properties expected in a high-end product shot. VERDICT: FAIL

local/flux-2-klein-4bPASS

A ripe green apple on a seamless white background with soft side lighting and subtle shadow — catalog style

Evaluator (Gemma 4 26B)

The image successfully meets the prompt requirements. The green apple features realistic textures, including natural-looking lenticels (small white spots) and a believable specular highlight. The lighting is soft, creating a gentle gradient across the fruit and a subtle, realistic shadow on the seamless white background. The stem appears organic and well-integrated. There are no obvious AI artifacts, blurring, or unnatural distortions. It looks like a high-quality studio product photograph. VER

Graphical design 42%vs54%

Data VisualisationClick to expand0%vs0%

Data Visualisation compares whether local/flux-2-klein-4b or local/krea-2-turbo produces charts with more correct labels, proportions, colors, and readable values.

Easy 0%vs0%

local/flux-2-klein-4bFAIL

A bar chart showing monthly sales for January through June with values 12K, 18K, 15K, 22K, 19K, 25K — labeled axes, color-coded bars, and a title 'Monthly Sales 2025'

Evaluator (Qwen 3 VL)

The chart lacks proper month labels on the x-axis (shows gibberish like "Joannhmary" and "Sune"). Values are mislabeled (e.g., "121" instead of "15K", "29K" instead of "19K"). The y-axis is mislabeled and inconsistent. The title is correct, but the chart type is a bar chart. Core data and labels are incorrect and unreadable. VERDICT: FAIL

local/flux-2-klein-4bFAIL

A pie chart showing market share: Company A 35%, Company B 25%, Company C 20%, Others 20% — each slice a different color with labels and percentages

Evaluator (Qwen 3 VL)

The chart is a pie chart with correct percentages (35%, 25%, 20%, 20%) and distinct colors. However, labels are incorrect: “Company A” is mislabeled as “Marrkreny A” and “Company B” as “Companity”. “Company C” is mislabeled as “C0%” and “Others” is correctly labeled. The chart type matches the request. VERDICT: FAIL

local/flux-2-klein-4bFAIL

A simple line graph showing daily website visitors over one week (Mon-Sun): 1200, 1500, 1100, 1800, 2000, 1700, 1400 — with grid lines, axis labels, and title 'Weekly Traffic'

Evaluator (Qwen 3 VL)

The chart is a line graph titled "Weekly Traffic" with grid lines and axis labels. However, the y-axis label is illegible ("BnileY (m)"), and the x-axis labels (0, 1900, 1300, 1720, 140n) are incorrect and unlabelled days. The data points do not match the requested values (1200, 1500, 1100, 1800, 2000, 1700, 1400). The chart type is correct, but critical data and labels are wrong. VERDICT: FAIL

Layout & DesignClick to expand22%vs33%

Layout & Design compares whether local/flux-2-klein-4b or local/krea-2-turbo produces the cleaner requested composition with the right hierarchy and required elements.

Easy 33%vs100%

local/flux-2-klein-4bFAIL

A movie night poster with a large popcorn image in the center, the title 'MOVIE NIGHT' in bold at the top, and the tagline 'Every Friday at 8 PM' at the bottom

Evaluator (Qwen 3 VL)

The image shows "MOVIE NIGHT" at the top, a large popcorn bucket centered, and "Every Fridiay at 8 PM" at the bottom. The text is readable but contains a typo ("Fridiay"). The layout and visual hierarchy match the prompt, though the error in the tagline slightly undermines accuracy. VERDICT: FAIL

local/flux-2-klein-4bPASS

An event poster featuring a guitar illustration, the headline 'LIVE MUSIC FEST' in large text, and 'Tickets available now' as a tagline below

Evaluator (Qwen 3 VL)

The poster includes a guitar illustration, headline "LIVE MUSIC FFEST" with correct visual hierarchy (large white "LIVE MUSIC" above orange "FFEST"), and tagline "Tickets available now" at the bottom. Text is readable and correctly placed. Minor typo in tagline ("availiabl" instead of "available") is present but does not affect core layout or readability. VERDICT: PASS

local/flux-2-klein-4bFAIL

A book club poster with an open book illustration, the title 'JOIN THE BOOK CLUB' prominently displayed, and 'Meets every Wednesday' as the tagline

Evaluator (Qwen 3 VL)

The poster includes an open book illustration and the title “JOIN THE BOK CLUB” in large, prominent text. However, the tagline “Meets every Whedesaay” contains spelling errors and is misaligned below the book. The visual hierarchy is correct, but critical text inaccuracies violate the prompt’s requirements. VERDICT: FAIL

Medium 0%vs0%

local/flux-2-klein-4bFAIL

A landing page design with a hero header image at the top, a headline and subtitle in the middle, a 'Sign Up' call-to-action button, and a footer with three social media icons

Evaluator (Qwen 3 VL)

The image shows a hero header, centered headline/subtitle, "Sign Up" button, and footer with three social icons — matching the layout. However, the headline and subtitle text is gibberish (placeholder text), making it unreadable and violating the requirement for correct, readable text placement. The visual hierarchy is structurally correct. VERDICT: FAIL

local/flux-2-klein-4bFAIL

A newsletter layout with a banner image header, a two-column body section with text on the left and an image on the right, a highlighted quote block, and a footer with unsubscribe link

Evaluator (Qwen 3 VL)

The image shows a banner header, two-column layout (text left, image right), a highlighted quote block, and a footer with “unsubscribe” text. However, all text is gibberish placeholder (e.g., “SECS ETK LUMOSITINS”), making it unreadable. The layout structure is visually present but functionally fails due to illegible content. VERDICT: FAIL

local/flux-2-klein-4bFAIL

A restaurant menu layout with a logo header, three category sections (appetizers, mains, desserts) each with items and prices, and a footer with address and hours

Evaluator (Qwen 3 VL)

The image shows a menu with a logo header, three category sections (Appetizers, Mains, Desserts), and a footer with address/hours. However, text is garbled and unreadable, prices are nonsensical, and layout hierarchy is inconsistent (e.g., “Mains” repeated). The footer text is also gibberish. Core elements are present but unusable due to severe text corruption. VERDICT: FAIL

Hard 33%vs0%

local/flux-2-klein-4bPASS

A magazine double-page spread with a left page containing a large hero photo, and a right page with a two-column article, a pull quote, a sidebar with author bio, and a page number

Evaluator (Qwen 3 VL)

The left page features a large hero photo as required. The right page has two columns of text, a pull quote with large quotation marks, and a sidebar labeled “Autobr.ior” with author bio text. Page numbers (29, 30) are visible. All layout zones are present with correct visual hierarchy. Text is readable and correctly placed, though content appears garbled (likely placeholder or AI artifact). VERDICT: PASS

local/flux-2-klein-4bFAIL

A mobile app UI mockup showing a dashboard with a top navigation bar, a search field, a 2x2 grid of feature cards with icons, a list of recent activity items, and a bottom tab bar

Evaluator (Qwen 3 VL)

The image shows a mobile app UI with a top navigation bar, search field, 2x2 grid of feature cards, recent activity list, and bottom tab bar — all present. However, text is heavily distorted and unreadable, violating the requirement for correct text placement and readability. Layout hierarchy is mostly intact, but illegible text renders the UI functionally flawed. VERDICT: FAIL

local/flux-2-klein-4bFAIL

A fashion magazine spread with a full-bleed left-page photo, right page with three-column layout, headline, body copy, a smaller inset image, and styled page numbers

Evaluator (Qwen 3 VL)

The left page features a full-bleed photo as required. The right page has a three-column layout with a headline, body copy, and two inset images. Page numbers (18, 15) are present. However, all text is gibberish and unreadable, violating the requirement for readable, correctly placed text. VERDICT: FAIL

Style DiversityClick to expand67%vs83%

Style Diversity compares whether local/flux-2-klein-4b or local/krea-2-turbo better shifts between requested visual styles like oil painting, pixel art, watercolor, and technical illustration.

Easy 67%vs67%

local/flux-2-klein-4bPASS

A golden retriever sitting in a garden, painted in the style of an oil painting with visible brushstrokes and rich texture

Evaluator (Qwen 3.5 122B)

The image clearly exhibits an oil painting style with visible, textured brushstrokes throughout the dog’s fur, foliage, and flowers. The color blending, impasto-like texture, and painterly lighting all align with traditional oil painting techniques. The golden retriever is centered in a lush garden setting as requested. All key visual characteristics of the specified art style are present and well-executed. VERDICT: PASS

local/flux-2-klein-4bPASS

A golden retriever sitting in a garden, rendered in Japanese anime style with flat colors and large expressive eyes

Evaluator (Qwen 3.5 122B)

The image clearly exhibits the requested Japanese anime style. Key characteristics are present: the golden retriever has large, expressive eyes with prominent highlights, simplified facial features, and clean line art. The colors are vibrant yet relatively flat with minimal shading gradients, consistent with anime aesthetics. The overall composition and rendering match the prompt’s stylistic requirements. VERDICT: PASS

local/flux-2-klein-4bFAIL

A golden retriever sitting in a garden, rendered in pixel art style with a 32x32 grid and limited color palette

Evaluator (Qwen 3.5 122B)

The image clearly depicts a golden retriever in a garden setting. The art style is unmistakably pixel art, with visible blocky pixels forming the dog, flowers, and background. However, the resolution appears significantly higher than a 32x32 grid — the image is much larger and more detailed than what a true 32x32 grid would allow. While the limited color palette is somewhat present, the grid constraint is not met. VERDICT: FAIL

Medium 100%vs100%

local/flux-2-klein-4bPASS

A city street scene in the style of Art Deco — geometric shapes, gold and black palette, ornamental symmetry

Evaluator (Qwen 3.5 122B)

The image clearly exhibits Art Deco characteristics: strong geometric shapes (zigzags, chevrons, stepped forms), a limited gold-and-black palette, and ornamental symmetry along the central axis. The buildings, street markings, and decorative borders all reflect the stylized, streamlined aesthetic typical of Art Deco. The composition is balanced and visually cohesive with the requested style. VERDICT: PASS

local/flux-2-klein-4bPASS

A woman with a parasol in the style of Impressionism — soft brushstrokes, dappled light, pastel palette

Evaluator (Qwen 3.5 122B)

The image exhibits clear Impressionist characteristics: soft, visible brushstrokes throughout the foliage and dress; dappled sunlight filtering through trees onto the subject and ground; and a pastel palette dominated by pinks, blues, yellows, and greens. The woman’s parasol and garden setting align with classic Impressionist themes. Lighting is diffused and atmospheric, not sharply defined. All requested stylistic elements are present and recognizable. VERDICT: PASS

local/flux-2-klein-4bPASS

A portrait of a man in Pop Art style — bold outlines, Ben-Day dots, flat vivid primary colors

Evaluator (Qwen 3.5 122B)

The image clearly exhibits Pop Art style characteristics. Bold black outlines define the subject and features. Ben-Day dots are visibly used for shading and texture across the face and background. The color palette consists of flat, vivid primary and secondary colors (bright yellow, magenta, cyan, green) rather than realistic tones. The composition is a stylized portrait consistent with the requested aesthetic. VERDICT: PASS

Hard 33%vs67%

local/flux-2-klein-4bFAIL

A forest rendered in photorealistic watercolor — the detail and lighting of a photo combined with visible watercolor bleeds and paper texture

Evaluator (Qwen 3.5 122B)

The image clearly exhibits watercolor characteristics: visible paper texture, soft color bleeds, and translucent layering typical of the medium. However, it fails the "photorealistic" criterion. The rendering is stylized and illustrative rather than photo-realistic; the lighting lacks photographic precision, and the details are painterly rather than sharp and lifelike. While the watercolor style is recognizable, the core requirement of combining it with photorealistic detail and lighting is not

local/flux-2-klein-4bPASS

A woman in a futuristic city rendered in cyberpunk Art Nouveau — neon colors and tech elements combined with flowing organic lines and floral borders

Evaluator (Qwen 3.5 122B)

The image successfully combines cyberpunk and Art Nouveau elements. The central figure is set against a neon-lit futuristic city with glowing signs and tech details. This is framed by an ornate, flowing border featuring floral motifs, vines, and decorative curves typical of Art Nouveau. The color palette uses vibrant neons against darker tones, matching the cyberpunk aesthetic. All key visual characteristics from the prompt are clearly present and well-integrated. VERDICT: PASS

local/flux-2-klein-4bFAIL

A medieval castle depicted in Japanese ukiyo-e woodblock print style combined with photorealistic sky and lighting

Evaluator (Qwen 3.5 122B)

The image successfully combines medieval castle architecture with Japanese ukiyo-e woodblock print aesthetics. Key characteristics include the stylized clouds, flat color planes, visible line work, and traditional Japanese text seals. However, the sky and lighting are not photorealistic — they retain the illustrative, non-photographic quality typical of ukiyo-e. Since the prompt explicitly required “photorealistic sky and lighting,” this core criterion is unmet. VERDICT: FAIL

Extreme 67%vs100%

local/flux-2-klein-4bPASS

An interior scene rendered in soft brutalism — raw concrete textures combined with rounded pastel furniture and warm diffused light

Evaluator (Qwen 3.5 122B)

The image clearly exhibits soft brutalism: raw concrete walls and floor with visible panel lines and texture are prominent. Rounded, pastel-colored furniture (cream, pink, mint) contrasts the hardness. Warm diffused sunlight streams in from the right, casting soft shadows and illuminating the scene gently. All key visual characteristics of the requested style are present and well-executed. VERDICT: PASS

local/flux-2-klein-4bFAIL

A still life painted in anxious minimalism — sparse composition with uncomfortable negative space, muted colors, and subtle visual tension

Evaluator (Qwen 3.5 122B)

The image depicts a wall with a dark baseboard, rendered in a realistic or photographic style rather than a distinct art style. While the composition is sparse and colors are muted, it lacks the specific brushwork, texture, or intentional "anxious" aesthetic of "anxious minimalism" as a defined art movement. It appears to be a photo of a wall, not a painting in that specific style. VERDICT: FAIL

local/flux-2-klein-4bPASS

A cityscape rendered in nostalgic futurism — retro-futuristic 1960s space-age aesthetics with a melancholic warm-toned patina

Evaluator (Qwen 3.5 122B)

The image successfully embodies the requested "nostalgic futurism" style. Key characteristics are clearly present: the color palette features a warm, sepia-toned patina with a vintage film grain texture. The architecture blends modern skyscrapers with retro-futuristic elements like the lattice tower and dome structures. The inclusion of large planets and flying saucers in the sky reinforces the 1960s space-age aesthetic. The overall mood is melancholic and dreamlike. VERDICT: PASS