ImageBench

vs

192 evaluations across 6 categories

Benchmark V1 verdicts are produced by VLM judges and can contain mistakes. Treat PASS/FAIL labels as machine-assisted assessments, and inspect the images yourself. Learn more about the methodology.

77%vs76%
Pass Rate
8.5svs246.2s
Avg Latency
Text Rendering47%80%Spatial Reasoning77%81%Human realism88%79%Truthfulness74%70%Professional Studio93%82%Graphical design58%54%Latency37%0%
The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink
Text RenderingTypography StyleEasylocal/flux-2-klein-9b

Prompt: The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink

Text Rendering 47%vs80%

Typography StyleClick to expand100%vs100%

Typography Style compares whether local/flux-2-klein-9b or local/krea-2-turbo better preserves requested lettering styles, font mood, and text texture.

Easy 100%vs100%
The word 'CHAPTER ONE' typed on aged paper with a vintage tylocal/flux-2-klein-9bPASS
The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink

Evaluator (Qwen 3 VL)

Text reads “CHAPTER ONE” correctly. Font mimics vintage typewriter style with blocky, slightly uneven ink strokes and visible paper texture. Aged paper background shows creases, stains, and wear. Ink appears imperfectly saturated, matching “slightly uneven ink” description. All elements convincingly simulate a typewritten document on aged paper. VERDICT: PASS

The phrase 'TODAY'S SPECIAL' written in white chalk on a darlocal/flux-2-klein-9bPASS
The phrase 'TODAY'S SPECIAL' written in white chalk on a dark green chalkboard with chalk dust

Evaluator (Qwen 3 VL)

Text reads “TODAY’S SPECIAL” correctly. Letters show white chalk texture, smudges, and dust particles on dark green board, convincingly mimicking chalk writing. Spelling and medium match prompt precisely. VERDICT: PASS

The word 'OPEN' displayed on a red LED sign with glowing pixlocal/flux-2-klein-9bPASS
The word 'OPEN' displayed on a red LED sign with glowing pixel-style letters

Evaluator (Qwen 3 VL)

The word "OPEN" is correctly spelled and rendered in a glowing, pixel-style LED matrix with distinct red dots forming each letter. The style convincingly mimics a classic LED sign, with uniform brightness and clear pixel structure. The red glow and reflection on the surface enhance realism. VERDICT: PASS

Writing accuracyClick to expand33%vs75%

Writing accuracy compares whether local/flux-2-klein-9b or local/krea-2-turbo more reliably renders the exact requested words, numbers, labels, and short phrases.

Easy 100%vs100%
A single large word 'HELLO' centered on a white backgroundlocal/flux-2-klein-9bPASS
A single large word 'HELLO' centered on a white background

Evaluator (Qwen 3 VL)

The word "HELLO" is fully spelled correctly and is perfectly legible against the white background. All letters are intact and clearly formed. No misspellings, missing letters, or illegible parts are present. The text matches the intended specification exactly. VERDICT: PASS

The word 'ADVENTURE' in big bold letters centered on a plainlocal/flux-2-klein-9bPASS
The word 'ADVENTURE' in big bold letters centered on a plain cream background

Evaluator (Qwen 3 VL)

The word "ADVENTURE" is spelled correctly and fully readable in bold, centered text on a cream background. All letters are intact and clearly legible. No misspellings or missing parts. The design matches the intended specification exactly. VERDICT: PASS

The word 'SERENITY' written large and centered on a solid lilocal/flux-2-klein-9bPASS
The word 'SERENITY' written large and centered on a solid light blue background

Evaluator (Qwen 3 VL)

The word "SERENITY" is spelled correctly and fully readable, centered on a solid light blue background. No misspellings, missing words, or illegible parts are present. The text matches the intended specification exactly. VERDICT: PASS

Medium 33%vs100%
The sentence 'The quick brown fox jumps high' in clean blacklocal/flux-2-klein-9bFAIL
The sentence 'The quick brown fox jumps high' in clean black text on a white background

Evaluator (Qwen 3 VL)

The word "brown" is misspelled as "boøøx" with distorted characters. "jumps high" is correct. Text is readable but contains a clear spelling error. The core requirement of correct spelling is not met. VERDICT: FAIL

'Every sunset brings a new dawn ahead' written neatly on a plocal/flux-2-klein-9bFAIL
'Every sunset brings a new dawn ahead' written neatly on a pale yellow background

Evaluator (Qwen 3 VL)

The text reads "Every sunset brings a new down ahead". The word "down" incorrectly replaces "dawn". This is a misspelling/incorrect word. The intended phrase "Every sunset brings a new dawn ahead" is not fully accurate. VERDICT: FAIL

'Stars shine brightest in the dark night' displayed on a clelocal/flux-2-klein-9bPASS
'Stars shine brightest in the dark night' displayed on a clean off-white background

Evaluator (Qwen 3 VL)

The text "Stars shine brightest in the dark night" is fully spelled correctly and clearly legible against the off-white background. No words are missing or misspelled. The font is consistent and high-contrast, ensuring readability. All characters are intact and properly formed. VERDICT: PASS

Hard 0%vs100%
A poster with headline 'SUMMER SALE' in large bold text, bodlocal/flux-2-klein-9bFAIL
A poster with headline 'SUMMER SALE' in large bold text, body text 'Up to 50% off all items this weekend only' in medium size, and fine print 'Terms and conditions apply. See store for details.' at the bottom in small text

Evaluator (Qwen 3 VL)

The word "tems" is misspelled (should be "items"). All other text is correctly spelled and readable. The fine print is present but contains a minor error. The core requirement is met only if all text is correct; since "tems" is wrong, the accuracy fails. VERDICT: FAIL

A flyer with large title 'GRAND OPENING', subtitle 'Join us local/flux-2-klein-9bFAIL
A flyer with large title 'GRAND OPENING', subtitle 'Join us for food, music, and fun' in medium text, and small disclaimer 'RSVP required. Limited seating available.' at the bottom

Evaluator (Qwen 3 VL)

The text "seetting" is misspelled; it should be "seating". All other words are correctly spelled and fully readable. The intended disclaimer contains an error that affects accuracy. VERDICT: FAIL

An event banner with headline 'ANNUAL GALA 2025' at the top local/flux-2-klein-9bFAIL
An event banner with headline 'ANNUAL GALA 2025' at the top in large serif font, description 'An evening of fine dining and live jazz' in medium text, and footnote 'Black tie optional. Doors open at 6 PM.' in small text

Evaluator (Qwen 3 VL)

Text contains errors: "fm" instead of "fine", "fne" instead of "fine". "GALA" has a decorative accent mark, but "GALA" is correct. "2025" is correctly displayed. Footnote is accurate. Misspellings and illegible parts prevent full correctness. VERDICT: FAIL

Extreme 0%vs0%
A vintage movie poster for 'THE LAST VOYAGE' featuring a taglocal/flux-2-klein-9bFAIL
A vintage movie poster for 'THE LAST VOYAGE' featuring a tagline 'One ship. One chance. No turning back.' with credits block reading 'Directed by James Harlow. Starring Elena Voss, Marcus Chen, and Sofia Reyes. Music by Thomas Grant. A Paramount Pict

Evaluator (Qwen 3 VL)

“MARUS CHEN” is misspelled (should be “Marcus”). “PARMOUT PICTURES RELEASE” is misspelled (should be “PARAMOUNT PICTURES RELEASE”). All other text is correct and readable. Minor errors in credits block prevent full accuracy. VERDICT: FAIL

A magazine cover titled 'NATIONAL EXPLORER' with cover lineslocal/flux-2-klein-9bFAIL
A magazine cover titled 'NATIONAL EXPLORER' with cover lines 'The Hidden Temples of Cambodia — page 34', 'Interview: Dr. Maria Santos on Climate Action', 'Top 10 Trails You Must Hike This Year', and a barcode area with 'Issue 247 | March 2025 | $6.99

Evaluator (Qwen 3 VL)

Text errors: “Cambodia” misspelled as “Camboodia”; “Maria Santos” as “Marria Santos”; “Climate Action” as “Climate Acize”; “This Sa Yeah” instead of “This Year”. “Issue 247” is correct. Multiple words are misspelled or garbled. VERDICT: FAIL

A product label for 'GOLDEN HARVEST ORGANIC HONEY' with subtlocal/flux-2-klein-9bFAIL
A product label for 'GOLDEN HARVEST ORGANIC HONEY' with subtitle 'Raw & Unfiltered', weight '12 oz (340g)', description 'Pure wildflower honey harvested from sustainable apiaries in the Pacific Northwest. No additives or preservatives.', and small te

Evaluator (Qwen 3 VL)

“Unflirteted” is misspelled (should be “Unfiltered”). “harserted” is misspelled (should be “harvested”). “Northwewst” is misspelled (should be “Northwest”). “aditives” is misspelled (should be “additives”). “presertaives” is misspelled (should be “preservatives”). Multiple critical errors in spelling and content. VERDICT: FAIL

Spatial Reasoning 77%vs81%

Attributes BindingClick to expand67%vs100%

Attributes Binding compares whether local/flux-2-klein-9b or local/krea-2-turbo better keeps each requested color, material, or property attached to the correct object.

Easy 100%vs100%
A red ball and a blue cube sitting side by side on a white slocal/flux-2-klein-9bPASS
A red ball and a blue cube sitting side by side on a white surface

Evaluator (Qwen 3.5 122B)

The image shows two objects on a white surface. On the left is a red sphere (ball). On the right is a blue cube. Both objects are correctly colored and shaped as specified. They are positioned side by side. All attributes (color, shape) are correctly bound to their respective objects with no swaps. VERDICT: PASS

A green apple and an orange pumpkin on a wooden cutting boarlocal/flux-2-klein-9bPASS
A green apple and an orange pumpkin on a wooden cutting board

Evaluator (Qwen 3.5 122B)

The image shows a green apple and an orange pumpkin placed on a wooden cutting board. The apple is correctly colored green and has the typical round shape with a stem. The pumpkin is orange, round, and has a stem as expected. Both objects are appropriately sized relative to each other and the board. No attributes are swapped or misassigned. VERDICT: PASS

A yellow taxi and a black limousine parked next to each othelocal/flux-2-klein-9bPASS
A yellow taxi and a black limousine parked next to each other on a street

Evaluator (Qwen 3.5 122B)

The image shows a yellow taxi and a black limousine parked next to each other on a street. The taxi is correctly yellow and has the typical taxi shape with a roof sign. The limousine is correctly black and has the elongated shape characteristic of a limousine. Both vehicles are positioned side-by-side as requested. All attributes (color, size, shape) are correctly bound to the right objects with no swaps. VERDICT: PASS

Medium 67%vs100%
A large red circle, a small blue triangle, and a medium greelocal/flux-2-klein-9bFAIL
A large red circle, a small blue triangle, and a medium green star arranged on a white canvas

Evaluator (Qwen 3.5 122B)

The image shows three distinct shapes on a white background. The large red shape is a semicircle (half-circle), not a full circle as requested. The blue triangle is small and correctly shaped. The green star is medium-sized and correctly shaped. Since the prompt specifically asked for a "large red circle" and the image contains a semicircle, the shape attribute is incorrectly bound. VERDICT: FAIL

A tall thin white candle, a short fat red candle, and a medilocal/flux-2-klein-9bPASS
A tall thin white candle, a short fat red candle, and a medium spiral yellow candle on a shelf

Evaluator (Qwen 3.5 122B)

The image shows three candles on a wooden shelf: 1. Left: Tall, thin, white candle — matches description. 2. Center: Short, fat, red candle — matches description. 3. Right: Medium-height, spiral-shaped, yellow candle — matches description. All attributes (color, size, shape) are correctly bound to each object with no swaps. VERDICT: PASS

A big brown horse, a small white rabbit, and a medium-sized local/flux-2-klein-9bPASS
A big brown horse, a small white rabbit, and a medium-sized black cat standing in a meadow

Evaluator (Qwen 3.5 122B)

The image shows a big brown horse with a white star on its forehead, standing in a meadow. A small white rabbit is sitting beside it. A medium-sized black cat is also present, sitting next to the rabbit. All objects match their described attributes: color, size, and shape are correctly bound. No swaps or mismatches observed. VERDICT: PASS

Hard 33%vs100%
A pink fire truck, a blue banana, and a green sun in a surrelocal/flux-2-klein-9bPASS
A pink fire truck, a blue banana, and a green sun in a surreal landscape

Evaluator (Qwen 3.5 122B)

The image contains a pink fire truck with correct shape and features (ladder, lights). A large blue banana is present, correctly colored and shaped. A green sun with a face and rays is visible, matching the description. All attributes are correctly bound to their respective objects with no swaps. The surreal landscape context is also maintained. VERDICT: PASS

A purple carrot, a red tree with red leaves, and a white watlocal/flux-2-klein-9bFAIL
A purple carrot, a red tree with red leaves, and a white watermelon sliced open

Evaluator (Qwen 3.5 122B)

The image contains: 1. A purple carrot with green/purple stems — matches “purple carrot”. 2. A tree with red leaves and brown trunk — matches “red tree with red leaves”. 3. A watermelon sliced open, but it has green rind and pink/red flesh — does NOT match “white watermelon”. The watermelon’s color is incorrect per the prompt. Attributes are mostly bound correctly except for the watermelon’s color. VERDICT: FAIL

An orange elephant, a yellow penguin, and a silver strawberrlocal/flux-2-klein-9bFAIL
An orange elephant, a yellow penguin, and a silver strawberry arranged on a table

Evaluator (Qwen 3.5 122B)

The image shows three objects on a wooden table: 1. Orange elephant – correct color and shape. 2. Yellow penguin – correct color, but the object is clearly a chick or duckling (not a penguin). 3. Silver strawberry – correct color and shape, with green leaves. The penguin is misidentified; it’s not a penguin. This violates the “specific attributes” requirement. VERDICT: FAIL

CompositionalityClick to expand100%vs89%

Compositionality compares whether local/flux-2-klein-9b or local/krea-2-turbo better combines the requested subjects, actions, props, and scene details into one coherent image.

Easy 100%vs100%
A tabby cat wearing oversized red sunglasseslocal/flux-2-klein-9bPASS
A tabby cat wearing oversized red sunglasses

Evaluator (Qwen 3.5 122B)

The image contains all required elements: - Tabby cat: Present (brown/black striped fur with white muzzle/chest) - Oversized red sunglasses: Present (large red frames with dark lenses covering eyes) - Coherent combination: Yes — sunglasses fit naturally on the cat’s face, no anatomical distortions or blending errors. All prompt elements are accurately and coherently rendered. VERDICT: PASS

A golden retriever wearing a tiny cowboy hatlocal/flux-2-klein-9bPASS
A golden retriever wearing a tiny cowboy hat

Evaluator (Qwen 3.5 122B)

The image contains: - Golden retriever: Present and clearly depicted. - Tiny cowboy hat: Present, though not particularly "tiny" — it fits proportionally on the dog’s head. - Coherent combination: Yes, the hat sits naturally on the dog’s head without visual glitches. All prompt elements are present and coherently combined. VERDICT: PASS

A penguin holding a colorful umbrellalocal/flux-2-klein-9bPASS
A penguin holding a colorful umbrella

Evaluator (Qwen 3.5 122B)

The image contains: 1. A penguin - Present (cartoon-style, black and white with orange beak/feet) 2. Holding - Present (penguin’s flipper grips umbrella handle) 3. Colorful umbrella - Present (rainbow-colored panels: red, blue, yellow, green, orange, purple) All elements are clearly depicted and coherently combined. The penguin is centrally positioned under the open umbrella, which it holds upright. No contradictions or missing components. VERDICT: PASS

Medium 100%vs100%
A medieval knight in full armor sitting at a desk typing on local/flux-2-klein-9bPASS
A medieval knight in full armor sitting at a desk typing on a modern laptop

Evaluator (Qwen 3.5 122B)

The image shows a medieval knight in full armor sitting at a wooden desk, typing on a modern laptop. All described elements are present: the knight’s armor is detailed and complete, the desk is rustic and period-appropriate, and the laptop is clearly modern (Apple logo visible). The combination is coherent and visually integrated. VERDICT: PASS

A Roman gladiator taking a selfie with a smartphone in the Clocal/flux-2-klein-9bPASS
A Roman gladiator taking a selfie with a smartphone in the Colosseum

Evaluator (Qwen 3.5 122B)

The image contains all required elements: 1. Roman gladiator - Present (wearing armor, helmet with red crest, shield) 2. Taking a selfie - Present (holding smartphone up, smiling at camera) 3. Smartphone - Present (black phone in hand) 4. Colosseum - Present (ancient Roman amphitheater structure in background) All elements are coherently combined in a single scene. The gladiator is actively taking a selfie with a modern smartphone while standing inside the Colosseum. VERDICT: PASS

An astronaut in a spacesuit riding a bicycle through a parklocal/flux-2-klein-9bPASS
An astronaut in a spacesuit riding a bicycle through a park

Evaluator (Qwen 3.5 122B)

The image contains all described elements: 1. Astronaut in a spacesuit - Present (white suit with helmet, American flag patch) 2. Riding a bicycle - Present (mountain bike being ridden) 3. Through a park - Present (trees, grass, benches, paved path) The unusual concept is coherently combined - the astronaut appears to be genuinely riding through a realistic park setting with proper lighting and shadows. VERDICT: PASS

Hard 100%vs67%
An octopus wearing a top hat and monocle playing chess on thlocal/flux-2-klein-9bPASS
An octopus wearing a top hat and monocle playing chess on the surface of the moon with Earth visible in the background

Evaluator (Qwen 3.5 122B)

The image contains all described elements: - Octopus: Present, central figure. - Top hat: Present, worn by the octopus. - Monocle: Present, worn by the octopus. - Chess: Present, being played on a board. - Moon surface: Present, depicted as the ground. - Earth in background: Present, two Earth-like planets visible. All elements are coherently combined. VERDICT: PASS

A dinosaur in a business suit giving a PowerPoint presentatilocal/flux-2-klein-9bPASS
A dinosaur in a business suit giving a PowerPoint presentation in a modern office while drinking coffee

Evaluator (Qwen 3.5 122B)

The image contains all requested elements: 1. Dinosaur: Present (T-Rex head). 2. Business suit: Present (dark suit, white shirt, blue tie). 3. PowerPoint presentation: Present (screen with slides and pointer). 4. Modern office: Present (desks, chairs, windows, plants). 5. Drinking coffee: Present (holding a white cup). All elements are coherently combined in a single scene. VERDICT: PASS

A robot dressed as a pirate sailing a paper boat on a lake olocal/flux-2-klein-9bPASS
A robot dressed as a pirate sailing a paper boat on a lake of lava while juggling three kittens

Evaluator (Qwen 3.5 122B)

The image contains all described elements: 1. Robot dressed as a pirate - Present (metallic body, pirate hat, coat) 2. Paper boat - Present (white origami-style boat) 3. Lake of lava - Present (glowing orange liquid with cracks) 4. Three kittens - Present (one on each hand, one floating above) 5. Juggling action - Present (kittens positioned as if being juggled) All elements are coherently combined in a single scene. VERDICT: PASS

CountingClick to expand44%vs56%

Counting compares whether local/flux-2-klein-9b or local/krea-2-turbo better creates the exact number of requested objects without additions or omissions.

Easy 100%vs100%
Exactly 3 red apples arranged on a wooden tablelocal/flux-2-klein-9bPASS
Exactly 3 red apples arranged on a wooden table

Evaluator (Qwen 3.5 122B)

The image displays exactly three red apples placed on a wooden surface. The apples are arranged in a row, and the count matches the prompt’s request precisely. No other objects are present that contradict the description. The visual content aligns perfectly with the specified criteria. VERDICT: PASS

Exactly 5 yellow rubber ducks sitting in a row on a bathroomlocal/flux-2-klein-9bPASS
Exactly 5 yellow rubber ducks sitting in a row on a bathroom shelf

Evaluator (Qwen 3.5 122B)

The image displays exactly five yellow rubber ducks arranged in a row on a white bathroom shelf. The count matches the prompt’s request precisely. All ducks are identical in appearance, facing forward, with orange beaks and black eyes. No extra or missing objects are present. The setting (bathroom shelf, tiled wall, mirror) aligns with context. Core criteria fully met. VERDICT: PASS

Exactly 4 white candles standing on a dark surfacelocal/flux-2-klein-9bPASS
Exactly 4 white candles standing on a dark surface

Evaluator (Qwen 3.5 122B)

The image displays exactly four white pillar candles arranged in a row on a dark, textured surface. Each candle is unlit and upright, with visible wicks. The count matches the prompt’s request precisely: four white candles, standing on a dark surface. No extra or missing objects are present. VERDICT: PASS

Medium 33%vs67%
2 orange cats and 3 black dogs sitting together on a grassy local/flux-2-klein-9bFAIL
2 orange cats and 3 black dogs sitting together on a grassy lawn

Evaluator (Qwen 3.5 122B)

The image shows 1 orange cat and 2 black dogs sitting on a grassy lawn. The prompt requested 2 orange cats and 3 black dogs. The counts do not match — there is one fewer cat and one fewer dog than specified. VERDICT: FAIL

4 red roses and 2 white daisies in a glass vase on a tablelocal/flux-2-klein-9bFAIL
4 red roses and 2 white daisies in a glass vase on a table

Evaluator (Qwen 3.5 122B)

The image shows a bouquet with multiple red roses — at least 5–6 visible, not 4. There are also 2–3 white daisies (one fully visible, one partially behind it, and possibly a third in the back). The vase is glass and on a wooden table, which matches. However, the rose count exceeds the requested “4,” and daisy count may be slightly off. Since the prompt specified exact counts and they don’t match precisely, this fails the core criterion. VERDICT: FAIL

1 blue butterfly and 3 yellow butterflies resting on a gardelocal/flux-2-klein-9bPASS
1 blue butterfly and 3 yellow butterflies resting on a garden fence

Evaluator (Qwen 3.5 122B)

The image shows a wooden garden fence with butterflies resting on it. There is exactly one blue butterfly on the left side of the fence. There are three yellow butterflies: one large in the center, one smaller to its right, and another small one further right near the fence post. The count matches the prompt’s request of “1 blue butterfly and 3 yellow butterflies.” All objects are clearly visible and correctly counted. VERDICT: PASS

Hard 0%vs0%
Exactly 9 goldfish swimming in a large round fishbowllocal/flux-2-klein-9bFAIL
Exactly 9 goldfish swimming in a large round fishbowl

Evaluator (Qwen 3.5 122B)

The image shows a round glass fishbowl containing goldfish and decorative stones at the bottom. Counting the goldfish: there are 8 clearly visible goldfish swimming in various orientations. The prompt requested exactly 9 goldfish. Since only 8 are present, the count does not match the requirement. VERDICT: FAIL

A fruit bowl containing exactly 3 bananas, 4 oranges, and 2 local/flux-2-klein-9bFAIL
A fruit bowl containing exactly 3 bananas, 4 oranges, and 2 apples

Evaluator (Qwen 3.5 122B)

The image shows a fruit bowl with: - 2 bananas (not 3) - 4 oranges (correct) - 1 apple (not 2) The counts do not match the requested quantities. Specifically, there is one fewer banana and one fewer apple than specified. VERDICT: FAIL

Exactly 11 colorful balloons floating against a clear blue slocal/flux-2-klein-9bFAIL
Exactly 11 colorful balloons floating against a clear blue sky

Evaluator (Qwen 3.5 122B)

The image shows 10 colorful balloons floating against a clear blue sky. The prompt requested exactly 11 balloons. Since the count is off by one, the core criterion is not met. VERDICT: FAIL

NegationClick to expand56%vs56%

Negation compares whether local/flux-2-klein-9b or local/krea-2-turbo better respects what should be absent from the image.

Easy 100%vs100%
A tropical beach scene with no peoplelocal/flux-2-klein-9bPASS
A tropical beach scene with no people

Evaluator (Qwen 3.5 122B)

The image shows a tropical beach scene with white sand, turquoise water, and palm trees leaning over the shore. The sky is blue with scattered clouds. There are footprints in the sand and some seaweed washed up near the waterline. No people are visible anywhere in the frame — no bodies, limbs, clothing, or silhouettes. The exclusion criterion (“no people”) is fully satisfied. VERDICT: PASS

A cat lying on a couch without a collarlocal/flux-2-klein-9bPASS
A cat lying on a couch without a collar

Evaluator (Qwen 3.5 122B)

The image shows an orange and white cat lying on a gray couch. The cat is looking at the camera with its front paws extended. There is no collar visible around the cat’s neck — the fur appears uninterrupted and clean in that area. The excluded element (collar) is truly absent. VERDICT: PASS

A classroom with desks and a chalkboard but no studentslocal/flux-2-klein-9bPASS
A classroom with desks and a chalkboard but no students

Evaluator (Qwen 3.5 122B)

The image shows an empty classroom with wooden desks and chairs arranged in rows facing a green chalkboard. There are no students visible in the room. The excluded element (students) is truly absent. VERDICT: PASS

Medium 67%vs33%
A sandy beach with palm trees, seashells, and a beach umbrellocal/flux-2-klein-9bPASS
A sandy beach with palm trees, seashells, and a beach umbrella but no water or ocean visible

Evaluator (Qwen 3.5 122B)

The image shows a sandy beach with palm trees, seashells in the foreground, and a green beach umbrella with lounge chairs. The background is a clear blue sky meeting the horizon line where sand ends — there is no visible water, ocean, sea, or any body of liquid. The excluded element (water/ocean) is truly absent. VERDICT: PASS

An office workspace with a desk, chair, bookshelf, and lamp local/flux-2-klein-9bFAIL
An office workspace with a desk, chair, bookshelf, and lamp but no computers or screens

Evaluator (Qwen 3.5 122B)

The image shows an office workspace with a wooden desk, a beige office chair, a tall bookshelf filled with books, and a black desk lamp. On the desk, there is a closed laptop (which could be considered a computer/screen when open) and some notebooks. Since a laptop is present — even if closed — it technically violates the exclusion of “no computers or screens.” VERDICT: FAIL

A kitchen with pots, pans, a stove, and a cutting board but local/flux-2-klein-9bPASS
A kitchen with pots, pans, a stove, and a cutting board but no food

Evaluator (Qwen 3.5 122B)

The image shows a kitchen scene with a gas stove, several stainless steel pots and pans, wooden cutting boards, and a folded cloth. The burners are lit with blue flames. There is no visible food in any of the pots, pans, or on the cutting boards. The excluded element — food — is indeed absent. VERDICT: PASS

Hard 0%vs33%
A pepperoni pizza with tomato sauce, basil, and crust but ablocal/flux-2-klein-9bFAIL
A pepperoni pizza with tomato sauce, basil, and crust but absolutely no cheese

Evaluator (Qwen 3.5 122B)

The image shows a pepperoni pizza with tomato sauce, basil leaves, and a golden-brown crust. However, it clearly contains melted cheese covering the sauce and beneath the pepperoni slices — visible as the white/yellowish layer with browned spots. Since the prompt explicitly excluded cheese, and cheese is prominently present, the core criterion is violated. VERDICT: FAIL

A car body with windows, doors, and headlights but no wheelslocal/flux-2-klein-9bFAIL
A car body with windows, doors, and headlights but no wheels

Evaluator (Qwen 3.5 122B)

The image shows a silver car with windows, doors, and headlights as described. However, the excluded element — wheels — is NOT truly absent. The rear wheel is clearly visible. The front wheel area is missing or obscured (possibly due to rendering error), but since at least one wheel is present, the exclusion criterion is violated. VERDICT: FAIL

A human face with eyes, nose, and mouth but no eyebrowslocal/flux-2-klein-9bFAIL
A human face with eyes, nose, and mouth but no eyebrows

Evaluator (Qwen 3.5 122B)

The image shows a human face with eyes, nose, and mouth. However, the excluded element — eyebrows — are clearly present. Both eyebrows are well-defined, dark, and naturally shaped above each eye. Since the prompt explicitly required “no eyebrows,” their presence violates the exclusion criterion. VERDICT: FAIL

Relative PositionClick to expand92%vs100%

Relative Position compares whether local/flux-2-klein-9b or local/krea-2-turbo more faithfully places objects in the requested spatial relationship.

Easy 100%vs100%
A cat sitting on top of a cardboard boxlocal/flux-2-klein-9bPASS
A cat sitting on top of a cardboard box

Evaluator (Qwen 3.5 122B)

The image shows a cat sitting directly on the top surface of a cardboard box. The cat is positioned centrally on the box, with its body oriented slightly to the right. The box is on the floor, and the cat is above it. All spatial relationships described in the prompt are accurately represented. VERDICT: PASS

A red ball underneath a wooden chairlocal/flux-2-klein-9bPASS
A red ball underneath a wooden chair

Evaluator (Qwen 3.5 122B)

The image shows a red ball positioned directly beneath the seat of a wooden chair, resting on the floor between the chair legs. The spatial relationship described — “a red ball underneath a wooden chair” — is accurately represented. All positional descriptors (underneath) are correct and visually confirmed. VERDICT: PASS

A coffee mug next to an open book on a desklocal/flux-2-klein-9bPASS
A coffee mug next to an open book on a desk

Evaluator (Qwen 3.5 122B)

The image shows a coffee mug positioned to the right of an open book on a wooden desk. The book is open and lies flat, with the mug placed beside it. The spatial relationship described in the prompt — “a coffee mug next to an open book on a desk” — is accurately represented. All positional relationships (on, next to) are correct. VERDICT: PASS

Medium 100%vs100%
A bird perched on a branch, the branch extending from a treelocal/flux-2-klein-9bPASS
A bird perched on a branch, the branch extending from a tree next to a park bench

Evaluator (Qwen 3.5 122B)

The image shows a bird perched on a branch that extends from a large tree trunk. The tree is positioned next to a park bench, which sits on grass near a paved path. The spatial relationships described — bird “on” branch, branch extending “from” tree, tree “next to” bench — are all accurately represented in the image. No positional errors detected. VERDICT: PASS

A cat sitting on a pillow, the pillow placed on a wooden chalocal/flux-2-klein-9bPASS
A cat sitting on a pillow, the pillow placed on a wooden chair

Evaluator (Qwen 3.5 122B)

The image shows a cat sitting on a pillow, which is placed on a wooden chair. The spatial relationships described in the prompt are accurately represented: the cat is on the pillow, and the pillow is on the chair. All positional relationships (on, under) are correct. VERDICT: PASS

A candle on top of a stack of books, the books next to a potlocal/flux-2-klein-9bPASS
A candle on top of a stack of books, the books next to a potted plant

Evaluator (Qwen 3.5 122B)

The image shows a lit candle placed directly on top of a stack of books. The stack of books is positioned to the left of a potted plant. All specified spatial relationships — “candle on top of books” and “books next to potted plant” — are accurately represented. The arrangement matches the prompt’s description precisely. VERDICT: PASS

Hard 67%vs100%
Five chess pieces on a board: a king in the center, a queen local/flux-2-klein-9bFAIL
Five chess pieces on a board: a king in the center, a queen to its left, a bishop to its right, a rook above the king, and a knight below the king

Evaluator (Qwen 3.5 122B)

The image shows five chess pieces on a board, but the spatial arrangement does not match the prompt. The prompt specifies: king in center, queen left, bishop right, rook above, knight below. In the image: there is no queen or bishop visible — instead, we see a rook (left), king (center), knight (right), pawn (below), and another king (above). The pieces and their positions are incorrect relative to the prompt’s description. VERDICT: FAIL

A desk scene: laptop in the center, coffee mug to the left, local/flux-2-klein-9bPASS
A desk scene: laptop in the center, coffee mug to the left, phone to the right, lamp above, and notebook below

Evaluator (Qwen 3.5 122B)

The image shows a desk scene with a laptop centered, a coffee mug to its left, and a smartphone to its right — all correct. A desk lamp is positioned above and to the right of the laptop, casting light downward — consistent with “lamp above.” Below the laptop, an open notebook lies flat on the desk — matching “notebook below.” All specified spatial relationships are accurately represented. VERDICT: PASS

Five animals in a field: a horse on the far left, a cow to tlocal/flux-2-klein-9bPASS
Five animals in a field: a horse on the far left, a cow to the right of the horse, a sheep in the center, a goat to the right of the sheep, and a dog on the far right

Evaluator (Qwen 3.5 122B)

The image shows five animals in a field arranged left to right as follows: a brown horse, a brown-and-white cow, a white sheep, a black-and-brown goat, and a tan dog. The spatial relationships match the prompt exactly — horse on far left, cow to its right, sheep in center, goat to sheep’s right, dog on far right. All positional descriptors are accurate. VERDICT: PASS

Extreme 100%vs100%
A landscape with a red barn in the foreground at large scalelocal/flux-2-klein-9bPASS
A landscape with a red barn in the foreground at large scale, a windmill in the midground at medium scale, and snow-capped mountains in the background at small scale, all with consistent depth perspective

Evaluator (Qwen 3.5 122B)

The image shows a red barn in the foreground on the right, a windmill in the midground to the left, and snow-capped mountains in the background spanning the horizon. The spatial arrangement matches the prompt: barn (foreground, large), windmill (midground, medium), mountains (background, small). Depth perspective is consistent. All positional relationships are correctly represented. VERDICT: PASS

A beach scene with a sandcastle in the foreground, a surfer local/flux-2-klein-9bPASS
A beach scene with a sandcastle in the foreground, a surfer on the waves in the midground, and a cruise ship on the horizon in the background, sizes decreasing with distance

Evaluator (Qwen 3.5 122B)

The image correctly shows a sandcastle in the foreground, a surfer riding waves in the midground, and a cruise ship on the horizon in the background. All objects are positioned as described: sandcastle closest to viewer, surfer further out on water, ship farthest away on horizon line. Sizes decrease appropriately with distance. Spatial relationships match the prompt exactly. VERDICT: PASS

A city park with a park bench and pigeons in the foreground,local/flux-2-klein-9bPASS
A city park with a park bench and pigeons in the foreground, a fountain in the midground, and skyscrapers in the background, each layer progressively smaller

Evaluator (Qwen 3.5 122B)

The image shows a park bench and pigeons in the foreground, a fountain in the midground, and skyscrapers in the background. The spatial arrangement matches the prompt: foreground elements are largest and closest, midground fountain is centered and smaller, background buildings are smallest and farthest. All positional relationships (on, under, left, right, above, below) are correctly represented. VERDICT: PASS

Scale & ProportionsClick to expand100%vs78%

Scale & Proportions compares whether local/flux-2-klein-9b or local/krea-2-turbo better keeps requested subjects and objects at believable relative sizes.

Easy 100%vs100%
A person standing next to a yellow school bus, showing realilocal/flux-2-klein-9bPASS
A person standing next to a yellow school bus, showing realistic relative scale

Evaluator (Qwen 3.5 122B)

The image shows a man standing next to a yellow school bus. The relative scale appears realistic — the man is appropriately sized compared to the bus, neither too large nor too small. His height relative to the bus’s door and windows matches real-world proportions. No obvious distortions or AI artifacts affect the size relationship. VERDICT: PASS

A ladybug sitting on a large green leaf, shown at realistic local/flux-2-klein-9bPASS
A ladybug sitting on a large green leaf, shown at realistic proportions

Evaluator (Qwen 3.5 122B)

The ladybug appears appropriately sized relative to the leaf it is resting on. The leaf shows detailed venation and texture consistent with a macro photograph, and the insect’s scale matches real-world proportions for a ladybug on a typical plant leaf. No anatomical or dimensional inconsistencies are visible. VERDICT: PASS

A child holding a basketball, both at correct relative sizeslocal/flux-2-klein-9bPASS
A child holding a basketball, both at correct relative sizes

Evaluator (Qwen 3.5 122B)

The child’s hands are holding the basketball in a natural, proportionate manner. The ball appears appropriately sized relative to the child’s torso and arms — not unnaturally large or small. Fingers wrap around the ball realistically, and the overall scale matches real-world expectations for a young child holding a standard basketball. No anatomical or object-size distortions are evident. VERDICT: PASS

Medium 100%vs67%
A tiny person standing on the rim of a coffee cup, looking dlocal/flux-2-klein-9bPASS
A tiny person standing on the rim of a coffee cup, looking down into it

Evaluator (Qwen 3.5 122B)

The image shows a tiny figurine of a person standing on the rim of a coffee cup, looking down into the liquid. The size relationship is consistent with the prompt — the person is miniature relative to the cup, which is itself a normal-sized coffee cup on a saucer. The proportions are visually coherent and match the described scenario. VERDICT: PASS

A giant tabby cat walking between city skyscrapers like a kalocal/flux-2-klein-9bPASS
A giant tabby cat walking between city skyscrapers like a kaiju

Evaluator (Qwen 3.5 122B)

The image depicts a giant tabby cat walking down a city street, towering over cars and buildings. The cat’s head reaches several stories high, consistent with the "kaiju" description. Its paws are larger than vehicles, and its body spans the width of the road. The surrounding skyscrapers appear appropriately scaled relative to the cat’s exaggerated size. All proportions align with the prompt’s intent. VERDICT: PASS

A miniature house sitting on the palm of someone's handlocal/flux-2-klein-9bPASS
A miniature house sitting on the palm of someone's hand

Evaluator (Qwen 3.5 122B)

The image shows a miniature house resting on an open human palm. The house is appropriately scaled to fit comfortably within the hand, with proportions that suggest it is a small model or dollhouse. The fingers and palm appear realistically sized relative to the house, maintaining believable spatial relationships. No obvious distortions or scale inconsistencies are present. VERDICT: PASS

Hard 100%vs67%
A group photo of an elephant, a horse, a dog, a cat, and a mlocal/flux-2-klein-9bPASS
A group photo of an elephant, a horse, a dog, a cat, and a mouse, all standing in a line with correct real-world proportional sizes

Evaluator (Qwen 3.5 122B)

The image shows an elephant, horse, dog, cat, and mouse arranged in a line from largest to smallest, which aligns with real-world proportional sizes. The elephant is largest, followed by the horse, then the dog, cat, and mouse — each progressively smaller as expected. While the scene is fantastical (these animals wouldn’t naturally stand together), the relative scaling between them is accurate. VERDICT: PASS

A dinner table scene with a plate, fork, knife, wine glass, local/flux-2-klein-9bPASS
A dinner table scene with a plate, fork, knife, wine glass, salt shaker, and peppercorn, all at realistic relative sizes

Evaluator (Qwen 3.5 122B)

The image shows a table setting with plates, cutlery, wine glasses, and shakers. The relative sizes appear mostly consistent — the wine glass is appropriately taller than the plates, cutlery is proportional to the plates, and salt/pepper shakers are smaller than the plates but larger than scattered peppercorns. However, the two knives placed side-by-side next to the foreground plate are nearly identical in size and shape, which is unusual for a standard place setting (typically one dinner knife

A farm scene with a barn, tractor, farmer, cow, chicken, andlocal/flux-2-klein-9bPASS
A farm scene with a barn, tractor, farmer, cow, chicken, and a fence post, all at correct proportional scale

Evaluator (Qwen 3.5 122B)

The image displays a farm scene with a barn, tractor, farmer, cow, and chickens. The relative sizes appear consistent: the barn is the largest structure, the tractor is appropriately sized relative to the farmer and cow, and the chickens are small in comparison to the other elements. The fence post is also proportionate to the farmer and animals. All objects maintain realistic scale relationships. VERDICT: PASS

Human realism 88%vs79%

Faces & ExpressionsClick to expand100%vs92%

Faces & Expressions compares whether local/flux-2-klein-9b or local/krea-2-turbo creates more realistic faces with the requested age, emotion, and expression cues.

Easy 100%vs100%
A close-up portrait of a young woman with a warm natural smilocal/flux-2-klein-9bPASS
A close-up portrait of a young woman with a warm natural smile

Evaluator (Qwen 3 VL)

Face is symmetric with natural features: eyes, nose, and mouth are well-proportioned. Warm, genuine smile is clearly readable, with crinkled eyes and relaxed cheeks. No obvious distortions. Lighting and hair detail enhance realism. Meets prompt criteria for a natural, warm smile in a close-up portrait. VERDICT: PASS

A portrait of a middle-aged man with a calm, serious expresslocal/flux-2-klein-9bPASS
A portrait of a middle-aged man with a calm, serious expression

Evaluator (Qwen 3 VL)

Face is symmetric with natural, correctly proportioned features. Expression conveys calm seriousness through neutral mouth, direct gaze, and subtle brow lines. No visible distortions. Lighting and detail are realistic. Meets prompt requirements. VERDICT: PASS

A headshot of a teenager with a relaxed, neutral facelocal/flux-2-klein-9bPASS
A headshot of a teenager with a relaxed, neutral face

Evaluator (Qwen 3 VL)

The face is symmetric with correctly proportioned features. The teenager’s expression is neutral and relaxed, matching the prompt. No significant distortions are visible. Eyes, nose, and mouth are natural and well-defined. Lighting and focus are appropriate for a headshot. VERDICT: PASS

Medium 100%vs100%
A close-up of a person's face showing clear disgust, nose wrlocal/flux-2-klein-9bPASS
A close-up of a person's face showing clear disgust, nose wrinkled and lip curled

Evaluator (Qwen 3 VL)

The face shows clear disgust: nose wrinkled, lips curled, and brows furrowed. Features are symmetric and anatomically correct. No distortions; skin texture and fine details are natural. Emotion is unmistakably readable. Minor blemishes are present but don’t affect facial integrity. VERDICT: PASS

A portrait capturing genuine surprise — raised eyebrows, widlocal/flux-2-klein-9bPASS
A portrait capturing genuine surprise — raised eyebrows, wide eyes, slightly open mouth

Evaluator (Qwen 3 VL)

The face is symmetric with correctly rendered features. The expression clearly conveys surprise: eyebrows are raised, eyes are wide, and mouth is slightly open. No distortions are present. The emotion is vivid and readable. VERDICT: PASS

A face showing quiet melancholy — downcast eyes, slightly drlocal/flux-2-klein-9bPASS
A face showing quiet melancholy — downcast eyes, slightly drawn mouth, distant gaze

Evaluator (Qwen 3 VL)

Face is symmetric with natural, correctly rendered features. The downcast eyes, slightly drawn mouth, and distant gaze clearly convey quiet melancholy. No significant distortions or artifacts. Lighting and texture are realistic. The emotion is unmistakably readable and well-executed. VERDICT: PASS

Hard 100%vs67%
An elderly woman in her 80s with deep wrinkles and white hailocal/flux-2-klein-9bPASS
An elderly woman in her 80s with deep wrinkles and white hair, smiling warmly with kind eyes

Evaluator (Qwen 3 VL)

Face is symmetric with natural, aged features including deep wrinkles and white hair. Warm, genuine smile with crinkled eyes clearly conveys kindness. No distortions; lighting and texture are realistic. Emotion and prompt details are accurately rendered. VERDICT: PASS

A 5-year-old child with round cheeks and large eyes, crying local/flux-2-klein-9bPASS
A 5-year-old child with round cheeks and large eyes, crying with tears streaming down their face

Evaluator (Qwen 3 VL)

The child’s face is symmetric with natural features: round cheeks, large eyes, and tears streaming down. The crying expression is clearly readable, with furrowed brows, parted lips, and visible tears. No distortions are present. Features align well with the prompt. VERDICT: PASS

An elderly man in his 70s with age spots and thinning gray hlocal/flux-2-klein-9bPASS
An elderly man in his 70s with age spots and thinning gray hair, looking skeptically over reading glasses

Evaluator (Qwen 3 VL)

Face is symmetric with natural wrinkles and age spots. Thinning gray hair and reading glasses are accurate. Skeptical expression is clearly readable via furrowed brow and downturned mouth. No significant distortions. Features align well with prompt. VERDICT: PASS

Extreme 100%vs100%
A group of three friends at a table: one laughing with head local/flux-2-klein-9bPASS
A group of three friends at a table: one laughing with head tilted back, one looking shocked with wide eyes, and one rolling their eyes with a smirk

Evaluator (Qwen 3 VL)

Faces are symmetric with clear, stylized features. Emotions are distinctly readable: left person laughs with head back, center looks shocked with wide eyes, right person smirks while rolling eyes. No significant distortions. Art style is consistent and expressive. All three expressions match the prompt accurately. VERDICT: PASS

A family photo with four people: a father beaming with pridelocal/flux-2-klein-9bPASS
A family photo with four people: a father beaming with pride, a mother smiling gently, a teenager looking bored, and a child grinning excitedly

Evaluator (Qwen 3 VL)

Faces show natural symmetry and correct anatomical features. Emotions are clearly readable: father beams, mother smiles gently, child grins excitedly. Teenager’s expression is ambiguous but not visibly distorted. All faces are well-defined without AI artifacts. VERDICT: PASS

Three coworkers in a meeting: one looking frustrated with fulocal/flux-2-klein-9bPASS
Three coworkers in a meeting: one looking frustrated with furrowed brows, one nodding with a pleased expression, and one looking confused with a tilted head

Evaluator (Qwen 3 VL)

Faces are symmetric with correct anatomical features. Emotions are clearly readable: left man shows frustration (furrowed brow), center woman shows pleased (smiling, chin resting), right man shows confusion (tilted head, puzzled gaze). No distortions or unnatural artifacts. All three expressions match the prompt accurately. VERDICT: PASS

Full BodyClick to expand100%vs92%

Full Body compares whether local/flux-2-klein-9b or local/krea-2-turbo renders full-body people with more natural anatomy, clothing, pose, and proportions.

Easy 100%vs100%
A woman standing upright in a relaxed pose, arms at her sidelocal/flux-2-klein-9bPASS
A woman standing upright in a relaxed pose, arms at her sides, wearing jeans and a white t-shirt

Evaluator (Qwen 3 VL)

The woman stands upright with arms at her sides, wearing jeans and a white t-shirt as described. Limb count and proportions are anatomically correct. Pose is clearly recognizable as relaxed standing. No distortions or anomalies are present. All elements match the prompt precisely. VERDICT: PASS

A man standing straight facing the camera, hands in pockets,local/flux-2-klein-9bPASS
A man standing straight facing the camera, hands in pockets, normal proportions

Evaluator (Qwen 3 VL)

The man stands straight, facing forward, with hands in pockets. Limb count and proportions are anatomically correct. The pose is clearly recognizable as described. No distortions are present. All elements match the prompt precisely. VERDICT: PASS

A person standing in a neutral pose in a park, full body vislocal/flux-2-klein-9bPASS
A person standing in a neutral pose in a park, full body visible from head to feet

Evaluator (Qwen 3 VL)

The person’s full body is visible from head to feet, standing in a neutral pose on a park path. Limb count and proportions are anatomically correct. The pose is recognizable as standing still, hands slightly curled at sides. No significant distortions are present. VERDICT: PASS

Medium 100%vs100%
A woman mid-stride running through a park, hair and clothes local/flux-2-klein-9bPASS
A woman mid-stride running through a park, hair and clothes showing motion

Evaluator (Qwen 3 VL)

The woman’s limbs are correctly proportioned and count is accurate. Her running pose is anatomically plausible, with one leg forward, the other back, arms bent and swinging. Hair and clothing show motion blur consistent with running. No distortions are visible. The action is clearly recognizable as mid-stride running. VERDICT: PASS

A man captured mid-jump over a puddle, body in the air with local/flux-2-klein-9bPASS
A man captured mid-jump over a puddle, body in the air with bent knees

Evaluator (Qwen 3 VL)

The man’s limbs are correctly counted and proportioned. His pose—bent knees, arms out for balance—matches the described mid-air jump over a puddle. The action is clearly recognizable, with realistic motion blur and splash effects. No anatomical distortions are visible. The reflection in the puddle adds authenticity. VERDICT: PASS

A dancer in a dynamic spin, dress flowing outward with one llocal/flux-2-klein-9bPASS
A dancer in a dynamic spin, dress flowing outward with one leg extended

Evaluator (Qwen 3 VL)

The dancer’s limbs are correctly proportioned and anatomically plausible. One leg is extended in a dynamic spin, the other supports weight en pointe. The dress flows outward as described. No distortions in body structure or pose are evident. The action is clearly recognizable as a spinning dance move. VERDICT: PASS

Hard 100%vs67%
A person in a yoga tree pose — standing on one leg with the local/flux-2-klein-9bPASS
A person in a yoga tree pose — standing on one leg with the other foot pressed to the inner thigh, arms raised overhead

Evaluator (Qwen 3 VL)

The person has two legs, correct proportions, and performs the tree pose accurately: standing on one leg, the other foot pressed to the inner thigh, arms raised overhead in prayer position. No distortions or anatomical errors are visible. The pose is clearly recognizable and matches the prompt. VERDICT: PASS

A rock climber clinging to a cliff face with arms and legs elocal/flux-2-klein-9bPASS
A rock climber clinging to a cliff face with arms and legs extended to different holds

Evaluator (Qwen 3 VL)

The climber has two arms and two legs, with proportions and pose consistent with realistic rock climbing. Limbs are extended to different holds as described. No anatomical distortions are visible. The action is clearly recognizable as dynamic climbing. All elements align with the prompt. VERDICT: PASS

A cellist seated with correct posture, legs around the instrlocal/flux-2-klein-9bPASS
A cellist seated with correct posture, legs around the instrument, bow arm extended

Evaluator (Qwen 3 VL)

The cellist’s posture and limb count are anatomically correct. Legs are wrapped around the instrument, bow arm is extended, and the cello is properly supported. Proportions are natural, and the pose is clearly recognizable as a seated cello performance. No distortions are visible. VERDICT: PASS

Extreme 100%vs100%
Two tango dancers in a close embrace, the man dipping the wolocal/flux-2-klein-9bPASS
Two tango dancers in a close embrace, the man dipping the woman backward with intertwined legs

Evaluator (Qwen 3 VL)

The dancers’ limbs are correctly counted and proportioned. The woman is dipped backward with intertwined legs, matching the prompt. The pose is anatomically plausible and recognizable as a tango move. No significant distortions are visible. The composition and lighting are natural. VERDICT: PASS

Two soccer players mid-tackle, one sliding and the other jumlocal/flux-2-klein-9bPASS
Two soccer players mid-tackle, one sliding and the other jumping over with the ball at their feet

Evaluator (Qwen 3 VL)

Both players show correct limb count and proportions. The sliding player’s body posture and the jumping player’s airborne pose are anatomically plausible and match the described action. The ball is at the sliding player’s feet, and the dynamic interaction is recognizable. No distortions are present. VERDICT: PASS

Two gymnasts performing a synchronized handstand, side by silocal/flux-2-klein-9bPASS
Two gymnasts performing a synchronized handstand, side by side with identical body alignment

Evaluator (Qwen 3 VL)

Both gymnasts show correct limb count and proportions. Their synchronized handstand pose is anatomically accurate, with legs straight, feet together, and hands supporting weight. Alignment is nearly identical, though slight variations in head tilt and arm positioning are natural. No distortions are visible. The action is clearly recognizable as a synchronized handstand. VERDICT: PASS

HandsClick to expand58%vs42%

Hands compares whether local/flux-2-klein-9b or local/krea-2-turbo draws more anatomically plausible hands, fingers, and gestures.

Easy 67%vs67%
A person standing with both hands visible at their sides, filocal/flux-2-klein-9bPASS
A person standing with both hands visible at their sides, fingers relaxed and naturally spread

Evaluator (Qwen 3.5 122B)

The left hand (viewer’s left) shows 4 fingers clearly; the thumb is partially obscured but likely present. The right hand (viewer’s right) also shows 4 fingers clearly; thumb is visible and separate. Both hands appear to have 5 digits anatomically, though one finger per hand is slightly hidden or blended due to angle/pose. No gross anomalies — grip is relaxed, fingers naturally spread as described. Core criteria met. VERDICT: PASS

A woman holding her hands out in front of her palms-up, all local/flux-2-klein-9bPASS
A woman holding her hands out in front of her palms-up, all ten fingers clearly visible

Evaluator (Qwen 3.5 122B)

The woman’s hands are extended palms-up with all ten fingers clearly visible. Each hand has exactly five fingers: thumb, index, middle, ring, and pinky — all anatomically proportioned and correctly positioned. No extra, missing, or malformed digits. Gesture is natural and consistent with the prompt. VERDICT: PASS

A man with arms crossed, both hands resting on opposite uppelocal/flux-2-klein-9bFAIL
A man with arms crossed, both hands resting on opposite upper arms with fingers visible

Evaluator (Qwen 3.5 122B)

The left hand (on the right side of image) shows 4 fingers clearly — thumb is tucked or obscured. The right hand (on left side of image) also shows 4 fingers; thumb is not visible. Neither hand displays exactly 5 visible fingers. Grip is anatomically plausible for crossed arms, but finger count fails the prompt’s requirement. Anomaly: missing visible thumbs on both hands. VERDICT: FAIL

Medium 33%vs33%
A person holding a white ceramic coffee mug by the handle wilocal/flux-2-klein-9bPASS
A person holding a white ceramic coffee mug by the handle with their right hand

Evaluator (Qwen 3.5 122B)

The right hand gripping the mug shows four fingers and a thumb, totaling five digits — anatomically correct. The left hand is not visible in the image. Since only one hand is shown and it meets the criteria of 5 fingers with proper grip, the core requirement is satisfied. VERDICT: PASS

A woman writing in a notebook with a pen gripped correctly ilocal/flux-2-klein-9bFAIL
A woman writing in a notebook with a pen gripped correctly in her right hand

Evaluator (Qwen 3.5 122B)

The right hand gripping the pen shows 5 fingers, but the thumb and index finger appear fused or malformed at the tip, and the pen grip is awkward. The left hand resting on the page also shows 5 fingers, but they are slightly distorted with unnatural spacing and curvature. While the count is technically 5 per hand, the anatomy and grip are not fully correct. VERDICT: FAIL

A person playing an acoustic guitar with left hand on the frlocal/flux-2-klein-9bFAIL
A person playing an acoustic guitar with left hand on the fretboard and right hand strumming

Evaluator (Qwen 3.5 122B)

The right hand (strumming) shows only 4 fingers clearly, with the thumb partially visible — but the pinky appears fused or missing. The left hand (fretting) has 5 fingers, but the index finger is oddly bent and the ring/pinky fingers appear merged or distorted. Anatomical correctness fails on both hands due to unnatural finger positioning and possible fusion/omission. VERDICT: FAIL

Hard 33%vs67%
A person making a peace sign with their right hand, clearly local/flux-2-klein-9bFAIL
A person making a peace sign with their right hand, clearly showing two fingers up and three curled

Evaluator (Qwen 3.5 122B)

The hand making the peace sign shows two fingers extended (index and middle) and three curled (thumb, ring, pinky) — anatomically correct for a right-hand peace gesture. However, the thumb is visibly curled inward but appears to be fused or malformed with the index finger base, lacking clear separation. The other fingers are proportionate and correctly positioned. No extra or missing fingers overall (5 total), but the thumb-index junction is anomalous. VERDICT: FAIL

A hand giving a thumbs-up with four fingers curled and thumblocal/flux-2-klein-9bPASS
A hand giving a thumbs-up with four fingers curled and thumb extended upward

Evaluator (Qwen 3.5 122B)

The image shows one hand making a thumbs-up gesture. The thumb is extended upward, and the other four fingers are curled into a fist — this matches the prompt exactly. Anatomically, the hand appears correct: five fingers total (thumb + four curled), proper knuckle alignment, natural skin texture, and no extra or missing digits. No anomalies detected. VERDICT: PASS

A person counting to three on their fingers, with index, midlocal/flux-2-klein-9bFAIL
A person counting to three on their fingers, with index, middle, and ring fingers extended

Evaluator (Qwen 3.5 122B)

The image shows one hand with four fingers extended (thumb, index, middle, ring) and the pinky curled — not matching the prompt’s description of “index, middle, and ring fingers extended” for counting to three. The thumb is also extended, making it a four-finger gesture. Anatomically, the hand structure appears correct, but the gesture does not match the prompt. Only one hand is visible; no second hand to evaluate. Core criteria (counting to three with specified fingers) is not met. VERDICT: FA

Extreme 100%vs0%
Two people shaking hands in a professional setting, all finglocal/flux-2-klein-9bPASS
Two people shaking hands in a professional setting, all fingers of both hands anatomically correct

Evaluator (Qwen 3.5 122B)

Upon close examination of the handshake: - The left hand (from viewer’s perspective) shows 5 fingers clearly visible and correctly positioned. - The right hand also displays 5 fingers, with natural curvature and grip alignment consistent with a firm handshake. - No extra, missing, or malformed fingers are present. - Thumb placement and finger interlock appear anatomically plausible for a professional handshake. VERDICT: PASS

Two people doing a high-five, palms meeting with all fingerslocal/flux-2-klein-9bPASS
Two people doing a high-five, palms meeting with all fingers extended on both hands

Evaluator (Qwen 3.5 122B)

The image shows two hands meeting in a high-five gesture. The left hand (from viewer’s perspective) clearly has 5 fingers extended and anatomically correct. The right hand also displays 5 fingers, but the thumb is partially obscured behind the other hand — however, it is still visible at the edge. All fingers are extended as required. No anatomical anomalies or extra/missing digits are present. Gesture matches prompt. VERDICT: PASS

One person handing a small gift box to another person, both local/flux-2-klein-9bPASS
One person handing a small gift box to another person, both sets of hands gripping the box correctly

Evaluator (Qwen 3.5 122B)

The image shows four hands interacting with a gift box. The left person’s hands (two visible) each show 5 fingers — thumbs and fingers are clearly defined and anatomically correct. The right person’s hands (two visible) also each show 5 fingers; grip is natural, no extra or missing digits. No anomalies detected. All hands appear realistic and correctly formed. VERDICT: PASS

Multi-SubjectClick to expand100%vs100%

Multi-Subject compares whether local/flux-2-klein-9b or local/krea-2-turbo better keeps multiple people distinct with the requested attributes, outfits, and positions.

Easy 100%vs100%
Two women standing side by side: one with short blonde hair local/flux-2-klein-9bPASS
Two women standing side by side: one with short blonde hair wearing a red jacket, the other with long black hair wearing a blue dress

Evaluator (Qwen 3 VL)

Left woman: short blonde hair, wearing red jacket over dark top. Right woman: long black hair, wearing blue dress. Both visually distinct and match prompt descriptions precisely. No missing or mismatched elements. VERDICT: PASS

Two men at a cafe: one with a beard and glasses wearing a grlocal/flux-2-klein-9bPASS
Two men at a cafe: one with a beard and glasses wearing a green sweater, the other clean-shaven in a black hoodie

Evaluator (Qwen 3 VL)

Both men are present and visually distinct. Left: bearded, wearing glasses and a green sweater. Right: clean-shaven, wearing a black hoodie. Features match prompt exactly. No visual artifacts or mismatches. VERDICT: PASS

Two friends posing: one tall with curly red hair in denim ovlocal/flux-2-klein-9bPASS
Two friends posing: one tall with curly red hair in denim overalls, the other short with straight brown hair in a yellow sundress

Evaluator (Qwen 3 VL)

Tall person: curly red hair, denim overalls (over tank top). Short person: straight brown hair, yellow floral sundress. Both visually distinct and match descriptions precisely. No missing elements or mismatches. VERDICT: PASS

Medium 100%vs100%
Four people at a dinner table: a bald man in a suit, a womanlocal/flux-2-klein-9bPASS
Four people at a dinner table: a bald man in a suit, a woman with silver hair in a red blouse, a young man with dreadlocks in a denim jacket, and a woman with a hijab in a green dress

Evaluator (Qwen 3 VL)

All four individuals are present and visually distinct. Left: bald man in suit. Next: woman with silver hair in red blouse. Center: young man with dreadlocks in denim jacket. Right: woman with hijab in green dress. Features match prompt exactly. VERDICT: PASS

Three musicians on stage: a tall woman with short pink hair local/flux-2-klein-9bPASS
Three musicians on stage: a tall woman with short pink hair playing bass, a man with a beard and bandana on drums, and a slim man with glasses playing keyboard

Evaluator (Qwen 3 VL)

Left: Tall woman with short pink hair, playing bass guitar. Center: Man with beard and bandana, seated at drum kit. Right: Slim man with glasses, playing keyboard. All three are visually distinct and match described features. VERDICT: PASS

Three hikers on a trail: a man with a backpack and sunhat, alocal/flux-2-klein-9bPASS
Three hikers on a trail: a man with a backpack and sunhat, a woman with braided hair and hiking poles, and a teenager with a baseball cap and water bottle

Evaluator (Qwen 3 VL)

Man: wears sunhat and backpack. Woman: has braided hair, uses hiking poles, wears backpack. Teenager: wears baseball cap, carries water bottle, wears backpack. All three are visually distinct and match described features precisely. VERDICT: PASS

Truthfulness 74%vs70%

PhotorealismClick to expand100%vs100%

Photorealism compares whether local/flux-2-klein-9b or local/krea-2-turbo makes materials, surfaces, and real-world objects look more visually plausible and distinct.

Medium 100%vs100%
A still life on a wooden table featuring a clear glass of walocal/flux-2-klein-9bPASS
A still life on a wooden table featuring a clear glass of water, a polished silver spoon, a folded linen napkin, and a ripe red apple — each material rendered with accurate texture and light behavior

Evaluator (Qwen 3.6 27B)

The image successfully renders the requested materials with high fidelity. The glass exhibits correct refraction and transparency, distorting the wood grain behind it. The silver spoon shows sharp, accurate specular reflections. The wooden table displays a convincing grain texture, and the linen napkin has a realistic woven fabric appearance. The apple’s skin reflects light naturally. All visual properties align with the prompt's requirements for accurate texture and light behavior. VERDICT: PA

A workspace desk with a brushed aluminum laptop, a leather-blocal/flux-2-klein-9bPASS
A workspace desk with a brushed aluminum laptop, a leather-bound notebook, a ceramic coffee mug, and a glass paperweight — all materials visually distinct and realistic

Evaluator (Qwen 3.6 27B)

The image successfully renders the requested materials with high fidelity. The laptop exhibits a convincing brushed aluminum finish with appropriate metallic reflections. The notebook displays realistic leather grain and texture. The ceramic mug has a matte, speckled surface consistent with pottery, while the glass paperweight demonstrates accurate refraction and transparency. The wooden desk shows natural grain patterns. All materials are visually distinct and realistic. VERDICT: PASS

A kitchen counter with a stainless steel pot, a wooden cuttilocal/flux-2-klein-9bPASS
A kitchen counter with a stainless steel pot, a wooden cutting board, a glass bottle of olive oil, and a cotton dish towel — each surface reflecting light correctly

Evaluator (Qwen 3.6 27B)

The image accurately renders the distinct material properties requested. The stainless steel pot exhibits sharp, high-contrast reflections consistent with polished metal. The glass bottle shows correct translucency and refraction of the olive oil within. The wooden cutting board displays natural grain and matte texture, while the cotton towel features soft folds and diffuse light absorption typical of fabric. All surfaces interact with light realistically. VERDICT: PASS

Physics & ReflectionsClick to expand92%vs83%

Physics & Reflections compares whether local/flux-2-klein-9b or local/krea-2-turbo better follows physical cues such as shadows, reflections, gravity, transparency, and contact points.

Easy 100%vs67%
A red sphere on a white surface with a single shadow cast tolocal/flux-2-klein-9bPASS
A red sphere on a white surface with a single shadow cast to the right, consistent with a light source from the upper left

Evaluator (Qwen 3.6 27B)

The image displays a red sphere on a white surface. The lighting originates from the upper left, evidenced by the bright specular highlight on the sphere's upper-left quadrant. Consequently, the shadow is cast to the lower right, which is physically consistent with the light source direction. The shadow’s shape and softness are realistic for a spherical object under directional lighting. The color and orientation are accurate. VERDICT: PASS

A coffee mug on a sunny windowsill casting a sharp shadow tolocal/flux-2-klein-9bPASS
A coffee mug on a sunny windowsill casting a sharp shadow to the left

Evaluator (Qwen 3.6 27B)

The lighting is consistent with a strong light source from the upper right, casting a sharp, dark shadow to the left of the mug. The shadow’s shape accurately mirrors the mug’s form, including the handle. Reflections on the coffee surface and the glossy ceramic are natural. The window frame and outdoor scene show appropriate depth and lighting. No distortions or implausible elements detected. VERDICT: PASS

A person standing outdoors at midday with a short shadow dirlocal/flux-2-klein-9bPASS
A person standing outdoors at midday with a short shadow directly below them

Evaluator (Qwen 3.6 27B)

The shadow is cast directly beneath the subject, consistent with a midday sun position. The shadow’s shape accurately mirrors the person’s stance and clothing, with no distortion or unnatural stretching. The darkness and soft edges are appropriate for bright outdoor lighting. No reflections are expected in this scene, and none appear. Overall, the shadow behavior is physically plausible and aligns with the prompt’s description. VERDICT: PASS

Medium 100%vs100%
A woman standing in front of a full-length mirror, her reflelocal/flux-2-klein-9bPASS
A woman standing in front of a full-length mirror, her reflection accurately showing the back of her outfit and reversed orientation

Evaluator (Qwen 3.6 27B)

The reflection in the mirror is physically plausible. It accurately depicts the woman’s back, matching her posture and outfit (beige top, dark jeans). The orientation is correctly reversed relative to the room’s layout — for example, the bed and window in the reflection align with expected spatial inversion. Lighting and shadows are consistent with ambient indoor light, and no distortion or color mismatch is evident. The mirror frame and floor perspective also support realism. VERDICT: PASS

A cat sitting on a polished marble floor with its reflectionlocal/flux-2-klein-9bPASS
A cat sitting on a polished marble floor with its reflection visible below

Evaluator (Qwen 3.6 27B)

The reflection is physically plausible. It is correctly oriented vertically below the cat, matching its pose and position. The slight blurring and reduced contrast in the reflection accurately simulate the optical properties of a polished surface. The lighting direction is consistent, with highlights on the floor aligning with the light source. The colors are appropriately muted in the reflection compared to the real subject. VERDICT: PASS

A vase of flowers on a glass table with a clear reflection vlocal/flux-2-klein-9bPASS
A vase of flowers on a glass table with a clear reflection visible on the table surface

Evaluator (Qwen 3.6 27B)

The reflection on the glass table is physically plausible. It correctly mirrors the vase and flowers with appropriate vertical inversion and slight distortion due to the glass surface. The reflection’s brightness and color match the original objects, and its position aligns with the light source and table geometry. No obvious artifacts or inconsistencies are present. VERDICT: PASS

Hard 100%vs67%
A row of trees reflected in a calm lake at sunset, with the local/flux-2-klein-9bPASS
A row of trees reflected in a calm lake at sunset, with the reflection slightly darker and gently rippled

Evaluator (Qwen 3.6 27B)

The reflections are vertically inverted, matching the trees' orientation. The color palette accurately mirrors the sunset sky and dark silhouettes. The water surface shows gentle ripples that distort the reflection slightly, consistent with a calm lake. The lighting and shadow directions are coherent with a sunset scene. No physical anomalies are present. VERDICT: PASS

A city skyline reflected in a rain puddle on a street, with local/flux-2-klein-9bPASS
A city skyline reflected in a rain puddle on a street, with mild distortion and color shift

Evaluator (Qwen 3.6 27B)

The reflection in the puddle is physically plausible. It correctly inverts the scene vertically, showing the buildings and sky upside down. The distortion is consistent with a shallow water surface, exhibiting slight waviness and blurring that mimics real-world refraction. The color shift is subtle but present, with the reflection appearing slightly darker and more muted than the direct view, which aligns with how light behaves when reflecting off water. The orientation matches the perspective o

A mountain range reflected in a still alpine lake, with the local/flux-2-klein-9bPASS
A mountain range reflected in a still alpine lake, with the reflection inverted and subtly muted in color

Evaluator (Qwen 3.6 27B)

The reflection is physically plausible, showing a clear inverted mirror image of the mountains. The orientation is correct, and the subtle muting of colors in the water compared to the sky and peaks aligns with realistic light absorption and scattering in water. The stillness of the lake allows for a sharp reflection, which is consistent with the calm surface depicted. VERDICT: PASS

Extreme 67%vs100%
A clear glass sphere sitting on a mirror surface near a poollocal/flux-2-klein-9bPASS
A clear glass sphere sitting on a mirror surface near a pool of water, showing refraction through the sphere, reflection on the mirror, and water reflection simultaneously

Evaluator (Qwen 3.6 27B)

The image displays a glass sphere on a reflective surface with a clear, inverted reflection below it, consistent with optical physics. The refraction within the sphere correctly inverts the background scene. The lighting and shadows align logically with the light source. The water reflection in the foreground is coherent with the environment. All optical phenomena—refraction, reflection, and shadow—are physically plausible and accurately rendered. VERDICT: PASS

A chrome sculpture in a room with a glass floor and large wilocal/flux-2-klein-9bFAIL
A chrome sculpture in a room with a glass floor and large windows, showing multiple reflections and environmental mapping on every surface

Evaluator (Qwen 3.6 27B)

The reflections on the chrome sculpture are physically plausible, showing distorted environmental mapping of the windows and room. The reflection on the glass floor is correctly oriented and aligned with the sculpture above. However, the shadows are missing; there is no cast shadow from the sculpture onto the floor or surrounding surfaces, which is physically impossible given the strong lighting from the windows. This lack of grounding shadows makes the scene physically implausible. VERDICT: FA

A crystal wine glass on a polished marble counter next to a local/flux-2-klein-9bPASS
A crystal wine glass on a polished marble counter next to a window overlooking a lake — showing refraction in the glass, reflection on the marble, and the lake's own reflective surface

Evaluator (Qwen 3.6 27B)

The reflections and refractions are physically plausible. The marble countertop shows a clear, slightly distorted reflection of the glass stem and base, consistent with a polished surface. The glass itself exhibits realistic refraction, bending the background view through its curved shape and cut patterns. The lake’s surface reflects the sky and clouds naturally, with appropriate shimmering highlights. Shadows align with the light source from the window. No directional or color inconsistencies a

World KnowledgeClick to expand50%vs50%

World Knowledge compares whether local/flux-2-klein-9b or local/krea-2-turbo depicts recognizable places, landmarks, flags, and known objects with more accurate real-world details.

Easy 100%vs100%
The Eiffel Tower in Paris on a clear day, architecturally aclocal/flux-2-klein-9bPASS
The Eiffel Tower in Paris on a clear day, architecturally accurate with its iron lattice structure and three levels

Evaluator (Qwen 3.6 27B)

The image depicts the Eiffel Tower with high architectural accuracy. The iron lattice structure, the three distinct levels (including the top platform and antenna), and the arched base are all correctly rendered. The perspective is realistic, showing the tower against a clear blue sky with surrounding trees and buildings in the background. There are no visible distortions or structural anomalies. The subject is clearly recognizable and factually accurate. VERDICT: PASS

The Taj Mahal in Agra seen from the reflecting pool, with aclocal/flux-2-klein-9bPASS
The Taj Mahal in Agra seen from the reflecting pool, with accurate dome, minarets, and symmetry

Evaluator (Qwen 3.6 27B)

The image depicts the Taj Mahal with high fidelity. The central dome, four minarets, and the symmetrical layout are architecturally accurate. The perspective from the reflecting pool is correct, showing the iconic reflection. The surrounding red sandstone structures and cypress trees are also present and correctly placed. The lighting and textures appear realistic, contributing to the overall authenticity of the scene. VERDICT: PASS

The Statue of Liberty on her island pedestal, with correct tlocal/flux-2-klein-9bPASS
The Statue of Liberty on her island pedestal, with correct torch, crown, tablet, and green patina

Evaluator (Qwen 3.6 27B)

The Statue of Liberty is clearly recognizable and accurately depicted with correct green patina, raised torch, spiked crown, and tablet. The pedestal and base structure align with real-world architecture. Minor details like the surrounding water, island vegetation, and visitor crowds enhance realism. No significant factual or architectural inaccuracies are present. VERDICT: PASS

Medium 33%vs33%
A traditional Japanese shamisen instrument with accurate thrlocal/flux-2-klein-9bFAIL
A traditional Japanese shamisen instrument with accurate three-stringed design, wooden body, and bachi plectrum

Evaluator (Qwen 3.6 27B)

The image depicts a shamisen, but it contains significant factual inaccuracies. The instrument shown has four tuning pegs and four strings, whereas a traditional shamisen is strictly a three-stringed instrument. Additionally, the bridge and tailpiece assembly appear structurally confused, with the bridge floating oddly and the tailpiece lacking the traditional frog-shaped design. These deviations from the standard three-stringed design make it factually inaccurate. VERDICT: FAIL

A medieval astrolabe with accurate rete, tympan, and engravelocal/flux-2-klein-9bFAIL
A medieval astrolabe with accurate rete, tympan, and engraved degree markings on brass

Evaluator (Qwen 3.6 27B)

The image depicts a brass instrument that resembles an astrolabe but contains significant factual inaccuracies. The text and numbers are garbled, nonsensical, and inconsistent (e.g., mixed Roman/Arabic numerals, random letters like "S", "N", "W" placed illogically). The rete and tympan structures are visually confused and do not match historical astrolabe designs. These errors indicate AI generation rather than a real-world accurate depiction. VERDICT: FAIL

A Japanese kintsugi bowl — cracked ceramic repaired with vislocal/flux-2-klein-9bPASS
A Japanese kintsugi bowl — cracked ceramic repaired with visible gold lacquer lines

Evaluator (Qwen 3.6 27B)

The image depicts a ceramic bowl with visible cracks repaired using gold lacquer, which is the defining characteristic of kintsugi. The subject is clearly recognizable as a Japanese kintsugi bowl. The lighting, texture, and perspective are realistic, and there are no architectural or factual inaccuracies in the representation of the object. VERDICT: PASS

Hard 0%vs0%
A detailed diagram of the water cycle showing evaporation frlocal/flux-2-klein-9bFAIL
A detailed diagram of the water cycle showing evaporation from the ocean, condensation forming clouds, precipitation as rain, and collection in rivers flowing back to the sea

Evaluator (Qwen 3.6 27B)

The image depicts the water cycle, but the text labels are nonsensical gibberish (e.g., "Evapiration fexr," "Contrtiad," "Rennn"). This makes the diagram factually inaccurate and fails to meet the requirement for a "detailed diagram" with recognizable information. VERDICT: FAIL

A cross-section showing tectonic plates at a convergent bounlocal/flux-2-klein-9bFAIL
A cross-section showing tectonic plates at a convergent boundary with subduction, a trench, volcanic arc, and layered mantle

Evaluator (Qwen 3.6 27B)

The image depicts the correct geological structure of a convergent boundary, including subduction, a trench, and a volcanic arc. However, the text labels are nonsensical gibberish (e.g., "Wuame Nats," "Convergent Buiderly") rather than accurate scientific terminology. This indicates a failure in text rendering, making the diagram factually inaccurate as a labeled educational resource. VERDICT: FAIL

An anatomical illustration of the human heart showing four clocal/flux-2-klein-9bFAIL
An anatomical illustration of the human heart showing four chambers, valves, aorta, pulmonary arteries, and veins with correct placement

Evaluator (Qwen 3.6 27B)

The image depicts a human heart, but the labels are nonsensical (e.g., "Aotaa," "Pminulke," "Maidto"), indicating a failure in text rendering. Anatomically, the illustration is also flawed: the aorta is incorrectly shown as a large, bulbous chamber on the left side rather than the central arching vessel. The pulmonary artery and veins are mislabeled and misplaced. The overall structure lacks the precision required for an accurate anatomical illustration. VERDICT: FAIL

Extreme 67%vs67%
A Morpho menelaus butterfly with accurate iridescent blue dolocal/flux-2-klein-9bFAIL
A Morpho menelaus butterfly with accurate iridescent blue dorsal wings, brown ventral wing pattern with eyespots, and correct body morphology

Evaluator (Qwen 3.6 27B)

The image depicts a butterfly with dorsal wings showing iridescent blue and brown edges, and ventral wings with brown patterns and eyespots. However, the Morpho menelaus is known for its entirely iridescent blue dorsal wings without the prominent brown borders seen here. Additionally, the ventral side of Morpho menelaus typically has a more uniform brown pattern with fewer distinct eyespots compared to what is shown. The body morphology also appears slightly off, with an unusual texture and colo

The Hagia Sophia interior showing the massive central dome, local/flux-2-klein-9bPASS
The Hagia Sophia interior showing the massive central dome, pendentives, semi-domes, and Islamic calligraphy medallions alongside Byzantine mosaics

Evaluator (Qwen 3.6 27B)

The image accurately depicts the interior of the Hagia Sophia, featuring the massive central dome, pendentives, semi-domes, and Islamic calligraphy medallions alongside Byzantine mosaics. The architectural elements are correctly represented, and the combination of Islamic and Byzantine art is evident. The details such as the mosaics, calligraphy, and structural features align with the real-world subject. VERDICT: PASS

A specimen of bismuth crystal showing accurate iridescent oxlocal/flux-2-klein-9bPASS
A specimen of bismuth crystal showing accurate iridescent oxide layers, geometric hopper crystal formation, and characteristic staircase structure

Evaluator (Qwen 3.6 27B)

The image depicts a bismuth crystal with high fidelity. It accurately portrays the characteristic geometric hopper crystal formation, featuring the distinct "staircase" structure caused by faster growth at the edges. The iridescent oxide layers are rendered with realistic coloration and banding patterns typical of this mineral. The lighting and texture appear natural, making the subject easily recognizable and factually accurate. VERDICT: PASS

Professional Studio 93%vs82%

Camera & LightingClick to expand92%vs83%

Camera & Lighting compares whether local/flux-2-klein-9b or local/krea-2-turbo more faithfully follows lens, angle, focus, exposure, and lighting instructions.

Easy 100%vs100%
A wide-angle shot of a modern living room interior showing elocal/flux-2-klein-9bPASS
A wide-angle shot of a modern living room interior showing expansive space with slight barrel distortion at the edges

Evaluator (Gemma 4 26B)

The image successfully implements the requested camera effects. It features a wide-angle perspective that emphasizes the expansive space of the modern living room. Most notably, it includes a heavy circular vignette/distortion effect at the edges, which mimics the "slight barrel distortion" requested (though the effect is quite pronounced, it fulfills the intent). The lighting is bright and even, consistent with a modern interior. VERDICT: PASS

A close-up macro shot of a single dewdrop on a rose petal, slocal/flux-2-klein-9bPASS
A close-up macro shot of a single dewdrop on a rose petal, shallow depth of field

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. It features a close-up macro shot of a single, prominent dewdrop on a red rose petal. The depth of field is extremely shallow, with the central droplet in sharp focus while the surrounding petal and other smaller droplets are softly blurred. The lighting is natural and highlights the transparency and refractive properties of the water. VERDICT: PASS

An aerial bird's-eye view looking straight down at a trafficlocal/flux-2-klein-9bPASS
An aerial bird's-eye view looking straight down at a traffic roundabout with cars and trees

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. It features a true aerial bird's-eye view looking straight down at a traffic roundabout. The perspective is flat and top-down, with no vanishing point distortion. The depth of field is consistent with a high-altitude drone shot, keeping the entire scene in sharp focus. The lighting is even and natural, consistent with daylight. VERDICT: PASS

Medium 67%vs100%
A portrait lit with classic Rembrandt lighting — triangle oflocal/flux-2-klein-9bFAIL
A portrait lit with classic Rembrandt lighting — triangle of light on the shadowed cheek, moody atmosphere

Evaluator (Gemma 4 26B)

The image features a portrait with a moody atmosphere and a shallow depth of field. However, it fails the specific lighting requirement. Rembrandt lighting is characterized by a distinct triangle of light on the shadowed cheek. In this image, the light is a broad, harsh side-light that illuminates a large portion of the face, lacking the characteristic small triangle of light on the dark side. VERDICT: FAIL

A landscape photograph bathed in golden hour light — long walocal/flux-2-klein-9bPASS
A landscape photograph bathed in golden hour light — long warm shadows, orange sky, soft glow

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. The lighting is a rich golden hour glow with a vibrant orange sky. The sun is low on the horizon, creating long, soft shadows across the rolling hills and foreground vegetation. The perspective provides a deep sense of scale, and the lighting pattern is consistent with a sunset/sunrise scenario, featuring a soft glow and warm atmospheric haze. VERDICT: PASS

A high-key portrait of a person against a bright white backglocal/flux-2-klein-9bPASS
A high-key portrait of a person against a bright white background, minimal shadows, airy and luminous

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. It features a high-key portrait with a bright white background and a very bright, luminous quality. The lighting is soft and even, resulting in minimal shadows on the subject's face. The perspective is a standard portrait shot, and the overall aesthetic is airy and bright as requested. VERDICT: PASS

Hard 100%vs100%
A portrait shot at 85mm f/1.4 with creamy bokeh background, local/flux-2-klein-9bPASS
A portrait shot at 85mm f/1.4 with creamy bokeh background, backlit with warm rim light separating the subject from a blurred garden

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. The perspective is a classic portrait shot. The depth of field is shallow, creating a creamy bokeh background with large, soft light orbs. Most importantly, the lighting features a strong, warm rim light (backlighting) that creates a glowing halo around the subject's hair and shoulders, effectively separating her from the blurred garden background. VERDICT: PASS

A street photograph with shallow depth of field as if shot alocal/flux-2-klein-9bPASS
A street photograph with shallow depth of field as if shot at 50mm f/1.8, subject sharp with soft city lights in the background

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt's technical requirements. The subject is sharp and in focus, while the background features a shallow depth of field with beautiful, soft bokeh from the city lights. The perspective is consistent with a 50mm lens, and the lighting creates a realistic street photography atmosphere with a clear distinction between the sharp subject and the blurred urban environment. VERDICT: PASS

A food photograph shot from above at 35mm f/2.8 with naturallocal/flux-2-klein-9bPASS
A food photograph shot from above at 35mm f/2.8 with natural window light from the left, slight vignette, warm color temperature

Evaluator (Gemma 4 26B)

The image matches all specified criteria. The perspective is a top-down shot (shot from above). The lighting is natural and directional, coming from the left side, creating soft shadows on the right. The color temperature is warm, and there is a subtle vignette around the edges. The depth of field is consistent with a 35mm f/2.8 setting, with a gentle blur on the edges of the bowl and the wooden background. VERDICT: PASS

Extreme 100%vs33%
A tilt-shift photograph of a city street making real-life lolocal/flux-2-klein-9bPASS
A tilt-shift photograph of a city street making real-life look like a miniature model — selective focus strip, saturated colors

Evaluator (Gemma 4 26B)

The image perfectly captures the tilt-shift effect. It features a distinct selective focus strip where the middle ground is sharp, while the foreground and background are heavily blurred, creating a miniature model aesthetic. The colors are highly saturated, and the high-angle perspective enhances the "toy-like" appearance of the city street. VERDICT: PASS

A cinematic frame with blue and orange anamorphic lens flarelocal/flux-2-klein-9bPASS
A cinematic frame with blue and orange anamorphic lens flare streaking horizontally across a dark moody scene

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. It features a dark, moody scene with a cinematic quality. Most importantly, it displays prominent horizontal anamorphic lens flares in both blue and orange, streaking across the center of the frame. The lighting and perspective align with the requested cinematic aesthetic. VERDICT: PASS

A portrait with dramatic split lighting — half the face briglocal/flux-2-klein-9bPASS
A portrait with dramatic split lighting — half the face brightly lit, the other half in deep shadow, high contrast chiaroscuro

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt's requirements. It features a close-up portrait with dramatic split lighting. One side of the face is brightly illuminated, highlighting skin texture and detail, while the other side is cast in deep, heavy shadow. This creates a high-contrast chiaroscuro effect against a black background. The perspective is a direct, intimate portrait, and the depth of field is shallow, focusing sharply on the eye and facial features. VERDICT: PASS

Color PrecisionClick to expand92%vs92%

Color Precision compares whether local/flux-2-klein-9b or local/krea-2-turbo renders requested colors more clearly and consistently across subjects, products, and backgrounds.

Easy 100%vs100%
A bright red sports car parked on a gray asphalt road — the local/flux-2-klein-9bPASS
A bright red sports car parked on a gray asphalt road — the red is vivid and unambiguous

Evaluator (Gemma 4 26B)

The image features a bright, vivid red sports car parked on a gray asphalt road. The red color of the car is highly saturated, intense, and unambiguous, exactly as described in the prompt. The gray of the road is also clearly visible. All color requirements are perfectly met. VERDICT: PASS

A woman wearing a royal blue evening dress against a neutrallocal/flux-2-klein-9bPASS
A woman wearing a royal blue evening dress against a neutral background

Evaluator (Gemma 4 26B)

The image features a woman wearing a vibrant, deep blue evening dress. The color of the dress is a true royal blue, and the background is a solid, neutral beige/tan color. Both the dress color and the background color strictly adhere to the prompt's specifications. VERDICT: PASS

A field of pure sunflower yellow flowers under a clear blue local/flux-2-klein-9bPASS
A field of pure sunflower yellow flowers under a clear blue sky

Evaluator (Gemma 4 26B)

The image perfectly matches the color requirements specified in the prompt. The flowers are a vibrant, pure sunflower yellow, and the sky is a clear, bright blue. There are no deviations in the color palette from the requested "sunflower yellow" and "clear blue." VERDICT: PASS

Medium 67%vs100%
A bedroom wall painted in #FF6B35 (burnt orange) with white local/flux-2-klein-9bPASS
A bedroom wall painted in #FF6B35 (burnt orange) with white furniture and natural daylight

Evaluator (Gemma 4 26B)

The image features a bedroom with a prominent accent wall in a vibrant burnt orange color. The color closely matches the hex code #FF6B35, which is a saturated, warm orange-red. The furniture, including the bed frame, nightstands, and lampshades, is white as requested. The lighting is bright and natural, consistent with the "natural daylight" requirement. VERDICT: PASS

A forest scene where the foliage is precisely #2D5F2D (dark local/flux-2-klein-9bFAIL
A forest scene where the foliage is precisely #2D5F2D (dark muted green) with brown tree trunks

Evaluator (Gemma 4 26B)

The foliage in the image consists of various shades of green, ranging from very light, pale mint/teal in the background to darker, more saturated greens in the foreground. The hex code #2D5F2D represents a specific, dark, muted forest green. While the image contains dark green elements, it also features many lighter, brighter, and more vibrant green tones that do not match the specific, singular color requirement of the prompt. VERDICT: FAIL

A room with walls in #4A90D9 (medium blue) with warm wooden local/flux-2-klein-9bPASS
A room with walls in #4A90D9 (medium blue) with warm wooden floor and white trim

Evaluator (Gemma 4 26B)

The image accurately reflects all color specifications. The walls are a medium blue that closely matches the hex code #4A90D9. The floor is a warm-toned wood, and the baseboard/ceiling trim is white. All elements of the color prompt are present and correctly rendered. VERDICT: PASS

Hard 100%vs67%
A minimalist room with walls in #F5E6CC (warm cream), a sofalocal/flux-2-klein-9bPASS
A minimalist room with walls in #F5E6CC (warm cream), a sofa in #2C3E50 (dark navy), and accent pillows in #E74C3C (bright red)

Evaluator (Gemma 4 26B)

The image accurately reflects the specified colors: - The walls are a warm cream/off-white, matching #F5E6CC. - The sofa is a dark navy blue, matching #2C3E50. - The accent pillows are a bright, vibrant red, matching #E74C3C. All color requirements from the prompt are met precisely. VERDICT: PASS

A product flat-lay on a #1A1A2E (deep dark blue) background,local/flux-2-klein-9bPASS
A product flat-lay on a #1A1A2E (deep dark blue) background, with a #F0C27F (soft gold) watch and #FFFFFF (pure white) earbuds

Evaluator (Gemma 4 26B)

The image matches the color requirements. The background is a deep dark blue consistent with #1A1A2E. The watch features a soft gold/champagne color (#F0C27F) on the face and a tan strap. The earbuds and their charging case are pure white (#FFFFFF). All specified colors are accurately represented in the flat-lay composition. VERDICT: PASS

A graphic design composition with background #FAFAFA (near-wlocal/flux-2-klein-9bPASS
A graphic design composition with background #FAFAFA (near-white), main shape #FF4500 (orange-red), and accent lines #333333 (dark charcoal)

Evaluator (Gemma 4 26B)

The image accurately follows the color specifications. The background is a near-white (#FAFAFA), the main central shape is a vibrant orange-red (#FF4500), and the thin accent lines are a dark charcoal color (#333333). All three requested colors are clearly visible and match the prompt's descriptions. VERDICT: PASS

Extreme 100%vs100%
A smooth gradient on a wall going from #FF0000 (red) on the local/flux-2-klein-9bPASS
A smooth gradient on a wall going from #FF0000 (red) on the left to #0000FF (blue) on the right, transitioning evenly through purple

Evaluator (Gemma 4 26B)

The image displays a smooth horizontal gradient. On the far left, the color is a vibrant red (#FF0000). As it moves toward the right, it transitions through a deep purple/magenta middle section and ends in a bright blue (#0000FF) on the far right. The transition is even and matches the specified color sequence and hex values perfectly. VERDICT: PASS

A sunset sky gradient on a flat surface from #FF6B35 (orangelocal/flux-2-klein-9bPASS
A sunset sky gradient on a flat surface from #FF6B35 (orange) at the bottom to #1B1464 (deep indigo) at the top

Evaluator (Gemma 4 26B)

The image features a vertical gradient. The bottom color is a bright, saturated orange, which aligns well with the requested #FF6B35. The top of the image transitions into a very dark, near-black shade. While the prompt requested a deep indigo (#1B1464), the top of the image appears almost black, which is a common way for AI to interpret very dark indigo in a gradient. The overall color progression matches the intent. VERDICT: PASS

A horizontal banner with a gradient from #00C9FF (cyan) on tlocal/flux-2-klein-9bPASS
A horizontal banner with a gradient from #00C9FF (cyan) on the left to #92FE9D (mint green) on the right

Evaluator (Gemma 4 26B)

The image shows a horizontal banner with a gradient. On the left, the color is a bright cyan, which matches the description of #00C9FF. On the right, the color is a light, pale mint green, which matches the description of #92FE9D. The transition between the two colors is a smooth gradient as requested. VERDICT: PASS

PhotorealismClick to expand100%vs33%

Photorealism compares whether local/flux-2-klein-9b or local/krea-2-turbo produces more polished product and studio imagery with realistic materials, lighting, shadows, and finish.

Easy 100%vs33%
A single red sneaker on a clean white background with soft slocal/flux-2-klein-9bPASS
A single red sneaker on a clean white background with soft studio lighting and gentle shadow — product photography style

Evaluator (Gemma 4 26B)

The image successfully achieves the studio-quality look requested. The lighting is soft and consistent, creating a natural, gentle shadow beneath the shoe. The textures of the leather and laces are sharp and realistic. There are no obvious AI artifacts, such as warped logos, nonsensical stitching, or melting shapes. The white background is clean, and the overall composition is professional and high-quality. VERDICT: PASS

A clear glass perfume bottle on a reflective black surface wlocal/flux-2-klein-9bPASS
A clear glass perfume bottle on a reflective black surface with studio lighting, no background distractions

Evaluator (Gemma 4 26B)

The image successfully captures the requested aesthetic. The glass textures, liquid transparency, and reflections on the black surface are highly realistic. There are no obvious AI artifacts; the geometry of the bottle and the way the straw/tube behaves within the liquid look physically accurate. The lighting is clean and professional, meeting the "studio-quality" requirement perfectly. VERDICT: PASS

A ripe green apple on a seamless white background with soft local/flux-2-klein-9bPASS
A ripe green apple on a seamless white background with soft side lighting and subtle shadow — catalog style

Evaluator (Gemma 4 26B)

The image successfully meets the prompt requirements. The green apple features realistic textures, including subtle lenticels and a natural-looking stem. The lighting is soft and directional, creating a realistic highlight on the skin and a soft, natural shadow on the white background. There are no obvious AI artifacts, such as warping or unnatural textures, and it successfully mimics a high-quality studio product photograph. VERDICT: PASS

Graphical design 58%vs54%

Data VisualisationClick to expand0%vs0%

Data Visualisation compares whether local/flux-2-klein-9b or local/krea-2-turbo produces charts with more correct labels, proportions, colors, and readable values.

Easy 0%vs0%
A bar chart showing monthly sales for January through June wlocal/flux-2-klein-9bFAIL
A bar chart showing monthly sales for January through June with values 12K, 18K, 15K, 22K, 19K, 25K — labeled axes, color-coded bars, and a title 'Monthly Sales 2025'

Evaluator (Qwen 3 VL)

The chart mislabels months (e.g., “Jarry” for January, “Jule” for June) and shows incorrect values (29K instead of 25K). The y-axis label is garbled (“Aaunegauty”), and the legend (“Salles”) doesn’t match the chart’s color scheme. The requested values and labels are not correctly represented. VERDICT: FAIL

A pie chart showing market share: Company A 35%, Company B 2local/flux-2-klein-9bFAIL
A pie chart showing market share: Company A 35%, Company B 25%, Company C 20%, Others 20% — each slice a different color with labels and percentages

Evaluator (Qwen 3 VL)

The chart is a pie chart with correct percentages (35%, 25%, 20%, 20%) and labels. However, “Company B” is mislabeled as “Ompany C” in its slice, and “Company C” is incorrectly labeled as “Company C 20%” in a slice that visually belongs to “Others.” The chart type matches the request, but critical labeling errors invalidate accuracy. VERDICT: FAIL

A simple line graph showing daily website visitors over one local/flux-2-klein-9bFAIL
A simple line graph showing daily website visitors over one week (Mon-Sun): 1200, 1500, 1100, 1800, 2000, 1700, 1400 — with grid lines, axis labels, and title 'Weekly Traffic'

Evaluator (Qwen 3 VL)

The chart shows a line graph titled "Weekly Traffic" with grid lines and axis labels. However, the x-axis labels are incorrect (showing "Daik" and repeated values like 1800, 2000), and the y-axis values are misaligned with the data points. The requested data sequence (1200, 1500, 1100, 1800, 2000, 1700, 1400) is not accurately represented. VERDICT: FAIL

Layout & DesignClick to expand44%vs33%

Layout & Design compares whether local/flux-2-klein-9b or local/krea-2-turbo produces the cleaner requested composition with the right hierarchy and required elements.

Easy 100%vs100%
A movie night poster with a large popcorn image in the centelocal/flux-2-klein-9bPASS
A movie night poster with a large popcorn image in the center, the title 'MOVIE NIGHT' in bold at the top, and the tagline 'Every Friday at 8 PM' at the bottom

Evaluator (Qwen 3 VL)

The image features a large popcorn bucket centered, with "MOVIE NIGHT" in bold yellow at the top and "Every Friday at 8PM" in white at the bottom. Text is readable and correctly placed per the prompt. Visual hierarchy is clear, with the popcorn as the focal point and text anchoring top and bottom. All required layout elements are present and accurately rendered. VERDICT: PASS

An event poster featuring a guitar illustration, the headlinlocal/flux-2-klein-9bPASS
An event poster featuring a guitar illustration, the headline 'LIVE MUSIC FEST' in large text, and 'Tickets available now' as a tagline below

Evaluator (Qwen 3 VL)

The poster features a guitar illustration integrated into the headline "LIVE MUSIC FEST," with "Tickets available now" correctly placed below. Text is readable and properly sized, maintaining visual hierarchy. All required elements are present and correctly positioned. VERDICT: PASS

A book club poster with an open book illustration, the titlelocal/flux-2-klein-9bPASS
A book club poster with an open book illustration, the title 'JOIN THE BOOK CLUB' prominently displayed, and 'Meets every Wednesday' as the tagline

Evaluator (Qwen 3 VL)

The poster features an open book illustration centered below the title "JOIN THE BOOK CLUB," which is prominently displayed in large, bold text. The tagline "Meets every Wednesday" is clearly visible and correctly placed beneath the book. Visual hierarchy is maintained with appropriate sizing and positioning. All required elements are present and legible. VERDICT: PASS

Medium 0%vs0%
A landing page design with a hero header image at the top, alocal/flux-2-klein-9bFAIL
A landing page design with a hero header image at the top, a headline and subtitle in the middle, a 'Sign Up' call-to-action button, and a footer with three social media icons

Evaluator (Qwen 3 VL)

The image shows a hero header with a person using a laptop, followed by a headline, subtitle, and “Sign Up” button — all correctly placed. The footer contains three social media icons. However, the text is gibberish (placeholder), making it unreadable and violating the prompt’s requirement for readable text. Layout structure is correct, but text fails readability. VERDICT: FAIL

A newsletter layout with a banner image header, a two-columnlocal/flux-2-klein-9bFAIL
A newsletter layout with a banner image header, a two-column body section with text on the left and an image on the right, a highlighted quote block, and a footer with unsubscribe link

Evaluator (Qwen 3 VL)

The image shows a newsletter layout with a header banner, two-column body (text left, image right), a highlighted quote block, and a footer unsubscribe link. However, all text is gibberish placeholder text, making it unreadable. The visual hierarchy is present but meaningless due to illegible content. VERDICT: FAIL

A restaurant menu layout with a logo header, three category local/flux-2-klein-9bFAIL
A restaurant menu layout with a logo header, three category sections (appetizers, mains, desserts) each with items and prices, and a footer with address and hours

Evaluator (Qwen 3 VL)

The image shows a menu with a logo header, three category sections (Appetizers, Mains, Desserts), and a footer with address/hours. However, all text is gibberish, unreadable placeholder text, not actual menu items or correct information. The layout structure is present but content is invalid. VERDICT: FAIL

Hard 33%vs0%
A magazine double-page spread with a left page containing a local/flux-2-klein-9bPASS
A magazine double-page spread with a left page containing a large hero photo, and a right page with a two-column article, a pull quote, a sidebar with author bio, and a page number

Evaluator (Qwen 3 VL)

The left page features a large hero photo of a woman. The right page has a two-column article, a pull quote (“Pullbäckan tänd...”), a sidebar with “Ramme för identitet” and author bio, and page numbers (13, 14). Text is readable and correctly placed per layout. All described zones are present with appropriate visual hierarchy. VERDICT: PASS

A mobile app UI mockup showing a dashboard with a top navigalocal/flux-2-klein-9bFAIL
A mobile app UI mockup showing a dashboard with a top navigation bar, a search field, a 2x2 grid of feature cards with icons, a list of recent activity items, and a bottom tab bar

Evaluator (Qwen 3 VL)

The UI includes a top navigation bar, search field, 2x2 grid of feature cards with icons, recent activity list, and bottom tab bar. Text is present but mostly garbled and unreadable. Layout hierarchy is structurally correct, but illegible text violates core usability requirements. VERDICT: FAIL

A fashion magazine spread with a full-bleed left-page photo,local/flux-2-klein-9bFAIL
A fashion magazine spread with a full-bleed left-page photo, right page with three-column layout, headline, body copy, a smaller inset image, and styled page numbers

Evaluator (Qwen 3 VL)

Left page has full-bleed photo; right page shows three-column layout with headline, body copy, inset image, and page numbers. Text is present but illegible due to gibberish placeholder characters. Visual hierarchy is structurally correct, but readability fails. Layout elements are present as described. VERDICT: FAIL

Style DiversityClick to expand83%vs83%

Style Diversity compares whether local/flux-2-klein-9b or local/krea-2-turbo better shifts between requested visual styles like oil painting, pixel art, watercolor, and technical illustration.

Easy 67%vs67%
A golden retriever sitting in a garden, painted in the stylelocal/flux-2-klein-9bPASS
A golden retriever sitting in a garden, painted in the style of an oil painting with visible brushstrokes and rich texture

Evaluator (Qwen 3.5 122B)

The image clearly depicts a golden retriever in a garden setting. The style is unmistakably that of an oil painting, featuring thick, visible brushstrokes (impasto technique), rich color blending, and textured surfaces that mimic canvas work. The lighting and composition further enhance the painterly aesthetic. All key visual characteristics requested are present and well-executed. VERDICT: PASS

A golden retriever sitting in a garden, rendered in Japaneselocal/flux-2-klein-9bPASS
A golden retriever sitting in a garden, rendered in Japanese anime style with flat colors and large expressive eyes

Evaluator (Qwen 3.5 122B)

The image clearly exhibits the requested Japanese anime style. Key characteristics are present: the golden retriever has large, expressive eyes with prominent highlights, simplified facial features, and clean line art. The colors are flat and vibrant without complex shading gradients, consistent with anime aesthetics. The overall composition and rendering match the prompt’s stylistic requirements. VERDICT: PASS

A golden retriever sitting in a garden, rendered in pixel arlocal/flux-2-klein-9bFAIL
A golden retriever sitting in a garden, rendered in pixel art style with a 32x32 grid and limited color palette

Evaluator (Qwen 3.5 122B)

The image clearly depicts a golden retriever in a garden setting. The pixel art style is unmistakable, with visible pixelation and blocky forms throughout. However, the grid appears much finer than 32x32 — likely 64x64 or higher — and the color palette is not particularly limited, showing many shades of green, pink, yellow, and brown. While the core style is recognizable, the specific technical constraints (32x32 grid, limited palette) are not accurately met. VERDICT: FAIL

Medium 100%vs100%
A city street scene in the style of Art Deco — geometric shalocal/flux-2-klein-9bPASS
A city street scene in the style of Art Deco — geometric shapes, gold and black palette, ornamental symmetry

Evaluator (Qwen 3.5 122B)

The image clearly exhibits Art Deco characteristics: strong geometric shapes, ornamental symmetry, and a gold/black palette. The buildings feature stepped forms and stylized patterns typical of the style. The central emblem, decorative border, and streamlined vehicles reinforce the aesthetic. While some text elements appear nonsensical (common in AI art), the core visual style is accurately rendered. VERDICT: PASS

A woman with a parasol in the style of Impressionism — soft local/flux-2-klein-9bPASS
A woman with a parasol in the style of Impressionism — soft brushstrokes, dappled light, pastel palette

Evaluator (Qwen 3.5 122B)

The image clearly exhibits Impressionist characteristics: soft, visible brushstrokes throughout the foliage and dress; dappled sunlight filtering through trees onto the subject and ground; and a pastel palette dominated by pinks, yellows, and muted greens. The overall effect is painterly and atmospheric, matching the requested style precisely. VERDICT: PASS

A portrait of a man in Pop Art style — bold outlines, Ben-Dalocal/flux-2-klein-9bPASS
A portrait of a man in Pop Art style — bold outlines, Ben-Day dots, flat vivid primary colors

Evaluator (Qwen 3.5 122B)

The image clearly exhibits Pop Art characteristics: bold black outlines define the figure, flat vivid colors (yellow skin, blue eyes, magenta/purple hair highlights) dominate, and Ben-Day dots are prominently used for shading and texture across the face and background. The composition is stylized and graphic, matching the requested aesthetic precisely. All key visual elements of the specified style are present and well-executed. VERDICT: PASS

Hard 67%vs67%
A forest rendered in photorealistic watercolor — the detail local/flux-2-klein-9bFAIL
A forest rendered in photorealistic watercolor — the detail and lighting of a photo combined with visible watercolor bleeds and paper texture

Evaluator (Qwen 3.5 122B)

The image clearly exhibits watercolor characteristics: visible paper texture, soft color bleeds, and translucent layering typical of the medium. However, it lacks the "photorealistic" detail and lighting requested. The rendering is stylized and illustrative rather than photo-realistic. While the watercolor style is present, the core requirement of combining it with photorealism is not met. VERDICT: FAIL

A woman in a futuristic city rendered in cyberpunk Art Nouvelocal/flux-2-klein-9bPASS
A woman in a futuristic city rendered in cyberpunk Art Nouveau — neon colors and tech elements combined with flowing organic lines and floral borders

Evaluator (Qwen 3.5 122B)

The image successfully combines cyberpunk and Art Nouveau styles. The central figure and background feature neon colors, futuristic tech elements, and a cyberpunk cityscape. The composition is framed by distinct flowing organic lines and prominent floral borders, fulfilling the Art Nouveau requirement. The requested style is clearly recognizable with all key visual characteristics present. VERDICT: PASS

A medieval castle depicted in Japanese ukiyo-e woodblock prilocal/flux-2-klein-9bPASS
A medieval castle depicted in Japanese ukiyo-e woodblock print style combined with photorealistic sky and lighting

Evaluator (Qwen 3.5 122B)

The image successfully combines ukiyo-e woodblock print aesthetics (flat colors, stylized clouds, Japanese text, textured paper effect) with a photorealistic sky gradient and lighting. The castle architecture blends medieval European elements with Japanese roof styles, fitting the hybrid prompt. Key visual characteristics of both styles are clearly present and well-integrated. VERDICT: PASS

Extreme 100%vs100%
An interior scene rendered in soft brutalism — raw concrete local/flux-2-klein-9bPASS
An interior scene rendered in soft brutalism — raw concrete textures combined with rounded pastel furniture and warm diffused light

Evaluator (Qwen 3.5 122B)

The image clearly exhibits soft brutalism: raw concrete walls and ceiling with visible formwork marks, paired with rounded, pastel-colored furniture in muted pink and green tones. Warm diffused lighting enhances the cozy yet industrial atmosphere. All key visual characteristics from the prompt are present and well-executed. VERDICT: PASS

A still life painted in anxious minimalism — sparse compositlocal/flux-2-klein-9bPASS
A still life painted in anxious minimalism — sparse composition with uncomfortable negative space, muted colors, and subtle visual tension

Evaluator (Qwen 3.5 122B)

The image successfully embodies the requested style of "anxious minimalism." The composition is sparse, featuring only a cup, a small fruit, and a paper on a table against a vast, empty wall. The negative space dominates the upper two-thirds, creating the requested sense of unease or tension. The color palette is muted (grays, beiges, soft browns), and the brushwork appears deliberate yet slightly raw, enhancing the subtle visual tension. The key characteristics are clearly present. VERDICT: PA

A cityscape rendered in nostalgic futurism — retro-futuristilocal/flux-2-klein-9bPASS
A cityscape rendered in nostalgic futurism — retro-futuristic 1960s space-age aesthetics with a melancholic warm-toned patina

Evaluator (Qwen 3.5 122B)

The image successfully embodies the requested "nostalgic futurism" style. Key characteristics are clearly present: the architecture features classic 1960s space-age elements like saucer-shaped structures and sleek spires. The color palette is dominated by a warm, sepia-toned patina that evokes nostalgia. Additionally, the inclusion of multiple moons, flying saucers, and the vintage paper texture overlay perfectly aligns with the retro-futuristic aesthetic. VERDICT: PASS