ImageBench

vs

192 evaluations across 6 categories

74%vs95%
Pass Rate
3.8svs28.1s
Avg Latency
Text Rendering40%87%Spatial Reasoning68%97%Human realism95%98%Truthfulness82%93%Professional Studio93%100%Graphical design42%92%Latency61%2%
The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink
Text RenderingTypography StyleEasybfl/flux-2-klein-4b

Prompt: The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink

Text Rendering 40%vs87%

Typography Style100%vs100%
Easy 100%vs100%
The word 'CHAPTER ONE' typed on aged paper with a vintage tybfl/flux-2-klein-4bPASS
The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink

Evaluator (Qwen 3 VL)

The text “CHAPTER ONE” is spelled correctly. The font style resembles a vintage typewriter font — blocky, with slightly uneven, textured ink that mimics the look of an old typewriter. The paper has visible creases, stains, and aging effects, matching the “aged paper” description. The ink appears slightly smudged and uneven in places, consistent with the prompt’s request for “slightly uneven ink.” All elements of the prompt are convincingly rendered. VERDICT: PASS

The phrase 'TODAY'S SPECIAL' written in white chalk on a darbfl/flux-2-klein-4bPASS
The phrase 'TODAY'S SPECIAL' written in white chalk on a dark green chalkboard with chalk dust

Evaluator (Qwen 3 VL)

The text “TODAY'S SPECIAL” is rendered in a style that convincingly mimics white chalk on a dark green chalkboard. The lettering shows the characteristic uneven, textured appearance of chalk writing, with visible smudges and dust particles around the edges, which aligns with the prompt’s description of “chalk dust.” The background is a dark green surface with faint scratches and wear, consistent with a chalkboard. The spelling is correct: “TODAY'S SPECIAL” (with proper apostrophe). VERDICT: PAS

The word 'OPEN' displayed on a red LED sign with glowing pixbfl/flux-2-klein-4bPASS
The word 'OPEN' displayed on a red LED sign with glowing pixel-style letters

Evaluator (Qwen 3 VL)

The text "OPEN" is spelled correctly. The style matches the described medium: it is rendered in glowing, pixel-style letters composed of individual red LED lights, consistent with a classic LED sign. The pixelated, dot-matrix appearance and bright red glow convincingly simulate an LED display. VERDICT: PASS

Writing accuracy25%vs83%
Easy 100%vs100%
A single large word 'HELLO' centered on a white backgroundbfl/flux-2-klein-4bPASS
A single large word 'HELLO' centered on a white background

Evaluator (Qwen 3 VL)

The word displayed is "HELLO", which is spelled correctly and is fully readable. All letters are clearly formed and legible against the white background. There are no misspellings, missing letters, or illegible parts. VERDICT: PASS

The word 'ADVENTURE' in big bold letters centered on a plainbfl/flux-2-klein-4bPASS
The word 'ADVENTURE' in big bold letters centered on a plain cream background

Evaluator (Qwen 3 VL)

The word "ADVENTURE" is spelled correctly and is fully readable. All letters are clearly visible and properly formed. There are no misspellings, missing words, or illegible parts. VERDICT: PASS

The word 'SERENITY' written large and centered on a solid libfl/flux-2-klein-4bPASS
The word 'SERENITY' written large and centered on a solid light blue background

Evaluator (Qwen 3 VL)

The word displayed is "SERENITY", which is spelled correctly and is fully readable. There are no misspellings, missing words, or illegible parts. The text is large and centered on a solid light blue background, matching the intended description. VERDICT: PASS

Medium 0%vs100%
The sentence 'The quick brown fox jumps high' in clean blackbfl/flux-2-klein-4bFAIL
The sentence 'The quick brown fox jumps high' in clean black text on a white background

Evaluator (Qwen 3 VL)

The intended text was: "The quick brown fox jumps high" Actual text in image: - Line 1: "The" — correct. - Line 2: "aruckbon fox" — "aruckbon" is a misspelling of "quick brown". - Line 3: "yumps high" — "yumps" is a misspelling of "jumps". All words are fully readable, but two words are misspelled. VERDICT: FAIL

'Every sunset brings a new dawn ahead' written neatly on a pbfl/flux-2-klein-4bFAIL
'Every sunset brings a new dawn ahead' written neatly on a pale yellow background

Evaluator (Qwen 3 VL)

The intended text is: “Every sunset brings a new dawn ahead”. In the image, the text reads: - “Every sanst” → “sanst” is a misspelling of “sunset”. - “brings rrigs a” → “rrigs” is a misspelling of “rings”, but “rings a” is incorrect; it should be “brings a”. - “new down” → “down” is a misspelling of “dawn”. - “ahead” → correct. Misspellings and errors: - “sanst” → should be “sunset” - “rrigs” → should be “rings” (but contextually, it should be “brings a”, so “rrigs” is incorrect) - “down” → sh

'Stars shine brightest in the dark night' displayed on a clebfl/flux-2-klein-4bFAIL
'Stars shine brightest in the dark night' displayed on a clean off-white background

Evaluator (Qwen 3 VL)

The intended text is: "Stars shine brightest in the dark night". In the image, the following errors are present: - "brigtst" is misspelled — should be "brightest". - "th" is missing the letter "e" — should be "the". - "niglht" is misspelled — should be "night". Additionally, the word "in" appears twice — once on the second line and again on the third line — which is redundant and incorrect. VERDICT: FAIL

Hard 0%vs33%
A poster with headline 'SUMMER SALE' in large bold text, bodbfl/flux-2-klein-4bFAIL
A poster with headline 'SUMMER SALE' in large bold text, body text 'Up to 50% off all items this weekend only' in medium size, and fine print 'Terms and conditions apply. See store for details.' at the bottom in small text

Evaluator (Qwen 3 VL)

The text in the image contains several errors: - “itms” is a misspelling of “items”. - “thms” is a misspelling of “items” (same word, repeated error). - “wemmond” is a misspelling of “weekend”. - The word “this” before “weekend” is correct, but the following word is misspelled. The fine print “Terms and conditions apply. See store for details.” is correct. VERDICT: FAIL

A flyer with large title 'GRAND OPENING', subtitle 'Join us bfl/flux-2-klein-4bFAIL
A flyer with large title 'GRAND OPENING', subtitle 'Join us for food, music, and fun' in medium text, and small disclaimer 'RSVP required. Limited seating available.' at the bottom

Evaluator (Qwen 3 VL)

The text in the image contains several errors: - "Join usr food, music, and fun" → "usr" is a misspelling of "us". - "RSVP required. Limited seetting available." → "seetting" is a misspelling of "seating". These are clear misspellings and are fully legible, so the errors are not due to illegibility but incorrect spelling. VERDICT: FAIL

An event banner with headline 'ANNUAL GALA 2025' at the top bfl/flux-2-klein-4bFAIL
An event banner with headline 'ANNUAL GALA 2025' at the top in large serif font, description 'An evening of fine dining and live jazz' in medium text, and footnote 'Black tie optional. Doors open at 6 PM.' in small text

Evaluator (Qwen 3 VL)

The text in the image contains several misspellings and errors compared to the intended text: 1. "GALA" is incorrectly rendered as "G-LA" — the "A" is missing, and the "L" is misaligned or malformed. 2. "optional" is misspelled as "optninal" — missing the "o" and "a", and the "l" is misshapen. 3. "open" is misspelled as "opea" — the "n" is missing, and the "a" is misshapen. 4. The word "optional" is also missing the final "l" in "optninal", making it incomplete. These errors significantly affe

Extreme 0%vs100%
A vintage movie poster for 'THE LAST VOYAGE' featuring a tagbfl/flux-2-klein-4bFAIL
A vintage movie poster for 'THE LAST VOYAGE' featuring a tagline 'One ship. One chance. No turning back.' with credits block reading 'Directed by James Harlow. Starring Elena Voss, Marcus Chen, and Sofia Reyes. Music by Thomas Grant. A Paramount Pict

Evaluator (Qwen 3 VL)

The intended text was: "A vintage movie poster for 'THE LAST VOYAGE' featuring a tagline 'One ship. One chance. No turning back.' with credits block reading 'Directed by James Harlow. Starring Elena Voss, Marcus Chen, and Sofia Reyes. Music by Thomas Grant. A Paramount Pictures Release. Coming Summer 2025.' integrated realistically into the poster design" Actual text in image: - Title: “THE LAST VOYAGIE” — “VOYAGE” is misspelled (missing ‘E’). - Tagline: “One ship. No truhiing back.” — “truh

A magazine cover titled 'NATIONAL EXPLORER' with cover linesbfl/flux-2-klein-4bFAIL
A magazine cover titled 'NATIONAL EXPLORER' with cover lines 'The Hidden Temples of Cambodia — page 34', 'Interview: Dr. Maria Santos on Climate Action', 'Top 10 Trails You Must Hike This Year', and a barcode area with 'Issue 247 | March 2025 | $6.99

Evaluator (Qwen 3 VL)

The text on the magazine cover contains multiple errors: 1. “NATIOJNAL” — misspelled; should be “NATIONAL” 2. “Campanire” — misspelled; should be “Cambodia” 3. “Sanslatoss” — misspelled; should be “Maria Santos” 4. “Climatt” — misspelled; should be “Climate” 5. “Toop 10” — misspelled; should be “Top 10” 6. “Ttaals” — misspelled; should be “Trails” 7. “Top O Ma” — misspelled; should be “Top 10” 8. “Hiik” — misspelled; should be “Hike” 9. “Issue A X 47” — incorrect; should be “Issue 247” 10. “/€6

A product label for 'GOLDEN HARVEST ORGANIC HONEY' with subtbfl/flux-2-klein-4bFAIL
A product label for 'GOLDEN HARVEST ORGANIC HONEY' with subtitle 'Raw & Unfiltered', weight '12 oz (340g)', description 'Pure wildflower honey harvested from sustainable apiaries in the Pacific Northwest. No additives or preservatives.', and small te

Evaluator (Qwen 3 VL)

The text on the label contains multiple misspellings and errors compared to the intended text: - "GOLD EiNN" — "EiNN" should be "EN" (missing "N" in "EINN" → "GOLDEN") - "Unfiluted" — should be "Unfiltered" - "12aw & Unz" — should be "12 oz" (aw → oz, Unz → Unz is incorrect; should be "oz") - "wildflowe honny" — should be "wildflower honey" - "habircsed" — should be "harvested" - "sustainmable" — should be "sustainable" - "Pafficu Notskennt" — should be "Pacific Northwest" - "No aduatites or pe

Spatial Reasoning 68%vs97%

Attributes Binding67%vs100%
Easy 100%vs100%
A red ball and a blue cube sitting side by side on a white sbfl/flux-2-klein-4bPASS
A red ball and a blue cube sitting side by side on a white surface

Evaluator (Qwen 3.5 122B)

Object 1: Red ball — observed attributes: color = red, shape = sphere, size = small (consistent with prompt). Object 2: Blue cube — observed attributes: color = blue, shape = cube, size = small (consistent with prompt). All attributes (color, shape) are correctly bound to the right object with no swaps. The objects are sitting side by side on a white surface as specified. VERDICT: PASS

A green apple and an orange pumpkin on a wooden cutting boarbfl/flux-2-klein-4bPASS
A green apple and an orange pumpkin on a wooden cutting board

Evaluator (Qwen 3.5 122B)

Object 1: Green apple - Color: Green (correct) - Shape: Round (correct) - Size: Medium (correct) Object 2: Orange pumpkin - Color: Orange (correct) - Shape: Ribbed, pumpkin-shaped (correct) - Size: Small (correct) All attributes are correctly bound to the right object with no swaps. VERDICT: PASS

A yellow taxi and a black limousine parked next to each othebfl/flux-2-klein-4bPASS
A yellow taxi and a black limousine parked next to each other on a street

Evaluator (Qwen 3.5 122B)

Object 1: Yellow taxi - Color: Yellow (correct) - Shape: Sedan (standard taxi shape, not limousine) - Size: Standard size for a taxi (not a limousine) Object 2: Black limousine - Color: Black (correct) - Shape: Limousine (elongated body, correct) - Size: Larger than standard sedan (correct for limousine) All attributes are correctly bound to the right object. The yellow taxi is indeed yellow and has a sedan shape, and the black vehicle is a limousine with the correct color and elongated size. VE

Medium 67%vs100%
A large red circle, a small blue triangle, and a medium greebfl/flux-2-klein-4bFAIL
A large red circle, a small blue triangle, and a medium green star arranged on a white canvas

Evaluator (Qwen 3.5 122B)

Observed objects and their attributes: - Large red circle: ✅ Present. Correct color (red), correct shape (circle), correct size (large relative to the other objects). - Small blue triangle: ❌ Not present. There is no triangle at all. The prompt required a small blue triangle, but none exists. - Medium green star: ✅ Present. Correct color (green), correct shape (star), correct size (medium relative to the circle). The prompt required three specific objects: a large red circle, a small blue triang

A tall thin white candle, a short fat red candle, and a medibfl/flux-2-klein-4bPASS
A tall thin white candle, a short fat red candle, and a medium spiral yellow candle on a shelf

Evaluator (Qwen 3.5 122B)

Object 1: Tall thin white candle — Observed: White color, tall and thin shape. ✅ Matches prompt. Object 2: Short fat red candle — Observed: Red color, short and fat (cylindrical) shape. ✅ Matches prompt. Object 3: Medium spiral yellow candle — Observed: Yellow color, spiral (coiled) shape. ✅ Matches prompt. All three objects have their specified color, size, and shape correctly assigned with no swaps. VERDICT: PASS

A big brown horse, a small white rabbit, and a medium-sized bfl/flux-2-klein-4bPASS
A big brown horse, a small white rabbit, and a medium-sized black cat standing in a meadow

Evaluator (Qwen 3.5 122B)

Object: Horse - Color: Brown (matches "big brown horse") - Size: Big (matches "big brown horse") - Shape: Horse-shaped (correct animal form) Object: Rabbit - Color: White (matches "small white rabbit") - Size: Small (matches "small white rabbit") - Shape: Rabbit-shaped (correct animal form) Object: Cat - Color: Black (matches "medium-sized black cat") - Size: Medium-sized (matches "medium-sized black cat") - Shape: Cat-shaped (correct animal form) All attributes (color, size, shape) are correctl

Hard 33%vs100%
A pink fire truck, a blue banana, and a green sun in a surrebfl/flux-2-klein-4bFAIL
A pink fire truck, a blue banana, and a green sun in a surreal landscape

Evaluator (Qwen 3.5 122B)

Object: Fire truck → Color: Pink → Shape: Standard fire truck shape → Correctly bound. Object: Banana (floating in sky) → Color: Yellow → Shape: Curved banana → Incorrectly bound (should be blue, not yellow). Object: Sun → Color: Green → Shape: Circular with yellow spikes → Correctly bound. The prompt required a blue banana, but the floating banana is yellow. This is a swap of color attribute. VERDICT: FAIL

A purple carrot, a red tree with red leaves, and a white watbfl/flux-2-klein-4bPASS
A purple carrot, a red tree with red leaves, and a white watermelon sliced open

Evaluator (Qwen 3.5 122B)

Object: Purple carrot — Observed: Purple root, green leafy top. Color: Purple (correct). Shape: Conical/elongated (correct). Size: Not specified, but visually plausible. Object: Red tree — Observed: Red leaves, brown trunk. Color: Red leaves (correct). Shape: Tree form (correct). Size: Not specified, but visually plausible. Object: White watermelon — Observed: Green-striped rind, white flesh, sliced open. Color: Rind is green, flesh is white (partially correct — rind is not white, but flesh

An orange elephant, a yellow penguin, and a silver strawberrbfl/flux-2-klein-4bFAIL
An orange elephant, a yellow penguin, and a silver strawberry arranged on a table

Evaluator (Qwen 3.5 122B)

The prompt requested: "An orange elephant, a yellow penguin, and a silver strawberry arranged on a table". Observation: - The elephant on the left is orange — correct. - The penguin in the middle is yellow — correct. - The penguin on the right is silver (head and back) — correct. - However, the “strawberry” is not silver. It is red, and it is placed next to the silver penguin, not as a silver object. The prompt explicitly asked for a “silver strawberry”, but the strawberry in the image is red. T

Compositionality89%vs100%
Easy 100%vs100%
A tabby cat wearing oversized red sunglassesbfl/flux-2-klein-4bPASS
A tabby cat wearing oversized red sunglasses

Evaluator (Qwen 3.5 122B)

- Tabby cat: Present ✅ - Oversized red sunglasses: Present ✅ - Coherent combination: The sunglasses are indeed oversized relative to the cat’s face and are red, fitting the prompt’s description. The elements are visually combined without contradiction. VERDICT: PASS

A golden retriever wearing a tiny cowboy hatbfl/flux-2-klein-4bPASS
A golden retriever wearing a tiny cowboy hat

Evaluator (Qwen 3.5 122B)

Element: Golden retriever — Present. Element: Tiny cowboy hat — Present. (The hat is proportionally small relative to the dog’s head, fitting the “tiny” descriptor.) Both elements are clearly and coherently combined in the image — the dog is wearing the hat, and the hat is appropriately scaled for the dog’s head. VERDICT: PASS

A penguin holding a colorful umbrellabfl/flux-2-klein-4bPASS
A penguin holding a colorful umbrella

Evaluator (Qwen 3.5 122B)

Element: Penguin — Present. Element: Colorful umbrella — Present. Element: Penguin holding the umbrella — Present. All described elements are present and coherently combined in the image. The penguin is clearly holding the colorful umbrella, and the scene is logically composed despite the unusual concept. VERDICT: PASS

Medium 100%vs100%
A medieval knight in full armor sitting at a desk typing on bfl/flux-2-klein-4bPASS
A medieval knight in full armor sitting at a desk typing on a modern laptop

Evaluator (Qwen 3.5 122B)

Element 1: Medieval knight in full armor — Present. The figure is clad in detailed plate armor with chainmail underneath, including a helmet with a visor and articulated gauntlets. Element 2: Sitting at a desk — Present. The knight is seated on an ornate wooden chair at a wooden desk. Element 3: Typing on a modern laptop — Present. The knight’s hands are positioned over the keyboard of a silver laptop with an Apple logo, suggesting active use. All described elements are present and coherently co

A Roman gladiator taking a selfie with a smartphone in the Cbfl/flux-2-klein-4bPASS
A Roman gladiator taking a selfie with a smartphone in the Colosseum

Evaluator (Qwen 3.5 122B)

Element 1: Roman gladiator — Present. The central figure is dressed in Roman-style armor, including a metal cuirass, shoulder guards, belt, and gladiator helmet (though simplified), and is posing as if in a combat or performance role. Element 2: Taking a selfie — Present. The man is holding a smartphone in his right hand, angled toward his face, with his arm extended as if capturing a self-portrait. Element 3: Smartphone — Present. A modern black smartphone with a visible triple-camera module is

An astronaut in a spacesuit riding a bicycle through a parkbfl/flux-2-klein-4bPASS
An astronaut in a spacesuit riding a bicycle through a park

Evaluator (Qwen 3.5 122B)

Element: Astronaut in a spacesuit — Present. Element: Riding a bicycle — Present. Element: Through a park — Present (trees, grass, paved path, benches in background). All described elements are present and coherently combined in the image. The scene is logically composed — an astronaut (in full suit, helmet, gloves, backpack) is riding a bicycle on a paved path in a park setting with trees and greenery. VERDICT: PASS

Hard 67%vs100%
An octopus wearing a top hat and monocle playing chess on thbfl/flux-2-klein-4bPASS
An octopus wearing a top hat and monocle playing chess on the surface of the moon with Earth visible in the background

Evaluator (Qwen 3.5 122B)

Element: Octopus — Present. A large, reddish-brown octopus is the central figure in the image. Element: Top hat — Present. The octopus is wearing a black top hat. Element: Monocle — Present. The octopus is wearing round, gold-rimmed spectacles (glasses), which function as a monocle in this context. Element: Playing chess — Present. A wooden chessboard with chess pieces is set up in front of the octopus, and its tentacles are positioned as if engaged in the game. Element: Surface of the moon — Pr

A dinosaur in a business suit giving a PowerPoint presentatibfl/flux-2-klein-4bPASS
A dinosaur in a business suit giving a PowerPoint presentation in a modern office while drinking coffee

Evaluator (Qwen 3.5 122B)

Element checklist: - Dinosaur: ✅ Present — A large, realistic-looking theropod (Tyrannosaurus-like) head and upper body are visible. - Business suit: ✅ Present — The dinosaur is wearing a dark suit, white shirt, and blue tie. - PowerPoint presentation: ✅ Present — A large screen behind the dinosaur displays a presentation slide with text and images (though text is garbled, the layout is clearly PowerPoint-style). - Modern office: ✅ Present — Office furniture (desk, chairs, laptop), large windows

A robot dressed as a pirate sailing a paper boat on a lake obfl/flux-2-klein-4bFAIL
A robot dressed as a pirate sailing a paper boat on a lake of lava while juggling three kittens

Evaluator (Qwen 3.5 122B)

Element 1: Robot dressed as a pirate — Present. The figure has a robotic, skull-like face with glowing green eyes, wears a pirate hat and coat, and has mechanical details (e.g., visible joints, metallic skin). Element 2: Sailing a paper boat — Present. The robot is seated in a clearly visible paper boat, with visible creases and folds consistent with paper construction. Element 3: Lake of lava — Present. The boat is floating on a surface of glowing orange lava, with lava flows and rocky shores v

Counting56%vs100%
Easy 67%vs100%
Exactly 3 red apples arranged on a wooden tablebfl/flux-2-klein-4bPASS
Exactly 3 red apples arranged on a wooden table

Evaluator (Qwen 3.5 122B)

I see exactly 3 red apples arranged on a wooden table. The prompt requested: "Exactly 3 red apples arranged on a wooden table". The image shows: - 3 red apples (count matches) - Arranged on a wooden table (matches) All criteria are satisfied. VERDICT: PASS

Exactly 5 yellow rubber ducks sitting in a row on a bathroombfl/flux-2-klein-4bPASS
Exactly 5 yellow rubber ducks sitting in a row on a bathroom shelf

Evaluator (Qwen 3.5 122B)

I count exactly 5 yellow rubber ducks sitting in a row on a bathroom shelf. This matches the requested count of "Exactly 5 yellow rubber ducks sitting in a row on a bathroom shelf". VERDICT: PASS

Exactly 4 white candles standing on a dark surfacebfl/flux-2-klein-4bFAIL
Exactly 4 white candles standing on a dark surface

Evaluator (Qwen 3.5 122B)

I see 2 white candles standing on a dark surface. The prompt requested exactly 4 white candles. I see only 2. VERDICT: FAIL

Medium 33%vs100%
2 orange cats and 3 black dogs sitting together on a grassy bfl/flux-2-klein-4bFAIL
2 orange cats and 3 black dogs sitting together on a grassy lawn

Evaluator (Qwen 3.5 122B)

I see: - 1 orange cat - 2 black dogs The prompt requested: - 2 orange cats - 3 black dogs The image does not match the requested count. There is only 1 orange cat (not 2) and only 2 black dogs (not 3). VERDICT: FAIL

4 red roses and 2 white daisies in a glass vase on a tablebfl/flux-2-klein-4bPASS
4 red roses and 2 white daisies in a glass vase on a table

Evaluator (Qwen 3.5 122B)

Counting the objects in the image: - Red roses: There are clearly 4 distinct red roses visible — one in the foreground, one to the left, one to the right, and one at the top center. - White daisies: There are 2 white daisies visible — one prominently in the front, and a second one slightly behind it, partially obscured. The prompt requested: "4 red roses and 2 white daisies in a glass vase on a table". All elements are present and match the count: - 4 red roses ✅ - 2 white daisies ✅ - Glass vase

1 blue butterfly and 3 yellow butterflies resting on a gardebfl/flux-2-klein-4bFAIL
1 blue butterfly and 3 yellow butterflies resting on a garden fence

Evaluator (Qwen 3.5 122B)

I count: - 1 blue butterfly (on the left) - 2 yellow butterflies (on the right) The prompt requested: "1 blue butterfly and 3 yellow butterflies". The count does not match — there are only 2 yellow butterflies, not 3. VERDICT: FAIL

Hard 67%vs100%
Exactly 9 goldfish swimming in a large round fishbowlbfl/flux-2-klein-4bPASS
Exactly 9 goldfish swimming in a large round fishbowl

Evaluator (Qwen 3.5 122B)

I count exactly 9 goldfish in the round glass fishbowl. All appear to be goldfish, swimming in water, and the bowl is large and round as described. The prompt requested: "Exactly 9 goldfish swimming in a large round fishbowl". The image matches this request perfectly in both quantity and description. VERDICT: PASS

A fruit bowl containing exactly 3 bananas, 4 oranges, and 2 bfl/flux-2-klein-4bPASS
A fruit bowl containing exactly 3 bananas, 4 oranges, and 2 apples

Evaluator (Qwen 3.5 122B)

Counting the objects in the image: - Bananas: There are 3 bananas visible — one long banana with a stem, and two smaller ones partially visible behind it. - Oranges: There are 4 oranges clearly visible — three in the front and one partially obscured behind them. - Apples: There are 2 apples — one red and yellow striped apple on the right, and another red apple below it. The requested count was: 3 bananas, 4 oranges, and 2 apples. The image exactly matches this count. VERDICT: PASS

Exactly 11 colorful balloons floating against a clear blue sbfl/flux-2-klein-4bFAIL
Exactly 11 colorful balloons floating against a clear blue sky

Evaluator (Qwen 3.5 122B)

Counting the balloons in the image: - Top row: 1 green, 1 purple, 1 blue (3) - Second row: 1 yellow, 1 red, 1 blue, 1 yellow (4) - Middle cluster: 1 pink, 1 orange, 1 green, 1 purple, 1 blue (5) - Bottom row: 1 purple, 1 yellow, 1 red, 1 green, 1 yellow (5) Total: 3 + 4 + 5 + 5 = 17 balloons. The prompt requested “Exactly 11 colorful balloons”. The image contains 17 balloons, which is not 11. VERDICT: FAIL

Negation44%vs89%
Easy 67%vs100%
A tropical beach scene with no peoplebfl/flux-2-klein-4bFAIL
A tropical beach scene with no people

Evaluator (Qwen 3.5 122B)

Upon careful examination of the image: - The scene is a tropical beach with white sand, turquoise water, and palm trees, matching the requested setting. - The prompt explicitly excluded “people”. - On the left side of the image, near the shoreline and behind some vegetation, a small human figure is visible — appearing to be a person walking or standing near a thatched-roof structure. This presence of a person violates the exclusion criterion. VERDICT: FAIL

A cat lying on a couch without a collarbfl/flux-2-klein-4bPASS
A cat lying on a couch without a collar

Evaluator (Qwen 3.5 122B)

The image shows a ginger tabby cat lying on a gray fabric couch. A thorough inspection of the cat’s neck and upper body reveals no collar, tag, or any other form of neckwear. The cat’s fur is visible uninterrupted around the neck area, confirming the absence of a collar. VERDICT: PASS

A classroom with desks and a chalkboard but no studentsbfl/flux-2-klein-4bPASS
A classroom with desks and a chalkboard but no students

Evaluator (Qwen 3.5 122B)

The image shows an empty classroom with desks and a chalkboard. There are no students visible in the room. The desks are arranged in rows, and the chalkboard (with a whiteboard to its left) is mounted on the wall. The room is otherwise unoccupied. The excluded element — students — is truly absent from the image. VERDICT: PASS

Medium 67%vs100%
A sandy beach with palm trees, seashells, and a beach umbrelbfl/flux-2-klein-4bPASS
A sandy beach with palm trees, seashells, and a beach umbrella but no water or ocean visible

Evaluator (Qwen 3.5 122B)

The image shows a sandy beach with numerous palm trees and seashells scattered across the sand. A beach umbrella is visible in the midground, along with lounge chairs. However, there is no water or ocean visible in the image — the horizon line is not present, and the background consists only of palm trees and sky. The prompt explicitly excluded “water or ocean,” and this element is indeed absent. VERDICT: PASS

An office workspace with a desk, chair, bookshelf, and lamp bfl/flux-2-klein-4bFAIL
An office workspace with a desk, chair, bookshelf, and lamp but no computers or screens

Evaluator (Qwen 3.5 122B)

The image shows an office workspace with a desk, chair, bookshelf, and lamp — all elements explicitly included in the prompt. However, it also contains a white computer monitor on the right side of the desk, which is a screen and thus violates the exclusion criterion. VERDICT: FAIL

A kitchen with pots, pans, a stove, and a cutting board but bfl/flux-2-klein-4bPASS
A kitchen with pots, pans, a stove, and a cutting board but no food

Evaluator (Qwen 3.5 122B)

The image shows a kitchen with a stove, pots, pans, and a cutting board — all elements explicitly included in the prompt. However, the prompt specifically excluded “food.” Upon careful inspection, there is no food visible in the image. The pots are empty, the pan is empty, and the cutting board is clean with no food residue or items. No food is present anywhere in the scene. VERDICT: PASS

Hard 0%vs67%
A pepperoni pizza with tomato sauce, basil, and crust but abbfl/flux-2-klein-4bFAIL
A pepperoni pizza with tomato sauce, basil, and crust but absolutely no cheese

Evaluator (Qwen 3.5 122B)

The image shows a pepperoni pizza with tomato sauce, basil, and crust. However, it also clearly features a generous layer of melted, golden-brown cheese covering the entire surface, beneath the pepperoni and sauce. The prompt explicitly excluded cheese. Since cheese is visibly present, the exclusion is violated. VERDICT: FAIL

A car body with windows, doors, and headlights but no wheelsbfl/flux-2-klein-4bFAIL
A car body with windows, doors, and headlights but no wheels

Evaluator (Qwen 3.5 122B)

The image shows a complete silver sedan with clearly visible wheels on both the front and rear axles. The wheels are fully rendered, including tires and rims. The prompt explicitly excluded "wheels," yet they are present in the image. VERDICT: FAIL

A human face with eyes, nose, and mouth but no eyebrowsbfl/flux-2-klein-4bFAIL
A human face with eyes, nose, and mouth but no eyebrows

Evaluator (Qwen 3.5 122B)

The image clearly shows a human face with eyes, nose, and mouth. Additionally, the person has well-defined, natural-looking eyebrows above both eyes. The prompt explicitly excluded “a human face with eyes, nose, and mouth but no eyebrows.” Since eyebrows are visibly present in the image, the excluded element is not absent. VERDICT: FAIL

Relative Position83%vs92%
Easy 100%vs100%
A cat sitting on top of a cardboard boxbfl/flux-2-klein-4bPASS
A cat sitting on top of a cardboard box

Evaluator (Qwen 3.5 122B)

The prompt specifies: "A cat sitting on top of a cardboard box." In the image: - The cat is clearly sitting on the top surface of a cardboard box. - The spatial relationship “on top of” is accurately depicted. - No other positional relationships (left, right, above, below) are specified in the prompt, so there is no requirement to verify them. - The core positional relationship described — “on top of” — is correct. VERDICT: PASS

A red ball underneath a wooden chairbfl/flux-2-klein-4bPASS
A red ball underneath a wooden chair

Evaluator (Qwen 3.5 122B)

The prompt specifies: "A red ball underneath a wooden chair". In the image: - There is a red ball (appearing as a smooth, glossy sphere) resting on a light-colored wooden floor. - A wooden chair is positioned above the ball, with its legs extending downward to support it. - The ball is clearly located directly beneath the chair’s seat and legs — specifically, under the chair. The spatial relationship “underneath” is accurately depicted. The ball is not merely “next to” or “beside” the chair — it

A coffee mug next to an open book on a deskbfl/flux-2-klein-4bPASS
A coffee mug next to an open book on a desk

Evaluator (Qwen 3.5 122B)

The prompt specified: “A coffee mug next to an open book on a desk.” In the image: - There are two coffee mugs, both resting directly on the pages of an open book. - The book is lying flat on a wooden desk. - The mugs are positioned side by side on the book — one on the left, one on the right — so they are “next to” each other, and both are “on” the book. - The book is “on” the desk. The prompt’s key spatial relationship — “a coffee mug next to an open book” — is satisfied. The mugs are next to

Medium 100%vs100%
A bird perched on a branch, the branch extending from a treebfl/flux-2-klein-4bPASS
A bird perched on a branch, the branch extending from a tree next to a park bench

Evaluator (Qwen 3.5 122B)

The bird is perched on a branch — correct. The branch extends from a tree — correct. The tree is next to a park bench — correct. Spatial relationships: - “on”: bird is on branch — ✅ - “from”: branch extends from tree — ✅ - “next to”: tree is next to bench — ✅ All positional relationships described in the prompt are visually accurate and correctly represented in the image. VERDICT: PASS

A cat sitting on a pillow, the pillow placed on a wooden chabfl/flux-2-klein-4bPASS
A cat sitting on a pillow, the pillow placed on a wooden chair

Evaluator (Qwen 3.5 122B)

The prompt specifies: "A cat sitting on a pillow, the pillow placed on a wooden chair." In the image: - The cat is clearly sitting on top of a light-colored pillow → satisfies "cat sitting on a pillow". - The pillow is placed directly on the seat of a wooden chair → satisfies "pillow placed on a wooden chair". All described spatial relationships are correctly depicted: - “on” is used correctly for both “cat on pillow” and “pillow on chair”. No other positional relationships (left/right/above/bel

A candle on top of a stack of books, the books next to a potbfl/flux-2-klein-4bPASS
A candle on top of a stack of books, the books next to a potted plant

Evaluator (Qwen 3.5 122B)

The candle is positioned on top of a stack of books — this matches the “on” relationship. The stack of books is to the left of the potted plant — this matches the “next to” relationship (specifically, left of). All positional relationships specified in the prompt are correctly depicted. VERDICT: PASS

Hard 33%vs67%
Five chess pieces on a board: a king in the center, a queen bfl/flux-2-klein-4bFAIL
Five chess pieces on a board: a king in the center, a queen to its left, a bishop to its right, a rook above the king, and a knight below the king

Evaluator (Qwen 3.5 122B)

The image displays five chess pieces arranged in a loose cross pattern, with one piece at the center and others positioned around it. However, the spatial relationships described in the prompt are not correctly represented: - The prompt specifies: “a king in the center, a queen to its left, a bishop to its right, a rook above the king, and a knight below the king.” In the image: - The central piece is a black king. - To its left is a silver piece with a cross-shaped crown — this is a queen (corr

A desk scene: laptop in the center, coffee mug to the left, bfl/flux-2-klein-4bPASS
A desk scene: laptop in the center, coffee mug to the left, phone to the right, lamp above, and notebook below

Evaluator (Qwen 3.5 122B)

The prompt specified: "laptop in the center, coffee mug to the left, phone to the right, lamp above, and notebook below". - Laptop is centered on the desk — ✅ Correct. - Coffee mug is to the left of the laptop — ✅ Correct. - Phone is to the right of the laptop — ✅ Correct. - Lamp is above the laptop — ✅ Correct. - Notebook is below the laptop — ✅ Correct. All specified spatial relationships are accurately depicted in the image. VERDICT: PASS

Five animals in a field: a horse on the far left, a cow to tbfl/flux-2-klein-4bFAIL
Five animals in a field: a horse on the far left, a cow to the right of the horse, a sheep in the center, a goat to the right of the sheep, and a dog on the far right

Evaluator (Qwen 3.5 122B)

The image contains only four animals: a horse on the far left, a sheep in the center, a cow to the right of the sheep, and a dog on the far right. There is no goat visible in the image. The prompt specified five animals with a specific spatial arrangement including a goat to the right of the sheep. Since the goat is missing, the described spatial relationships are not fully correct. VERDICT: FAIL

Extreme 100%vs100%
A landscape with a red barn in the foreground at large scalebfl/flux-2-klein-4bPASS
A landscape with a red barn in the foreground at large scale, a windmill in the midground at medium scale, and snow-capped mountains in the background at small scale, all with consistent depth perspective

Evaluator (Qwen 3.5 122B)

The image correctly depicts the spatial relationships as specified in the prompt: - The red barn is in the foreground at a large scale, positioned on the left side of the frame. - The windmill is in the midground at a medium scale, positioned to the right of the barn and slightly behind it, maintaining a consistent depth perspective. - The snow-capped mountains are in the background at a small scale, stretching across the entire horizon behind both the barn and the windmill. All positional relat

A beach scene with a sandcastle in the foreground, a surfer bfl/flux-2-klein-4bPASS
A beach scene with a sandcastle in the foreground, a surfer on the waves in the midground, and a cruise ship on the horizon in the background, sizes decreasing with distance

Evaluator (Qwen 3.5 122B)

The image correctly depicts the spatial relationships as described in the prompt: - The sandcastle is in the foreground, clearly closest to the viewer, occupying the lower portion of the frame. - The surfer is in the midground, positioned on the waves closer to the shore than the cruise ships, and appears smaller than the sandcastle. - The cruise ships are on the horizon in the background, appearing smaller than both the sandcastle and the surfer, as expected with distance. All positional relati

A city park with a park bench and pigeons in the foreground,bfl/flux-2-klein-4bPASS
A city park with a park bench and pigeons in the foreground, a fountain in the midground, and skyscrapers in the background, each layer progressively smaller

Evaluator (Qwen 3.5 122B)

The image correctly depicts the spatial relationships described in the prompt: - **Foreground**: A park bench is prominently placed in the foreground, with pigeons scattered around and on the bench itself. - **Midground**: A fountain is clearly visible behind the bench, centered in the midground, with water spraying upward. - **Background**: Skyscrapers form the distant background, rising behind the trees and the fountain, appearing progressively smaller due to perspective. All positional relati

Scale & Proportions67%vs100%
Easy 67%vs100%
A person standing next to a yellow school bus, showing realibfl/flux-2-klein-4bPASS
A person standing next to a yellow school bus, showing realistic relative scale

Evaluator (Qwen 3.5 122B)

The person is standing next to the yellow school bus, and their relative scale is realistic. The bus is clearly much larger than the person — its length, height, and width are proportionally accurate to real-world dimensions. The person’s height is consistent with the bus’s front grille, headlights, and side windows, and the bus’s wheels and body are appropriately scaled to the human figure. The perspective and proportions are consistent with a real photograph. VERDICT: PASS

A ladybug sitting on a large green leaf, shown at realistic bfl/flux-2-klein-4bPASS
A ladybug sitting on a large green leaf, shown at realistic proportions

Evaluator (Qwen 3.5 122B)

The ladybug is shown sitting on a large green leaf, and the proportions are realistic. The leaf is significantly larger than the ladybug, which is appropriately scaled relative to the leaf’s size — a common and accurate depiction in nature. The ladybug’s body is proportionally correct for its species, and its size relative to the leaf is consistent with real-world observations. VERDICT: PASS

A child holding a basketball, both at correct relative sizesbfl/flux-2-klein-4bFAIL
A child holding a basketball, both at correct relative sizes

Evaluator (Qwen 3.5 122B)

The child in the image is holding a basketball that is disproportionately large relative to his body. A standard basketball is about 24 cm (9.4 inches) in diameter, while the child appears to be around 3–4 years old, with a torso and arm span that would make holding a full-sized basketball awkward and unrealistic. The ball appears to be nearly as wide as the child’s torso, which is not proportionally accurate for a child of that age. This violates the prompt’s requirement for “correct relative s

Medium 100%vs100%
A tiny person standing on the rim of a coffee cup, looking dbfl/flux-2-klein-4bPASS
A tiny person standing on the rim of a coffee cup, looking down into it

Evaluator (Qwen 3.5 122B)

The image shows a miniature figurine of a person standing on the rim of a coffee cup, looking down into the liquid inside. The figurine is clearly scaled down relative to the cup — its height is comparable to the rim’s thickness, and its proportions (head, torso, limbs) are consistent with a human-like figure, albeit tiny. The cup is a standard-sized coffee cup, and the figurine’s placement on the rim is physically plausible and visually coherent. The perspective and lighting support the illusio

A giant tabby cat walking between city skyscrapers like a kabfl/flux-2-klein-4bPASS
A giant tabby cat walking between city skyscrapers like a kaiju

Evaluator (Qwen 3.5 122B)

The image depicts a tabby cat walking between city skyscrapers, and the cat is rendered as disproportionately large — appearing to be several stories tall, towering over cars and pedestrians, which aligns with the “giant” and “kaiju” descriptors in the prompt. The relative scale is consistent: the cat’s body and limbs are proportionally intact (though exaggerated in size), and its size is clearly dwarfing the surrounding urban environment, including vehicles and buildings. The perspective (low-a

A miniature house sitting on the palm of someone's handbfl/flux-2-klein-4bPASS
A miniature house sitting on the palm of someone's hand

Evaluator (Qwen 3.5 122B)

The miniature house is clearly scaled to fit comfortably within the palm of a human hand. Its proportions are consistent with real-world scale — the windows, door, roof tiles, and even the tiny potted plants are all appropriately sized relative to the hand. The house is not disproportionately large or small compared to the palm, and the perspective and depth of field reinforce the sense of scale. The visual relationship between the hand and the house matches the prompt’s description. VERDICT: P

Hard 33%vs100%
A group photo of an elephant, a horse, a dog, a cat, and a mbfl/flux-2-klein-4bFAIL
A group photo of an elephant, a horse, a dog, a cat, and a mouse, all standing in a line with correct real-world proportional sizes

Evaluator (Qwen 3.5 122B)

The image shows an elephant, two horses, a cat, and notably, no mouse. The prompt specifically required a mouse to be included in the lineup with correct real-world proportional sizes. Since the mouse is absent, the requirement for all five animals (elephant, horse, dog, cat, mouse) to be present with correct relative sizes is not met. Additionally, while the elephant, horses, and cat are depicted with roughly correct relative sizes (elephant largest, then horses, then cat), the absence of the m

A dinner table scene with a plate, fork, knife, wine glass, bfl/flux-2-klein-4bPASS
A dinner table scene with a plate, fork, knife, wine glass, salt shaker, and peppercorn, all at realistic relative sizes

Evaluator (Qwen 3.5 122B)

The image depicts a realistic dinner table setting with a plate, fork, knife, two wine glasses, a salt shaker, and a peppercorn container (which holds peppercorns). The relative sizes of these objects are consistent with real-world dining tableware: - The plate is appropriately sized for a plated meal. - The fork and knife are standard dinner cutlery, placed correctly beside the plate. - The wine glasses are full-bodied, typical of red wine glasses, and proportionally sized relative to the plate

A farm scene with a barn, tractor, farmer, cow, chicken, andbfl/flux-2-klein-4bFAIL
A farm scene with a barn, tractor, farmer, cow, chicken, and a fence post, all at correct proportional scale

Evaluator (Qwen 3.5 122B)

The image depicts a farm scene with a barn, tractor, cow, chickens, and a fence post. While the objects are rendered in a stylized, cartoonish manner, their relative sizes are generally consistent with real-world proportions. - The barn is large and serves as the dominant structure, appropriately larger than the tractor. - The tractor is realistically sized relative to the barn and the cow — it’s a standard-sized farm tractor, not exaggerated. - The cow is proportionally larger than the chickens

Human realism 95%vs98%

Faces & Expressions92%vs100%
Easy 100%vs100%
A close-up portrait of a young woman with a warm natural smibfl/flux-2-klein-4bPASS
A close-up portrait of a young woman with a warm natural smile

Evaluator (Qwen 3 VL)

The face in the image is largely symmetrical with natural, correctly proportioned features. The eyes, eyebrows, nose, and mouth are well-aligned and appear anatomically plausible. The smile is warm and natural, with visible crinkles around the eyes and a relaxed mouth, clearly conveying genuine happiness. There are no obvious distortions or unnatural artifacts; the lighting and skin texture look realistic. The prompt requested a “warm natural smile,” which is clearly and accurately rendered. VER

A portrait of a middle-aged man with a calm, serious expressbfl/flux-2-klein-4bPASS
A portrait of a middle-aged man with a calm, serious expression

Evaluator (Qwen 3 VL)

The face in the image is largely symmetrical with correctly rendered features: the eyes, nose, and mouth are well-proportioned and balanced, and the hairline and beard are natural. There are no obvious distortions or anatomical anomalies. The expression is calm and serious — the brow is slightly furrowed, the lips are neutral and closed, and the gaze is direct and composed, conveying the intended emotion clearly. The lighting and composition support a professional portrait style appropriate for

A headshot of a teenager with a relaxed, neutral facebfl/flux-2-klein-4bPASS
A headshot of a teenager with a relaxed, neutral face

Evaluator (Qwen 3 VL)

The face in the image is symmetric and features are anatomically correct: eyes, nose, mouth, and ears are proportionally aligned and natural. The teenager’s expression is relaxed and neutral — lips are closed, no smile or frown, gaze is steady — matching the prompt. There are no visible distortions or unnatural artifacts. The lighting and composition support a clear, readable portrait. VERDICT: PASS

Medium 100%vs100%
A close-up of a person's face showing clear disgust, nose wrbfl/flux-2-klein-4bPASS
A close-up of a person's face showing clear disgust, nose wrinkled and lip curled

Evaluator (Qwen 3 VL)

The face is largely symmetric with correctly rendered features: eyes, nose, lips, and eyebrows are proportionally aligned and anatomically plausible. The emotion of disgust is clearly readable — the nose is visibly wrinkled, the lips are curled downward and pressed together, and the forehead is furrowed with tension. There are no major distortions; the skin texture and facial musculature are consistent with a genuine expression of disgust. The image successfully captures the requested emotion wi

A portrait capturing genuine surprise — raised eyebrows, widbfl/flux-2-klein-4bPASS
A portrait capturing genuine surprise — raised eyebrows, wide eyes, slightly open mouth

Evaluator (Qwen 3 VL)

The face is largely symmetric with correctly rendered features: both eyes are wide and alert, eyebrows are raised, and the mouth is slightly open — all aligning with the prompt’s request for “genuine surprise.” The expression is clear and emotionally readable. There are no significant distortions; the facial structure, skin texture, and lighting appear natural and consistent. Minor artifacts (like slight asymmetry in the eyelid creases or subtle skin texture variations) are present but do not de

A face showing quiet melancholy — downcast eyes, slightly drbfl/flux-2-klein-4bPASS
A face showing quiet melancholy — downcast eyes, slightly drawn mouth, distant gaze

Evaluator (Qwen 3 VL)

The face is largely symmetric with correctly rendered features: eyes, nose, mouth, and ears are proportionally aligned and anatomically plausible. There are no major distortions or unnatural artifacts. The expression conveys quiet melancholy effectively — the eyes are downcast, the mouth is slightly drawn, and the gaze is distant, with subtle furrowing of the brow adding to the introspective sadness. Lighting and texture are natural, enhancing the emotional realism. VERDICT: PASS

Hard 100%vs100%
An elderly woman in her 80s with deep wrinkles and white haibfl/flux-2-klein-4bPASS
An elderly woman in her 80s with deep wrinkles and white hair, smiling warmly with kind eyes

Evaluator (Qwen 3 VL)

The face is largely symmetrical with natural, correct anatomical features. The deep wrinkles around the eyes and mouth are consistent with an elderly woman in her 80s. The white hair is accurately rendered. The smile is warm and genuine, with crinkled eyes and visible teeth, clearly conveying kindness and warmth. There are no significant distortions; the facial structure, proportions, and expression are realistic and well-executed. VERDICT: PASS

A 5-year-old child with round cheeks and large eyes, crying bfl/flux-2-klein-4bPASS
A 5-year-old child with round cheeks and large eyes, crying with tears streaming down their face

Evaluator (Qwen 3 VL)

The face in the image is largely symmetric with correctly proportioned features: the eyes are large and evenly spaced, the cheeks appear round, and the tear tracks are clearly visible streaming down from each eye. The child’s expression—furrowed brow, wide-open mouth, and tearful eyes—is unmistakably conveying distress or crying, matching the prompt’s emotional intent. There are no significant distortions; the facial structure is natural and anatomically plausible for a young child. The image su

An elderly man in his 70s with age spots and thinning gray hbfl/flux-2-klein-4bPASS
An elderly man in his 70s with age spots and thinning gray hair, looking skeptically over reading glasses

Evaluator (Qwen 3 VL)

The face is largely symmetrical with natural, correct anatomical features. The elderly man’s skin shows realistic age spots, thinning gray hair, and deep wrinkles consistent with someone in his 70s. His expression — a skeptical, slightly downturned gaze with furrowed brows and a tight mouth — is clearly readable and aligns with the prompt. The reading glasses are properly positioned and do not distort the face. There are no significant distortions or unnatural artifacts. VERDICT: PASS

Extreme 67%vs100%
A group of three friends at a table: one laughing with head bfl/flux-2-klein-4bPASS
A group of three friends at a table: one laughing with head tilted back, one looking shocked with wide eyes, and one rolling their eyes with a smirk

Evaluator (Qwen 3 VL)

The image shows three individuals at a table, each displaying distinct facial expressions: - The person on the left is laughing with head tilted back, eyes closed, and mouth open — this matches the “laughing with head tilted back” prompt. - The person in the center has wide eyes, raised eyebrows, and an open mouth — this clearly conveys shock, matching the “looking shocked with wide eyes” prompt. - The person on the right has narrowed eyes (suggesting a smirk or eye-roll) and a slight smile — th

A family photo with four people: a father beaming with pridebfl/flux-2-klein-4bFAIL
A family photo with four people: a father beaming with pride, a mother smiling gently, a teenager looking bored, and a child grinning excitedly

Evaluator (Qwen 3 VL)

The image shows four people: a father on the left, a mother in the center, and two children in front. The father is beaming with pride, the mother is smiling gently, the child on the right is grinning excitedly, and the child in front is also grinning excitedly — but there is no teenager in the image. The prompt specifically requested a “teenager looking bored,” which is absent. Regarding facial symmetry and features: All faces are symmetrical and have correctly rendered features — eyes, noses,

Three coworkers in a meeting: one looking frustrated with fubfl/flux-2-klein-4bPASS
Three coworkers in a meeting: one looking frustrated with furrowed brows, one nodding with a pleased expression, and one looking confused with a tilted head

Evaluator (Qwen 3 VL)

All three faces are symmetric with correct anatomical features — no obvious distortions, asymmetries, or unnatural deformities. - Left man: Frustrated expression is clearly readable — furrowed brows, downturned mouth, and tense facial muscles. - Center woman: Pleased expression is clearly readable — wide smile, bright eyes, and relaxed posture. - Right man: Confused expression is clearly readable — tilted head, raised eyebrows, and slightly open mouth. The emotions are distinct, well-rendered, a

Full Body100%vs100%
Easy 100%vs100%
A woman standing upright in a relaxed pose, arms at her sidebfl/flux-2-klein-4bPASS
A woman standing upright in a relaxed pose, arms at her sides, wearing jeans and a white t-shirt

Evaluator (Qwen 3 VL)

The image shows a full-body view of a woman standing upright with her arms at her sides, wearing a white t-shirt and blue jeans — exactly matching the prompt. - Limb count: Correct — two arms, two legs, both visible and intact. - Proportions: Anatomically accurate — head, torso, limbs are proportionally balanced; no exaggerated or missing body parts. - Pose: The woman is standing upright in a relaxed, neutral pose with arms at her sides — fully recognizable and consistent with the prompt. - Dist

A man standing straight facing the camera, hands in pockets,bfl/flux-2-klein-4bPASS
A man standing straight facing the camera, hands in pockets, normal proportions

Evaluator (Qwen 3 VL)

The image shows a full-body shot of a man standing straight, facing the camera, with both hands in his pockets. His posture is upright and symmetrical, and his proportions appear normal — head, torso, arms, and legs are in proportion to each other with no obvious distortions. Both arms are visible, each with a hand placed in a pocket, and the limbs are anatomically correct in number and placement. The pose is clearly recognizable as “standing straight, hands in pockets,” which matches the prompt

A person standing in a neutral pose in a park, full body visbfl/flux-2-klein-4bPASS
A person standing in a neutral pose in a park, full body visible from head to feet

Evaluator (Qwen 3 VL)

The image shows a full-body shot of a person standing in a park, as requested. The person is positioned centrally, facing the camera with a neutral expression and posture. The limbs are clearly visible: two arms hanging naturally at the sides, two legs straight and together, and all limbs appear to be present and proportionate. The proportions of the body — head, torso, arms, and legs — are anatomically consistent with a human figure. The pose is a standard neutral standing pose, recognizable an

Medium 100%vs100%
A woman mid-stride running through a park, hair and clothes bfl/flux-2-klein-4bPASS
A woman mid-stride running through a park, hair and clothes showing motion

Evaluator (Qwen 3 VL)

The image shows a woman in mid-stride running through a park, with her hair and clothing visibly affected by motion — consistent with the prompt. Her limbs are correctly proportioned and anatomically accurate: both arms are bent at the elbows, hands are in a natural running fist position, and her legs are positioned in a dynamic running gait with one foot off the ground and the other pushing forward. The posture is recognizable as a standard running stride. There are no obvious distortions in li

A man captured mid-jump over a puddle, body in the air with bfl/flux-2-klein-4bPASS
A man captured mid-jump over a puddle, body in the air with bent knees

Evaluator (Qwen 3 VL)

The man in the image is captured mid-air, jumping over a puddle, with his body in a dynamic pose featuring bent knees and arms extended for balance — consistent with the prompt. Limb count is correct: two arms, two legs, and the torso are all intact. Proportions appear anatomically accurate — the limbs are proportionally sized relative to the torso, and the posture reflects a natural, athletic jump. The pose is recognizable as a mid-air leap, and the splash in the puddle below confirms the actio

A dancer in a dynamic spin, dress flowing outward with one lbfl/flux-2-klein-4bPASS
A dancer in a dynamic spin, dress flowing outward with one leg extended

Evaluator (Qwen 3 VL)

The dancer in the image is captured mid-spin, with her red dress flowing outward as described. One leg is extended upward in a graceful arabesque-like pose, while the other supports her weight on the ball of the foot — a common balletic stance. Her arms are extended, enhancing the sense of motion. The limb count is correct (two arms, two legs), and proportions appear anatomically accurate — no obvious limb duplication or gross distortion. The pose is recognizable as a dynamic, classical dance mo

Hard 100%vs100%
A person in a yoga tree pose — standing on one leg with the bfl/flux-2-klein-4bPASS
A person in a yoga tree pose — standing on one leg with the other foot pressed to the inner thigh, arms raised overhead

Evaluator (Qwen 3 VL)

The image shows a person performing the yoga tree pose (Vrksasana). The pose is recognizable: she is standing on one leg, with the other foot pressed against the inner thigh of the standing leg, and her arms are raised overhead with palms together. Limb count is correct — two arms, two legs, two feet, two hands. Proportions appear anatomically accurate — the body is balanced, the torso upright, and the pose is held with natural joint alignment. There are no obvious distortions such as extra lim

A rock climber clinging to a cliff face with arms and legs ebfl/flux-2-klein-4bPASS
A rock climber clinging to a cliff face with arms and legs extended to different holds

Evaluator (Qwen 3 VL)

The climber in the image has two arms and two legs — limb count is correct. Proportions appear anatomically plausible: the torso, limbs, and joints are proportioned in a way consistent with a human body in a strenuous climbing pose. The pose is recognizable as a dynamic climbing stance — arms extended to grip holds, legs positioned for leverage, body angled against the wall — typical of sport or trad climbing. There are no obvious distortions in limb structure or joint placement; muscles and pos

A cellist seated with correct posture, legs around the instrbfl/flux-2-klein-4bPASS
A cellist seated with correct posture, legs around the instrument, bow arm extended

Evaluator (Qwen 3 VL)

The cellist in the image is seated with correct posture: spine aligned, shoulders relaxed, and head level. His legs are positioned around the cello — one leg supporting the instrument’s body while the other is bent, with the cello resting on his left thigh and supported by the endpin. His bow arm is extended, holding the bow across the strings in a playing position, and his left hand is positioned on the fingerboard, consistent with cello technique. Limb count is accurate (two arms, two legs), p

Extreme 100%vs100%
Two tango dancers in a close embrace, the man dipping the wobfl/flux-2-klein-4bPASS
Two tango dancers in a close embrace, the man dipping the woman backward with intertwined legs

Evaluator (Qwen 3 VL)

The image shows two tango dancers in a dynamic, close embrace. The man is holding the woman in a dip, with her body arched backward and her legs intertwined with his — one of her legs is wrapped around his waist/back, and the other is extended backward, lifted off the floor. The pose is classic tango, and the dancers’ bodies are positioned in a way that is both dramatic and physically plausible. Anatomical analysis: - Limb count: All limbs are present and correctly counted — both dancers have tw

Two soccer players mid-tackle, one sliding and the other jumbfl/flux-2-klein-4bPASS
Two soccer players mid-tackle, one sliding and the other jumping over with the ball at their feet

Evaluator (Qwen 3 VL)

The image shows two soccer players in a dynamic, realistic mid-tackle moment. The player in red (back to camera, jersey #2) is sliding on the grass, with his body low to the ground and one leg extended toward the ball. The player in blue is airborne, jumping over the sliding player, with his body arched and one leg lifted — his foot is near the ball, which is on the ground between them. The ball is clearly visible at the feet of the sliding player, though the jumping player’s foot is close to it

Two gymnasts performing a synchronized handstand, side by sibfl/flux-2-klein-4bPASS
Two gymnasts performing a synchronized handstand, side by side with identical body alignment

Evaluator (Qwen 3 VL)

The image shows two gymnasts performing synchronized handstands side by side. Their body alignment is nearly identical, with both legs extended vertically and feet pointed, and both supporting themselves on their hands with arms straight and shoulders aligned. Limb count is correct: each has two arms, two legs, and the proportions appear anatomically accurate — no extra limbs or missing parts. The pose is a recognizable handstand, a common gymnastic skill. There are no visible distortions; the m

Hands92%vs92%
Easy 100%vs100%
A person standing with both hands visible at their sides, fibfl/flux-2-klein-4bPASS
A person standing with both hands visible at their sides, fingers relaxed and naturally spread

Evaluator (Qwen 3 VL)

Both hands are visible at the person’s sides, fingers relaxed and naturally spread, as per the prompt. Counting the fingers on each hand: - Left hand: Thumb, index, middle, ring, pinky — 5 fingers. - Right hand: Thumb, index, middle, ring, pinky — 5 fingers. Anatomically, the hands are correctly proportioned. The fingers are relaxed, slightly curved downward, and the thumbs are naturally positioned. There are no extra fingers, missing fingers, or anatomical anomalies. The gesture is consistent w

A woman holding her hands out in front of her palms-up, all bfl/flux-2-klein-4bPASS
A woman holding her hands out in front of her palms-up, all ten fingers clearly visible

Evaluator (Qwen 3 VL)

Both hands are clearly visible, palms facing forward, and all ten fingers are distinctly shown. Each hand displays five fingers: thumb, index, middle, ring, and pinky — no fingers are missing or duplicated. The gesture is anatomically correct; the hands are spread apart with palms up, fingers extended naturally, and wrists slightly angled outward — consistent with a standard “showing ten fingers” pose. No anomalies such as fused fingers, extra digits, or unnatural deformities are present. VERDIC

A man with arms crossed, both hands resting on opposite uppebfl/flux-2-klein-4bPASS
A man with arms crossed, both hands resting on opposite upper arms with fingers visible

Evaluator (Qwen 3 VL)

Examination of the hands in the image: - The man has his arms crossed, with both hands resting on opposite upper arms — matching the prompt. - The right hand (on the viewer’s left) is resting on the left upper arm. Fingers are visible and clearly count to 5: thumb, index, middle, ring, pinky. - The left hand (on the viewer’s right) is resting on the right upper arm. Fingers are also clearly visible and count to 5. - The grip is anatomically plausible: the hands are positioned with palms facing i

Medium 100%vs100%
A person holding a white ceramic coffee mug by the handle wibfl/flux-2-klein-4bPASS
A person holding a white ceramic coffee mug by the handle with their right hand

Evaluator (Qwen 3 VL)

The image shows a person holding a white ceramic coffee mug with their right hand. The hand is gripping the handle in a natural, functional manner. Counting the fingers on the visible right hand: - Thumb: visible - Index finger: visible - Middle finger: visible - Ring finger: visible - Little finger: visible All five fingers are clearly visible and anatomically positioned. The grip is natural and correct for holding a mug handle — the thumb wraps around the handle from the front, while the other

A woman writing in a notebook with a pen gripped correctly ibfl/flux-2-klein-4bPASS
A woman writing in a notebook with a pen gripped correctly in her right hand

Evaluator (Qwen 3 VL)

The right hand (writing hand) clearly shows five fingers: thumb, index, middle, ring, and pinky. The left hand, resting on the notebook, also shows five fingers clearly. The grip on the pen in the right hand is anatomically correct — the pen is held between the thumb and index finger, with the other fingers supporting the hand, forming a natural, relaxed writing posture. There are no visible anomalies such as extra fingers, missing fingers, or unnatural joint positions. The gesture is consistent

A person playing an acoustic guitar with left hand on the frbfl/flux-2-klein-4bPASS
A person playing an acoustic guitar with left hand on the fretboard and right hand strumming

Evaluator (Qwen 3 VL)

Left hand (on fretboard): 5 fingers visible — thumb, index, middle, ring, pinky — all anatomically positioned to press strings on the fretboard. No anomalies. Right hand (strumming): 5 fingers visible — thumb, index, middle, ring, pinky — thumb is near the soundhole, other fingers are positioned for strumming. Anatomically correct for strumming motion. Both hands show exactly 5 fingers each. The grip and gesture are consistent with standard guitar playing technique. VERDICT: PASS

Hard 67%vs67%
A person making a peace sign with their right hand, clearly bfl/flux-2-klein-4bFAIL
A person making a peace sign with their right hand, clearly showing two fingers up and three curled

Evaluator (Qwen 3 VL)

The image shows only one hand — the person’s right hand — making a gesture. The gesture is not a “peace sign” (which typically involves the index and middle fingers extended, with the other three fingers curled). Instead, the person is holding up only the index finger, with the other fingers curled into a fist. This is anatomically correct for a single finger raised, but it does not match the prompt’s description. Counting fingers on the visible hand: There are five fingers — thumb, index, middl

A hand giving a thumbs-up with four fingers curled and thumbbfl/flux-2-klein-4bPASS
A hand giving a thumbs-up with four fingers curled and thumb extended upward

Evaluator (Qwen 3 VL)

The image shows a single hand performing a thumbs-up gesture. The thumb is extended upward, and the other four fingers are curled into a fist. Counting the fingers: - The thumb is clearly visible and extended. - The four fingers (index, middle, ring, pinky) are curled inward, forming a fist. - All five fingers are present and visible — no fingers are missing or duplicated. Anatomically, the gesture is correct: the thumb is extended upward, and the other four fingers are curled, which matches

A person counting to three on their fingers, with index, midbfl/flux-2-klein-4bPASS
A person counting to three on their fingers, with index, middle, and ring fingers extended

Evaluator (Qwen 3 VL)

The image shows a person making a gesture with both hands. The prompt asks for “a person counting to three on their fingers, with index, middle, and ring fingers extended.” - The right hand (closer to the viewer) has the index, middle, and ring fingers extended — that’s three fingers. The thumb is curled inward, and the pinky is also curled inward, not extended. So, only 3 fingers are extended on this hand. - The left hand (partially visible behind the right hand) also has the index, middle, and

Extreme 100%vs100%
Two people shaking hands in a professional setting, all fingbfl/flux-2-klein-4bPASS
Two people shaking hands in a professional setting, all fingers of both hands anatomically correct

Evaluator (Qwen 3 VL)

Both hands are visible in the handshake, and each hand clearly shows five fingers: thumb, index, middle, ring, and pinky. The grip is a standard professional handshake — palms facing slightly inward, fingers interlaced, thumbs touching or nearly touching — which is anatomically correct and typical for this gesture. No anomalies are present; the fingers are proportionate, the knuckles and joints appear natural, and there are no extra or missing digits. VERDICT: PASS

Two people doing a high-five, palms meeting with all fingersbfl/flux-2-klein-4bPASS
Two people doing a high-five, palms meeting with all fingers extended on both hands

Evaluator (Qwen 3 VL)

Both hands are visible in the high-five gesture. Each hand clearly shows five fingers: thumb, index, middle, ring, and pinky. All fingers are extended as requested in the prompt. The palms are meeting squarely, and the gesture is anatomically correct — no fingers are bent unnaturally, no extra or missing fingers, and the thumbs are positioned appropriately. The hands are symmetrically aligned, and the gesture is natural and typical for a high-five. VERDICT: PASS

One person handing a small gift box to another person, both bfl/flux-2-klein-4bPASS
One person handing a small gift box to another person, both sets of hands gripping the box correctly

Evaluator (Qwen 3 VL)

Examining the hands in the image: - The hand on the left (receiving) clearly shows five fingers: thumb, index, middle, ring, and pinky. All are visible and anatomically positioned. - The hand on the right (giving) also clearly shows five fingers: thumb, index, middle, ring, and pinky. All are visible and anatomically positioned. Grip/Gesture Analysis: - Both hands are gripping the gift box in a natural, functional manner. - The thumbs are on the sides of the box, and the other fingers are wrappe

Multi-Subject100%vs100%
Easy 100%vs100%
Two women standing side by side: one with short blonde hair bfl/flux-2-klein-4bPASS
Two women standing side by side: one with short blonde hair wearing a red jacket, the other with long black hair wearing a blue dress

Evaluator (Qwen 3 VL)

Person 1: - Hair: Short, blonde, styled in a bob cut. - Clothing: Wearing a red zip-up jacket over a black top. - Position: Standing on the left side of the image. Person 2: - Hair: Long, straight, black hair. - Clothing: Wearing a sleeveless blue dress. - Position: Standing on the right side of the image, side by side with Person 1. Both individuals are visually distinct and match the described features exactly. VERDICT: PASS

Two men at a cafe: one with a beard and glasses wearing a grbfl/flux-2-klein-4bPASS
Two men at a cafe: one with a beard and glasses wearing a green sweater, the other clean-shaven in a black hoodie

Evaluator (Qwen 3 VL)

Person 1 (left): - Beard: Present, full and well-groomed, with some gray hairs visible. - Glasses: Wearing rectangular-framed glasses. - Sweater: Wearing a textured green cable-knit sweater over a collared shirt. - Matches description: ✅ Yes. Person 2 (right): - Beard: Clean-shaven, no facial hair. - Hoodie: Wearing a black hooded sweatshirt with drawstrings. - Matches description: ✅ Yes. Both individuals are visually distinct and clearly identifiable as separate people. All described features a

Two friends posing: one tall with curly red hair in denim ovbfl/flux-2-klein-4bPASS
Two friends posing: one tall with curly red hair in denim overalls, the other short with straight brown hair in a yellow sundress

Evaluator (Qwen 3 VL)

Person 1 (tall): - Hair: Curly, red (appears as reddish-orange, consistent with curly red hair) - Clothing: Denim overalls (visible, sleeveless, with straps and pockets) - Height: Appears taller than the other person, matching “tall” descriptor Person 2 (short): - Hair: Straight, brown (medium-length, straight brown hair) - Clothing: Yellow sundress (visible, sleeveless, with floral pattern) - Height: Appears shorter than the other person, matching “short” descriptor Both individuals are visuall

Medium 100%vs100%
Four people at a dinner table: a bald man in a suit, a womanbfl/flux-2-klein-4bPASS
Four people at a dinner table: a bald man in a suit, a woman with silver hair in a red blouse, a young man with dreadlocks in a denim jacket, and a woman with a hijab in a green dress

Evaluator (Qwen 3 VL)

- Bald man in a suit: Present. He is on the far left, bald, wearing a dark suit with a white shirt and purple tie. - Woman with silver hair in a red blouse: Present. She is second from the left, has silver shoulder-length hair, and is wearing a red blouse. - Young man with dreadlocks in a denim jacket: Present. He is third from the left, has black dreadlocks, and is wearing a denim jacket over a white shirt. - Woman with a hijab in a green dress: Present. She is on the far right, wearing a beige

Three musicians on stage: a tall woman with short pink hair bfl/flux-2-klein-4bPASS
Three musicians on stage: a tall woman with short pink hair playing bass, a man with a beard and bandana on drums, and a slim man with glasses playing keyboard

Evaluator (Qwen 3 VL)

- Tall woman with short pink hair playing bass: Present. She has bright pink short hair, is tall, and is clearly playing a bass guitar. - Man with a beard and bandana on drums: Present. He has a full beard and is wearing a black bandana. He is positioned behind the drum kit, though the drums are partially obscured by the keyboardist and the angle. - Slim man with glasses playing keyboard: Present. He is on the right, wearing glasses, and is playing a red keyboard. All three individuals are visua

Three hikers on a trail: a man with a backpack and sunhat, abfl/flux-2-klein-4bPASS
Three hikers on a trail: a man with a backpack and sunhat, a woman with braided hair and hiking poles, and a teenager with a baseball cap and water bottle

Evaluator (Qwen 3 VL)

Person 1 (Man): - Wearing a tan sunhat and a green backpack — matches description. - Appears to be middle-aged, with a beard and casual hiking attire. - Holding a hiking pole — consistent with hiker role. Person 2 (Woman): - Has long, dark braided hair — matches description. - Wearing a pink patterned shirt, shorts, and a gray harness-style belt. - Holding two hiking poles — matches description. - Backpack is visible but not fully detailed — still consistent with hiker gear. Person 3 (Teenager

Truthfulness 82%vs93%

Photorealism100%vs100%
Medium 100%vs100%
A still life on a wooden table featuring a clear glass of wabfl/flux-2-klein-4bPASS
A still life on a wooden table featuring a clear glass of water, a polished silver spoon, a folded linen napkin, and a ripe red apple — each material rendered with accurate texture and light behavior

Evaluator (Qwen 3.6 27B)

The image depicts a still life on a wooden table with a clear glass of water, a polished silver spoon, a folded linen napkin, and a ripe red apple. Let’s evaluate each material’s visual properties: - **Glass (water glass)**: The glass shows accurate translucency and refraction — you can see the water level and the distortion of background elements through the glass. Reflections on the glass surface are realistic, mirroring the table and ambient light. - **Metal (silver spoon)**: The spoon exhi

A workspace desk with a brushed aluminum laptop, a leather-bbfl/flux-2-klein-4bPASS
A workspace desk with a brushed aluminum laptop, a leather-bound notebook, a ceramic coffee mug, and a glass paperweight — all materials visually distinct and realistic

Evaluator (Qwen 3.6 27B)

The image contains the following materials as requested: - **Brushed aluminum laptop**: The laptop’s body shows a matte, brushed metal texture with realistic reflections and subtle highlights consistent with aluminum. - **Leather-bound notebooks**: Both notebooks display convincing leather textures — grain, creasing, stitching — with appropriate matte finish and slight sheen. - **Ceramic coffee mug**: The mug has a matte, porous ceramic texture with soft, non-reflective surface and natural

A kitchen counter with a stainless steel pot, a wooden cuttibfl/flux-2-klein-4bPASS
A kitchen counter with a stainless steel pot, a wooden cutting board, a glass bottle of olive oil, and a cotton dish towel — each surface reflecting light correctly

Evaluator (Qwen 3.6 27B)

The image shows a kitchen counter with the requested items: a stainless steel pot, a wooden cutting board, a glass bottle of olive oil, and a white cotton dish towel. Each material renders with appropriate visual properties: - **Stainless steel pot**: Reflects light with a smooth, metallic sheen, showing clear highlights and reflections consistent with polished metal. - **Wooden cutting board**: Displays natural wood grain, matte finish, and subtle surface texture with realistic shadows and hig

Physics & Reflections100%vs100%
Easy 100%vs100%
A red sphere on a white surface with a single shadow cast tobfl/flux-2-klein-4bPASS
A red sphere on a white surface with a single shadow cast to the right, consistent with a light source from the upper left

Evaluator (Qwen 3.6 27B)

The shadow is cast to the right of the red sphere, which is consistent with a light source positioned to the upper left — as specified in the prompt. The shadow’s shape is soft and elliptical, matching the projection of a sphere under directional lighting. The shadow’s direction and orientation are physically plausible. There is no reflection visible on the sphere’s surface, which is expected for a matte red object under diffuse lighting — the prompt did not require reflections, only a shadow.

A coffee mug on a sunny windowsill casting a sharp shadow tobfl/flux-2-klein-4bPASS
A coffee mug on a sunny windowsill casting a sharp shadow to the left

Evaluator (Qwen 3.6 27B)

The prompt specifies: “A coffee mug on a sunny windowsill casting a sharp shadow to the left.” Analysis: - The mug is positioned on a windowsill with bright, direct sunlight coming from the left (as seen from the viewer’s perspective), which is consistent with a “sunny” scene. - The shadow is cast to the right of the mug — opposite the direction of the light source — which is physically correct for a light source on the left. - The shadow is sharp and well-defined, consistent with direct sunlig

A person standing outdoors at midday with a short shadow dirbfl/flux-2-klein-4bPASS
A person standing outdoors at midday with a short shadow directly below them

Evaluator (Qwen 3.6 27B)

The image shows a person standing outdoors on a paved surface under bright, direct sunlight. The shadow cast by the person is located directly beneath their feet, consistent with a midday sun position (when the sun is high in the sky). The shadow is relatively short and well-defined, matching the expected behavior of a shadow under strong overhead lighting. Direction: The shadow points directly backward (away from the sun), which is physically correct for a midday sun. Orientation: The shadow

Medium 100%vs100%
A woman standing in front of a full-length mirror, her reflebfl/flux-2-klein-4bPASS
A woman standing in front of a full-length mirror, her reflection accurately showing the back of her outfit and reversed orientation

Evaluator (Qwen 3.6 27B)

The reflection in the mirror accurately shows the reversed orientation of the woman, displaying the back of her outfit as expected. The mirror’s reflection is symmetrical and undistorted, with correct proportions and details (e.g., the back pockets of the jeans, the t-shirt’s hem, and the position of her arms and feet). The shadow cast on the wall to the right of the woman is consistent with a light source coming from the front-left (as the shadow falls behind her and to the right), which matche

A cat sitting on a polished marble floor with its reflectionbfl/flux-2-klein-4bPASS
A cat sitting on a polished marble floor with its reflection visible below

Evaluator (Qwen 3.6 27B)

The reflection of the cat on the polished marble floor is mostly physically plausible. The reflection is inverted as expected (upside down), and its shape and proportions match the cat above it, though slightly softened and distorted by the floor’s surface and perspective. The cat’s eyes appear as expected — not perfectly mirrored (which would be physically impossible due to the nature of reflection and eye structure), but the reflection’s orientation and general form are correct. The direction

A vase of flowers on a glass table with a clear reflection vbfl/flux-2-klein-4bPASS
A vase of flowers on a glass table with a clear reflection visible on the table surface

Evaluator (Qwen 3.6 27B)

The reflection on the glass table is physically plausible. The vase and flowers are mirrored below with correct orientation and proportion, though slightly distorted by the glass’s curvature and angle. The reflection’s brightness and color match the real objects, with natural falloff and softness due to ambient lighting. Shadows are not strongly visible, but the glass’s transparency and the reflection’s clarity imply consistent lighting direction. No obvious artifacts or impossible distortions a

Hard 100%vs100%
A row of trees reflected in a calm lake at sunset, with the bfl/flux-2-klein-4bPASS
A row of trees reflected in a calm lake at sunset, with the reflection slightly darker and gently rippled

Evaluator (Qwen 3.6 27B)

The reflection in the image is physically plausible. The trees are mirrored across the water’s surface with correct orientation and approximate symmetry, consistent with a calm lake. The reflection is slightly darker than the actual trees, which is expected due to absorption and scattering in water. The gentle ripples distort the reflection slightly, which matches the “gently rippled” description. The color of the reflection accurately captures the warm sunset tones from the sky, with the darker

A city skyline reflected in a rain puddle on a street, with bfl/flux-2-klein-4bPASS
A city skyline reflected in a rain puddle on a street, with mild distortion and color shift

Evaluator (Qwen 3.6 27B)

The reflection in the puddle shows a city skyline with buildings inverted as expected in a water mirror, and the distortion is mild — caused by the puddle’s irregular surface and raindrops, which slightly warp the image. The color shift is subtle, with cooler tones dominating due to the overcast sky and wet surface, which is physically plausible. Shadows from buildings and cars are visible in the reflection and align with the direction of ambient light (likely from a high, diffused source). The

A mountain range reflected in a still alpine lake, with the bfl/flux-2-klein-4bPASS
A mountain range reflected in a still alpine lake, with the reflection inverted and subtly muted in color

Evaluator (Qwen 3.6 27B)

The reflection in the lake is a near-perfect, inverted mirror image of the mountain range, which is physically accurate for a still, calm body of water. The orientation is correctly inverted — the peaks appear upside down as they should in a reflection. The color is subtly muted, as expected due to the absorption and scattering of light in water, and the reflection retains the same lighting direction (warm tones on the sunlit faces of the mountains are mirrored accordingly). There is no distorti

Extreme 100%vs100%
A clear glass sphere sitting on a mirror surface near a poolbfl/flux-2-klein-4bPASS
A clear glass sphere sitting on a mirror surface near a pool of water, showing refraction through the sphere, reflection on the mirror, and water reflection simultaneously

Evaluator (Qwen 3.6 27B)

The image depicts a clear glass sphere resting on a wet, reflective surface near a pool of water. The sphere shows multiple optical phenomena: - **Refraction through the sphere**: The sphere acts as a fisheye lens, inverting and distorting the scene inside it — trees, sky, and structures are visible upside-down and warped, which is physically correct for refraction through a spherical lens. - **Reflection on the mirror surface**: The wet surface beneath the sphere creates a clear, symmetrical r

A chrome sculpture in a room with a glass floor and large wibfl/flux-2-klein-4bPASS
A chrome sculpture in a room with a glass floor and large windows, showing multiple reflections and environmental mapping on every surface

Evaluator (Qwen 3.6 27B)

The reflections and shadows in the image are physically plausible. - The chrome sculpture reflects the surrounding environment accurately: windows, buildings, and sky are mirrored with correct orientation and perspective. - The glass floor shows a clear, undistorted reflection of the sculpture, with slight distortion only due to the glass’s curvature and thickness — consistent with real-world optics. - Shadows are consistent with the direction of ambient light (likely from large windows), and t

A crystal wine glass on a polished marble counter next to a bfl/flux-2-klein-4bPASS
A crystal wine glass on a polished marble counter next to a window overlooking a lake — showing refraction in the glass, reflection on the marble, and the lake's own reflective surface

Evaluator (Qwen 3.6 27B)

The image shows a crystal wine glass on a polished marble countertop, with a window behind it overlooking a lake. The scene includes: - **Reflection on the marble**: The polished marble surface clearly reflects the glass and the window frame, with natural distortion and brightness consistent with a glossy surface. The reflection is oriented correctly relative to the glass’s position and the light source (indirect daylight from the window). - **Refraction in the glass**: The glass distorts the

World Knowledge58%vs83%
Easy 100%vs100%
The Eiffel Tower in Paris on a clear day, architecturally acbfl/flux-2-klein-4bPASS
The Eiffel Tower in Paris on a clear day, architecturally accurate with its iron lattice structure and three levels

Evaluator (Qwen 3.6 27B)

The depicted subject is clearly recognizable as the Eiffel Tower in Paris. The image shows its iconic iron lattice structure, which is architecturally accurate. The three levels are visible: the base with its arched supports, the second level with its platform and railings, and the top section leading to the antenna. The tower is shown against a clear blue sky, matching the prompt’s description. There are no significant architectural inaccuracies — the proportions, trusswork, and overall form ar

The Taj Mahal in Agra seen from the reflecting pool, with acbfl/flux-2-klein-4bPASS
The Taj Mahal in Agra seen from the reflecting pool, with accurate dome, minarets, and symmetry

Evaluator (Qwen 3.6 27B)

The depicted subject is the Taj Mahal in Agra, viewed from the reflecting pool, as requested. The image clearly shows the main dome, four minarets (two on each side), and the symmetrical layout of the structure, including the reflection in the pool. Architecturally, the dome is accurately rendered with its characteristic onion shape and finial, and the minarets are correctly proportioned and positioned. The symmetry of the building and its reflection is well preserved. The scene is recognizable

The Statue of Liberty on her island pedestal, with correct tbfl/flux-2-klein-4bPASS
The Statue of Liberty on her island pedestal, with correct torch, crown, tablet, and green patina

Evaluator (Qwen 3.6 27B)

The depicted subject is the Statue of Liberty, clearly recognizable and presented in a real-world setting on her island pedestal. The key elements from the prompt are all accurately represented: - Torch: Present and correctly positioned in her raised right hand. - Crown: The seven-pointed crown is clearly visible on her head. - Tablet: She is holding a tablet in her left hand, inscribed with the date “JULY IV MDCCLXXVI” (July 4, 1776). - Green patina: The statue’s surface exhibits the character

Medium 33%vs33%
A traditional Japanese shamisen instrument with accurate thrbfl/flux-2-klein-4bFAIL
A traditional Japanese shamisen instrument with accurate three-stringed design, wooden body, and bachi plectrum

Evaluator (Qwen 3.6 27B)

The depicted instrument is a shamisen, a traditional Japanese stringed instrument. It features: - A round wooden body (correct) - Three strings (correct) - A long neck with tuning pegs at the headstock (correct) - A bachi (plectrum) is not visible in this image — it is typically held separately by the player, so its absence does not invalidate the instrument’s identity. However, the headstock design — specifically the arrangement of the tuning pegs — is not typical of a traditional shamisen. Tra

A medieval astrolabe with accurate rete, tympan, and engravebfl/flux-2-klein-4bFAIL
A medieval astrolabe with accurate rete, tympan, and engraved degree markings on brass

Evaluator (Qwen 3.6 27B)

The depicted object is a medieval-style astrolabe, and it is recognizable as such. It features a central alidade (the movable rule with sights), a rete (the star map with radial lines), and a tympan (the graduated disk with degree markings). The brass construction is evident, and the engraved degree markings and zodiacal signs are clearly visible, consistent with historical astrolabes. However, there are notable inaccuracies: 1. The rete appears to have a simplified or stylized star pattern, lac

A Japanese kintsugi bowl — cracked ceramic repaired with visbfl/flux-2-klein-4bPASS
A Japanese kintsugi bowl — cracked ceramic repaired with visible gold lacquer lines

Evaluator (Qwen 3.6 27B)

The image depicts a ceramic bowl with visible cracks repaired using gold lacquer lines — a hallmark of kintsugi, the traditional Japanese art of repairing broken pottery with gold. The bowl’s form, material, and repair technique are consistent with authentic kintsugi practice. The bowl is made of ceramic with a matte, speckled glaze, and the gold lines follow the cracks precisely, highlighting the repair rather than hiding it — which is the core philosophy of kintsugi. The bowl’s shape is typica

Hard 0%vs100%
A detailed diagram of the water cycle showing evaporation frbfl/flux-2-klein-4bFAIL
A detailed diagram of the water cycle showing evaporation from the ocean, condensation forming clouds, precipitation as rain, and collection in rivers flowing back to the sea

Evaluator (Qwen 3.6 27B)

The image depicts a diagram of the water cycle, but it is not factually or architecturally accurate. The labels are misspelled or nonsensical (e.g., “Evpetetion,” “Condement,” “Praite,” “Lronak,” “Korniak,” “Prete Far,” “Coppert in Ybrneins,” “Tatbr Eraina,” “N16tambord”). These appear to be garbled or invented terms, not standard scientific terminology. The diagram attempts to show evaporation, condensation, precipitation, and collection, but the labeling fails to correctly identify these proce

A cross-section showing tectonic plates at a convergent bounbfl/flux-2-klein-4bFAIL
A cross-section showing tectonic plates at a convergent boundary with subduction, a trench, volcanic arc, and layered mantle

Evaluator (Qwen 3.6 27B)

The image depicts a cross-section of a convergent tectonic boundary with subduction, a trench, a volcanic arc, and layered mantle — all of which are the correct geological elements for such a setting. However, the labels are gibberish (e.g., “RAO’EPIRA,” “LOIDI COSH QUEINJI,” “RUDM EZ MANTEA,” “KOU I REBS IANCIAO”), which are not real geological terms. This indicates a failure in labeling accuracy, even though the visual structure of the diagram is conceptually correct. Since the prompt require

An anatomical illustration of the human heart showing four cbfl/flux-2-klein-4bFAIL
An anatomical illustration of the human heart showing four chambers, valves, aorta, pulmonary arteries, and veins with correct placement

Evaluator (Qwen 3.6 27B)

The image depicts a stylized anatomical illustration of the human heart, but the labels are completely nonsensical and appear to be random strings of letters (e.g., “Foatondiaon,” “Mekerotion,” “Guappacharics,” etc.). These are not real anatomical terms, and none correspond to the correct structures such as the four chambers, valves, aorta, pulmonary arteries, or veins. Even though the visual structure of the heart (shape, major vessels, chambers) is roughly recognizable, the labeling is entire

Extreme 100%vs100%
A Morpho menelaus butterfly with accurate iridescent blue dobfl/flux-2-klein-4bPASS
A Morpho menelaus butterfly with accurate iridescent blue dorsal wings, brown ventral wing pattern with eyespots, and correct body morphology

Evaluator (Qwen 3.6 27B)

The depicted subject is a Morpho menelaus butterfly, and it is recognizable as such. The dorsal (upper) wings display a vivid, iridescent blue — a hallmark of this species — with dark brown borders and small yellowish spots. The ventral (lower) wings are brown, featuring prominent eyespots with orange rings and black centers, which is accurate. The body morphology, including the segmented thorax, abdomen, and antennae, is correctly rendered. There are no significant factual or architectural ina

The Hagia Sophia interior showing the massive central dome, bfl/flux-2-klein-4bPASS
The Hagia Sophia interior showing the massive central dome, pendentives, semi-domes, and Islamic calligraphy medallions alongside Byzantine mosaics

Evaluator (Qwen 3.6 27B)

The image depicts the interior of the Hagia Sophia in Istanbul, which is factually and architecturally accurate for the specified elements: - The massive central dome is clearly visible, supported by pendentives — the curved transitional structures that allow the dome to rest on the square base — which are correctly rendered. - Semi-domes are visible on the sides of the central dome, as expected in Byzantine architecture. - Islamic calligraphy medallions (in Arabic script) are present on the wa

A specimen of bismuth crystal showing accurate iridescent oxbfl/flux-2-klein-4bPASS
A specimen of bismuth crystal showing accurate iridescent oxide layers, geometric hopper crystal formation, and characteristic staircase structure

Evaluator (Qwen 3.6 27B)

The depicted subject is a bismuth crystal specimen. It exhibits: - A clear geometric hopper crystal formation, with stepped, layered growth that tapers toward the top — characteristic of bismuth’s crystallization. - A staircase-like structure, where each level is composed of smaller, cube-like crystals stacked in a stepped pyramid, which is typical of bismuth’s growth pattern. - Iridescent oxide layers: The surface displays vivid, rainbow-like iridescence caused by thin oxide films on the cryst

Professional Studio 93%vs100%

Camera & Lighting92%vs100%
Easy 100%vs100%
A wide-angle shot of a modern living room interior showing ebfl/flux-2-klein-4bPASS
A wide-angle shot of a modern living room interior showing expansive space with slight barrel distortion at the edges

Evaluator (Gemma 4 26B)

The image presents a wide-angle shot of a modern living room, and the perspective clearly exhibits barrel distortion at the edges — most notably in the curved distortion of the floor, walls, and furniture near the periphery, which is characteristic of a fisheye or ultra-wide lens. The composition successfully conveys an expansive space, and the lighting appears even and ambient, consistent with a modern interior, though not dramatically stylized. The depth of field is shallow in the sense that t

A close-up macro shot of a single dewdrop on a rose petal, sbfl/flux-2-klein-4bPASS
A close-up macro shot of a single dewdrop on a rose petal, shallow depth of field

Evaluator (Gemma 4 26B)

The image presents a close-up macro shot of a single dewdrop resting on a rose petal. The dewdrop is sharply in focus, while the surrounding petal and background are softly blurred, indicating a shallow depth of field — a hallmark of macro photography. The perspective is intimate and centered on the dewdrop, fulfilling the “close-up” requirement. The lighting is soft and diffused, creating gentle reflections and highlights on the droplet’s surface without harsh shadows, which is appropriate for

An aerial bird's-eye view looking straight down at a trafficbfl/flux-2-klein-4bPASS
An aerial bird's-eye view looking straight down at a traffic roundabout with cars and trees

Evaluator (Gemma 4 26B)

The image presents a clear aerial bird’s-eye view looking straight down at a traffic roundabout, with cars circulating and trees surrounding the area. The perspective is top-down and centered on the roundabout, matching the requested viewpoint. The depth of field appears uniform across the scene, consistent with a high-altitude drone or satellite shot, and there are no noticeable blur or focus falloff. Lighting is even and natural, with consistent shadows cast by trees and vehicles, suggesting a

Medium 100%vs100%
A portrait lit with classic Rembrandt lighting — triangle ofbfl/flux-2-klein-4bPASS
A portrait lit with classic Rembrandt lighting — triangle of light on the shadowed cheek, moody atmosphere

Evaluator (Gemma 4 26B)

The image displays a portrait with a strong, directional light source coming from the side, casting a distinct triangular highlight across the cheek and nose — a hallmark of Rembrandt lighting. The shadowed side of the face is deep and moody, contributing to the dramatic, atmospheric tone requested. The depth of field is shallow, with the subject’s face in sharp focus while the background is softly blurred, enhancing the portrait’s intimacy and directing attention to the lighting pattern. The pe

A landscape photograph bathed in golden hour light — long wabfl/flux-2-klein-4bPASS
A landscape photograph bathed in golden hour light — long warm shadows, orange sky, soft glow

Evaluator (Gemma 4 26B)

The image depicts a landscape bathed in warm, golden-hour lighting, with an orange sky and soft glow — matching the prompt’s description. Long, directional shadows are visible across the shrubs and terrain, indicating low-angle sunlight. The perspective is wide and immersive, capturing depth from foreground vegetation to distant hills. The depth of field is shallow, with foreground elements slightly blurred while midground and background retain detail, creating a natural sense of distance. The l

A high-key portrait of a person against a bright white backgbfl/flux-2-klein-4bPASS
A high-key portrait of a person against a bright white background, minimal shadows, airy and luminous

Evaluator (Gemma 4 26B)

The image presents a high-key portrait with a bright white background, as specified. The lighting is soft and diffused, resulting in minimal shadows across the subject’s face and neck, which aligns with the “minimal shadows” requirement. The overall effect is airy and luminous, matching the described aesthetic. The perspective is a centered, close-up portrait, and the depth of field is shallow, keeping the subject’s face in focus while the background is uniformly bright and out of focus — consis

Hard 100%vs100%
A portrait shot at 85mm f/1.4 with creamy bokeh background, bfl/flux-2-klein-4bPASS
A portrait shot at 85mm f/1.4 with creamy bokeh background, backlit with warm rim light separating the subject from a blurred garden

Evaluator (Gemma 4 26B)

The image shows a portrait with a shallow depth of field, producing a creamy bokeh background that matches the description. The subject is backlit with a warm rim light that separates her hair and shoulders from the background, creating a soft, glowing outline — consistent with the prompt. The composition is a tight portrait, which is typical of an 85mm lens, and the lighting and background blur suggest a wide aperture (f/1.4 or similar), though the exact lens specification cannot be verified fr

A street photograph with shallow depth of field as if shot abfl/flux-2-klein-4bPASS
A street photograph with shallow depth of field as if shot at 50mm f/1.8, subject sharp with soft city lights in the background

Evaluator (Gemma 4 26B)

The image matches the described camera/lighting prompt well. - Perspective: The shot is taken from a street-level angle, looking down a city street, which fits the “street photograph” description. - Depth of field: The subject (the woman) is sharp and in focus, while the background city lights and buildings are softly blurred — a classic shallow depth of field effect. This matches “subject sharp with soft city lights in the background.” - Lighting pattern: The background features bokeh from stre

A food photograph shot from above at 35mm f/2.8 with naturalbfl/flux-2-klein-4bPASS
A food photograph shot from above at 35mm f/2.8 with natural window light from the left, slight vignette, warm color temperature

Evaluator (Gemma 4 26B)

The image is shot from a high angle, consistent with an overhead perspective, which matches the “shot from above” requirement. The depth of field appears shallow, with the food in focus and the background (window, table, cup) softly blurred — this is plausible for a 35mm f/2.8 lens, which typically produces a moderate to shallow depth of field depending on distance and composition. The lighting comes from the left, as indicated by the bright window on the left side of the frame and the correspon

Extreme 67%vs100%
A tilt-shift photograph of a city street making real-life lobfl/flux-2-klein-4bPASS
A tilt-shift photograph of a city street making real-life look like a miniature model — selective focus strip, saturated colors

Evaluator (Gemma 4 26B)

The image successfully emulates a tilt-shift photograph of a city street with a miniature model effect. The perspective is a low-angle, straight-on view down the street, which enhances the illusion of scale. The depth of field is narrow and selective, with only a thin central strip in focus — the cars and crosswalk — while the foreground and background are blurred, mimicking the classic tilt-shift look. The lighting appears to be bright daylight with saturated colors, particularly in the buildin

A cinematic frame with blue and orange anamorphic lens flarebfl/flux-2-klein-4bFAIL
A cinematic frame with blue and orange anamorphic lens flare streaking horizontally across a dark moody scene

Evaluator (Gemma 4 26B)

The image displays horizontal lens flare streaks — one prominent blue and one orange — across a dark, moody background, which aligns with the prompt’s description of “blue and orange anamorphic lens flare streaking horizontally across a dark moody scene.” However, there is no discernible perspective or depth of field, as the image is essentially a flat abstract composition of light streaks against a dark background with no foreground, midground, or background elements to establish spatial depth

A portrait with dramatic split lighting — half the face brigbfl/flux-2-klein-4bPASS
A portrait with dramatic split lighting — half the face brightly lit, the other half in deep shadow, high contrast chiaroscuro

Evaluator (Gemma 4 26B)

The image displays a portrait with a strong, dramatic split lighting effect: one side of the face is brightly illuminated, revealing skin texture and facial features, while the other side is cast in deep shadow, creating high contrast. This chiaroscuro lighting is clearly intentional and matches the prompt. The perspective is a tight close-up, focusing on the face, which is appropriate for a portrait. The depth of field is shallow, with the subject sharply in focus against a completely black, fe

Color Precision92%vs100%
Easy 100%vs100%
A bright red sports car parked on a gray asphalt road — the bfl/flux-2-klein-4bPASS
A bright red sports car parked on a gray asphalt road — the red is vivid and unambiguous

Evaluator (Gemma 4 26B)

The car in the image is a bright, vivid red — matching the description “vivid and unambiguous” red. The road beneath it is gray asphalt, as specified. While the prompt did not specify a hex value, the visual color of the car is clearly and unmistakably red, and the asphalt is gray. No color deviations from the prompt are evident. VERDICT: PASS

A woman wearing a royal blue evening dress against a neutralbfl/flux-2-klein-4bPASS
A woman wearing a royal blue evening dress against a neutral background

Evaluator (Gemma 4 26B)

The image shows a woman wearing a long evening dress that is a deep, rich royal blue, matching the description “royal blue.” The background is a neutral, light gray, which aligns with “neutral background.” There are no other colors present that contradict the prompt. VERDICT: PASS

A field of pure sunflower yellow flowers under a clear blue bfl/flux-2-klein-4bPASS
A field of pure sunflower yellow flowers under a clear blue sky

Evaluator (Gemma 4 26B)

The image displays a field of sunflowers under a clear blue sky, matching the prompt’s description. The sunflowers are a vibrant, bright yellow — consistent with “pure sunflower yellow” — and the sky is a clear, vivid blue. While no specific hex values were provided in the prompt, the colors visually align with the named colors as expected in a natural scene. The yellow is saturated and true to the sunflower hue, and the blue is a clean, cloudless sky blue. VERDICT: PASS

Medium 100%vs100%
A bedroom wall painted in #FF6B35 (burnt orange) with white bfl/flux-2-klein-4bPASS
A bedroom wall painted in #FF6B35 (burnt orange) with white furniture and natural daylight

Evaluator (Gemma 4 26B)

The wall in the image is painted in a warm, burnt orange hue that closely matches the hex value #FF6B35. The furniture — including the bed frame, nightstands, and lampshade — is white, as specified. The lighting appears to be natural daylight, consistent with the prompt. While the bedding and rug have beige/cream tones, these are neutral and complementary to the burnt orange wall and white furniture, and do not contradict the prompt’s requirements. All specified elements — the exact wall color (

A forest scene where the foliage is precisely #2D5F2D (dark bfl/flux-2-klein-4bPASS
A forest scene where the foliage is precisely #2D5F2D (dark muted green) with brown tree trunks

Evaluator (Gemma 4 26B)

The foliage in the image is a dark, muted green, closely matching the hex color #2D5F2D. The tree trunks are a consistent brown, as specified. The colors are accurate to the prompt. VERDICT: PASS

A room with walls in #4A90D9 (medium blue) with warm wooden bfl/flux-2-klein-4bPASS
A room with walls in #4A90D9 (medium blue) with warm wooden floor and white trim

Evaluator (Gemma 4 26B)

The walls in the image are a medium blue that closely matches the hex color #4A90D9. The floor is a warm wooden tone, and the trim (baseboards and crown molding) is white, as specified in the prompt. VERDICT: PASS

Hard 67%vs100%
A minimalist room with walls in #F5E6CC (warm cream), a sofabfl/flux-2-klein-4bPASS
A minimalist room with walls in #F5E6CC (warm cream), a sofa in #2C3E50 (dark navy), and accent pillows in #E74C3C (bright red)

Evaluator (Gemma 4 26B)

The walls in the image are a warm cream color that closely matches #F5E6CC. The sofa is a dark navy that closely matches #2C3E50. The accent pillows are a bright red that closely matches #E74C3C. All specified colors are accurately represented in the image. VERDICT: PASS

A product flat-lay on a #1A1A2E (deep dark blue) background,bfl/flux-2-klein-4bFAIL
A product flat-lay on a #1A1A2E (deep dark blue) background, with a #F0C27F (soft gold) watch and #FFFFFF (pure white) earbuds

Evaluator (Gemma 4 26B)

The prompt specified: - Background: #1A1A2E (deep dark blue) — ✅ The background is a solid, deep dark blue, matching the hex value closely. - Watch: #F0C27F (soft gold) — ❌ The watch face is a soft gold, which matches. However, the watch case and buckle are also gold-toned, and the strap is a light pinkish-beige, not gold. The strap is not #F0C27F. The prompt says “a #F0C27F watch,” which likely refers to the watch’s primary color (face), which is correct. But the strap is not gold — it’s a pale

A graphic design composition with background #FAFAFA (near-wbfl/flux-2-klein-4bPASS
A graphic design composition with background #FAFAFA (near-white), main shape #FF4500 (orange-red), and accent lines #333333 (dark charcoal)

Evaluator (Gemma 4 26B)

The background is #FAFAFA (near-white) — matches. The main shape is #FF4500 (orange-red) — matches. The accent lines are #333333 (dark charcoal) — matches. VERDICT: PASS

Extreme 100%vs100%
A smooth gradient on a wall going from #FF0000 (red) on the bfl/flux-2-klein-4bPASS
A smooth gradient on a wall going from #FF0000 (red) on the left to #0000FF (blue) on the right, transitioning evenly through purple

Evaluator (Gemma 4 26B)

The image displays a smooth horizontal gradient transitioning from red on the left to blue on the right, with a clear purple band in the center. The left edge is a vivid red (#FF0000), the right edge is a vivid blue (#0000FF), and the transition through purple is even and consistent with the prompt. The colors match the specified hex values exactly at the endpoints and transition correctly through the intermediate purple hues. VERDICT: PASS

A sunset sky gradient on a flat surface from #FF6B35 (orangebfl/flux-2-klein-4bPASS
A sunset sky gradient on a flat surface from #FF6B35 (orange) at the bottom to #1B1464 (deep indigo) at the top

Evaluator (Gemma 4 26B)

The image shows a smooth vertical gradient transitioning from a bright orange at the bottom to a deep indigo at the top, which visually matches the specified hex values: #FF6B35 (orange) at the bottom and #1B1464 (deep indigo) at the top. The gradient is clean and follows the exact color progression described. VERDICT: PASS

A horizontal banner with a gradient from #00C9FF (cyan) on tbfl/flux-2-klein-4bPASS
A horizontal banner with a gradient from #00C9FF (cyan) on the left to #92FE9D (mint green) on the right

Evaluator (Gemma 4 26B)

The image displays a horizontal banner with a smooth gradient transitioning from a bright cyan on the left to a mint green on the right. The leftmost color matches #00C9FF (a vivid cyan), and the rightmost color matches #92FE9D (a light mint green). The gradient is continuous and accurately follows the specified hex values without noticeable deviation or color mismatch. VERDICT: PASS

Photorealism100%vs100%
Easy 100%vs100%
A single red sneaker on a clean white background with soft sbfl/flux-2-klein-4bPASS
A single red sneaker on a clean white background with soft studio lighting and gentle shadow — product photography style

Evaluator (Gemma 4 26B)

The image presents a clean, studio-style product shot of a red sneaker on a white background with soft lighting and a gentle shadow — exactly matching the prompt. The textures of the fabric and sole appear natural, the stitching and laces are consistent and detailed, and there are no visible AI artifacts such as warped geometry, inconsistent shadows, or unnatural blending. The lighting is even and professional, typical of real product photography. VERDICT: PASS

A clear glass perfume bottle on a reflective black surface wbfl/flux-2-klein-4bPASS
A clear glass perfume bottle on a reflective black surface with studio lighting, no background distractions

Evaluator (Gemma 4 26B)

The image presents a clear glass perfume bottle on a reflective black surface, with studio lighting and no background distractions — exactly matching the prompt. The reflections are crisp and physically accurate, the glass shows realistic refraction and highlights, and the metallic nozzle has natural specular highlights. There are no obvious AI artifacts: no warped geometry, inconsistent reflections, unnatural textures, or compositional glitches. The lighting and shadows are consistent with a co

A ripe green apple on a seamless white background with soft bfl/flux-2-klein-4bPASS
A ripe green apple on a seamless white background with soft side lighting and subtle shadow — catalog style

Evaluator (Gemma 4 26B)

The image presents a clean, studio-style product shot of a green apple on a seamless white background, with soft side lighting and a subtle shadow — matching the prompt’s description. The apple’s surface has natural-looking highlights and speckles, and the stem appears organic and detailed. There are no obvious AI artifacts such as warped geometry, inconsistent textures, or unnatural blending. The lighting and shadow are physically plausible, and the composition feels authentic. VERDICT: PASS

Graphical design 42%vs92%

Data Visualisation0%vs100%
Easy 0%vs100%
A bar chart showing monthly sales for January through June wbfl/flux-2-klein-4bFAIL
A bar chart showing monthly sales for January through June with values 12K, 18K, 15K, 22K, 19K, 25K — labeled axes, color-coded bars, and a title 'Monthly Sales 2025'

Evaluator (Qwen 3 VL)

The chart fails to meet the requested specifications in multiple critical areas: 1. **Title Error**: The title reads “Motthly Sales 2025” — it has a typo (“Motthly” instead of “Monthly”). 2. **X-Axis Labels Incorrect**: The x-axis labels are not months. Instead of “January, February, March, April, May, June”, it shows “January”, “15K”, “20K”, “25K” — which is invalid. The months are missing for February through June, and the numerical values are incorrectly placed on the x-axis. 3. **Y-Axis Labe

A pie chart showing market share: Company A 35%, Company B 2bfl/flux-2-klein-4bFAIL
A pie chart showing market share: Company A 35%, Company B 25%, Company C 20%, Others 20% — each slice a different color with labels and percentages

Evaluator (Qwen 3 VL)

The requested pie chart should show: - Company A: 35% - Company B: 25% - Company C: 20% - Others: 20% In the image: - One slice is labeled “Company A 35%” — correct. - One slice is labeled “Company B 25%” — correct. - One slice is labeled “Company C 20%” — correct. - Two slices are labeled “Others 20%” — incorrect (duplicate). - One slice is labeled “Others 20%” — correct. - One slice is labeled “Company 25%” — incorrect (mislabeling). - One slice is labeled “Mnnpny 3%” — incorrect (typo and wro

A simple line graph showing daily website visitors over one bfl/flux-2-klein-4bFAIL
A simple line graph showing daily website visitors over one week (Mon-Sun): 1200, 1500, 1100, 1800, 2000, 1700, 1400 — with grid lines, axis labels, and title 'Weekly Traffic'

Evaluator (Qwen 3 VL)

The requested visualization is a line graph showing daily website visitors over one week (Mon-Sun) with specific values: 1200, 1500, 1100, 1800, 2000, 1700, 1400. However, the generated image has several critical errors: 1. The x-axis labels are not days of the week (Mon-Sun) — they are mislabeled with the data values themselves (0, 1500, 1100, 2100, 1700, 1700, 100, 1400), which is incorrect and misleading. 2. The y-axis label is garbled and unreadable (“MexhACnSrela”). 3. The x-axis label is “

Layout & Design11%vs89%
Easy 33%vs100%
A movie night poster with a large popcorn image in the centebfl/flux-2-klein-4bPASS
A movie night poster with a large popcorn image in the center, the title 'MOVIE NIGHT' in bold at the top, and the tagline 'Every Friday at 8 PM' at the bottom

Evaluator (Qwen 3 VL)

The image features a large, centrally placed popcorn bucket, which fulfills the “large popcorn image in the center” requirement. The title “MOVIE NIGHT” is prominently displayed at the top in bold, stylized lettering, matching the prompt. The tagline “Every Friday at 8 PM” is clearly visible at the bottom in readable white text. The visual hierarchy is appropriate: the title commands attention at the top, the popcorn is the focal point in the center, and the tagline provides supporting informati

An event poster featuring a guitar illustration, the headlinbfl/flux-2-klein-4bFAIL
An event poster featuring a guitar illustration, the headline 'LIVE MUSIC FEST' in large text, and 'Tickets available now' as a tagline below

Evaluator (Qwen 3 VL)

The poster includes a guitar illustration, the headline “LIWE MUSIC FEST” (note: “LIWE” appears to be a typo for “LIVE”), and the tagline “Tickets available now” at the bottom. The visual hierarchy is generally correct: the headline is large and centered at the top, the guitar is a central visual element, and the tagline is placed below in smaller text. However, the headline contains a clear error — “LIWE” instead of “LIVE” — which misrepresents the intended message and may confuse viewers. Sinc

A book club poster with an open book illustration, the titlebfl/flux-2-klein-4bFAIL
A book club poster with an open book illustration, the title 'JOIN THE BOOK CLUB' prominently displayed, and 'Meets every Wednesday' as the tagline

Evaluator (Qwen 3 VL)

The image includes an open book illustration, which is present and centered below the main title. The title “JOIN THE BOCK CLUB” is prominently displayed at the top in large, bold text — though “BOCK” is misspelled (should be “BOOK”). The tagline “Mets every Wedessaay” is present at the bottom, but it is misspelled (“Mets” instead of “Meets,” “Wedessaay” instead of “Wednesday”). The visual hierarchy is correct: title at top, book in center, tagline at bottom. However, the text contains critical

Medium 0%vs67%
A landing page design with a hero header image at the top, abfl/flux-2-klein-4bFAIL
A landing page design with a hero header image at the top, a headline and subtitle in the middle, a 'Sign Up' call-to-action button, and a footer with three social media icons

Evaluator (Qwen 3 VL)

The image shows a landing page layout that includes: - A hero header image at the top (with a blurred photo of people working). - A headline and subtitle in the middle (though the text is gibberish and unreadable). - A “Sign Up” call-to-action button (present and centered). - A footer with three social media icons (Twitter, Twitter, and LinkedIn — though the second icon is misaligned and the third is slightly distorted). However, the headline and subtitle are not readable — they appear to be pla

A newsletter layout with a banner image header, a two-columnbfl/flux-2-klein-4bFAIL
A newsletter layout with a banner image header, a two-column body section with text on the left and an image on the right, a highlighted quote block, and a footer with unsubscribe link

Evaluator (Qwen 3 VL)

The image shows a newsletter layout with the following elements: - A banner image header (top portion with a smiling woman). - A two-column body section: text on the left, image on the right. - A highlighted quote block (yellow box in the center). - A footer with an “Unsubscribe” link (blue bar at bottom with button). However, the text in all sections is gibberish — it appears to be placeholder or corrupted text, not readable content. The prompt requires text to be readable and correctly placed,

A restaurant menu layout with a logo header, three category bfl/flux-2-klein-4bFAIL
A restaurant menu layout with a logo header, three category sections (appetizers, mains, desserts) each with items and prices, and a footer with address and hours

Evaluator (Qwen 3 VL)

The image displays a restaurant menu with a logo header at the top, three category sections (Appetizers, Mains, Desserts), and a footer with contact information. The visual hierarchy is generally clear: the logo is prominent at the top, categories are clearly separated with headers, and the footer is distinct at the bottom. However, all text — including item names, prices, and footer information — is gibberish or nonsensical, making it unreadable and functionally useless for a real menu. The pro

Hard 0%vs100%
A magazine double-page spread with a left page containing a bfl/flux-2-klein-4bFAIL
A magazine double-page spread with a left page containing a large hero photo, and a right page with a two-column article, a pull quote, a sidebar with author bio, and a page number

Evaluator (Qwen 3 VL)

The image shows a magazine double-page spread. The left page contains a large hero photo of a woman, which matches the prompt. The right page contains a two-column article layout, a pull quote (“Torn. ahi Jo uia.Is chit gey.”) positioned beneath the headline, and a sidebar with an author bio (a small inset image and text block at the bottom left of the right column). The page number “8” is visible in the bottom right corner. However, all text is rendered in gibberish — it appears to be placehold

A mobile app UI mockup showing a dashboard with a top navigabfl/flux-2-klein-4bFAIL
A mobile app UI mockup showing a dashboard with a top navigation bar, a search field, a 2x2 grid of feature cards with icons, a list of recent activity items, and a bottom tab bar

Evaluator (Qwen 3 VL)

The image shows a mobile app UI mockup that attempts to follow the described layout, but several critical elements are missing or incorrectly rendered: 1. **Top Navigation Bar**: Present — includes a profile picture, username “Reseci”, and icons for search and more options. ✅ 2. **Search Field**: Present — appears as “Sesacts” with a magnifying glass icon. However, it’s positioned below the top bar, not integrated into it as a typical search field. ❌ (Misplaced) 3. **2x2 Grid of Feature Cards**:

A fashion magazine spread with a full-bleed left-page photo,bfl/flux-2-klein-4bFAIL
A fashion magazine spread with a full-bleed left-page photo, right page with three-column layout, headline, body copy, a smaller inset image, and styled page numbers

Evaluator (Qwen 3 VL)

The image depicts a magazine spread with a full-bleed photo on the left page and a three-column layout on the right page, which matches the prompt’s description. However, all text is gibberish — it appears to be placeholder or randomly generated text, not readable or meaningful content. The headline, body copy, and inset image are visually present, but the text is not legible or correctly placed in terms of content. The page numbers (18 and 20) are visible at the bottom corners, but the text con

Style Diversity75%vs92%
Easy 67%vs100%
A golden retriever sitting in a garden, painted in the stylebfl/flux-2-klein-4bPASS
A golden retriever sitting in a garden, painted in the style of an oil painting with visible brushstrokes and rich texture

Evaluator (Qwen 3 VL)

The image depicts a golden retriever sitting in a garden, surrounded by colorful flowers and foliage. The visual style strongly resembles an oil painting, with visible brushstrokes—especially in the background foliage, sky, and flower petals—and rich, textured surfaces that convey depth and materiality. The fur of the dog has a painterly quality, with layered strokes and blending that mimic traditional oil techniques. The lighting and color palette are consistent with impressionistic or realisti

A golden retriever sitting in a garden, rendered in Japanesebfl/flux-2-klein-4bFAIL
A golden retriever sitting in a garden, rendered in Japanese anime style with flat colors and large expressive eyes

Evaluator (Qwen 3 VL)

The image depicts a golden retriever sitting in a garden, which matches the subject matter of the prompt. The style is somewhat anime-influenced, with large, expressive eyes and simplified forms. However, the rendering lacks the hallmark flat colors and minimal shading typical of Japanese anime style — instead, it uses soft gradients and some shading to suggest form and depth, especially in the fur and background. The background is also more detailed and painterly than typical anime, with blende

A golden retriever sitting in a garden, rendered in pixel arbfl/flux-2-klein-4bPASS
A golden retriever sitting in a garden, rendered in pixel art style with a 32x32 grid and limited color palette

Evaluator (Qwen 3 VL)

The image depicts a golden retriever sitting in a garden, rendered in a pixel art style. The visual characteristics of pixel art — such as blocky, low-resolution blocks of color, visible pixelation, and simplified forms — are clearly present. The scene uses a limited color palette, with distinct blocks of color for the dog, grass, flowers, trees, and sky, consistent with the 32x32 grid aesthetic implied by the prompt. While the exact 32x32 grid resolution cannot be verified visually without a pi

Medium 100%vs100%
A city street scene in the style of Art Deco — geometric shabfl/flux-2-klein-4bPASS
A city street scene in the style of Art Deco — geometric shapes, gold and black palette, ornamental symmetry

Evaluator (Qwen 3 VL)

The image clearly embodies the requested Art Deco style. It features: - **Geometric shapes**: Strong use of straight lines, angular forms, and stylized patterns in the buildings, street layout, and decorative borders. - **Gold and black palette**: Dominated by a high-contrast black-and-gold color scheme, with gold accents outlining architectural elements and creating visual rhythm. - **Ornamental symmetry**: The composition is highly symmetrical, with buildings mirroring each other down the cent

A woman with a parasol in the style of Impressionism — soft bfl/flux-2-klein-4bPASS
A woman with a parasol in the style of Impressionism — soft brushstrokes, dappled light, pastel palette

Evaluator (Qwen 3 VL)

The image depicts a woman holding a parasol in a garden setting. The visual style is painterly, with visible brushstroke textures, a soft and luminous quality, and a pastel color palette dominated by pinks, yellows, and light blues — all consistent with Impressionism. The dappled light filtering through foliage and the focus on capturing the fleeting effects of light and atmosphere are also characteristic of the style. While the rendering is somewhat polished and idealized (perhaps slightly more

A portrait of a man in Pop Art style — bold outlines, Ben-Dabfl/flux-2-klein-4bPASS
A portrait of a man in Pop Art style — bold outlines, Ben-Day dots, flat vivid primary colors

Evaluator (Qwen 3 VL)

The image clearly embodies the requested Pop Art style. It features: - **Bold outlines**: Strong, clean black lines define the facial features, hair, clothing, and background shapes. - **Ben-Day dots**: The entire image, including skin tones and clothing, is rendered with a halftone dot pattern — a hallmark of Pop Art and comic book aesthetics. - **Flat vivid primary colors**: The palette uses bright, saturated primary colors (yellow, blue, purple) for the background and vivid magenta/pink for t

Hard 33%vs67%
A forest rendered in photorealistic watercolor — the detail bfl/flux-2-klein-4bFAIL
A forest rendered in photorealistic watercolor — the detail and lighting of a photo combined with visible watercolor bleeds and paper texture

Evaluator (Qwen 3 VL)

The image depicts a forest scene rendered in a watercolor style, with visible paper texture and soft, blended color transitions characteristic of watercolor painting. However, the prompt requested a “photorealistic watercolor” style — meaning it should combine the high detail and lighting fidelity of a photograph with the watercolor medium. While the image has watercolor qualities (bleeds, texture), it lacks photorealistic detail: the trees, foliage, and ground are stylized and simplified, not r

A woman in a futuristic city rendered in cyberpunk Art Nouvebfl/flux-2-klein-4bPASS
A woman in a futuristic city rendered in cyberpunk Art Nouveau — neon colors and tech elements combined with flowing organic lines and floral borders

Evaluator (Qwen 3 VL)

The image successfully embodies the requested "cyberpunk Art Nouveau" style. Key visual characteristics are clearly present: - **Cyberpunk elements**: Neon signage, futuristic architecture, glowing tech interfaces, a rain-slicked urban environment, and a woman in a high-tech outfit. - **Art Nouveau elements**: The ornate, flowing organic lines framing the image, floral motifs (lotus-like flowers), curvilinear borders, and decorative swirls. - **Color palette**: Dominated by neon pinks, purples,

A medieval castle depicted in Japanese ukiyo-e woodblock pribfl/flux-2-klein-4bFAIL
A medieval castle depicted in Japanese ukiyo-e woodblock print style combined with photorealistic sky and lighting

Evaluator (Qwen 3 VL)

The image successfully incorporates the key visual characteristics of Japanese ukiyo-e woodblock prints: flat areas of color, bold outlines, stylized clouds, vertical and horizontal composition, and the inclusion of Japanese text blocks and seals in the corners. The castle’s architecture, while medieval in form, is rendered with the flat perspective and decorative elements typical of ukiyo-e. However, the prompt specifically requested “photorealistic sky and lighting.” The sky in the image is st

Extreme 100%vs100%
An interior scene rendered in soft brutalism — raw concrete bfl/flux-2-klein-4bPASS
An interior scene rendered in soft brutalism — raw concrete textures combined with rounded pastel furniture and warm diffused light

Evaluator (Qwen 3 VL)

The image clearly depicts an interior scene with raw, exposed concrete walls, ceiling, and floor — a hallmark of brutalism. The furniture is distinctly rounded and upholstered in soft pastel tones (pink, mint, light blue), contrasting with the starkness of the concrete. Warm, diffused light enters through a window, casting gentle shadows and enhancing the cozy, muted atmosphere. All key elements of the requested style — raw concrete textures, rounded pastel furniture, and warm diffused lighting

A still life painted in anxious minimalism — sparse compositbfl/flux-2-klein-4bPASS
A still life painted in anxious minimalism — sparse composition with uncomfortable negative space, muted colors, and subtle visual tension

Evaluator (Qwen 3 VL)

The image presents a minimalist still life featuring a single geometric form — a cube-like object — set against a muted, monochromatic background. The composition is sparse, with the object occupying only a portion of the frame, leaving large areas of negative space. The color palette is subdued, consisting of desaturated grays and soft earth tones, which aligns with the “muted colors” requirement. The brushwork is visible and textured, suggesting a painterly, tactile quality. The visual tension

A cityscape rendered in nostalgic futurism — retro-futuristibfl/flux-2-klein-4bPASS
A cityscape rendered in nostalgic futurism — retro-futuristic 1960s space-age aesthetics with a melancholic warm-toned patina

Evaluator (Qwen 3 VL)

The image successfully embodies the requested “nostalgic futurism — retro-futuristic 1960s space-age aesthetics with a melancholic warm-toned patina.” Key visual characteristics are clearly present: - **Retro-futurism / 1960s Space-Age Aesthetics**: The architecture features sleek, stylized skyscrapers with pointed spires, domed structures, and futuristic signage (e.g., the “ARANDO” neon sign), evoking mid-century visions of the future. The overall design language is reminiscent of 1960s sci-fi