ImageBench

vs

192 evaluations across 6 categories

68%vs84%
Pass Rate
Text Rendering67%93%Spatial Reasoning67%79%Human realism74%91%Truthfulness59%82%Professional Studio70%78%Graphical design71%92%Latency32%22%
The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink
Text RenderingTypography StyleEasyfal/ideogram/v3

Prompt: The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink

Text Rendering 67%vs93%

Typography StyleClick to expand33%vs100%
Easy 33%vs100%
The word 'CHAPTER ONE' typed on aged paper with a vintage tyfal/ideogram/v3FAIL
The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink

Evaluator (Qwen 3 VL)

The text “Chapter one” appears in a bold, slightly uneven serif font on aged paper, resembling typewriter output. However, the ink lacks the characteristic slight smudging or irregular pressure of a vintage typewriter. Spelling is correct. The medium’s authenticity is compromised by the overly clean, digitally rendered ink strokes. VERDICT: FAIL

The phrase 'TODAY'S SPECIAL' written in white chalk on a darfal/ideogram/v3FAIL
The phrase 'TODAY'S SPECIAL' written in white chalk on a dark green chalkboard with chalk dust

Evaluator (Qwen 3 VL)

Text is rendered in a stylized chalk-like font on a dark board, with visible chalk dust on the frame. However, the word “TODAY’S” is misspelled as “TODDAPY” in the banner above. The core medium (chalkboard style) is convincing, but the spelling error disqualifies it. VERDICT: FAIL

The word 'OPEN' displayed on a red LED sign with glowing pixfal/ideogram/v3PASS
The word 'OPEN' displayed on a red LED sign with glowing pixel-style letters

Evaluator (Qwen 3 VL)

The word "OPEN" is rendered in glowing red pixel-style letters on a digital LED sign, matching the described medium. The spelling is correct. The pixelated, blocky font and bright red illumination convincingly simulate an LED display. Reflections on wet pavement enhance realism. VERDICT: PASS

Writing accuracyClick to expand75%vs92%
Easy 100%vs100%
A single large word 'HELLO' centered on a white backgroundfal/ideogram/v3PASS
A single large word 'HELLO' centered on a white background

Evaluator (Qwen 3 VL)

The word "HELLO" is fully spelled correctly and is clearly readable despite stylized design elements. All letters are intact and legible. No misspellings or missing parts. The text is centered on a light background as intended. VERDICT: PASS

The word 'ADVENTURE' in big bold letters centered on a plainfal/ideogram/v3PASS
The word 'ADVENTURE' in big bold letters centered on a plain cream background

Evaluator (Qwen 3 VL)

The word "ADVENTURE" is spelled correctly and fully readable, centered prominently. The background is cream-colored, though stylized with texture and a hot air balloon illustration behind the text. No misspellings or illegible parts exist. All text elements match the intended specification. VERDICT: PASS

The word 'SERENITY' written large and centered on a solid lifal/ideogram/v3PASS
The word 'SERENITY' written large and centered on a solid light blue background

Evaluator (Qwen 3 VL)

The word "SERENITY" is spelled correctly and fully readable, centered on a light blue background with a radiant effect. No misspellings or illegible parts are present. The text meets the specified criteria exactly. VERDICT: PASS

Medium 100%vs100%
The sentence 'The quick brown fox jumps high' in clean blackfal/ideogram/v3PASS
The sentence 'The quick brown fox jumps high' in clean black text on a white background

Evaluator (Qwen 3 VL)

All words are spelled correctly and fully readable: "THE", "QUICK", "BROWN", "FOX", "JUMPS", "HIGH". Text is in clean black on a light beige background, matching the intent. No misspellings or illegible parts. VERDICT: PASS

'Every sunset brings a new dawn ahead' written neatly on a pfal/ideogram/v3PASS
'Every sunset brings a new dawn ahead' written neatly on a pale yellow background

Evaluator (Qwen 3 VL)

The text "Every sunset brings a new dawn ahead" is fully legible and spelled correctly. All words match the intended phrase exactly. No misspellings or missing words are present. The script is clear and readable against the pale yellow background. VERDICT: PASS

'Stars shine brightest in the dark night' displayed on a clefal/ideogram/v3PASS
'Stars shine brightest in the dark night' displayed on a clean off-white background

Evaluator (Qwen 3 VL)

The text "Stars shine brightest in the dark night" is fully legible and spelled correctly. All words match the intended phrase exactly. The background is dark blue, not off-white, but the text is clear and complete. No misspellings or missing words are present. VERDICT: PASS

Hard 100%vs100%
A poster with headline 'SUMMER SALE' in large bold text, bodfal/ideogram/v3PASS
A poster with headline 'SUMMER SALE' in large bold text, body text 'Up to 50% off all items this weekend only' in medium size, and fine print 'Terms and conditions apply. See store for details.' at the bottom in small text

Evaluator (Qwen 3 VL)

All text is spelled correctly and fully readable. Headline: "SUMMER SALE". Body: "UP TO 50% OFF ALL ITEMS THIS WEEKEND ONLY". Fine print: "TERMS AND CONDITIONS APPLY. SEE STORE FOR DETAILS." No misspellings or illegible parts detected. VERDICT: PASS

A flyer with large title 'GRAND OPENING', subtitle 'Join us fal/ideogram/v3PASS
A flyer with large title 'GRAND OPENING', subtitle 'Join us for food, music, and fun' in medium text, and small disclaimer 'RSVP required. Limited seating available.' at the bottom

Evaluator (Qwen 3 VL)

All text is spelled correctly and fully readable: "GRAND OPENING", "JOIN US FOR FOOD, MUSIC, AND FUN'", "RSVP REQUIRED. LIMITED SEATING AVAILABLE.". No misspellings or illegible parts detected. Text matches intended content exactly. VERDICT: PASS

An event banner with headline 'ANNUAL GALA 2025' at the top fal/ideogram/v3PASS
An event banner with headline 'ANNUAL GALA 2025' at the top in large serif font, description 'An evening of fine dining and live jazz' in medium text, and footnote 'Black tie optional. Doors open at 6 PM.' in small text

Evaluator (Qwen 3 VL)

All text matches the intended content exactly: “Annual Gala 2025”, “AN EVENING OF FINE DINING AND LIVE JAZZ”, and “BLACK TIE OPTIONAL. DOORS OPEN AT 6 PM.”. Font sizes and styles are appropriate. No misspellings, missing words, or illegible text detected. Text is fully readable and accurate. VERDICT: PASS

Extreme 0%vs67%
A vintage movie poster for 'THE LAST VOYAGE' featuring a tagfal/ideogram/v3FAIL
A vintage movie poster for 'THE LAST VOYAGE' featuring a tagline 'One ship. One chance. No turning back.' with credits block reading 'Directed by James Harlow. Starring Elena Voss, Marcus Chen, and Sofia Reyes. Music by Thomas Grant. A Paramount Pict

Evaluator (Qwen 3 VL)

"SHIPP" is misspelled (should be "SHIP"). "MUSIC" is misspelled as "MISE GHOUDI". "PICTURES" is misspelled as "PICTFRES". Several names are misspelled or distorted. Text is not fully readable or accurate per the intended prompt. VERDICT: FAIL

A magazine cover titled 'NATIONAL EXPLORER' with cover linesfal/ideogram/v3FAIL
A magazine cover titled 'NATIONAL EXPLORER' with cover lines 'The Hidden Temples of Cambodia — page 34', 'Interview: Dr. Maria Santos on Climate Action', 'Top 10 Trails You Must Hike This Year', and a barcode area with 'Issue 247 | March 2025 | $6.99

Evaluator (Qwen 3 VL)

Text errors: “HONIRE OF CAOCPANLUES YORU TTER” (garbled, not “The Hidden Temples of Cambodia”). “ISSUE 27” (not 247). “$6.99” (not $6.99). “DR. MARIA SANTOS” is correct. “TOP 10 TRAILS YOU MUST HIKE THIS YEAR” is correct. “page 34” is correct. Multiple critical errors in cover lines and issue details. VERDICT: FAIL

A product label for 'GOLDEN HARVEST ORGANIC HONEY' with subtfal/ideogram/v3FAIL
A product label for 'GOLDEN HARVEST ORGANIC HONEY' with subtitle 'Raw & Unfiltered', weight '12 oz (340g)', description 'Pure wildflower honey harvested from sustainable apiaries in the Pacific Northwest. No additives or preservatives.', and small te

Evaluator (Qwen 3 VL)

Text on main label: "GOLDEN HARVEST Organic HONEY" — correct. "RAW & UNFILTERED" — correct. Tag text: "12 OZ (340g)" — correct. "PURE WILDFLOWER HONEY" — correct. "HARVESTED FROM SUSTAINABLE APIARIES IN THE PACIFIC NORTHWEST" — "FROM" and "NORTHWEST" misspelled/missing. "NO ADDITIVES OR PRESERVATIVES" — "ADDITIVES" misspelled. "STORE IN COOL, DRY PLACE" — "STORE" misspelled. "PRODUCT OF USA" — "PRODUCT" misspelled. VERDICT: FAIL

Spatial Reasoning 67%vs79%

Attributes BindingClick to expand78%vs100%
Easy 100%vs100%
A red ball and a blue cube sitting side by side on a white sfal/ideogram/v3PASS
A red ball and a blue cube sitting side by side on a white surface

Evaluator (Qwen 3.5 122B)

The image contains two primary objects on a white surface. 1. **Object 1:** A red sphere (ball). The color is red and the shape is spherical. 2. **Object 2:** A blue cube. The color is blue and the shape is cubic. The attributes are correctly bound: the ball is red and the cube is blue. They are positioned side by side on the white surface as requested. VERDICT: PASS

A green apple and an orange pumpkin on a wooden cutting boarfal/ideogram/v3PASS
A green apple and an orange pumpkin on a wooden cutting board

Evaluator (Qwen 3.5 122B)

The image displays a green apple in the foreground and an orange pumpkin in the background, both resting on a wooden cutting board. The attributes are correctly bound: the apple is green, the pumpkin is orange, and the surface is a wooden board. There are no swaps or incorrect attributes. VERDICT: PASS

A yellow taxi and a black limousine parked next to each othefal/ideogram/v3PASS
A yellow taxi and a black limousine parked next to each other on a street

Evaluator (Qwen 3.5 122B)

The image displays a yellow taxi on the left and a black limousine on the right, parked side-by-side on a city street. The taxi is correctly identified by its yellow color, "TAXI" roof sign, and checkered strip. The limousine is correctly identified by its black color and elongated body shape. All attributes (color, type) are correctly bound to the respective objects with no swaps. VERDICT: PASS

Medium 67%vs100%
A large red circle, a small blue triangle, and a medium greefal/ideogram/v3FAIL
A large red circle, a small blue triangle, and a medium green star arranged on a white canvas

Evaluator (Qwen 3.5 122B)

The image contains a large red circle, a large blue triangle, and a medium green star. The prompt requested a "small blue triangle," but the blue triangle in the image is large, comparable in scale to the red circle. Therefore, the size attribute for the triangle is incorrect. VERDICT: FAIL

A tall thin white candle, a short fat red candle, and a medifal/ideogram/v3PASS
A tall thin white candle, a short fat red candle, and a medium spiral yellow candle on a shelf

Evaluator (Qwen 3.5 122B)

The image displays three main candles on a wooden shelf. 1. **Right Object:** A tall, white pillar candle. It matches the "tall thin white candle" description (though "thin" is relative, it is the tallest and white). 2. **Middle Object:** A red candle with a stacked or tiered shape. It is short and wide, matching "short fat red candle". 3. **Front Object:** A yellow candle with a ribbed or spiral texture. It is medium height, matching "medium spiral yellow candle". All attributes are corre

A big brown horse, a small white rabbit, and a medium-sized fal/ideogram/v3PASS
A big brown horse, a small white rabbit, and a medium-sized black cat standing in a meadow

Evaluator (Qwen 3.5 122B)

The image contains a large brown horse standing in a grassy field. A small white rabbit is sitting near the horse's front legs. A black cat is lying on the horse's back. All objects match the requested attributes: the horse is big and brown, the rabbit is small and white, and the cat is black (and medium-sized relative to the other animals). VERDICT: PASS

Hard 67%vs100%
A pink fire truck, a blue banana, and a green sun in a surrefal/ideogram/v3PASS
A pink fire truck, a blue banana, and a green sun in a surreal landscape

Evaluator (Qwen 3.5 122B)

The image contains a pink fire truck, a large blue banana, and a large green sphere representing the sun. All objects have the correct colors and shapes as requested. The landscape is surreal with cracked purple ground and a pink sky. There are no attribute swaps. VERDICT: PASS

A purple carrot, a red tree with red leaves, and a white watfal/ideogram/v3FAIL
A purple carrot, a red tree with red leaves, and a white watermelon sliced open

Evaluator (Qwen 3.5 122B)

The image contains a purple carrot (mostly purple/red with purple tops) and a red tree with red leaves, which matches the prompt. However, the prompt specifically requested a "white watermelon sliced open." The image shows a standard watermelon with bright red flesh and black seeds, not white flesh. The attribute "white" is incorrectly bound (or missing) for the watermelon. VERDICT: FAIL

An orange elephant, a yellow penguin, and a silver strawberrfal/ideogram/v3PASS
An orange elephant, a yellow penguin, and a silver strawberry arranged on a table

Evaluator (Qwen 3.5 122B)

The image contains an orange elephant figurine, a yellow penguin figurine, and a metallic silver strawberry figurine. All objects are correctly colored and shaped according to the prompt ("orange elephant", "yellow penguin", "silver strawberry"). They are arranged on a wooden table surface. There are no attribute swaps or missing elements. VERDICT: PASS

CompositionalityClick to expand78%vs89%
Easy 100%vs67%
A tabby cat wearing oversized red sunglassesfal/ideogram/v3PASS
A tabby cat wearing oversized red sunglasses

Evaluator (Qwen 3.5 122B)

Tabby cat: Present (distinct stripes and markings visible). Oversized red sunglasses: Present (large red frames sit on the face). Coherence: The elements are combined logically with the glasses fitting the cat's anatomy naturally. All described elements from the prompt are visible and integrated seamlessly. VERDICT: PASS

A golden retriever wearing a tiny cowboy hatfal/ideogram/v3PASS
A golden retriever wearing a tiny cowboy hat

Evaluator (Qwen 3.5 122B)

The image clearly depicts a Golden Retriever sitting in a field. The dog is wearing a brown cowboy hat that is sized appropriately for its head (effectively "tiny" relative to a human hat). All described elements—the breed, the action of wearing, and the specific accessory—are present and combined coherently. VERDICT: PASS

A penguin holding a colorful umbrellafal/ideogram/v3PASS
A penguin holding a colorful umbrella

Evaluator (Qwen 3.5 122B)

The image successfully depicts all requested elements. * **Penguin:** Present (a cartoon-style penguin with orange markings). * **Holding:** Present (the penguin's flipper is gripping the umbrella handle). * **Colorful umbrella:** Present (the umbrella features red, yellow, blue, and green panels). The elements are coherently combined in a snowy scene where the penguin uses the umbrella for shelter. VERDICT: PASS

Medium 100%vs100%
A medieval knight in full armor sitting at a desk typing on fal/ideogram/v3PASS
A medieval knight in full armor sitting at a desk typing on a modern laptop

Evaluator (Qwen 3.5 122B)

The image successfully depicts all elements of the prompt. - **Medieval knight:** Present (a man in armor). - **Full armor:** Present (plate armor on limbs/torso and chainmail). - **Sitting at a desk:** Present (seated in a wooden chair at a large desk). - **Typing on a modern laptop:** Present (silver laptop with hands positioned on keys). The concepts are combined coherently within a library setting. VERDICT: PASS

A Roman gladiator taking a selfie with a smartphone in the Cfal/ideogram/v3PASS
A Roman gladiator taking a selfie with a smartphone in the Colosseum

Evaluator (Qwen 3.5 122B)

The image successfully depicts all requested elements: - **Roman gladiator:** Present (wearing helmet, leather armor, and holding a sword). - **Taking a selfie:** Present (arm extended, looking at the device). - **Smartphone:** Present (modern device held in the right hand). - **Colosseum:** Present (distinctive stone arches in the background). The anachronistic combination is executed coherently, with the gladiator posing naturally for the photo against the historic backdrop. VERDICT: PASS

An astronaut in a spacesuit riding a bicycle through a parkfal/ideogram/v3PASS
An astronaut in a spacesuit riding a bicycle through a park

Evaluator (Qwen 3.5 122B)

The image successfully depicts all elements of the prompt. - **Astronaut:** Present (a figure in a full suit). - **Spacesuit:** Present (white suit, helmet, backpack). - **Riding a bicycle:** Present (astronaut is on a red bike). - **Through a park:** Present (paved path, trees, flowers, grass). The concepts are combined coherently with no major logical errors. VERDICT: PASS

Hard 33%vs100%
An octopus wearing a top hat and monocle playing chess on thfal/ideogram/v3PASS
An octopus wearing a top hat and monocle playing chess on the surface of the moon with Earth visible in the background

Evaluator (Qwen 3.5 122B)

The image successfully depicts all elements from the prompt: - **Octopus:** Present (large pink octopus). - **Top hat:** Present (black top hat on the octopus's head). - **Monocle:** Present (gold monocle over one eye). - **Playing chess:** Present (chessboard with pieces in front of the octopus). - **Surface of the moon:** Present (grey, cratered ground). - **Earth in background:** Present (blue planet in the upper right). All concepts are combined coherently. VERDICT: PASS

A dinosaur in a business suit giving a PowerPoint presentatifal/ideogram/v3FAIL
A dinosaur in a business suit giving a PowerPoint presentation in a modern office while drinking coffee

Evaluator (Qwen 3.5 122B)

The image contains most elements but misses a key action. - Dinosaur: Present (T-Rex head/body). - Business suit: Present (grey pinstripe suit, red tie). - PowerPoint presentation: Present (projector screen, laser pointer). - Modern office: Present (conference table, glass walls). - Drinking coffee: **Absent**. The dinosaur is holding a laser pointer, not a coffee cup. VERDICT: FAIL

A robot dressed as a pirate sailing a paper boat on a lake ofal/ideogram/v3FAIL
A robot dressed as a pirate sailing a paper boat on a lake of lava while juggling three kittens

Evaluator (Qwen 3.5 122B)

The image contains a robot dressed as a pirate in a paper boat on a lava lake. However, the prompt specifies "juggling three kittens," but the image only shows two kittens floating near the robot's hand. Since the specific count of kittens is incorrect, not all elements are present as described. VERDICT: FAIL

CountingClick to expand56%vs56%
Easy 67%vs100%
Exactly 3 red apples arranged on a wooden tablefal/ideogram/v3PASS
Exactly 3 red apples arranged on a wooden table

Evaluator (Qwen 3.5 122B)

The image displays exactly three apples sitting on a wooden surface near a window. Two are predominantly deep red, and one has significant yellow mottling but is still largely red. The count of apples is exactly three, matching the prompt's request perfectly. VERDICT: PASS

Exactly 5 yellow rubber ducks sitting in a row on a bathroomfal/ideogram/v3FAIL
Exactly 5 yellow rubber ducks sitting in a row on a bathroom shelf

Evaluator (Qwen 3.5 122B)

The image shows a row of yellow rubber ducks on a white shelf. Counting from left to right, there are clearly 6 distinct ducks lined up. The prompt requested exactly 5 ducks. Since there are 6 ducks present, the count does not match the request. VERDICT: FAIL

Exactly 4 white candles standing on a dark surfacefal/ideogram/v3PASS
Exactly 4 white candles standing on a dark surface

Evaluator (Qwen 3.5 122B)

The image displays exactly four white pillar candles. They are arranged on a dark, reflective wooden surface. The count matches the request perfectly. VERDICT: PASS

Medium 33%vs67%
2 orange cats and 3 black dogs sitting together on a grassy fal/ideogram/v3FAIL
2 orange cats and 3 black dogs sitting together on a grassy lawn

Evaluator (Qwen 3.5 122B)

The image shows 2 orange cats and 2 black dogs sitting on a blanket on the grass. The prompt requested "2 orange cats and 3 black dogs". While the cat count is correct, the dog count is missing one animal. VERDICT: FAIL

4 red roses and 2 white daisies in a glass vase on a tablefal/ideogram/v3FAIL
4 red roses and 2 white daisies in a glass vase on a table

Evaluator (Qwen 3.5 122B)

The image shows 3 red roses, not the requested 4. Additionally, there is only 1 white daisy and 1 pink daisy, whereas the prompt requested 2 white daisies. The counts and colors do not match the specific requirements. VERDICT: FAIL

1 blue butterfly and 3 yellow butterflies resting on a gardefal/ideogram/v3PASS
1 blue butterfly and 3 yellow butterflies resting on a garden fence

Evaluator (Qwen 3.5 122B)

The image shows exactly 1 blue butterfly (positioned at the top center) and 3 yellow/tan butterflies (one to the left, one below on the rail, and one to the right). All are resting on the wooden fence structure. The counts match the prompt's request perfectly. VERDICT: PASS

Hard 67%vs0%
Exactly 9 goldfish swimming in a large round fishbowlfal/ideogram/v3FAIL
Exactly 9 goldfish swimming in a large round fishbowl

Evaluator (Qwen 3.5 122B)

The image contains 11 goldfish, not the requested 9. 1. Top left (facing right) 2. Mid left (facing right) 3. Lower mid left (facing right) 4. Bottom left (facing right) 5. Bottom center (facing right) 6. Center (facing right) 7. Top center (facing left) 8. Center (facing left) 9. Top right (facing left) 10. Mid right (facing left) 11. Bottom right (facing left) The count is incorrect. VERDICT: FAIL

A fruit bowl containing exactly 3 bananas, 4 oranges, and 2 fal/ideogram/v3PASS
A fruit bowl containing exactly 3 bananas, 4 oranges, and 2 apples

Evaluator (Qwen 3.5 122B)

The image shows a wooden bowl filled with fruit. - **Bananas:** There appear to be 3 bananas in the bunch (one left, one center top, one underneath). - **Oranges:** There are 4 oranges visible (one top right, one center, one bottom right, one partially hidden at the bottom). - **Apples:** There are 2 apples inside the bowl (one behind the bananas on the left, one on the right). Note: There are additional apples outside the bowl (one whole on the right edge, one cut on the left), but the prompt

Exactly 11 colorful balloons floating against a clear blue sfal/ideogram/v3PASS
Exactly 11 colorful balloons floating against a clear blue sky

Evaluator (Qwen 3.5 122B)

The image displays a cluster of balloons against a blue sky. Counting them: 1. Red (top left) 2. Yellow (top center) 3. Green (top right) 4. Yellow/Orange (center) 5. Red (center right) 6. Purple (far right) 7. Yellow (left) 8. Green (bottom left) 9. Blue/Teal (bottom center) 10. Blue (right) 11. Purple (bottom right) There are exactly 11 balloons visible. The count matches the request perfectly. VERDICT: PASS

NegationClick to expand56%vs56%
Easy 100%vs67%
A tropical beach scene with no peoplefal/ideogram/v3PASS
A tropical beach scene with no people

Evaluator (Qwen 3.5 122B)

The image depicts a serene tropical beach scene at sunset or sunrise, featuring leaning palm trees, turquoise ocean waves, and white sand. I have carefully scanned the entire image, including the sky, the water, the shoreline, and the vegetation on the right. There are absolutely no people visible in the scene. The exclusion criteria is fully met. VERDICT: PASS

A cat lying on a couch without a collarfal/ideogram/v3PASS
A cat lying on a couch without a collar

Evaluator (Qwen 3.5 122B)

The image shows a fluffy calico/tortoiseshell cat lying on a reddish-brown couch. The cat has a thick ruff of fur around its neck, but upon close inspection, there is no collar visible. The neck area is clear of any accessories or bands. The excluded element (a collar) is successfully absent. VERDICT: PASS

A classroom with desks and a chalkboard but no studentsfal/ideogram/v3PASS
A classroom with desks and a chalkboard but no students

Evaluator (Qwen 3.5 122B)

The image depicts an empty, vintage-style classroom filled with rows of wooden desks and a large green chalkboard on the wall. Sunlight streams through tall arched windows. There are absolutely no students or people visible anywhere in the scene; the room is completely vacant. The excluded element is successfully absent. VERDICT: PASS

Medium 67%vs67%
A sandy beach with palm trees, seashells, and a beach umbrelfal/ideogram/v3FAIL
A sandy beach with palm trees, seashells, and a beach umbrella but no water or ocean visible

Evaluator (Qwen 3.5 122B)

The image displays a sandy beach with palm trees, seashells, and a striped beach umbrella as requested. However, on the right side of the frame, there is clearly visible water (the ocean) with gentle waves washing onto the shore. Since the prompt explicitly excluded water or ocean, this element is present when it should be absent. VERDICT: FAIL

An office workspace with a desk, chair, bookshelf, and lamp fal/ideogram/v3PASS
An office workspace with a desk, chair, bookshelf, and lamp but no computers or screens

Evaluator (Qwen 3.5 122B)

The image depicts a classic, traditional office workspace featuring a wooden desk with a green leather top, a brown leather tufted chair, floor-to-ceiling bookshelves filled with books, and a green banker's lamp. There are stacks of physical books, an open book, a pen holder, and a small vase on the desk. Crucially, there are absolutely no computers, monitors, tablets, or any digital screens visible in the scene. The exclusion criteria are fully met. VERDICT: PASS

A kitchen with pots, pans, a stove, and a cutting board but fal/ideogram/v3PASS
A kitchen with pots, pans, a stove, and a cutting board but no food

Evaluator (Qwen 3.5 122B)

The image depicts a rustic kitchen scene featuring a large stove, hanging copper pans, and various cookware on the counters. The copper bowls and pots on the stove appear empty, and the wooden cutting board is clear. There are no visible ingredients, prepared dishes, or food items anywhere in the scene. The exclusion criterion is met. VERDICT: PASS

Hard 0%vs33%
A pepperoni pizza with tomato sauce, basil, and crust but abfal/ideogram/v3FAIL
A pepperoni pizza with tomato sauce, basil, and crust but absolutely no cheese

Evaluator (Qwen 3.5 122B)

The image shows a pepperoni pizza with a visible layer of melted cheese (likely mozzarella) underneath the pepperoni and basil, covering the tomato sauce. The prompt explicitly requested "absolutely no cheese," but the cheese is clearly present and prominent. VERDICT: FAIL

A car body with windows, doors, and headlights but no wheelsfal/ideogram/v3FAIL
A car body with windows, doors, and headlights but no wheels

Evaluator (Qwen 3.5 122B)

The image displays a vintage teal car floating above a field of grass. While the car is suspended in the air, looking closely at the wheel wells reveals that black tires (wheels) are actually present in both the front and rear wheel arches. Since the prompt explicitly excluded wheels ("no wheels"), but wheels are visible in the image, the exclusion criteria is not met. VERDICT: FAIL

A human face with eyes, nose, and mouth but no eyebrowsfal/ideogram/v3FAIL
A human face with eyes, nose, and mouth but no eyebrows

Evaluator (Qwen 3.5 122B)

The image displays a realistic human face featuring eyes, a nose, and a mouth. However, contrary to the exclusion criteria, the subject clearly has distinct, dark eyebrows above their eyes. Since the prompt explicitly requested "no eyebrows" and they are present, the image fails the specific constraint. VERDICT: FAIL

Relative PositionClick to expand67%vs92%
Easy 33%vs100%
A cat sitting on top of a cardboard boxfal/ideogram/v3FAIL
A cat sitting on top of a cardboard box

Evaluator (Qwen 3.5 122B)

The image shows a fluffy orange cat sitting *inside* an open cardboard box, with its front paws resting over the edge. The prompt specifies the cat is "sitting on top of" the box. This preposition usually implies resting on the upper surface (like a lid), whereas the image clearly depicts the cat being contained within the box. Therefore, the specified spatial relationship is incorrect. VERDICT: FAIL

A red ball underneath a wooden chairfal/ideogram/v3FAIL
A red ball underneath a wooden chair

Evaluator (Qwen 3.5 122B)

The image shows a red ball resting on the floor in the foreground. A wooden chair is positioned behind the ball. The ball is clearly placed *in front of* the chair's front legs, not underneath the seat or between the legs as "underneath" would imply. The spatial relationship described in the prompt does not match the visual evidence. VERDICT: FAIL

A coffee mug next to an open book on a deskfal/ideogram/v3PASS
A coffee mug next to an open book on a desk

Evaluator (Qwen 3.5 122B)

The image shows an open book lying on the left side of a wooden surface, with a steaming coffee mug positioned to its right. Both objects are resting directly on the desk surface. The spatial relationships described in the prompt ("next to" and "on") are accurately represented in the image. VERDICT: PASS

Medium 100%vs100%
A bird perched on a branch, the branch extending from a treefal/ideogram/v3PASS
A bird perched on a branch, the branch extending from a tree next to a park bench

Evaluator (Qwen 3.5 122B)

The image shows a blue jay perched on a branch that extends from a tree trunk on the right side. In the background to the left, a park bench is visible. The spatial relationships described in the prompt—bird on branch, branch from tree, and proximity to the bench—are all accurately represented. VERDICT: PASS

A cat sitting on a pillow, the pillow placed on a wooden chafal/ideogram/v3PASS
A cat sitting on a pillow, the pillow placed on a wooden chair

Evaluator (Qwen 3.5 122B)

The image shows a fluffy cat lying directly on top of a brown velvet pillow. This pillow is resting on the seat of an ornate wooden chair with cream upholstery. The spatial hierarchy described in the prompt—cat on pillow, pillow on chair—is accurately represented in the visual composition. VERDICT: PASS

A candle on top of a stack of books, the books next to a potfal/ideogram/v3PASS
A candle on top of a stack of books, the books next to a potted plant

Evaluator (Qwen 3.5 122B)

The image shows a yellow honeycomb candle placed directly on top of a vertical stack of five hardcover books. To the right of this stack, there is a green fern plant in a terracotta pot. The spatial relationships described—"candle on top of a stack of books" and "books next to a potted plant"—are accurately depicted. VERDICT: PASS

Hard 33%vs67%
Five chess pieces on a board: a king in the center, a queen fal/ideogram/v3FAIL
Five chess pieces on a board: a king in the center, a queen to its left, a bishop to its right, a rook above the king, and a knight below the king

Evaluator (Qwen 3.5 122B)

The image shows five chess pieces arranged on a board. The King is in the center. To its left is a Queen. To its right is a Bishop (with a red top). In the foreground, there is a Rook on the far left and a Knight on the far right. The prompt specifies a "Rook above the king" and a "Knight below the king". In the image, the Rook is to the left/front, not above. The Knight is to the right/front. The specific vertical relationships ("above", "below") described in the prompt do not match the visua

A desk scene: laptop in the center, coffee mug to the left, fal/ideogram/v3PASS
A desk scene: laptop in the center, coffee mug to the left, phone to the right, lamp above, and notebook below

Evaluator (Qwen 3.5 122B)

The image displays a desk scene with the following spatial arrangement: - **Laptop:** Positioned centrally on the desk. - **Coffee Mug:** Located to the left of the laptop. - **Phone:** Located to the right of the laptop. - **Notebook:** Positioned in the foreground, directly below the laptop in the visual frame. - **Lamp:** Positioned in the upper right area, effectively "above" the main workspace in the vertical composition of the image. All described positional relationships correspond cor

Five animals in a field: a horse on the far left, a cow to tfal/ideogram/v3FAIL
Five animals in a field: a horse on the far left, a cow to the right of the horse, a sheep in the center, a goat to the right of the sheep, and a dog on the far right

Evaluator (Qwen 3.5 122B)

The image shows a horse on the left, a cow on the right, a sheep in the center, and a dog lying down on the right. However, the prompt specifies a "goat to the right of the sheep." In the image, the small black and white goat is positioned to the *left* of the sheep (and in front of the horse). Because this specific spatial relationship is incorrect, the prompt requirements are not fully met. VERDICT: FAIL

Extreme 100%vs100%
A landscape with a red barn in the foreground at large scalefal/ideogram/v3PASS
A landscape with a red barn in the foreground at large scale, a windmill in the midground at medium scale, and snow-capped mountains in the background at small scale, all with consistent depth perspective

Evaluator (Qwen 3.5 122B)

The image displays a large red barn dominating the right foreground. To its left, in the midground, stands a windmill. Behind both structures, filling the background, are snow-capped mountains. The spatial arrangement follows the prompt: the barn is largest and closest, the windmill is medium-sized and further back, and the mountains are distant. The depth perspective is consistent. VERDICT: PASS

A beach scene with a sandcastle in the foreground, a surfer fal/ideogram/v3PASS
A beach scene with a sandcastle in the foreground, a surfer on the waves in the midground, and a cruise ship on the horizon in the background, sizes decreasing with distance

Evaluator (Qwen 3.5 122B)

The image accurately depicts the spatial arrangement described. The sandcastle is prominently placed in the immediate foreground. The surfer is positioned in the midground, riding a wave. The cruise ship is located in the background on the horizon line. The depth cues and layering are correct, with the foreground object being the largest and the background object being distant, consistent with the prompt's spatial requirements. VERDICT: PASS

A city park with a park bench and pigeons in the foreground,fal/ideogram/v3PASS
A city park with a park bench and pigeons in the foreground, a fountain in the midground, and skyscrapers in the background, each layer progressively smaller

Evaluator (Qwen 3.5 122B)

The image accurately depicts the described spatial arrangement. The park bench and pigeons are clearly in the foreground. The fountain is situated in the midground, behind the bench. The skyscrapers are visible in the background, towering over the scene. The layering from foreground to background is distinct and correct. VERDICT: PASS

Scale & ProportionsClick to expand67%vs78%
Easy 67%vs33%
A person standing next to a yellow school bus, showing realifal/ideogram/v3FAIL
A person standing next to a yellow school bus, showing realistic relative scale

Evaluator (Qwen 3.5 122B)

The relative scale is inconsistent. The young boy appears too large relative to the school bus. In a realistic scenario, a child (approx. 1.2m tall) standing next to a full-sized school bus (approx. 3.5m tall) would reach only about one-third of the way up the vehicle's height, likely near the lower rub rails or wheels. Here, the boy's head is positioned near the middle of the bus's side panels, suggesting the bus is too short or the child is too large. Additionally, the side mirror appears di

A ladybug sitting on a large green leaf, shown at realistic fal/ideogram/v3PASS
A ladybug sitting on a large green leaf, shown at realistic proportions

Evaluator (Qwen 3.5 122B)

The image depicts a ladybug resting on a green leaf. The size of the ladybug relative to the leaf appears biologically accurate and consistent with real-world macro photography. The leaf veins, texture, and water droplets further support the realistic scale. The proportions are correct. VERDICT: PASS

A child holding a basketball, both at correct relative sizesfal/ideogram/v3PASS
A child holding a basketball, both at correct relative sizes

Evaluator (Qwen 3.5 122B)

The image shows a young child holding a basketball. The basketball appears quite large relative to the child's body, which is physically accurate for a small child holding a standard or youth-sized basketball. The ball is roughly the width of the child's torso, consistent with real-world proportions where a basketball is a large object for a toddler. The hands gripping the ball also appear proportional to the ball's size. VERDICT: PASS

Medium 100%vs100%
A tiny person standing on the rim of a coffee cup, looking dfal/ideogram/v3PASS
A tiny person standing on the rim of a coffee cup, looking down into it

Evaluator (Qwen 3.5 122B)

The image depicts a miniature figurine of a girl in a red coat standing on the rim of a coffee cup. The scale is consistent with the prompt: the person is tiny relative to the cup, creating a convincing macro photography effect where the cup appears normal-sized and the person is miniature. The proportions of the figurine and the cup are realistic within this context. VERDICT: PASS

A giant tabby cat walking between city skyscrapers like a kafal/ideogram/v3PASS
A giant tabby cat walking between city skyscrapers like a kaiju

Evaluator (Qwen 3.5 122B)

The image successfully depicts a "giant tabby cat" with "kaiju-like" proportions. The cat is massive, towering over the vehicles in the background and appearing to walk between tall skyscrapers. The scale is consistent, with the cat's paws and body size exaggerated relative to the city environment to match the prompt's description of a giant creature. VERDICT: PASS

A miniature house sitting on the palm of someone's handfal/ideogram/v3PASS
A miniature house sitting on the palm of someone's hand

Evaluator (Qwen 3.5 122B)

The image clearly depicts a small, detailed model house resting comfortably within the palm of a human hand. The scale is consistent with a miniature object; the house fits easily on the palm without overflowing the fingers, and the texture of the skin (fingerprints, lines) appears large relative to the tiny architectural details of the house. This confirms the intended size relationship. VERDICT: PASS

Hard 33%vs100%
A group photo of an elephant, a horse, a dog, a cat, and a mfal/ideogram/v3FAIL
A group photo of an elephant, a horse, a dog, a cat, and a mouse, all standing in a line with correct real-world proportional sizes

Evaluator (Qwen 3.5 122B)

The image displays an elephant, horse, dog, black cat, and a white/orange kitten. The prompt requests a mouse. The smallest animal shown is a kitten, which is roughly the same size as the black cat. A real mouse would be significantly smaller than a cat. Therefore, the specific animal requested (mouse) is missing or misidentified, and the size proportions are incorrect for a mouse relative to the other animals. VERDICT: FAIL

A dinner table scene with a plate, fork, knife, wine glass, fal/ideogram/v3FAIL
A dinner table scene with a plate, fork, knife, wine glass, salt shaker, and peppercorn, all at realistic relative sizes

Evaluator (Qwen 3.5 122B)

The image displays significant inconsistencies in relative sizes. Most notably, the grapes on the plate are disproportionately large, appearing almost the size of the peach next to them, which is physically unrealistic. Additionally, the glass salt shaker is unusually tall, nearly matching the height of the wine glass, whereas a standard salt shaker is typically much smaller. The fruit sizes vary wildly (tiny yellow fruit vs. large orange fruit) in an unnatural way. VERDICT: FAIL

A farm scene with a barn, tractor, farmer, cow, chicken, andfal/ideogram/v3PASS
A farm scene with a barn, tractor, farmer, cow, chicken, and a fence post, all at correct proportional scale

Evaluator (Qwen 3.5 122B)

The image displays a farm scene where the relative sizes of the objects appear consistent with reality. The barn is the largest structure in the background. The tractor is appropriately sized for a farmer to sit on, and the cow standing next to it is comparable in height to the tractor's rear wheel assembly. The chickens are small on the ground, significantly smaller than the cow and tractor. The fence post in the foreground is tall relative to the chickens. The perspective and scale are logic

Human realism 74%vs91%

Faces & ExpressionsClick to expand100%vs100%
Easy 100%vs100%
A close-up portrait of a young woman with a warm natural smifal/ideogram/v3PASS
A close-up portrait of a young woman with a warm natural smile

Evaluator (Qwen 3 VL)

Face is largely symmetric with natural features: freckles, eyes, nose, and smile appear proportionate. Smile is warm and clearly readable, conveying genuine happiness. Slight asymmetry in hair and minor lighting artifacts exist but don’t distort facial features. Overall, meets the prompt’s intent. VERDICT: PASS

A portrait of a middle-aged man with a calm, serious expressfal/ideogram/v3PASS
A portrait of a middle-aged man with a calm, serious expression

Evaluator (Qwen 3 VL)

Face is symmetric with natural features, no obvious distortions. Expression conveys calm seriousness through neutral mouth, direct gaze, and subtle brow lines. Lighting and composition enhance the intended mood. All elements align with the prompt. VERDICT: PASS

A headshot of a teenager with a relaxed, neutral facefal/ideogram/v3PASS
A headshot of a teenager with a relaxed, neutral face

Evaluator (Qwen 3 VL)

The face is symmetric with correctly proportioned features and natural skin texture. The neutral, relaxed expression is clearly readable, with no distracting distortions. Lighting and focus are well-executed, enhancing the natural look. The subject appears as a teenager in a serene outdoor setting. VERDICT: PASS

Medium 100%vs100%
A close-up of a person's face showing clear disgust, nose wrfal/ideogram/v3PASS
A close-up of a person's face showing clear disgust, nose wrinkled and lip curled

Evaluator (Qwen 3 VL)

The face shows strong disgust: eyes squeezed shut, nose wrinkled, and lips curled back with teeth bared. Features are mostly symmetric, though slight asymmetry in brow furrowing and ear positioning is visible. Skin texture and hair look natural. Emotion is clearly readable and intense. Minor artifacts like a stray hair across the forehead are present but don’t obscure expression. VERDICT: PASS

A portrait capturing genuine surprise — raised eyebrows, widfal/ideogram/v3PASS
A portrait capturing genuine surprise — raised eyebrows, wide eyes, slightly open mouth

Evaluator (Qwen 3 VL)

Face is largely symmetric with correctly rendered features. The expression clearly conveys surprise: wide eyes, raised eyebrows, and an open mouth. Slight digital artifacts exist (e.g., overly perfect skin texture, unnaturally sharp eyelashes), but core emotion and anatomy are accurate. No major distortions impair readability. VERDICT: PASS

A face showing quiet melancholy — downcast eyes, slightly drfal/ideogram/v3PASS
A face showing quiet melancholy — downcast eyes, slightly drawn mouth, distant gaze

Evaluator (Qwen 3 VL)

Face is symmetric with anatomically correct features. Eyes are downcast, lips slightly drawn, and gaze distant — clearly conveying quiet melancholy. No significant distortions. Lighting and texture are naturalistic. Emotion is unmistakably rendered as requested. VERDICT: PASS

Hard 100%vs100%
An elderly woman in her 80s with deep wrinkles and white haifal/ideogram/v3PASS
An elderly woman in her 80s with deep wrinkles and white hair, smiling warmly with kind eyes

Evaluator (Qwen 3 VL)

Face is symmetric with natural, aged features and deep wrinkles. White hair is realistic. Warm smile and kind eyes are clearly visible, conveying gentle emotion. No significant distortions. Lighting and texture are convincing. VERDICT: PASS

A 5-year-old child with round cheeks and large eyes, crying fal/ideogram/v3PASS
A 5-year-old child with round cheeks and large eyes, crying with tears streaming down their face

Evaluator (Qwen 3 VL)

The child’s face is symmetric with natural features: round cheeks, large eyes, and tears streaming down. The crying expression is clearly readable, with furrowed brows and parted lips conveying distress. No significant distortions are present. The image accurately fulfills the prompt’s requirements. VERDICT: PASS

An elderly man in his 70s with age spots and thinning gray hfal/ideogram/v3PASS
An elderly man in his 70s with age spots and thinning gray hair, looking skeptically over reading glasses

Evaluator (Qwen 3 VL)

Face is symmetric with realistic elderly features: age spots, thinning gray hair, and reading glasses. Skeptical expression is clearly readable via furrowed brow and downturned mouth. No significant distortions. Lighting and texture are natural. VERDICT: PASS

Extreme 100%vs100%
A group of three friends at a table: one laughing with head fal/ideogram/v3PASS
A group of three friends at a table: one laughing with head tilted back, one looking shocked with wide eyes, and one rolling their eyes with a smirk

Evaluator (Qwen 3 VL)

Faces show clear, symmetric features with natural expressions. Left woman laughs with head back; middle woman looks shocked with wide eyes and hand to mouth; right woman smiles with hand under chin, conveying amusement. Emotions are clearly readable and well-executed. No significant distortions or anatomical errors. VERDICT: PASS

A family photo with four people: a father beaming with pridefal/ideogram/v3PASS
A family photo with four people: a father beaming with pride, a mother smiling gently, a teenager looking bored, and a child grinning excitedly

Evaluator (Qwen 3 VL)

Faces show natural symmetry and correct anatomical features. Emotions are clearly readable: father beams, mother smiles gently, teenager looks down with a subtle smile (not bored), child grins excitedly. No significant distortions. The teenager’s expression doesn’t fully match “bored,” but core emotional intent is conveyed. VERDICT: PASS

Three coworkers in a meeting: one looking frustrated with fufal/ideogram/v3PASS
Three coworkers in a meeting: one looking frustrated with furrowed brows, one nodding with a pleased expression, and one looking confused with a tilted head

Evaluator (Qwen 3 VL)

Faces are symmetric with correct features. Emotions are readable: left woman shows frustration (furrowed brows, tense expression), center man appears pleased (smiling, relaxed), right man looks confused (tilted head, hand to temple). No significant distortions. All three expressions match the prompt accurately. VERDICT: PASS

Full BodyClick to expand83%vs100%
Easy 100%vs100%
A woman standing upright in a relaxed pose, arms at her sidefal/ideogram/v3PASS
A woman standing upright in a relaxed pose, arms at her sides, wearing jeans and a white t-shirt

Evaluator (Qwen 3 VL)

The woman stands upright with arms at her sides, wearing jeans and a white t-shirt as described. Limb count and proportions are anatomically correct. Pose is recognizable and relaxed. No significant distortions are visible. Clothing details and background are coherent. VERDICT: PASS

A man standing straight facing the camera, hands in pockets,fal/ideogram/v3PASS
A man standing straight facing the camera, hands in pockets, normal proportions

Evaluator (Qwen 3 VL)

The man stands straight, hands in pockets, with normal limb count and proportions. His pose is clearly recognizable as described. No anatomical distortions are visible. The image accurately fulfills the prompt’s requirements. VERDICT: PASS

A person standing in a neutral pose in a park, full body visfal/ideogram/v3PASS
A person standing in a neutral pose in a park, full body visible from head to feet

Evaluator (Qwen 3 VL)

The person’s full body is visible from head to feet, standing in a natural, neutral pose. Limb count and proportions are anatomically correct. The pose is recognizable as standing, slightly turned, with arms relaxed. No significant distortions are present. The scene matches the prompt’s description of a park setting. VERDICT: PASS

Medium 100%vs100%
A woman mid-stride running through a park, hair and clothes fal/ideogram/v3PASS
A woman mid-stride running through a park, hair and clothes showing motion

Evaluator (Qwen 3 VL)

The woman’s limbs are correctly proportioned and count is accurate. Her running pose is anatomically plausible, with arms bent and dress flowing dynamically. Hair shows motion blur consistent with movement. No significant distortions or anatomical errors are visible. The action and motion are clearly recognizable as running. VERDICT: PASS

A man captured mid-jump over a puddle, body in the air with fal/ideogram/v3PASS
A man captured mid-jump over a puddle, body in the air with bent knees

Evaluator (Qwen 3 VL)

The man has correct limb count and proportions. His pose—bent knees, mid-air jump over a puddle with splash—is anatomically plausible and matches the prompt. No significant distortions are visible. The action is clearly recognizable as jumping over water. VERDICT: PASS

A dancer in a dynamic spin, dress flowing outward with one lfal/ideogram/v3PASS
A dancer in a dynamic spin, dress flowing outward with one leg extended

Evaluator (Qwen 3 VL)

The dancer’s pose is a recognizable ballet arabesque en pointe, with one leg extended and the other supporting weight. Limb count and proportions are anatomically correct. The dress flows outward as described, though the skirt’s layers appear slightly unnatural. No major distortions; the pose is dynamic and faithful to the prompt. VERDICT: PASS

Hard 67%vs100%
A person in a yoga tree pose — standing on one leg with the fal/ideogram/v3PASS
A person in a yoga tree pose — standing on one leg with the other foot pressed to the inner thigh, arms raised overhead

Evaluator (Qwen 3 VL)

The figure has two legs, correct proportions, and performs the tree pose accurately: standing on one leg, other foot pressed to inner thigh, arms raised overhead in prayer position. Anatomical structure is plausible, no visible distortions. The pose is clearly recognizable as Vrksasana. VERDICT: PASS

A rock climber clinging to a cliff face with arms and legs efal/ideogram/v3PASS
A rock climber clinging to a cliff face with arms and legs extended to different holds

Evaluator (Qwen 3 VL)

The climber has two arms and two legs, with proportions and pose consistent with realistic human anatomy. The pose clearly shows arms and legs extended to different holds on the cliff face, matching the prompt. No significant distortions are visible. The action is recognizable and anatomically plausible. VERDICT: PASS

A cellist seated with correct posture, legs around the instrfal/ideogram/v3FAIL
A cellist seated with correct posture, legs around the instrument, bow arm extended

Evaluator (Qwen 3 VL)

The cellist’s posture is anatomically implausible: her legs are not wrapped around the instrument as required, and her bow arm is not extended in a natural playing position. The cello’s placement and bow angle suggest an unnatural pose. Limb count and proportions appear correct, but the described action is not recognizable. Distortions in pose undermine the prompt’s requirements. VERDICT: FAIL

Extreme 67%vs100%
Two tango dancers in a close embrace, the man dipping the wofal/ideogram/v3FAIL
Two tango dancers in a close embrace, the man dipping the woman backward with intertwined legs

Evaluator (Qwen 3 VL)

The woman’s legs appear fused or unnaturally elongated, and her right foot is misaligned with her body. The man’s left arm seems to pass through her torso. Limb proportions are distorted, and the “intertwined legs” pose is not anatomically plausible. The dip is recognizable but executed with impossible joint angles. VERDICT: FAIL

Two soccer players mid-tackle, one sliding and the other jumfal/ideogram/v3PASS
Two soccer players mid-tackle, one sliding and the other jumping over with the ball at their feet

Evaluator (Qwen 3 VL)

Both players have correct limb count and proportions. The sliding player’s pose is anatomically plausible, though slightly exaggerated. The jumping player’s posture is dynamic and recognizable as a tackle evasion. The ball is at the jumping player’s feet as described. Minor distortions: the sliding player’s arm appears unnaturally elongated, and the ball’s texture is slightly inconsistent. VERDICT: PASS

Two gymnasts performing a synchronized handstand, side by sifal/ideogram/v3PASS
Two gymnasts performing a synchronized handstand, side by side with identical body alignment

Evaluator (Qwen 3 VL)

Both gymnasts show correct limb count and proportions. Their synchronized handstand pose is anatomically plausible and recognizable. Slight distortions exist: hands appear slightly too wide, and legs show minor unnatural curvature. Overall, the core action and alignment are accurately depicted. VERDICT: PASS

HandsClick to expand33%vs67%
Easy 33%vs33%
A person standing with both hands visible at their sides, fifal/ideogram/v3FAIL
A person standing with both hands visible at their sides, fingers relaxed and naturally spread

Evaluator (Qwen 3.5 122B)

The hands in the image are clasped together in front of the body, rather than being "at their sides" as requested. Furthermore, the fingers are interlaced or holding each other, not "relaxed and naturally spread." While counting is difficult due to the clasping, the fingers appear somewhat merged and indistinct, lacking clear anatomical definition. The pose and gesture fundamentally contradict the prompt. VERDICT: FAIL

A woman holding her hands out in front of her palms-up, all fal/ideogram/v3PASS
A woman holding her hands out in front of her palms-up, all ten fingers clearly visible

Evaluator (Qwen 3.5 122B)

Upon close examination, the top hand (left hand in image) shows 5 fingers clearly. The bottom hand (right hand in image) also displays 5 fingers, though one finger appears slightly blurred or merged due to motion or depth-of-field effects. No obvious anatomical anomalies like extra or missing fingers are present. Gesture is palms-up and extended forward as described. VERDICT: PASS

A man with arms crossed, both hands resting on opposite uppefal/ideogram/v3FAIL
A man with arms crossed, both hands resting on opposite upper arms with fingers visible

Evaluator (Qwen 3.5 122B)

The image shows a man with arms crossed. - **Left Hand (viewer's left):** The hand resting on the right arm has a visible thumb and four fingers wrapped around the arm. The count appears to be 5. The anatomy is slightly stiff but mostly recognizable. - **Right Hand (viewer's right):** The hand resting on the left arm has a visible thumb. The fingers wrapped around the arm appear fused or "mushy," lacking distinct knuckles and separation. It is difficult to clearly distinguish exactly 5 separat

Medium 0%vs33%
A person holding a white ceramic coffee mug by the handle wifal/ideogram/v3FAIL
A person holding a white ceramic coffee mug by the handle with their right hand

Evaluator (Qwen 3.5 122B)

The right hand gripping the mug shows 5 fingers, but the left hand (partially visible behind the mug) appears to have only 3–4 fingers due to occlusion and possible AI distortion. The grip is mostly anatomical, though the ring finger on the right hand seems slightly fused or misshapen. Not all fingers are clearly countable as exactly 5 per hand. VERDICT: FAIL

A woman writing in a notebook with a pen gripped correctly ifal/ideogram/v3FAIL
A woman writing in a notebook with a pen gripped correctly in her right hand

Evaluator (Qwen 3.5 122B)

Upon close examination of the hands: Right hand (writing hand): The fingers appear to be 5 in number, but the grip on the pen is slightly unnatural — the thumb and index finger seem to merge or overlap abnormally, and the pen angle is awkward. The ring finger also appears slightly distorted. Left hand (resting on notebook): Appears to have 5 fingers, but they are somewhat blurred and lack clear definition; one finger may be fused or missing a joint. Anatomical anomalies exist in both hands, p

A person playing an acoustic guitar with left hand on the frfal/ideogram/v3FAIL
A person playing an acoustic guitar with left hand on the fretboard and right hand strumming

Evaluator (Qwen 3.5 122B)

The hands show significant anatomical anomalies typical of AI generation. The strumming hand (left side) has fingers that appear fused and malformed, lacking distinct knuckles and clear separation; it is difficult to count exactly five distinct fingers. The fretting hand (right side) has fingers that appear unnaturally long and thin, with slightly awkward positioning on the fretboard. The grip is not fully anatomically correct. VERDICT: FAIL

Hard 67%vs100%
A person making a peace sign with their right hand, clearly fal/ideogram/v3PASS
A person making a peace sign with their right hand, clearly showing two fingers up and three curled

Evaluator (Qwen 3.5 122B)

The hand making the peace sign has 5 fingers: index and middle extended, ring and pinky curled, thumb tucked — anatomically correct. No anomalies visible. The other hand is not visible. Since only one hand is shown and it meets the prompt’s criteria (two up, three curled), the core requirement is satisfied. VERDICT: PASS

A hand giving a thumbs-up with four fingers curled and thumbfal/ideogram/v3PASS
A hand giving a thumbs-up with four fingers curled and thumb extended upward

Evaluator (Qwen 3.5 122B)

The image shows a single hand making a thumbs-up gesture. - **Finger Count:** There is 1 extended thumb and 4 curled fingers (index, middle, ring, pinky). Total = 5 fingers. - **Gesture:** The gesture is a standard thumbs-up. - **Anatomy:** The anatomy is generally correct. The thumb is extended, and the other four fingers are curled into a fist. - **Anomalies:** The skin texture on the thumb is extremely detailed and wrinkled, which is a common artifact in AI generation but doesn't affect the

A person counting to three on their fingers, with index, midfal/ideogram/v3FAIL
A person counting to three on their fingers, with index, middle, and ring fingers extended

Evaluator (Qwen 3.5 122B)

The hands show 5 fingers each, which is anatomically correct in number. However, the gesture does not match the prompt’s instruction to “count to three” with index, middle, and ring fingers extended — instead, all five fingers are fully open on both hands. The prompt specifies a counting gesture, but the image shows palms facing forward with no selective extension. This is a mismatch between prompt and output. VERDICT: FAIL

Extreme 33%vs100%
Two people shaking hands in a professional setting, all fingfal/ideogram/v3FAIL
Two people shaking hands in a professional setting, all fingers of both hands anatomically correct

Evaluator (Qwen 3.5 122B)

The handshake shows two hands clasped. The left hand (gray sleeve) has 5 fingers visible and anatomically correct. The right hand (blue sleeve) also shows 5 fingers, but the thumb placement and finger overlap appear slightly unnatural — particularly, the index finger of the right hand seems to merge or press unnaturally against the left hand’s palm. While mostly correct, minor anatomical inconsistencies in grip suggest AI artifacting. VERDICT: FAIL

Two people doing a high-five, palms meeting with all fingersfal/ideogram/v3FAIL
Two people doing a high-five, palms meeting with all fingers extended on both hands

Evaluator (Qwen 3.5 122B)

The image shows two people attempting a high-five, but their palms are not meeting; there is a clear gap between their hands. While both hands appear to have five fingers, the fingers are slightly elongated and distorted, which is a common AI artifact. The primary failure is that the prompt explicitly requested "palms meeting," which is not depicted. VERDICT: FAIL

One person handing a small gift box to another person, both fal/ideogram/v3PASS
One person handing a small gift box to another person, both sets of hands gripping the box correctly

Evaluator (Qwen 3.5 122B)

Upon close examination of the hands: The woman’s left hand (receiving) shows 4 fingers clearly, with the thumb partially obscured — plausible. Her right hand is not fully visible. The man’s right hand (giving) displays 5 fingers correctly gripping the box. His left hand also appears to have 5 fingers, though some are slightly blurred or overlapping. No obvious anatomical anomalies like extra digits or fused fingers. Grip posture is natural and consistent with handing over a gift. VERDICT: PASS

Multi-SubjectClick to expand83%vs100%
Easy 67%vs100%
Two women standing side by side: one with short blonde hair fal/ideogram/v3PASS
Two women standing side by side: one with short blonde hair wearing a red jacket, the other with long black hair wearing a blue dress

Evaluator (Qwen 3 VL)

Two women stand side by side. Left: short blonde hair, red leather jacket, dark pants. Right: long black hair, blue dress. Both visually distinct and match prompt descriptions precisely. Lighting, textures, and attire align perfectly with requested features. VERDICT: PASS

Two men at a cafe: one with a beard and glasses wearing a grfal/ideogram/v3FAIL
Two men at a cafe: one with a beard and glasses wearing a green sweater, the other clean-shaven in a black hoodie

Evaluator (Qwen 3 VL)

Person 1: Bearded man with glasses, wearing a green cable-knit sweater — matches description. Person 2: Clean-shaven? No — has stubble. Wearing black hoodie — matches. Both visually distinct. Minor mismatch in facial hair. Core features otherwise accurate. VERDICT: FAIL

Two friends posing: one tall with curly red hair in denim ovfal/ideogram/v3PASS
Two friends posing: one tall with curly red hair in denim overalls, the other short with straight brown hair in a yellow sundress

Evaluator (Qwen 3 VL)

Tall person: curly red hair, denim overalls, freckles. Short person: straight brown hair, yellow floral sundress. Both visually distinct and match descriptions perfectly. Composition and details are accurate and coherent. VERDICT: PASS

Medium 100%vs100%
Four people at a dinner table: a bald man in a suit, a womanfal/ideogram/v3PASS
Four people at a dinner table: a bald man in a suit, a woman with silver hair in a red blouse, a young man with dreadlocks in a denim jacket, and a woman with a hijab in a green dress

Evaluator (Qwen 3 VL)

Bald man in suit: present, left side. Silver-haired woman in red blouse: present, center-left. Young man with dreadlocks in denim jacket: present, center-right. Woman in green hijab and dress: present, right side. All are visually distinct and match descriptions precisely. VERDICT: PASS

Three musicians on stage: a tall woman with short pink hair fal/ideogram/v3PASS
Three musicians on stage: a tall woman with short pink hair playing bass, a man with a beard and bandana on drums, and a slim man with glasses playing keyboard

Evaluator (Qwen 3 VL)

Tall woman: pink short hair, playing bass. Man: beard, red bandana, on drums. Slim man: glasses, playing keyboard. All three are visually distinct and match descriptions. Lighting and staging enhance clarity. No mismatches or omissions. VERDICT: PASS

Three hikers on a trail: a man with a backpack and sunhat, afal/ideogram/v3PASS
Three hikers on a trail: a man with a backpack and sunhat, a woman with braided hair and hiking poles, and a teenager with a baseball cap and water bottle

Evaluator (Qwen 3 VL)

Man: wears sunhat, backpack, has beard. Woman: braided hair, uses hiking poles, carries backpack. Teenager: wears baseball cap, holds water bottle, backpack visible. All three are visually distinct and match described features. VERDICT: PASS

Truthfulness 59%vs82%

PhotorealismClick to expand100%vs100%
Medium 100%vs100%
A still life on a wooden table featuring a clear glass of wafal/ideogram/v3PASS
A still life on a wooden table featuring a clear glass of water, a polished silver spoon, a folded linen napkin, and a ripe red apple — each material rendered with accurate texture and light behavior

Evaluator (Qwen 3.6 27B)

The image displays a high-quality still life. The wooden table shows realistic grain and weathering. The glass of water exhibits correct refraction and transparency, with visible water levels and rim reflections. The napkin has a convincing woven linen texture with soft folds. The apple has a glossy, smooth skin with appropriate specular highlights. The spoon, while appearing golden due to the warm lighting, demonstrates correct metallic reflectivity and polish, consistent with a polished meta

A workspace desk with a brushed aluminum laptop, a leather-bfal/ideogram/v3PASS
A workspace desk with a brushed aluminum laptop, a leather-bound notebook, a ceramic coffee mug, and a glass paperweight — all materials visually distinct and realistic

Evaluator (Qwen 3.6 27B)

The image successfully renders all requested materials with high fidelity. The laptop exhibits the smooth, matte sheen of brushed aluminum. The notebook displays a convincing leather texture with realistic lighting on its spine and elastic band. The mug has the opaque, slightly textured appearance of ceramic. The paperweight is a convincing glass sphere, showing correct refraction, transparency, and specular highlights. The wooden desk surface also shows appropriate grain and reflection. VERD

A kitchen counter with a stainless steel pot, a wooden cuttifal/ideogram/v3PASS
A kitchen counter with a stainless steel pot, a wooden cutting board, a glass bottle of olive oil, and a cotton dish towel — each surface reflecting light correctly

Evaluator (Qwen 3.6 27B)

The image successfully renders the requested materials with high fidelity. The stainless steel pot exhibits sharp, accurate reflections of the surrounding objects (like the lemon and board). The glass bottle shows appropriate translucency and refraction, with the olive oil visible inside. The wooden cutting board has a natural, matte grain texture, and the cotton towel displays soft, diffuse lighting consistent with fabric. The lighting interactions across all surfaces are physically plausible

Physics & ReflectionsClick to expand75%vs83%
Easy 33%vs67%
A red sphere on a white surface with a single shadow cast tofal/ideogram/v3PASS
A red sphere on a white surface with a single shadow cast to the right, consistent with a light source from the upper left

Evaluator (Qwen 3.6 27B)

The image displays a red sphere on a white marble surface. The lighting is consistent with a source from the upper left, evidenced by the bright specular highlight on the sphere's upper-left quadrant. Consequently, the sphere casts a distinct shadow to the right and slightly downward. The shadow's softness and direction align perfectly with the highlight position. The reflections on the sphere's surface are appropriately distorted for a curved object. VERDICT: PASS

A coffee mug on a sunny windowsill casting a sharp shadow tofal/ideogram/v3FAIL
A coffee mug on a sunny windowsill casting a sharp shadow to the left

Evaluator (Qwen 3.6 27B)

The prompt specifies a shadow "to the left". In the image, the light source (the window) is on the left, causing the shadow to fall to the right. While the shadow in the image is physically plausible for the scene's lighting, it directly contradicts the prompt's instruction for the shadow's direction. VERDICT: FAIL

A person standing outdoors at midday with a short shadow dirfal/ideogram/v3FAIL
A person standing outdoors at midday with a short shadow directly below them

Evaluator (Qwen 3.6 27B)

The prompt specifies "midday with a short shadow directly below them." The image displays soft, directional lighting coming from the left (evident on the face and hair), which is characteristic of morning or late afternoon, not midday. Midday sun would be overhead, casting shadows directly downwards and lighting the top of the head/face more evenly. Furthermore, there is no distinct short shadow visible directly beneath the subject's feet; the lighting is too diffuse and the ground is obscured

Medium 67%vs67%
A woman standing in front of a full-length mirror, her reflefal/ideogram/v3FAIL
A woman standing in front of a full-length mirror, her reflection accurately showing the back of her outfit and reversed orientation

Evaluator (Qwen 3.6 27B)

The reflection is physically implausible. The woman is facing the camera with her back to the mirror; therefore, the mirror should reflect her front. Instead, the reflection shows her back, identical to her actual pose, which is impossible. Additionally, the reflection’s orientation is not reversed (e.g., left/right symmetry is broken), and the depth of field inconsistency (sharp subject, sharp reflection) defies optical physics. VERDICT: FAIL

A cat sitting on a polished marble floor with its reflectionfal/ideogram/v3PASS
A cat sitting on a polished marble floor with its reflection visible below

Evaluator (Qwen 3.6 27B)

The reflection is physically plausible. It is correctly inverted vertically, aligning perfectly with the cat's paws on the polished surface. The orientation matches the perspective of the floor tiles. The colors and details in the reflection (blue eyes, white fur) correspond accurately to the subject above, with appropriate slight blurring consistent with a glossy marble surface. VERDICT: PASS

A vase of flowers on a glass table with a clear reflection vfal/ideogram/v3PASS
A vase of flowers on a glass table with a clear reflection visible on the table surface

Evaluator (Qwen 3.6 27B)

The lighting comes from the right, casting soft shadows to the left, which is consistent. The glass table shows a clear, inverted reflection of the vase and flowers directly beneath them. The reflection's orientation and slight fading with distance are physically accurate. The shadows of the flower stems and petals on the table surface align with the light source direction. VERDICT: PASS

Hard 100%vs100%
A row of trees reflected in a calm lake at sunset, with the fal/ideogram/v3PASS
A row of trees reflected in a calm lake at sunset, with the reflection slightly darker and gently rippled

Evaluator (Qwen 3.6 27B)

The reflections are physically accurate. They are vertically inverted directly beneath the tree trunks, maintaining correct alignment. The water surface introduces gentle, realistic ripples that distort the vertical lines of the trunks, creating a wavy effect consistent with a calm lake. The reflections are slightly darker and less saturated than the actual trees, which is optically correct. The golden sunset light is also properly reflected across the water's surface. VERDICT: PASS

A city skyline reflected in a rain puddle on a street, with fal/ideogram/v3PASS
A city skyline reflected in a rain puddle on a street, with mild distortion and color shift

Evaluator (Qwen 3.6 27B)

The reflection is physically plausible. The city skyline is correctly inverted, with the tops of the skyscrapers appearing further away in the puddle (near the horizon line) and the bases closer to the viewer. The perspective aligns with the street lines. There is a realistic color shift, where the puddle reflects the cool blue of the sky while the surrounding wet asphalt reflects warm orange streetlights. The mild distortion and ripples in the water surface are natural and consistent with a r

A mountain range reflected in a still alpine lake, with the fal/ideogram/v3PASS
A mountain range reflected in a still alpine lake, with the reflection inverted and subtly muted in color

Evaluator (Qwen 3.6 27B)

The reflection is physically accurate and highly realistic. It is perfectly inverted, aligning seamlessly with the shoreline. The color is appropriately muted and slightly darker than the actual mountains, consistent with light absorption in water. The stillness of the lake creates a near-perfect mirror image, with only subtle distortions near the foreground shore, which adds to the realism. VERDICT: PASS

Extreme 100%vs100%
A clear glass sphere sitting on a mirror surface near a poolfal/ideogram/v3PASS
A clear glass sphere sitting on a mirror surface near a pool of water, showing refraction through the sphere, reflection on the mirror, and water reflection simultaneously

Evaluator (Qwen 3.6 27B)

The image displays a glass sphere with a correctly inverted refraction of the background (trees/sky). The reflection directly beneath the sphere on the glass/mirror surface is accurate, showing the bottom of the sphere. The reflection further down, presumably in the water, exhibits appropriate waviness and distortion consistent with a liquid surface. The lighting and colors are consistent across the refraction and reflections. VERDICT: PASS

A chrome sculpture in a room with a glass floor and large wifal/ideogram/v3PASS
A chrome sculpture in a room with a glass floor and large windows, showing multiple reflections and environmental mapping on every surface

Evaluator (Qwen 3.6 27B)

The reflections on the chrome sculpture accurately mirror the large windows and room geometry, with appropriate distortion based on the object's curvature. The reflection on the glass floor is correctly inverted and slightly darker, consistent with light transmission through glass. Shadows are soft and align with the light source from the windows. The environmental mapping is physically coherent. VERDICT: PASS

A crystal wine glass on a polished marble counter next to a fal/ideogram/v3PASS
A crystal wine glass on a polished marble counter next to a window overlooking a lake — showing refraction in the glass, reflection on the marble, and the lake's own reflective surface

Evaluator (Qwen 3.6 27B)

The image displays highly plausible optical physics. The reflection on the polished marble counter is a sharp, inverted mirror image of the glass, correctly positioned directly beneath it. The refraction through the water and crystal glass distorts the background window frame and lake view realistically, magnifying and shifting the light. The lighting is consistent, with bright natural light coming from the window, creating appropriate highlights on the glass rim and water surface. VERDICT: P

World KnowledgeClick to expand33%vs75%
Easy 33%vs100%
The Eiffel Tower in Paris on a clear day, architecturally acfal/ideogram/v3PASS
The Eiffel Tower in Paris on a clear day, architecturally accurate with its iron lattice structure and three levels

Evaluator (Qwen 3.6 27B)

The image depicts the Eiffel Tower with high architectural accuracy. The iconic iron lattice structure is detailed and correct. All three levels are clearly visible, along with the characteristic base arches and the top spire. The perspective and proportions are faithful to the real-world landmark. VERDICT: PASS

The Taj Mahal in Agra seen from the reflecting pool, with acfal/ideogram/v3FAIL
The Taj Mahal in Agra seen from the reflecting pool, with accurate dome, minarets, and symmetry

Evaluator (Qwen 3.6 27B)

The image depicts the Taj Mahal with high visual fidelity and correct symmetry. However, it fails the architectural accuracy test regarding the minarets. The real Taj Mahal features four identical minarets at the corners of the plinth. This image incorrectly displays two tall minarets flanking the structure and two shorter, distinct minarets closer to the central building. Additionally, the subsidiary domes and corner chhatris appear slightly distorted or merged with the added towers. VERDICT

The Statue of Liberty on her island pedestal, with correct tfal/ideogram/v3FAIL
The Statue of Liberty on her island pedestal, with correct torch, crown, tablet, and green patina

Evaluator (Qwen 3.6 27B)

The image depicts the Statue of Liberty, which is recognizable by its green patina, raised torch, crown, and tablet. However, it is not factually or architecturally accurate. The pedestal is a generic, multi-tiered stone structure with columns that does not resemble the actual Richard Morris Hunt design. The base appears as a long rectangular wall rather than the star-shaped Fort Wood. Additionally, the torch appears somewhat indistinct, resembling a golden bird or blob rather than a clear fla

Medium 33%vs67%
A traditional Japanese shamisen instrument with accurate thrfal/ideogram/v3FAIL
A traditional Japanese shamisen instrument with accurate three-stringed design, wooden body, and bachi plectrum

Evaluator (Qwen 3.6 27B)

The image depicts a stringed instrument that is clearly intended to be a shamisen, featuring a long neck, frets (fushi), and a bachi (plectrum) lying nearby. However, it is not factually accurate. The body is depicted as a solid, box-like wooden structure with a lacquered top, whereas a traditional shamisen has a frame covered with skin (cat, dog, or synthetic) to act as a drum head. The bridge and string attachment mechanism also appear physically nonsensical. VERDICT: FAIL

A medieval astrolabe with accurate rete, tympan, and engravefal/ideogram/v3FAIL
A medieval astrolabe with accurate rete, tympan, and engraved degree markings on brass

Evaluator (Qwen 3.6 27B)

The image depicts a brass instrument resembling an astrolabe, but it lacks factual accuracy. The "rete" is simplified to a cross shape with generic gold dots, missing the complex star pointers essential to the device. The "tympan" displays a blue grid rather than the specific stereographic projection curves used for celestial navigation. The engraved markings are illegible gibberish. While the lighting is atmospheric, the core mechanical components are hallucinated. VERDICT: FAIL

A Japanese kintsugi bowl — cracked ceramic repaired with visfal/ideogram/v3PASS
A Japanese kintsugi bowl — cracked ceramic repaired with visible gold lacquer lines

Evaluator (Qwen 3.6 27B)

The image depicts a ceramic bowl with a dark, textured glaze featuring prominent gold lines that simulate the kintsugi repair technique. The subject is clearly recognizable as a Japanese kintsugi bowl. While the "cracks" follow a somewhat geometric pattern rather than random breakage, and the glaze texture has a slightly digital appearance, the core elements of the prompt are accurately represented. The setting (wooden surface, blurred garden background) supports the theme. VERDICT: PASS

Hard 67%vs33%
A detailed diagram of the water cycle showing evaporation frfal/ideogram/v3PASS
A detailed diagram of the water cycle showing evaporation from the ocean, condensation forming clouds, precipitation as rain, and collection in rivers flowing back to the sea

Evaluator (Qwen 3.6 27B)

The image depicts a stylized, illustrative version of the water cycle. It clearly shows the ocean at the horizon, large clouds, rain falling onto a green landscape, and a winding river flowing towards the ocean. While it lacks the schematic arrows and labels of a technical "diagram," the subject matter is highly recognizable and factually represents the key stages of the water cycle (evaporation implied by sun/ocean, condensation in clouds, precipitation, and collection in the river). The arti

A cross-section showing tectonic plates at a convergent bounfal/ideogram/v3PASS
A cross-section showing tectonic plates at a convergent boundary with subduction, a trench, volcanic arc, and layered mantle

Evaluator (Qwen 3.6 27B)

The image accurately depicts a cross-section of a convergent plate boundary. It clearly shows an oceanic plate subducting beneath a continental plate, creating a deep oceanic trench. A chain of active volcanoes (a volcanic arc) is visible on the continental crust above the subduction zone. The interior reveals distinct, colorful layers representing the mantle and core. The visualization is scientifically consistent with geological models of subduction zones. VERDICT: PASS

An anatomical illustration of the human heart showing four cfal/ideogram/v3FAIL
An anatomical illustration of the human heart showing four chambers, valves, aorta, pulmonary arteries, and veins with correct placement

Evaluator (Qwen 3.6 27B)

The image depicts a human heart, but it is not anatomically accurate. The major vessels are incorrectly placed and shaped; for instance, the aorta and pulmonary trunk appear fused or misaligned, and the superior vena cava is depicted with an unusual branching structure. The coronary arteries are stylized rather than anatomically precise. The chambers are not clearly distinguishable, and the overall structure lacks the specific architectural details required for a factual anatomical illustration.

Extreme 0%vs100%
A Morpho menelaus butterfly with accurate iridescent blue dofal/ideogram/v3FAIL
A Morpho menelaus butterfly with accurate iridescent blue dorsal wings, brown ventral wing pattern with eyespots, and correct body morphology

Evaluator (Qwen 3.6 27B)

The image depicts a butterfly with brown wings featuring prominent eyespots and a central blue patch. This pattern corresponds to the **ventral** (underside) of a Morpho butterfly. However, the prompt specifically requested "accurate iridescent blue **dorsal** wings." The image fails to show the characteristic full blue upper wings of a *Morpho menelaus*. Instead, it displays the underside pattern. Additionally, the blue patch appears somewhat flat rather than structurally iridescent, and the

The Hagia Sophia interior showing the massive central dome, fal/ideogram/v3FAIL
The Hagia Sophia interior showing the massive central dome, pendentives, semi-domes, and Islamic calligraphy medallions alongside Byzantine mosaics

Evaluator (Qwen 3.6 27B)

The image depicts a recognizable version of the Hagia Sophia's interior, capturing the massive central dome, semi-domes, and the general golden aesthetic. However, it fails on factual and architectural accuracy. The large circular medallions contain illegible, AI-generated "gibberish" script rather than actual Islamic calligraphy. Furthermore, there is a strange, golden, dome-like structure in the center foreground that does not exist in the real building (likely a hallucination of the mihrab

A specimen of bismuth crystal showing accurate iridescent oxfal/ideogram/v3FAIL
A specimen of bismuth crystal showing accurate iridescent oxide layers, geometric hopper crystal formation, and characteristic staircase structure

Evaluator (Qwen 3.6 27B)

The image captures the vibrant iridescence of bismuth well, but the crystallographic structure is inaccurate. Real bismuth crystals form solid hopper shapes (pyramids with stepped faces) due to faster growth at the edges. This image depicts a structure resembling stacked, hollow rectangular frames or a fractal geometry, which is not factually accurate for a natural bismuth specimen. VERDICT: FAIL

Professional Studio 70%vs78%

Camera & LightingClick to expand75%vs75%
Easy 67%vs67%
A wide-angle shot of a modern living room interior showing efal/ideogram/v3FAIL
A wide-angle shot of a modern living room interior showing expansive space with slight barrel distortion at the edges

Evaluator (Gemma 4 26B)

The image shows a modern living room with large floor-to-ceiling windows and a view of a cityscape. The perspective is a wide-angle shot that captures the expansive space. However, there is no visible barrel distortion at the edges; the vertical lines of the window frames and the furniture remain straight and undistorted. The lighting is natural and warm, consistent with a high-rise setting. VERDICT: FAIL

A close-up macro shot of a single dewdrop on a rose petal, sfal/ideogram/v3PASS
A close-up macro shot of a single dewdrop on a rose petal, shallow depth of field

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. It features a close-up macro shot of a single dewdrop hanging from a pink rose petal. The depth of field is shallow, with the dewdrop and the immediate petal in sharp focus while the background and foreground elements are softly blurred. The lighting is natural and soft, highlighting the transparency and refraction within the droplet. VERDICT: PASS

An aerial bird's-eye view looking straight down at a trafficfal/ideogram/v3PASS
An aerial bird's-eye view looking straight down at a traffic roundabout with cars and trees

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. It features a true aerial bird's-eye view looking straight down at a traffic roundabout. The perspective is flat and top-down, showing the circular road, central greenery, and surrounding roads. The lighting is consistent, with clear shadows cast by the trees and cars, indicating a single light source (sunlight) from a specific angle. VERDICT: PASS

Medium 67%vs67%
A portrait lit with classic Rembrandt lighting — triangle offal/ideogram/v3FAIL
A portrait lit with classic Rembrandt lighting — triangle of light on the shadowed cheek, moody atmosphere

Evaluator (Gemma 4 26B)

The image fails to meet the specific lighting requirements. Rembrandt lighting is characterized by a small triangle of light on the shadowed cheek of the subject. In this image, the light source is a large, bright, geometric shape behind the subject, creating a silhouette effect rather than the classic facial light pattern. There is no light triangle on the shadowed cheek; instead, the face is largely in shadow. VERDICT: FAIL

A landscape photograph bathed in golden hour light — long wafal/ideogram/v3PASS
A landscape photograph bathed in golden hour light — long warm shadows, orange sky, soft glow

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt's requirements. The lighting is a classic "golden hour" depiction, featuring a vibrant orange sky and a soft, warm glow. The light hits the crest of the hill, creating long, warm shadows and highlighting the texture of the grass. The perspective is a standard landscape shot, and the depth of field is appropriate for a wide-angle landscape view. VERDICT: PASS

A high-key portrait of a person against a bright white backgfal/ideogram/v3PASS
A high-key portrait of a person against a bright white background, minimal shadows, airy and luminous

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. It features a high-key portrait with a bright, white, airy background. The lighting is soft and luminous, creating minimal shadows on the subject's face and neck. The depth of field is shallow, keeping the focus on the subject while the background remains a soft, bright white, fulfilling all the requirements for a high-key, airy aesthetic. VERDICT: PASS

Hard 67%vs67%
A portrait shot at 85mm f/1.4 with creamy bokeh background, fal/ideogram/v3PASS
A portrait shot at 85mm f/1.4 with creamy bokeh background, backlit with warm rim light separating the subject from a blurred garden

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. The perspective is a classic portrait shot. The depth of field is shallow, creating a very creamy bokeh in the garden background (lavender and roses) as specified by the 85mm f/1.4 description. Most importantly, there is a clear, warm rim light effect on the subject's hair and shoulders, effectively separating her from the background as requested. VERDICT: PASS

A street photograph with shallow depth of field as if shot afal/ideogram/v3PASS
A street photograph with shallow depth of field as if shot at 50mm f/1.8, subject sharp with soft city lights in the background

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. The subject (the saxophone player) is sharp and in focus, while the background features a shallow depth of field with soft, circular bokeh from the city lights. The perspective is consistent with a 50mm lens, and the lighting pattern of the street lights and reflections on the wet ground creates the desired atmosphere. VERDICT: PASS

A food photograph shot from above at 35mm f/2.8 with naturalfal/ideogram/v3FAIL
A food photograph shot from above at 35mm f/2.8 with natural window light from the left, slight vignette, warm color temperature

Evaluator (Gemma 4 26B)

The image fails to meet the primary requirement of the prompt. The prompt specifies a shot "from above" (top-down/flat lay perspective), but the image is captured from a low-angle, side-view perspective. While the lighting is warm and comes from the left, and the depth of field is shallow, the fundamental camera angle is incorrect. VERDICT: FAIL

Extreme 100%vs100%
A tilt-shift photograph of a city street making real-life lofal/ideogram/v3PASS
A tilt-shift photograph of a city street making real-life look like a miniature model — selective focus strip, saturated colors

Evaluator (Gemma 4 26B)

The image successfully implements the tilt-shift effect. It features a distinct selective focus strip where the red car is in sharp focus, while the foreground and background (the street, pedestrians, and buildings) are heavily blurred. This creates the intended "miniature model" aesthetic. Additionally, the colors are highly saturated, and the lighting is bright and even, matching the prompt's description of a vibrant, toy-like scene. VERDICT: PASS

A cinematic frame with blue and orange anamorphic lens flarefal/ideogram/v3PASS
A cinematic frame with blue and orange anamorphic lens flare streaking horizontally across a dark moody scene

Evaluator (Gemma 4 26B)

The image matches the prompt perfectly. It features a dark, moody scene with a lighthouse under a twilight sky. The lighting pattern includes distinct blue and orange light streaks that originate from the lighthouse lantern and extend horizontally across the frame, accurately simulating the requested anamorphic lens flare effect. VERDICT: PASS

A portrait with dramatic split lighting — half the face brigfal/ideogram/v3PASS
A portrait with dramatic split lighting — half the face brightly lit, the other half in deep shadow, high contrast chiaroscuro

Evaluator (Gemma 4 26B)

The image features a portrait with a clear split lighting pattern. A sharp line of light divides the face vertically: the left side (viewer's left) is brightly illuminated, while the right side is cast in deep, high-contrast shadow. This perfectly matches the "dramatic split lighting" and "chiaroscuro" description. The perspective is a standard eye-level portrait, and the lighting creates a strong sense of depth. VERDICT: PASS

Color PrecisionClick to expand75%vs83%
Easy 67%vs100%
A bright red sports car parked on a gray asphalt road — the fal/ideogram/v3FAIL
A bright red sports car parked on a gray asphalt road — the red is vivid and unambiguous

Evaluator (Gemma 4 26B)

The image features a sports car in a deep, dark burgundy or maroon shade. The prompt specifically requested a "bright red" that is "vivid and unambiguous." The actual color of the car is a dark, muted red, which fails to meet the requirement of being bright and vivid. The road is a dark gray asphalt, which matches the description. VERDICT: FAIL

A woman wearing a royal blue evening dress against a neutralfal/ideogram/v3PASS
A woman wearing a royal blue evening dress against a neutral background

Evaluator (Gemma 4 26B)

The woman is wearing a vibrant, deep blue evening dress that perfectly matches the description of "royal blue." The background consists of draped fabric in a light, muted blue/grey tone, which serves as a neutral backdrop. The colors specified in the prompt are accurately represented in the image. VERDICT: PASS

A field of pure sunflower yellow flowers under a clear blue fal/ideogram/v3PASS
A field of pure sunflower yellow flowers under a clear blue sky

Evaluator (Gemma 4 26B)

The image features a field of bright yellow sunflowers and a clear sky. The sunflowers are a vibrant, pure yellow, and the sky is a clear, bright blue. The colors match the prompt's description of "pure sunflower yellow flowers" and a "clear blue sky" perfectly. VERDICT: PASS

Medium 100%vs33%
A bedroom wall painted in #FF6B35 (burnt orange) with white fal/ideogram/v3PASS
A bedroom wall painted in #FF6B35 (burnt orange) with white furniture and natural daylight

Evaluator (Gemma 4 26B)

The image features a bedroom wall in a deep burnt orange color, which aligns well with the hex code #FF6B35 and the description "burnt orange." The furniture, including the bed and the side table, is white/off-white as requested. The scene is illuminated by natural daylight coming through the window/curtains. All color requirements are met. VERDICT: PASS

A forest scene where the foliage is precisely #2D5F2D (dark fal/ideogram/v3PASS
A forest scene where the foliage is precisely #2D5F2D (dark muted green) with brown tree trunks

Evaluator (Gemma 4 26B)

The image depicts a forest scene with tall, slender tree trunks and dense foliage. The foliage consists of a muted, dark green color that aligns well with the description of #2D5F2D (dark muted green). The tree trunks are a natural brown color as requested. The overall color palette is consistent with the prompt's specific color requirements. VERDICT: PASS

A room with walls in #4A90D9 (medium blue) with warm wooden fal/ideogram/v3PASS
A room with walls in #4A90D9 (medium blue) with warm wooden floor and white trim

Evaluator (Gemma 4 26B)

The walls in the image are a muted, grayish-blue, which aligns well with the requested medium blue (#4A90D9). The floor is a warm wooden color, and the window trim is white, matching the prompt's description of a warm wooden floor and white trim. All color requirements are met. VERDICT: PASS

Hard 67%vs100%
A minimalist room with walls in #F5E6CC (warm cream), a sofafal/ideogram/v3PASS
A minimalist room with walls in #F5E6CC (warm cream), a sofa in #2C3E50 (dark navy), and accent pillows in #E74C3C (bright red)

Evaluator (Gemma 4 26B)

The image accurately reflects the requested colors. The walls are a warm cream/beige tone, the sofa is a dark navy blue, and the accent pillows are a bright red. These colors align well with the specified hex codes and descriptions provided in the prompt. VERDICT: PASS

A product flat-lay on a #1A1A2E (deep dark blue) background,fal/ideogram/v3PASS
A product flat-lay on a #1A1A2E (deep dark blue) background, with a #F0C27F (soft gold) watch and #FFFFFF (pure white) earbuds

Evaluator (Gemma 4 26B)

The image accurately reflects the requested colors. The background is a deep, dark blue consistent with #1A1A2E. The watch has a soft gold metallic finish matching #F0C27F. The earbuds are a pure white (#FFFFFF). All three color-coded elements are correctly represented in the final composition. VERDICT: PASS

A graphic design composition with background #FAFAFA (near-wfal/ideogram/v3FAIL
A graphic design composition with background #FAFAFA (near-white), main shape #FF4500 (orange-red), and accent lines #333333 (dark charcoal)

Evaluator (Gemma 4 26B)

The image contains a central orange-red shape, which aligns with the requested #FF4500. The accent lines are a dark charcoal color, consistent with #333333. However, the background is not a solid #FAFAFA (near-white); it features a significant warm gradient with yellow and orange light leaks, making the background color inconsistent with the specific hex value requested. VERDICT: FAIL

Extreme 67%vs100%
A smooth gradient on a wall going from #FF0000 (red) on the fal/ideogram/v3PASS
A smooth gradient on a wall going from #FF0000 (red) on the left to #0000FF (blue) on the right, transitioning evenly through purple

Evaluator (Gemma 4 26B)

The image shows a gradient on a textured wall. On the far left, the color is a deep red/maroon, which aligns with #FF0000. On the far right, the color is a deep blue, aligning with #0000FF. The transition through purple in the center is smooth and even, as requested. The colors match the specified hex values and the described gradient transition. VERDICT: PASS

A sunset sky gradient on a flat surface from #FF6B35 (orangefal/ideogram/v3FAIL
A sunset sky gradient on a flat surface from #FF6B35 (orange) at the bottom to #1B1464 (deep indigo) at the top

Evaluator (Gemma 4 26B)

The image fails to follow the specific color instructions. The prompt requested a gradient from #FF6B35 (orange) at the bottom to #1B1464 (deep indigo) at the top. However, the image shows a light peach/pale orange at the top and a dark blue/teal at the horizon, which is the reverse of the requested vertical order. Additionally, the colors do not match the specific hex values provided. VERDICT: FAIL

A horizontal banner with a gradient from #00C9FF (cyan) on tfal/ideogram/v3PASS
A horizontal banner with a gradient from #00C9FF (cyan) on the left to #92FE9D (mint green) on the right

Evaluator (Gemma 4 26B)

The image shows a horizontal gradient transition. On the left, the color is a vibrant cyan/bright blue, which aligns well with #00C9FF. On the right, the color transitions into a light mint green, which aligns with #92FE9D. The gradient flows horizontally across the image as requested. VERDICT: PASS

PhotorealismClick to expand33%vs67%
Easy 33%vs67%
A single red sneaker on a clean white background with soft sfal/ideogram/v3FAIL
A single red sneaker on a clean white background with soft studio lighting and gentle shadow — product photography style

Evaluator (Gemma 4 26B)

The image fails to meet the criteria for a professional product shot. The sneaker lacks realistic textures, appearing more like a smooth, plastic 3D render than a real shoe. There are significant AI artifacts: the laces are nonsensical and blend into the shoe body, and the eyelets are poorly defined. The lighting is flat, and the "shoe" lacks any structural detail or realistic material properties expected in studio photography. VERDICT: FAIL

A clear glass perfume bottle on a reflective black surface wfal/ideogram/v3FAIL
A clear glass perfume bottle on a reflective black surface with studio lighting, no background distractions

Evaluator (Gemma 4 26B)

The image fails to meet the criteria for a professional studio shot. The glass bottle exhibits significant AI artifacts, particularly in the way the light refracts through the glass and the distorted, nonsensical shapes within the bottle's body. The reflection on the black surface is inconsistent with the object's geometry, and the edges of the glass appear warped and unnatural. It does not pass as a real photograph. VERDICT: FAIL

A ripe green apple on a seamless white background with soft fal/ideogram/v3PASS
A ripe green apple on a seamless white background with soft side lighting and subtle shadow — catalog style

Evaluator (Gemma 4 26B)

The image successfully captures a high-quality studio aesthetic. The lighting is soft and realistic, creating a natural highlight on the apple and a gentle, diffused shadow on the surface below. The texture of the apple skin is detailed, and the seamless white background is clean. There are no obvious AI artifacts or distortions; the stem and the contact point with the surface look physically accurate. VERDICT: PASS

Graphical design 71%vs92%

Data VisualisationClick to expand33%vs67%
Easy 33%vs67%
A bar chart showing monthly sales for January through June wfal/ideogram/v3FAIL
A bar chart showing monthly sales for January through June with values 12K, 18K, 15K, 22K, 19K, 25K — labeled axes, color-coded bars, and a title 'Monthly Sales 2025'

Evaluator (Qwen 3 VL)

The chart shows only one month (January) despite requesting data for January through June. Bar values (12K, 18K, 15K, 22K, 25K) match, but axes are mislabeled and incomplete. No month labels for other months, no y-axis scale, and no title for the x-axis. Chart type is incorrect — it’s a stacked bar chart, not a standard bar chart. VERDICT: FAIL

A pie chart showing market share: Company A 35%, Company B 2fal/ideogram/v3PASS
A pie chart showing market share: Company A 35%, Company B 25%, Company C 20%, Others 20% — each slice a different color with labels and percentages

Evaluator (Qwen 3 VL)

The pie chart correctly displays four segments: Company A (35%, red), Company B (25%, blue), Company C (20%, yellow), and Others (20%, green). Each slice has a distinct color, clear label, and percentage. The chart type matches the request. All text is readable and properly positioned. The visual elements are clean and accurate. VERDICT: PASS

A simple line graph showing daily website visitors over one fal/ideogram/v3FAIL
A simple line graph showing daily website visitors over one week (Mon-Sun): 1200, 1500, 1100, 1800, 2000, 1700, 1400 — with grid lines, axis labels, and title 'Weekly Traffic'

Evaluator (Qwen 3 VL)

The chart shows a line graph titled "Weekly Traffic" with grid lines and axis labels. However, the x-axis labels (1200, 1500, etc.) are misaligned with the data points and do not represent days (Mon-Sun). The y-axis values are correct, but the x-axis lacks day labels. The chart type matches, but data representation is inaccurate. VERDICT: FAIL

Layout & DesignClick to expand78%vs100%
Easy 100%vs100%
A movie night poster with a large popcorn image in the centefal/ideogram/v3PASS
A movie night poster with a large popcorn image in the center, the title 'MOVIE NIGHT' in bold at the top, and the tagline 'Every Friday at 8 PM' at the bottom

Evaluator (Qwen 3 VL)

The poster features a large popcorn bucket centered, with “MOVIE NIGHT” in bold arched text at the top and “EVERY FRIDAY AT 8 PM” below it — matching the prompt’s layout. Text is readable and correctly placed. Visual hierarchy is clear, with the title dominating and the tagline supporting it. All required elements are present and correctly positioned. VERDICT: PASS

An event poster featuring a guitar illustration, the headlinfal/ideogram/v3PASS
An event poster featuring a guitar illustration, the headline 'LIVE MUSIC FEST' in large text, and 'Tickets available now' as a tagline below

Evaluator (Qwen 3 VL)

The poster features a guitar illustration, the headline "LIVE MUSIC FEST" in large, bold text with clear visual hierarchy, and "TICKETS AVAILABLE NOW" as a tagline below. All text is readable and correctly placed. The layout matches the prompt’s description precisely. VERDICT: PASS

A book club poster with an open book illustration, the titlefal/ideogram/v3PASS
A book club poster with an open book illustration, the title 'JOIN THE BOOK CLUB' prominently displayed, and 'Meets every Wednesday' as the tagline

Evaluator (Qwen 3 VL)

The image features an open book illustration and the text "Join the Book Club" prominently displayed above "Meets Every Wednesday," with correct visual hierarchy. Text is readable and correctly placed. All required layout elements are present. VERDICT: PASS

Medium 67%vs100%
A landing page design with a hero header image at the top, afal/ideogram/v3PASS
A landing page design with a hero header image at the top, a headline and subtitle in the middle, a 'Sign Up' call-to-action button, and a footer with three social media icons

Evaluator (Qwen 3 VL)

Hero header image present with headline “Streamline Your Workflow” and subtitle “Collaborate seamlessly and achieve your goals.” “Sign Up” button is visible in top-right. Footer has three social icons, though mislabeled (“Tweebook,” “Faceitl Fgfook”). Text is readable and correctly placed except for icon labels. Core layout and hierarchy are intact. VERDICT: PASS

A newsletter layout with a banner image header, a two-columnfal/ideogram/v3PASS
A newsletter layout with a banner image header, a two-column body section with text on the left and an image on the right, a highlighted quote block, and a footer with unsubscribe link

Evaluator (Qwen 3 VL)

The image includes a banner header, two-column body (text left, image right), a quote block, and footer with unsubscribe link. Text is readable and correctly placed. Layout hierarchy is clear and follows the prompt. All required elements are present and visually distinct. VERDICT: PASS

A restaurant menu layout with a logo header, three category fal/ideogram/v3FAIL
A restaurant menu layout with a logo header, three category sections (appetizers, mains, desserts) each with items and prices, and a footer with address and hours

Evaluator (Qwen 3 VL)

The menu shows a logo header (“The Cozy Bistro”), three category sections (Appetizers, Mains, Desserts) with items and prices, and a footer with address/hours. However, all text is gibberish and unreadable, violating the requirement for readable, correctly placed text. Layout structure is present but text content fails. VERDICT: FAIL

Hard 67%vs100%
A magazine double-page spread with a left page containing a fal/ideogram/v3FAIL
A magazine double-page spread with a left page containing a large hero photo, and a right page with a two-column article, a pull quote, a sidebar with author bio, and a page number

Evaluator (Qwen 3 VL)

The left page features a large hero photo of the cream jar, matching the prompt. The right page has a two-column article layout, a pull quote, and a sidebar with an author bio and page number. However, all text is gibberish and unreadable, violating the requirement for readable text. Layout structure is correct, but content fails. VERDICT: FAIL

A mobile app UI mockup showing a dashboard with a top navigafal/ideogram/v3PASS
A mobile app UI mockup showing a dashboard with a top navigation bar, a search field, a 2x2 grid of feature cards with icons, a list of recent activity items, and a bottom tab bar

Evaluator (Qwen 3 VL)

The image shows a mobile app UI with a top navigation bar, search field, 2x2 grid of feature cards, recent activity list, and bottom tab bar — all present. Text is readable and correctly placed. The layout hierarchy is logical and consistent with standard UI design. Minor typos in card labels don't affect core structure or readability. VERDICT: PASS

A fashion magazine spread with a full-bleed left-page photo,fal/ideogram/v3PASS
A fashion magazine spread with a full-bleed left-page photo, right page with three-column layout, headline, body copy, a smaller inset image, and styled page numbers

Evaluator (Qwen 3 VL)

Left page has full-bleed photo; right page has headline, body copy, inset image, and styled page numbers. Text is readable and correctly placed. Layout matches prompt with clear visual hierarchy. All described elements are present and correctly implemented. VERDICT: PASS

Style DiversityClick to expand75%vs92%
Easy 33%vs67%
A golden retriever sitting in a garden, painted in the stylefal/ideogram/v3FAIL
A golden retriever sitting in a garden, painted in the style of an oil painting with visible brushstrokes and rich texture

Evaluator (Qwen 3 VL)

The image depicts a golden retriever in a garden setting. It exhibits visible brushstroke textures and rich color blending characteristic of oil painting. However, the fur and background details appear overly smooth and digitally rendered, lacking the coarse, tactile texture expected in traditional oil paintings. The style is suggestive but not authentically rendered. VERDICT: FAIL

A golden retriever sitting in a garden, rendered in Japanesefal/ideogram/v3FAIL
A golden retriever sitting in a garden, rendered in Japanese anime style with flat colors and large expressive eyes

Evaluator (Qwen 3 VL)

The image depicts a golden retriever in a garden with cherry blossoms and a stone lantern, fitting the setting. However, it lacks flat colors and large expressive eyes characteristic of Japanese anime style. The rendering uses soft shading and realistic fur detail, resembling Western illustration or digital painting rather than anime. Key style elements are absent. VERDICT: FAIL

A golden retriever sitting in a garden, rendered in pixel arfal/ideogram/v3PASS
A golden retriever sitting in a garden, rendered in pixel art style with a 32x32 grid and limited color palette

Evaluator (Qwen 3 VL)

The image clearly depicts a golden retriever in a garden, rendered in a pixel art style with visible 32x32 grid blocks and a limited, blocky color palette. Key characteristics—blocky forms, low-resolution detail, and distinct color fills—are present. The style is unmistakably pixel art, matching the prompt’s specifications. VERDICT: PASS

Medium 100%vs100%
A city street scene in the style of Art Deco — geometric shafal/ideogram/v3PASS
A city street scene in the style of Art Deco — geometric shapes, gold and black palette, ornamental symmetry

Evaluator (Qwen 3 VL)

The image features Art Deco elements: geometric building facades, symmetrical street layout, ornate lampposts, and a streamlined car with chrome accents. The palette uses warm golds and deep browns, aligning with the requested gold-and-black theme. Visual symmetry and stylized forms are evident throughout. VERDICT: PASS

A woman with a parasol in the style of Impressionism — soft fal/ideogram/v3PASS
A woman with a parasol in the style of Impressionism — soft brushstrokes, dappled light, pastel palette

Evaluator (Qwen 3 VL)

The image displays soft, visible brushstrokes, a pastel palette (lavender, blues, yellows), and dappled light effects. The Impressionist style is clearly recognizable through its focus on light, color, and atmospheric rendering over sharp detail. The woman with a parasol fits the prompt’s subject and aesthetic. VERDICT: PASS

A portrait of a man in Pop Art style — bold outlines, Ben-Dafal/ideogram/v3PASS
A portrait of a man in Pop Art style — bold outlines, Ben-Day dots, flat vivid primary colors

Evaluator (Qwen 3 VL)

The image clearly exhibits Pop Art style: bold black outlines, Ben-Day dot patterns on skin, and flat, vivid primary colors (teal, yellow, blue, red). The comic-book speech bubble and stylized hair further reinforce the aesthetic. All requested visual characteristics are present and unmistakable. VERDICT: PASS

Hard 67%vs100%
A forest rendered in photorealistic watercolor — the detail fal/ideogram/v3FAIL
A forest rendered in photorealistic watercolor — the detail and lighting of a photo combined with visible watercolor bleeds and paper texture

Evaluator (Qwen 3 VL)

The image shows a forest with soft, blended color transitions and visible watercolor bleeds, especially in foliage and sky. Paper texture is subtly implied through grainy washes. Lighting mimics photographic depth with atmospheric haze and directional beams. However, the overall rendering lacks true photorealism in texture and sharpness, leaning more toward painterly realism. Key watercolor traits are present but not dominant. VERDICT: FAIL

A woman in a futuristic city rendered in cyberpunk Art Nouvefal/ideogram/v3PASS
A woman in a futuristic city rendered in cyberpunk Art Nouveau — neon colors and tech elements combined with flowing organic lines and floral borders

Evaluator (Qwen 3 VL)

The image blends cyberpunk neon lighting with Art Nouveau’s flowing, organic lines and floral motifs. The woman wears glowing cybernetic elements, surrounded by stylized orchids and curvilinear architecture. Neon pink and cyan hues dominate, merging tech with nature. Key style elements are clearly present and harmoniously integrated. VERDICT: PASS

A medieval castle depicted in Japanese ukiyo-e woodblock prifal/ideogram/v3PASS
A medieval castle depicted in Japanese ukiyo-e woodblock print style combined with photorealistic sky and lighting

Evaluator (Qwen 3 VL)

The image blends ukiyo-e style (bold outlines, flat colors, woodblock texture) with a photorealistic sky featuring dramatic, swirling clouds and a luminous sun. The castle’s architecture merges Japanese pagoda roofs with European medieval stone walls. Key stylistic elements are present: strong lines, limited palette, and stylized natural elements. The lighting and sky contrast with the flat, illustrative foreground. VERDICT: PASS

Extreme 100%vs100%
An interior scene rendered in soft brutalism — raw concrete fal/ideogram/v3PASS
An interior scene rendered in soft brutalism — raw concrete textures combined with rounded pastel furniture and warm diffused light

Evaluator (Qwen 3 VL)

The image shows raw concrete walls and arches, matching soft brutalism. The pastel blue, rounded sectional sofa and warm, diffused light from the lamp and window are clearly present. Textures are visible, and the overall mood aligns with the prompt. All key visual characteristics are recognizable and well-executed. VERDICT: PASS

A still life painted in anxious minimalism — sparse compositfal/ideogram/v3PASS
A still life painted in anxious minimalism — sparse composition with uncomfortable negative space, muted colors, and subtle visual tension

Evaluator (Qwen 3 VL)

The image shows a cracked cup in sparse composition with ample negative space, muted earthy tones, and dramatic lighting creating visual tension. The minimalist aesthetic and sense of unease align with “anxious minimalism.” The crack introduces subtle imperfection and discomfort. All key elements of the requested style are clearly present. VERDICT: PASS

A cityscape rendered in nostalgic futurism — retro-futuristifal/ideogram/v3PASS
A cityscape rendered in nostalgic futurism — retro-futuristic 1960s space-age aesthetics with a melancholic warm-toned patina

Evaluator (Qwen 3 VL)

The image clearly exhibits retro-futurism: sleek, curved skyscrapers, flying cars, and orbital roads evoke 1960s space-age design. The warm, sepia-toned patina and soft lighting create a melancholic atmosphere. Key visual elements like chrome surfaces, geometric forms, and atmospheric haze are present. Style is unmistakably recognizable. VERDICT: PASS