vs

192 evaluations across 6 categories

57%vs84%

Pass Rate

Text Rendering › Typography Style › Easylocal/hidream-i1-full-17b

Prompt: The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink

Text Rendering 53%vs93%

Typography StyleClick to expand100%vs100%

Easy 100%vs100%

local/hidream-i1-full-17bPASS

The word 'CHAPTER ONE' typed on aged paper with a vintage typewriter font, complete with slightly uneven ink

Evaluator (Qwen 3 VL)

The text "CHAPTER ONE" is spelled correctly and rendered in a bold, blocky font resembling vintage typewriter lettering. The ink appears slightly uneven and textured, matching the "slightly uneven ink" description. The background mimics aged paper. All requested elements are present and convincingly rendered. VERDICT: PASS

local/hidream-i1-full-17bPASS

The phrase 'TODAY'S SPECIAL' written in white chalk on a dark green chalkboard with chalk dust

Evaluator (Qwen 3 VL)

Text reads "TODAY'S SPECIAL" correctly. Letters show convincing chalk texture and slight smudges on dark green board. A white chalk stick rests nearby. The style matches the prompt’s medium: chalk-on-chalkboard. No obvious artifacts or misrendering. VERDICT: PASS

local/hidream-i1-full-17bPASS

The word 'OPEN' displayed on a red LED sign with glowing pixel-style letters

Evaluator (Qwen 3 VL)

The word "OPEN" is correctly spelled and rendered in a glowing, pixel-style LED matrix with bright red lights, matching the described medium precisely. The letters are clearly defined with individual dots, mimicking classic LED signage. No artifacts or misrendering are present. VERDICT: PASS

Writing accuracyClick to expand42%vs92%

Easy 100%vs100%

local/hidream-i1-full-17bPASS

A single large word 'HELLO' centered on a white background

Evaluator (Qwen 3 VL)

The word "HELLO" is spelled correctly and fully readable. All letters are present and clear against the white background. The text is centered as intended. No misspellings or illegible parts detected. VERDICT: PASS

local/hidream-i1-full-17bPASS

The word 'ADVENTURE' in big bold letters centered on a plain cream background

Evaluator (Qwen 3 VL)

The word "ADVENTURE" is spelled correctly and fully readable in bold, centered text on a plain cream background. All letters are intact and clearly legible. No misspellings or illegible parts detected. The visual matches the intended description precisely. VERDICT: PASS

local/hidream-i1-full-17bPASS

The word 'SERENITY' written large and centered on a solid light blue background

Evaluator (Qwen 3 VL)

The word "SERENITY" is spelled correctly and fully readable, centered on a solid light blue background. All letters are distinct, legible, and properly formed. No misspellings or missing parts. The text matches the intended specification exactly. VERDICT: PASS

Medium 33%vs100%

local/hidream-i1-full-17bPASS

The sentence 'The quick brown fox jumps high' in clean black text on a white background

Evaluator (Qwen 3 VL)

All words are spelled correctly: "THE", "QUICK", "BROWN", "FOX", "JUMPS", "HIGH". Text is fully readable, bold, black, on a light background. No misspellings or illegible parts. Matches the intended text exactly. VERDICT: PASS

local/hidream-i1-full-17bFAIL

'Every sunset brings a new dawn ahead' written neatly on a pale yellow background

Evaluator (Qwen 3 VL)

The text reads "Every sunset brings a new dowad," where "dowad" is a misspelling of "dawn." The word "dawn" is missing the final 'n' and has an extra 'a'. All other words are correctly spelled and readable. The misspelling significantly alters the intended message. VERDICT: FAIL

local/hidream-i1-full-17bFAIL

'Stars shine brightest in the dark night' displayed on a clean off-white background

Evaluator (Qwen 3 VL)

The word "BIGHTEST" is misspelled — it should be "BRIGHTEST". All other words are correctly spelled and fully readable. The text is clear against the off-white background. VERDICT: FAIL

Hard 33%vs100%

local/hidream-i1-full-17bFAIL

A poster with headline 'SUMMER SALE' in large bold text, body text 'Up to 50% off all items this weekend only' in medium size, and fine print 'Terms and conditions apply. See store for details.' at the bottom in small text

Evaluator (Qwen 3 VL)

Text "SUMMER SALE" is correct. Body text "Up to 50% off this ALL ITEMS THIS WEEKEND ONLY." is mostly correct but "50 off off" is redundant and awkward. Fine print has "endelp" (misspelled "end" + "elp") and "detailts" (misspelled "details"). Multiple errors in fine print make it illegible and incorrect. VERDICT: FAIL

local/hidream-i1-full-17bPASS

A flyer with large title 'GRAND OPENING', subtitle 'Join us for food, music, and fun' in medium text, and small disclaimer 'RSVP required. Limited seating available.' at the bottom

Evaluator (Qwen 3 VL)

Text matches intended content exactly: "GRAND OPENING", "JOIN US FOR FOOD, MUSIC, AND FUN", and "RSVP required. Limited seating available." All words are spelled correctly and fully readable. No misspellings or illegible parts detected. VERDICT: PASS

local/hidream-i1-full-17bFAIL

An event banner with headline 'ANNUAL GALA 2025' at the top in large serif font, description 'An evening of fine dining and live jazz' in medium text, and footnote 'Black tie optional. Doors open at 6 PM.' in small text

Evaluator (Qwen 3 VL)

“ANNUAL” is misspelled as “ANNUIL”. “AT BRD1” is illegible and incorrect; likely meant “AT BRD1” or “AT BROADWAY”. “BRD1” appears garbled. All other text is readable and correct. Core text accuracy fails due to misspellings and illegible parts. VERDICT: FAIL

Extreme 0%vs67%

local/hidream-i1-full-17bFAIL

A vintage movie poster for 'THE LAST VOYAGE' featuring a tagline 'One ship. One chance. No turning back.' with credits block reading 'Directed by James Harlow. Starring Elena Voss, Marcus Chen, and Sofia Reyes. Music by Thomas Grant. A Paramount Pict

Evaluator (Qwen 3 VL)

Text errors: “DEDPEND.” (misspelled “Thomas Dedpend” — should be “Thomas Grant”), “TIME PATTUANS” (on ship — likely “Titanic” misrendered). “Starring” omitted before Sofia Reyes. “A Paramount Pictures Release” missing. VERDICT: FAIL

local/hidream-i1-full-17bFAIL

A magazine cover titled 'NATIONAL EXPLORER' with cover lines 'The Hidden Temples of Cambodia — page 34', 'Interview: Dr. Maria Santos on Climate Action', 'Top 10 Trails You Must Hike This Year', and a barcode area with 'Issue 247 | March 2025 | $6.99

Evaluator (Qwen 3 VL)

Text errors: "DR. MARIA TRAILS" (should be "SANTOS"), "YOU MUST ACT" (should be "TOP 10 TRAILS"), "March 225 Yer" (should be "March 2025"), and "$8.99 $5.99" (price mismatch). Multiple words misspelled or incorrect. VERDICT: FAIL

local/hidream-i1-full-17bFAIL

A product label for 'GOLDEN HARVEST ORGANIC HONEY' with subtitle 'Raw & Unfiltered', weight '12 oz (340g)', description 'Pure wildflower honey harvested from sustainable apiaries in the Pacific Northwest. No additives or preservatives.', and small te

Evaluator (Qwen 3 VL)

The text "Unfiltered" is misspelled as "Unfilterd". The description reads "Store [illegible], [illegible] in the Pac Northwest." and "Prod. [illegible] of USA:", with multiple words garbled or unreadable. The weight "12 oz (340g)" is missing entirely. VERDICT: FAIL

Spatial Reasoning 49%vs79%

Attributes BindingClick to expand67%vs100%

Easy 100%vs100%

local/hidream-i1-full-17bPASS

A red ball and a blue cube sitting side by side on a white surface

Evaluator (Qwen 3.5 122B)

The image displays a red sphere (ball) on the left and a blue cube on the right, resting on a white surface. The attributes are correctly bound: the ball is red and the cube is blue. There are no attribute swaps or missing elements. VERDICT: PASS

local/hidream-i1-full-17bPASS

A green apple and an orange pumpkin on a wooden cutting board

Evaluator (Qwen 3.5 122B)

The image displays a large orange pumpkin in the background and a smaller green apple in the foreground. Both objects are resting on a light-colored wooden cutting board with visible grain. The attributes are correctly bound: the apple is green and the pumpkin is orange. There are no attribute swaps or missing elements. VERDICT: PASS

local/hidream-i1-full-17bPASS

A yellow taxi and a black limousine parked next to each other on a street

Evaluator (Qwen 3.5 122B)

The image displays two vehicles parked on a city street. 1. **Yellow Taxi:** A yellow sedan is in the foreground. It has a taxi roof sign, a "T" logo on the door, and is clearly a taxi. The color yellow is correctly bound to this object. 2. **Black Limousine:** A black luxury sedan (resembling a Mercedes-Benz S-Class) is parked behind/next to the taxi. While technically a sedan, it fits the visual archetype of a "black car" or town car service often referred to as a limousine. The color blac

Medium 100%vs100%

local/hidream-i1-full-17bPASS

A large red circle, a small blue triangle, and a medium green star arranged on a white canvas

Evaluator (Qwen 3.5 122B)

The image displays three distinct geometric shapes on a white background. 1. **Red Circle:** It is large and red, matching the prompt "large red circle". 2. **Blue Triangle:** It is blue and triangular. While its size is comparable to the star, it is smaller than the circle, fitting the "small" descriptor relative to the "large" circle. 3. **Green Star:** It is green and star-shaped, fitting the "medium green star" description. All attributes (color, shape, relative size) are correctly bou

local/hidream-i1-full-17bPASS

A tall thin white candle, a short fat red candle, and a medium spiral yellow candle on a shelf

Evaluator (Qwen 3.5 122B)

The image displays three candles on a white shelf against a dark background. 1. **Left Object:** A tall, thin, white candle (unlit). This matches "tall thin white candle". 2. **Middle Object:** A short, fat, red candle (lit). This matches "short fat red candle". 3. **Right Object:** A medium-height, spiral-shaped yellow candle (lit). This matches "medium spiral yellow candle". All attributes (color, size, shape) are correctly bound to the specific objects described in the prompt with no sw

local/hidream-i1-full-17bPASS

A big brown horse, a small white rabbit, and a medium-sized black cat standing in a meadow

Evaluator (Qwen 3.5 122B)

The image displays a brown horse, a white rabbit, and a black (with white markings) cat. - **Horse:** Observed as big and brown. Matches "big brown horse". - **Rabbit:** Observed as small and white. Matches "small white rabbit". - **Cat:** Observed as medium-sized (relative to the rabbit) and black/dark. Matches "medium-sized black cat". There are no attribute swaps (e.g., the horse is not white, the rabbit is not brown). While the cat and rabbit are sitting rather than standing, the core att

Hard 0%vs100%

local/hidream-i1-full-17bFAIL

A pink fire truck, a blue banana, and a green sun in a surreal landscape

Evaluator (Qwen 3.5 122B)

The image contains a pink fire truck and a blue banana, which matches the prompt. However, the prompt requested a "green sun," but the sun in the image is yellow. Additionally, there is a green banana present, suggesting a swap of attributes or an extra object not requested. The specific attribute "green" is bound to a banana instead of the sun. VERDICT: FAIL

local/hidream-i1-full-17bFAIL

A purple carrot, a red tree with red leaves, and a white watermelon sliced open

Evaluator (Qwen 3.5 122B)

The image contains two purple carrots with green tops, which matches the "purple carrot" request (though there are two instead of one). The background features a tree with vibrant red leaves, matching the "red tree with red leaves" request. However, the watermelon slice has red flesh and black seeds, whereas the prompt specifically requested a "white watermelon." The color attribute for the watermelon is incorrect. VERDICT: FAIL

local/hidream-i1-full-17bFAIL

An orange elephant, a yellow penguin, and a silver strawberry arranged on a table

Evaluator (Qwen 3.5 122B)

The image shows an orange elephant (correct), a large penguin that is black, white, and yellow (not purely yellow), a small yellow chick-like bird (not a penguin), and a red strawberry (not silver). The prompt requested a "silver strawberry" and a "yellow penguin," but the strawberry is red and the penguin is multicolored while the yellow object is a different bird. The attributes are incorrectly bound. VERDICT: FAIL

CompositionalityClick to expand78%vs89%

Easy 100%vs67%

local/hidream-i1-full-17bPASS

A tabby cat wearing oversized red sunglasses

Evaluator (Qwen 3.5 122B)

* **Tabby cat:** Present. The animal has distinct tabby striping and spotting on its fur. * **Oversized red sunglasses:** Present. The cat is wearing large, round glasses with bright red frames that are disproportionately large for its face. * **Wearing:** Present. The glasses are positioned correctly on the cat's face. All described elements are clearly visible and combined coherently. VERDICT: PASS

local/hidream-i1-full-17bPASS

A golden retriever wearing a tiny cowboy hat

Evaluator (Qwen 3.5 122B)

The image displays a Golden Retriever wearing a cowboy hat. - **Golden Retriever:** Present. The dog has the characteristic golden coat and facial features. - **Wearing:** Present. The hat is on the dog's head. - **Tiny Cowboy Hat:** Present. A cowboy hat is clearly visible. While the hat appears to be a standard size rather than explicitly "tiny" (it looks quite large on the dog), the core object is correct. - **Coherently Combined:** Yes. The elements are combined naturally. VERDICT: PASS

local/hidream-i1-full-17bPASS

A penguin holding a colorful umbrella

Evaluator (Qwen 3.5 122B)

Penguin: Present. Holding: Present (flipper grips the handle). Colorful umbrella: Present (rainbow segments). All described elements are clearly visible and coherently combined. The penguin stands upright, gripping the umbrella handle, with the colorful canopy open above. The scene matches the prompt perfectly without logical inconsistencies. VERDICT: PASS

Medium 100%vs100%

local/hidream-i1-full-17bPASS

A medieval knight in full armor sitting at a desk typing on a modern laptop

Evaluator (Qwen 3.5 122B)

The image successfully depicts all elements of the prompt: 1. **Medieval knight in full armor:** Present. The figure wears a helmet, plate armor, chainmail, and gauntlets. 2. **Sitting at a desk:** Present. The knight is seated on a wooden chair at a wooden table. 3. **Typing on a modern laptop:** Present. A silver laptop is open on the desk, and the knight's gloved hands are positioned on the keyboard. The concepts are combined coherently with realistic lighting and textures. VERDICT: PA

local/hidream-i1-full-17bPASS

A Roman gladiator taking a selfie with a smartphone in the Colosseum

Evaluator (Qwen 3.5 122B)

The image successfully depicts all requested elements: 1. **Roman gladiator:** Present (man in armor, helmet with red crest, red cape). 2. **Taking a selfie:** Present (arm extended, holding phone towards self). 3. **Smartphone:** Present (black modern device). 4. **In the Colosseum:** Present (iconic arches and arena floor in background). The anachronistic combination is executed coherently. VERDICT: PASS

local/hidream-i1-full-17bPASS

An astronaut in a spacesuit riding a bicycle through a park

Evaluator (Qwen 3.5 122B)

The image successfully depicts all requested elements: 1. **Astronaut in a spacesuit:** Present (white suit, helmet, backpack, flag patch). 2. **Riding a bicycle:** Present (figure is seated on a bike, holding handlebars). 3. **Through a park:** Present (background features trees, grass, and a path). The concepts are combined coherently with realistic lighting and perspective. VERDICT: PASS

Hard 33%vs100%

local/hidream-i1-full-17bFAIL

An octopus wearing a top hat and monocle playing chess on the surface of the moon with Earth visible in the background

Evaluator (Qwen 3.5 122B)

The image contains most elements but misses one specific detail requested in the prompt. - **Octopus:** Present (large orange octopus). - **Top hat:** Present (black top hat on head). - **Monocle:** **Missing** (the octopus has large eyes, but no monocle is visible). - **Playing chess:** Present (chessboard with pieces). - **Surface of the moon:** Present (cratered grey ground). - **Earth in background:** Present (blue planet visible). Because the monocle is missing, not all described elemen

local/hidream-i1-full-17bPASS

A dinosaur in a business suit giving a PowerPoint presentation in a modern office while drinking coffee

Evaluator (Qwen 3.5 122B)

The image successfully combines all the requested elements: 1. **Dinosaur:** A T-Rex head and tail are clearly visible. 2. **Business suit:** The figure is wearing a full suit and tie. 3. **Modern office:** The background features large windows, a desk, and office plants. 4. **Drinking coffee:** The figure holds a steaming white mug. 5. **Giving a PowerPoint presentation:** While the screen is blank, the open laptop, the charts on the desk, and the gesturing pose strongly imply a presenta

local/hidream-i1-full-17bFAIL

A robot dressed as a pirate sailing a paper boat on a lake of lava while juggling three kittens

Evaluator (Qwen 3.5 122B)

The image contains a robot pirate in a paper boat. However, there are discrepancies with the prompt. 1. **Robot dressed as a pirate:** Present (wearing a pirate hat). 2. **Paper boat:** Present. 3. **Lake of lava:** Partially present (fire/lava effects are on the water surface, but the background is a normal forest lake). 4. **Juggling three kittens:** Incorrect. There are **four** kittens visible (two in the air, two in/on the boat). Additionally, the robot is not actively juggling three

CountingClick to expand22%vs56%

Easy 33%vs100%

local/hidream-i1-full-17bFAIL

Exactly 3 red apples arranged on a wooden table

Evaluator (Qwen 3.5 122B)

The image displays 6 red apples arranged in two rows of three on a wooden table. The prompt requested exactly 3 red apples. Since the actual count (6) does not match the requested count (3), the image fails to meet the specific constraint. VERDICT: FAIL

local/hidream-i1-full-17bPASS

Exactly 5 yellow rubber ducks sitting in a row on a bathroom shelf

Evaluator (Qwen 3.5 122B)

The image displays exactly five yellow rubber ducks lined up in a row on a shelf. The count matches the request perfectly. VERDICT: PASS

local/hidream-i1-full-17bFAIL

Exactly 4 white candles standing on a dark surface

Evaluator (Qwen 3.5 122B)

The image displays five white candles standing in a row on a dark surface, each with a lit flame. The prompt specifically requested "Exactly 4 white candles." Since there are five candles visible instead of four, the count does not match the request. VERDICT: FAIL

Medium 33%vs67%

local/hidream-i1-full-17bFAIL

2 orange cats and 3 black dogs sitting together on a grassy lawn

Evaluator (Qwen 3.5 122B)

The image displays 2 orange cats sitting in the front. Behind them, there are only 2 black dogs visible (one in the center and one on the right). The prompt requested 3 black dogs. Therefore, the count of dogs is incorrect. VERDICT: FAIL

local/hidream-i1-full-17bPASS

4 red roses and 2 white daisies in a glass vase on a table

Evaluator (Qwen 3.5 122B)

The image displays exactly 4 red roses (one left, one top center, one right center, one far right) and 2 white daisies (one prominent in the front, one in the background). They are arranged in a clear glass vase sitting on a light-colored surface. The count and objects match the prompt description perfectly. VERDICT: PASS

local/hidream-i1-full-17bFAIL

1 blue butterfly and 3 yellow butterflies resting on a garden fence

Evaluator (Qwen 3.5 122B)

The image shows one large blue butterfly on the left and two yellow butterflies on the right side of the wooden post. The prompt specifically requested "3 yellow butterflies." Since there are only two yellow butterflies visible in the image, the count does not match the request. VERDICT: FAIL

Hard 0%vs0%

local/hidream-i1-full-17bFAIL

Exactly 9 goldfish swimming in a large round fishbowl

Evaluator (Qwen 3.5 122B)

The image contains significantly more than 9 goldfish. Counting the clearly visible fish, there are at least 10 distinct fish in the foreground (top left, far left, middle left, bottom left upper, bottom left lower, top middle, top right, center, bottom right, right side). Additionally, there are several blurry fish visible in the background (top center, middle center, etc.), bringing the total count to around 12 or 13. The count does not match the requested "Exactly 9". VERDICT: FAIL

local/hidream-i1-full-17bFAIL

A fruit bowl containing exactly 3 bananas, 4 oranges, and 2 apples

Evaluator (Qwen 3.5 122B)

The image shows a fruit bowl, but the counts do not match the prompt's request. - **Bananas:** The prompt asked for 3, but there are only 2 visible bananas. - **Oranges:** The prompt asked for 4, but there are only 2 oranges. - **Apples:** The prompt asked for 2, and there are 2 apples. Since the counts for bananas and oranges are incorrect, the image fails to meet the specific requirements. VERDICT: FAIL

local/hidream-i1-full-17bFAIL

Exactly 11 colorful balloons floating against a clear blue sky

Evaluator (Qwen 3.5 122B)

The image contains approximately 12 balloons. Visible balloons include: a top orange, top pink, top left green, center red, far right light blue, bottom left dark blue, far left red (partially hidden), center left light blue, bottom center yellow, bottom right orange, bottom right red, and a green balloon partially visible underneath the yellow one. The count of 12 does not match the requested count of exactly 11. VERDICT: FAIL

NegationClick to expand22%vs56%

Easy 67%vs67%

local/hidream-i1-full-17bPASS

A tropical beach scene with no people

Evaluator (Qwen 3.5 122B)

The image depicts a pristine tropical beach with white sand, lush palm trees leaning over the shore, and clear turquoise water leading to a distant mountain. I have carefully scanned the beach, the water, and the background. There are no visible human figures on the sand or swimming in the ocean. The small red object in the water appears to be a buoy. The excluded element is absent. VERDICT: PASS

local/hidream-i1-full-17bFAIL

A cat lying on a couch without a collar

Evaluator (Qwen 3.5 122B)

The image shows a tabby and white cat lying on a beige couch. Crucially, the cat is wearing a collar around its neck, which appears to be black with yellow and green accents. Since the requirement was for the cat to be *without* a collar, and the collar is clearly present, the exclusion criterion has not been met. VERDICT: FAIL

local/hidream-i1-full-17bPASS

A classroom with desks and a chalkboard but no students

Evaluator (Qwen 3.5 122B)

The image displays a classroom setting featuring a large green chalkboard mounted on a white wall and several wooden desks with chairs arranged in rows. The room is completely empty, and there are absolutely no students or people visible anywhere in the frame. The excluded element is successfully absent. VERDICT: PASS

Medium 0%vs67%

local/hidream-i1-full-17bFAIL

A sandy beach with palm trees, seashells, and a beach umbrella but no water or ocean visible

Evaluator (Qwen 3.5 122B)

The image clearly displays a large body of water (the ocean) on the left side, with waves washing onto the sandy shore. This directly contradicts the exclusion criteria "no water or ocean visible." While the other elements like palm trees, seashells, and an umbrella are present, the presence of the ocean means the specific constraint was violated. VERDICT: FAIL

local/hidream-i1-full-17bFAIL

An office workspace with a desk, chair, bookshelf, and lamp but no computers or screens

Evaluator (Qwen 3.5 122B)

The image clearly violates the negative constraint. While it features a desk, chair, shelves, and lamps, it prominently displays a large desktop computer monitor (showing a mountain wallpaper) and a laptop to its left. There is also a keyboard and mouse on the desk. The excluded elements (computers/screens) are definitely present. VERDICT: FAIL

local/hidream-i1-full-17bFAIL

A kitchen with pots, pans, a stove, and a cutting board but no food

Evaluator (Qwen 3.5 122B)

The image depicts a kitchen scene with pots on a stove and a cutting board, but it violates the "no food" constraint. There is a bowl of whole citrus fruits (likely oranges or lemons) to the left of the stove, and several slices of yellow fruit are arranged on the wooden cutting board in the foreground. VERDICT: FAIL

Hard 0%vs33%

local/hidream-i1-full-17bFAIL

A pepperoni pizza with tomato sauce, basil, and crust but absolutely no cheese

Evaluator (Qwen 3.5 122B)

The image shows a pepperoni pizza with tomato sauce, basil leaves, and a crust. However, contrary to the exclusion criteria ("absolutely no cheese"), there is clearly a significant layer of melted white cheese (likely mozzarella) covering the pizza surface, visible between the pepperoni slices and under the basil. The excluded element is present. VERDICT: FAIL

local/hidream-i1-full-17bFAIL

A car body with windows, doors, and headlights but no wheels

Evaluator (Qwen 3.5 122B)

The image displays a white car that clearly possesses four wheels with tires and rims. The prompt requested a car body with "no wheels," but the generated image includes fully formed wheels on all corners, violating the specific exclusion criteria. VERDICT: FAIL

local/hidream-i1-full-17bFAIL

A human face with eyes, nose, and mouth but no eyebrows

Evaluator (Qwen 3.5 122B)

The image clearly shows a human face with distinct, dark eyebrows above both eyes. The prompt explicitly requested "no eyebrows," meaning this element was supposed to be excluded. Since the eyebrows are clearly present in the generated image, the exclusion criteria was not met. VERDICT: FAIL

Relative PositionClick to expand67%vs92%

Easy 67%vs100%

local/hidream-i1-full-17bPASS

A cat sitting on top of a cardboard box

Evaluator (Qwen 3.5 122B)

The image shows a tabby and white cat resting directly on the upper surface of a brown cardboard box. The cat's body is positioned above the box, with its paws hanging slightly over the edge, accurately reflecting the "on top of" spatial relationship described in the prompt. VERDICT: PASS

local/hidream-i1-full-17bPASS

A red ball underneath a wooden chair

Evaluator (Qwen 3.5 122B)

The image shows a wooden chair with a red ball placed on the floor directly beneath its seat, between the front legs. This perfectly matches the specified spatial relationship of the ball being "underneath" the chair. VERDICT: PASS

local/hidream-i1-full-17bFAIL

A coffee mug next to an open book on a desk

Evaluator (Qwen 3.5 122B)

The image shows a coffee mug resting directly **on top of** the pages of an open book in the foreground. While there is another open book in the background where the mug is technically "next to" it, the primary and most obvious spatial relationship is that the mug is **on** the book, not beside it. The prompt specified "next to," which contradicts the visual evidence of the mug sitting on the book's pages. VERDICT: FAIL

Medium 67%vs100%

local/hidream-i1-full-17bFAIL

A bird perched on a branch, the branch extending from a tree next to a park bench

Evaluator (Qwen 3.5 122B)

The image shows a bird perched on the backrest of a park bench, not on a branch. While there is a large tree next to the bench, the bird is clearly standing on the wooden slats of the bench itself. The prompt specifies the bird is "perched on a branch," which contradicts the visual evidence. VERDICT: FAIL

local/hidream-i1-full-17bPASS

A cat sitting on a pillow, the pillow placed on a wooden chair

Evaluator (Qwen 3.5 122B)

The image shows a tabby cat sitting directly on top of a cream-colored pillow. This pillow is resting on the seat of a wooden chair. The spatial hierarchy matches the description perfectly: the cat is on the pillow, and the pillow is on the chair. VERDICT: PASS

local/hidream-i1-full-17bPASS

A candle on top of a stack of books, the books next to a potted plant

Evaluator (Qwen 3.5 122B)

The image shows a lit candle sitting directly on top of a stack of two white books. To the right of the books and candle, there is a potted plant with green leaves. The spatial relationships described in the prompt ("candle on top of a stack of books" and "books next to a potted plant") are accurately represented in the image. VERDICT: PASS

Hard 33%vs67%

local/hidream-i1-full-17bFAIL

Five chess pieces on a board: a king in the center, a queen to its left, a bishop to its right, a rook above the king, and a knight below the king

Evaluator (Qwen 3.5 122B)

The image displays five chess pieces: a Queen on the left, a dark King in the center, two Bishops (one smaller, one larger) to the right, and a Knight on the far right. The prompt specifies a Rook "above" the king and a Knight "below" the king. Neither of these positional relationships exists in the image; there is no Rook visible, and the Knight is positioned to the right, not below. The arrangement does not match the prompt's spatial instructions. VERDICT: FAIL

local/hidream-i1-full-17bPASS

A desk scene: laptop in the center, coffee mug to the left, phone to the right, lamp above, and notebook below

Evaluator (Qwen 3.5 122B)

The image displays a laptop centered on a wooden desk. To the left of the laptop is a white coffee mug. To the right is a smartphone. A black desk lamp is positioned on the right side, with its head extending above the workspace. Below the laptop (in the foreground), there are two notebooks (one black, one tan) with pens on them. All described spatial relationships (center, left, right, above, below) are accurately represented in the image. VERDICT: PASS

local/hidream-i1-full-17bFAIL

Five animals in a field: a horse on the far left, a cow to the right of the horse, a sheep in the center, a goat to the right of the sheep, and a dog on the far right

Evaluator (Qwen 3.5 122B)

The image shows five animals in a field. From left to right: a horse, a cow, a small lamb (not a goat), a larger sheep, and a dog. The prompt specified “a goat to the right of the sheep,” but the animal there is clearly a sheep — not a goat. Also, the “sheep in the center” is actually a lamb, while the larger sheep is to its right. Spatial positions (left/right) are mostly correct, but species misidentification violates the prompt’s description. Core criteria not fully met. VERDICT: FAIL

Extreme 100%vs100%

local/hidream-i1-full-17bPASS

A landscape with a red barn in the foreground at large scale, a windmill in the midground at medium scale, and snow-capped mountains in the background at small scale, all with consistent depth perspective

Evaluator (Qwen 3.5 122B)

The bottom image shows a red barn in the foreground on the right, a windmill in the midground on the left, and snow-capped mountains in the background. The spatial arrangement matches the prompt: barn (foreground, large), windmill (midground, medium), mountains (background, small). Depth perspective is consistent. All positional relationships are correctly represented. VERDICT: PASS

local/hidream-i1-full-17bPASS

A beach scene with a sandcastle in the foreground, a surfer on the waves in the midground, and a cruise ship on the horizon in the background, sizes decreasing with distance

Evaluator (Qwen 3.5 122B)

The image perfectly matches the described spatial arrangement. The sandcastle is clearly positioned in the immediate foreground on the sand. The surfer is located in the midground, riding a wave further back. The cruise ship is situated in the background on the horizon line. The perspective creates a sense of depth where the foreground object (sandcastle) appears largest, followed by the midground surfer, and the distant ship, adhering to the "sizes decreasing with distance" principle relative

local/hidream-i1-full-17bPASS

A city park with a park bench and pigeons in the foreground, a fountain in the midground, and skyscrapers in the background, each layer progressively smaller

Evaluator (Qwen 3.5 122B)

The image perfectly matches the described spatial arrangement. In the immediate foreground, a park bench sits on a path with several pigeons perched on it and walking nearby. Behind this, in the midground, is a large, tiered fountain surrounded by a pool and grass. In the background, a dense cluster of skyscrapers rises up, clearly positioned behind the park elements. The depth layers are distinct and correctly ordered from front to back. VERDICT: PASS

Scale & ProportionsClick to expand33%vs78%

Easy 33%vs33%

local/hidream-i1-full-17bPASS

A person standing next to a yellow school bus, showing realistic relative scale

Evaluator (Qwen 3.5 122B)

The image depicts a young girl standing next to a large yellow school bus. The relative scale appears realistic: the girl is significantly shorter than the height of the bus, consistent with a child standing next to a full-sized vehicle. The size of the wheel relative to the girl and the bus body is also proportionate. Despite the graphical overlay (arrow and text), the physical objects maintain correct relative proportions. VERDICT: PASS

local/hidream-i1-full-17bFAIL

A ladybug sitting on a large green leaf, shown at realistic proportions

Evaluator (Qwen 3.5 122B)

The image shows a ladybug that is disproportionately large relative to the leaf. In a realistic scenario, a ladybug would appear much smaller on a "large green leaf," typically occupying a tiny fraction of the surface area. Here, the insect covers a significant portion of the leaf, making the scale unrealistic. VERDICT: FAIL

local/hidream-i1-full-17bFAIL

A child holding a basketball, both at correct relative sizes

Evaluator (Qwen 3.5 122B)

The image shows a young child holding a basketball. However, the relative sizes are inconsistent. The basketball appears disproportionately large compared to the child's body. It is wider than the child's shoulders and significantly larger than their head (roughly 1.5 to 2 times the head width), whereas a standard basketball should be only slightly wider than a child's head and narrower than their shoulders. The hands also appear too small to grip such a large ball naturally. VERDICT: FAIL

Medium 67%vs100%

local/hidream-i1-full-17bFAIL

A tiny person standing on the rim of a coffee cup, looking down into it

Evaluator (Qwen 3.5 122B)

The image shows a tiny person standing on the wooden table surface next to the saucer, touching it. The prompt explicitly describes the person "standing on the rim of a coffee cup." Since the person is on the table and not on the rim, the spatial relationship and positioning described in the prompt are not met. VERDICT: FAIL

local/hidream-i1-full-17bPASS

A giant tabby cat walking between city skyscrapers like a kaiju

Evaluator (Qwen 3.5 122B)

The image depicts a massive tabby cat striding down a city street, towering over the cars and pedestrians below. The scale is consistent with the "kaiju" description, as the cat's head reaches significant height relative to the surrounding skyscrapers, and its paws dwarf the vehicles. The perspective and relative sizes of the cat versus the urban environment are logically consistent with the prompt. VERDICT: PASS

local/hidream-i1-full-17bPASS

A miniature house sitting on the palm of someone's hand

Evaluator (Qwen 3.5 122B)

The image clearly depicts a small, detailed model house resting in the center of an open human palm. The scale is consistent with the description of a "miniature house," as it fits comfortably within the hand without looking disproportionately large or tiny. The perspective and depth cues (fingers behind the house) reinforce the correct size relationship between the two objects. VERDICT: PASS

Hard 0%vs100%

local/hidream-i1-full-17bFAIL

A group photo of an elephant, a horse, a dog, a cat, and a mouse, all standing in a line with correct real-world proportional sizes

Evaluator (Qwen 3.5 122B)

The image fails to meet the prompt's requirements. The prompt requests a mouse, but the image contains two cats (one tabby, one orange/white) instead of a cat and a mouse. Consequently, the size relationship for the mouse is incorrect; the smallest animal present is a cat, which is significantly larger than a real mouse would be relative to the dog. Additionally, the prompt specifies "a cat" (singular), but there are two. VERDICT: FAIL

local/hidream-i1-full-17bFAIL

A dinner table scene with a plate, fork, knife, wine glass, salt shaker, and peppercorn, all at realistic relative sizes

Evaluator (Qwen 3.5 122B)

The relative sizes are not consistent with reality. The two small, spiky objects on the plate are labeled as "peppercorn" in the prompt context, but they are far too large (several centimeters in diameter) to be actual peppercorns, which are tiny (approx. 5mm). Additionally, the wine bottle appears somewhat undersized relative to the wine glass and plate, and the salt shaker looks disproportionately large. VERDICT: FAIL

local/hidream-i1-full-17bFAIL

A farm scene with a barn, tractor, farmer, cow, chicken, and a fence post, all at correct proportional scale

Evaluator (Qwen 3.5 122B)

The image fails to maintain correct proportional scale. The most significant error is the size of the chickens; they are depicted as being nearly half the height of the standing farmer, whereas they should be much smaller. Additionally, the large wooden cross-post in the foreground is taller than the farmer, while the white fence rails nearby are tiny, showing inconsistent scaling. The bench is also disproportionately small compared to the tractor. VERDICT: FAIL

Human realism 71%vs91%

Faces & ExpressionsClick to expand92%vs100%

Easy 100%vs100%

local/hidream-i1-full-17bPASS

A close-up portrait of a young woman with a warm natural smile

Evaluator (Qwen 3 VL)

Face is largely symmetric with natural features. Smile appears warm and genuine, clearly conveying happiness. Slight AI artifacts: overly perfect teeth and unnaturally smooth skin. No major distortions affecting emotional expression. Overall, meets the prompt’s intent. VERDICT: PASS

local/hidream-i1-full-17bPASS

A portrait of a middle-aged man with a calm, serious expression

Evaluator (Qwen 3 VL)

Face is symmetric with natural, correctly proportioned features. Expression conveys calm seriousness with subtle brow lines and neutral mouth. No obvious distortions; skin texture and hair look authentic. Lighting and detail are consistent with a professional portrait. VERDICT: PASS

local/hidream-i1-full-17bPASS

A headshot of a teenager with a relaxed, neutral face

Evaluator (Qwen 3 VL)

Face is symmetric with natural features. Eyes, nose, and mouth are proportionate and centered. Expression is relaxed and neutral, with subtle lips. No obvious distortions. Skin texture and lighting are realistic. Matches prompt for a teenager with a neutral, relaxed face. VERDICT: PASS

Medium 100%vs100%

local/hidream-i1-full-17bPASS

A close-up of a person's face showing clear disgust, nose wrinkled and lip curled

Evaluator (Qwen 3 VL)

The face shows clear disgust: nose wrinkled, lips curled, and furrowed brows. Features are mostly symmetric and anatomically plausible. Slight asymmetry in brow and lip tension is present but doesn’t obscure the emotion. The expression is unmistakably disgust. No major distortions. VERDICT: PASS

local/hidream-i1-full-17bPASS

A portrait capturing genuine surprise — raised eyebrows, wide eyes, slightly open mouth

Evaluator (Qwen 3 VL)

Face is symmetric with correctly rendered features. The expression clearly conveys surprise: wide eyes, raised eyebrows, and an open mouth. Hands frame the face naturally. No distortions or anatomical errors. The emotion is vivid and readable. VERDICT: PASS

local/hidream-i1-full-17bPASS

A face showing quiet melancholy — downcast eyes, slightly drawn mouth, distant gaze

Evaluator (Qwen 3 VL)

Face is symmetric with correct anatomical features. The expression conveys quiet melancholy: downcast eyes, slightly drawn mouth, and distant gaze. No significant distortions. Emotion is clearly readable and aligns with the prompt. VERDICT: PASS

Hard 100%vs100%

local/hidream-i1-full-17bPASS

An elderly woman in her 80s with deep wrinkles and white hair, smiling warmly with kind eyes

Evaluator (Qwen 3 VL)

Face is symmetric with natural, aged features and deep wrinkles. White hair is voluminous and realistic. Warm, genuine smile with crinkled eyes clearly conveys kindness. No noticeable distortions. Lighting and texture are photorealistic. VERDICT: PASS

local/hidream-i1-full-17bPASS

A 5-year-old child with round cheeks and large eyes, crying with tears streaming down their face

Evaluator (Qwen 3 VL)

The child’s face is symmetric with well-defined features: round cheeks, large expressive eyes, and tears streaming down. The crying emotion is clearly conveyed through furrowed brows, downturned mouth, and visible tears. No significant distortions are present. Features align with the prompt’s description. VERDICT: PASS

local/hidream-i1-full-17bPASS

An elderly man in his 70s with age spots and thinning gray hair, looking skeptically over reading glasses

Evaluator (Qwen 3 VL)

Face is largely symmetric with natural wrinkles and age spots. Thinning gray hair and reading glasses are present. Skeptical expression is clear via furrowed brow and downturned mouth. Slight asymmetry in eye shape and glasses alignment, but overall features are coherent and emotionally readable. VERDICT: PASS

Extreme 67%vs100%

local/hidream-i1-full-17bPASS

A group of three friends at a table: one laughing with head tilted back, one looking shocked with wide eyes, and one rolling their eyes with a smirk

Evaluator (Qwen 3 VL)

Faces are symmetric with correct anatomical features. Emotions are clearly readable: left person laughs with head tilted, center looks shocked with wide eyes, right person rolls eyes with smirk. No significant distortions. All three expressions match the prompt accurately. VERDICT: PASS

local/hidream-i1-full-17bPASS

A family photo with four people: a father beaming with pride, a mother smiling gently, a teenager looking bored, and a child grinning excitedly

Evaluator (Qwen 3 VL)

Faces are symmetric with correct anatomical features. Emotions are clearly readable: all smile warmly, matching "beaming," "gently smiling," and "grinning" tones. No distortions or unnatural artifacts. The prompt’s emotional intent is accurately conveyed across all four individuals. VERDICT: PASS

local/hidream-i1-full-17bFAIL

Three coworkers in a meeting: one looking frustrated with furrowed brows, one nodding with a pleased expression, and one looking confused with a tilted head

Evaluator (Qwen 3 VL)

Faces are symmetric with correct features. The left man shows frustration (furrowed brows). The center woman shows confusion (tilted head, furrowed brow). The right man shows confusion, not pleasure. Emotions are clearly readable. Minor facial distortions exist but don’t obscure expressions. VERDICT: FAIL

Full BodyClick to expand83%vs100%

Easy 100%vs100%

local/hidream-i1-full-17bPASS

A woman standing upright in a relaxed pose, arms at her sides, wearing jeans and a white t-shirt

Evaluator (Qwen 3 VL)

The woman stands upright with arms at sides, one hand in pocket — matching the prompt. Limb count and proportions are anatomically correct. No distortions observed. Clothing (jeans, white t-shirt) is accurately rendered. Pose is recognizable and relaxed. All core criteria are met. VERDICT: PASS

local/hidream-i1-full-17bPASS

A man standing straight facing the camera, hands in pockets, normal proportions

Evaluator (Qwen 3 VL)

The man stands straight, hands in pockets, with normal proportions. Limb count is correct (two arms, two legs). Pose is anatomically plausible and matches the prompt. No visible distortions. Clothing and posture are consistent with the description. All elements align with the requested action and proportions. VERDICT: PASS

local/hidream-i1-full-17bPASS

A person standing in a neutral pose in a park, full body visible from head to feet

Evaluator (Qwen 3 VL)

The person’s full body is visible from head to feet, standing in a neutral pose. Limb count and proportions are anatomically correct. The pose is recognizable as standing still, facing forward. No significant distortions are present. The scene matches the prompt’s description of a park setting. VERDICT: PASS

Medium 100%vs100%

local/hidream-i1-full-17bPASS

A woman mid-stride running through a park, hair and clothes showing motion

Evaluator (Qwen 3 VL)

The woman’s limbs and proportions are anatomically correct. Her running pose is dynamic and recognizable, with motion blur in her hair and clothing. No distortions in limb count or structure. The scene matches the prompt: park setting, mid-stride running, motion effects visible. VERDICT: PASS

local/hidream-i1-full-17bPASS

A man captured mid-jump over a puddle, body in the air with bent knees

Evaluator (Qwen 3 VL)

The man’s limbs are correctly counted and proportioned. His pose—bent knees, arms outstretched mid-air—is anatomically plausible for a jump. The action is clearly recognizable: he’s leaping over a puddle, with water splashing beneath him. No significant distortions are visible. The scene is coherent and realistic. VERDICT: PASS

local/hidream-i1-full-17bPASS

A dancer in a dynamic spin, dress flowing outward with one leg extended

Evaluator (Qwen 3 VL)

The dancer’s limbs are correctly counted and proportioned. The pose is a recognizable dynamic spin with one leg extended and dress flowing outward. No anatomical distortions are visible. The motion is captured naturally, with believable fabric dynamics and hair movement. All elements align with the prompt. VERDICT: PASS

Hard 67%vs100%

local/hidream-i1-full-17bFAIL

A person in a yoga tree pose — standing on one leg with the other foot pressed to the inner thigh, arms raised overhead

Evaluator (Qwen 3 VL)

The pose is not Tree Pose (Vrksasana). The person stands on one foot, but the other leg is lifted high, not pressed to the inner thigh. This is more akin to a variation of Natarajasana or a standing split. Limb count and proportions are correct, but the described action is not recognizable as Tree Pose. Anatomical distortions are minimal but present in the leg positioning. VERDICT: FAIL

local/hidream-i1-full-17bPASS

A rock climber clinging to a cliff face with arms and legs extended to different holds

Evaluator (Qwen 3 VL)

The climber has correct limb count and proportions. Both arms and legs are extended to holds, matching the prompt. The pose is recognizable as dynamic rock climbing. No major anatomical distortions are visible. Musculature and gear placement are plausible. The action is clearly identifiable as climbing. VERDICT: PASS

local/hidream-i1-full-17bPASS

A cellist seated with correct posture, legs around the instrument, bow arm extended

Evaluator (Qwen 3 VL)

The cellist’s posture and limb count are anatomically correct. Her legs are wrapped around the cello, and her bow arm is extended. The instrument’s position and bowing action are accurately depicted. No distortions in proportions or limbs are visible. The pose matches the prompt precisely. VERDICT: PASS

Extreme 67%vs100%

local/hidream-i1-full-17bFAIL

Two tango dancers in a close embrace, the man dipping the woman backward with intertwined legs

Evaluator (Qwen 3 VL)

The woman’s legs appear fused or unnaturally bent at the hip, and her right leg’s foot placement is distorted. The man’s posture is awkward, with one leg unnaturally angled. Limb counts are correct, but proportions and pose are anatomically implausible. The described tango dip is not recognizable due to severe distortions. VERDICT: FAIL

local/hidream-i1-full-17bPASS

Two soccer players mid-tackle, one sliding and the other jumping over with the ball at their feet

Evaluator (Qwen 3 VL)

Both players show correct limb count and proportions. The pose is anatomically plausible: one player is sliding, the other lunging, with the ball at their feet. The action matches the prompt’s description. No significant distortions are visible. Musculature and joint angles appear natural for dynamic movement. VERDICT: PASS

local/hidream-i1-full-17bPASS

Two gymnasts performing a synchronized handstand, side by side with identical body alignment

Evaluator (Qwen 3 VL)

Both gymnasts show correct limb count and proportions. Their handstand poses are anatomically plausible, though slight asymmetry exists—right gymnast’s legs appear slightly more spread. The synchronized action is recognizable. Minor distortions: one gymnast’s head/neck angle looks unnatural, and wrist positioning is slightly off. Overall, the core pose and anatomy are accurate enough. VERDICT: PASS

HandsClick to expand25%vs67%

Easy 67%vs33%

local/hidream-i1-full-17bPASS

A person standing with both hands visible at their sides, fingers relaxed and naturally spread

Evaluator (Qwen 3.5 122B)

The image shows a person with both hands raised, palms facing forward. - **Left Hand (viewer's left):** Has 5 distinct fingers (thumb, index, middle, ring, pinky). The anatomy appears correct with natural joint placement and proportions. - **Right Hand (viewer's right):** Has 5 distinct fingers. The anatomy is also correct. - **Anomalies:** There are no extra or missing fingers. The hands are anatomically sound. Although the prompt mentioned hands "at their sides," the hands are raised, but th

local/hidream-i1-full-17bPASS

A woman holding her hands out in front of her palms-up, all ten fingers clearly visible

Evaluator (Qwen 3.5 122B)

The image shows two hands with palms facing forward. Each hand has five fingers, totaling ten fingers visible. The thumbs are positioned naturally on the sides, and the other four fingers are extended upward. The gesture appears anatomically correct with no visible anomalies in finger count or positioning. VERDICT: PASS

local/hidream-i1-full-17bFAIL

A man with arms crossed, both hands resting on opposite upper arms with fingers visible

Evaluator (Qwen 3.5 122B)

The hands show significant anatomical anomalies. The hand on the left (viewer's left) clearly displays only four fingers resting on the arm; the thumb is missing or obscured. The hand on the right (viewer's right) appears to have a thumb and four fingers, but the fingers look elongated, fused, and lack distinct knuckles, giving them a unnatural, "mushy" appearance. The grip is not anatomically correct. VERDICT: FAIL

Medium 0%vs33%

local/hidream-i1-full-17bFAIL

A person holding a white ceramic coffee mug by the handle with their right hand

Evaluator (Qwen 3.5 122B)

The image shows two hands holding a mug. - **Right hand (holding handle):** This hand (on the left side of the image) appears to have fused fingers. While there seem to be roughly 5 digits (thumb + 4 fingers), they are merged into a single mass with indistinct knuckles. The grip is unnatural; the fingers are wrapped around the handle in a way that suggests the handle is much thicker than it is, or the hand is distorted. - **Left hand (supporting bottom):** This hand (on the right side) has 5 v

local/hidream-i1-full-17bFAIL

A woman writing in a notebook with a pen gripped correctly in her right hand

Evaluator (Qwen 3.5 122B)

The right hand holding the pen shows 5 fingers, but the grip is anatomically incorrect — the pen is held unnaturally between the thumb and index finger with the middle finger supporting from below in a way that doesn’t match natural writing posture. The left hand resting on the notebook also has 5 fingers, but they appear slightly blurred and less defined. No extra or missing fingers, but the grip anomaly violates anatomical correctness. VERDICT: FAIL

local/hidream-i1-full-17bFAIL

A person playing an acoustic guitar with left hand on the fretboard and right hand strumming

Evaluator (Qwen 3.5 122B)

The image fails to meet the prompt's requirements. There is no hand visible on the fretboard (neck); the neck is bare. The only visible hand is at the bottom left near the bridge, appearing to be a thumb and perhaps one finger, lacking the full 5 fingers. The grip is not anatomically correct for strumming, and the hand itself looks malformed or incomplete. VERDICT: FAIL

Hard 33%vs100%

local/hidream-i1-full-17bFAIL

A person making a peace sign with their right hand, clearly showing two fingers up and three curled

Evaluator (Qwen 3.5 122B)

The hand making the peace sign shows two fingers extended (index and middle) and three fingers curled (thumb, ring, pinky). However, the thumb is positioned unnaturally — it appears to be tucked behind the index finger rather than wrapping around the palm as in a natural grip. Additionally, the fingernails on the curled fingers are visible and well-defined, but the thumb’s placement suggests an anatomical inconsistency. The gesture is recognizable but not fully anatomically correct. VERDICT: FA

local/hidream-i1-full-17bPASS

A hand giving a thumbs-up with four fingers curled and thumb extended upward

Evaluator (Qwen 3.5 122B)

The image displays a single hand performing a thumbs-up gesture. The thumb is extended upward, and the remaining four fingers (index, middle, ring, and pinky) are curled into a fist. Counting the digits reveals exactly five fingers in total. The anatomy of the hand, while stylized, is structurally correct with no extra or missing fingers. The grip is consistent with a standard thumbs-up. VERDICT: PASS

local/hidream-i1-full-17bFAIL

A person counting to three on their fingers, with index, middle, and ring fingers extended

Evaluator (Qwen 3.5 122B)

The image displays two hands with only the index fingers extended, pointing upwards. The other fingers are curled into a fist. This gesture represents counting to "one" (or "two" total), not "three" as requested in the prompt ("index, middle, and ring fingers extended"). While there appear to be five fingers per hand, the core gesture described in the prompt is completely missing. The image fails to depict the requested action. VERDICT: FAIL

Extreme 0%vs100%

local/hidream-i1-full-17bFAIL

Two people shaking hands in a professional setting, all fingers of both hands anatomically correct

Evaluator (Qwen 3.5 122B)

The image shows a handshake, but upon close inspection, the anatomy is flawed. 1. **Finger Count:** There appear to be more than 5 fingers total. I see a thumb on top (right hand), a thumb on the side (left hand), and a cluster of at least 4 fingers at the bottom. This suggests polydactyly (extra fingers) or merging of hands. 2. **Anatomy:** The grip is unnatural. The fingers at the bottom seem to belong to one hand but are positioned where the other hand's fingers should be, or they are a f

local/hidream-i1-full-17bFAIL

Two people doing a high-five, palms meeting with all fingers extended on both hands

Evaluator (Qwen 3.5 122B)

The left hand shows only 4 fingers (thumb + 3 fingers visible; pinky is missing or fused). The right hand has 5 fingers, all extended and anatomically plausible. Since the prompt requires exactly 5 fingers per hand and both hands to be correct, the left hand’s anomaly fails the criteria. VERDICT: FAIL

local/hidream-i1-full-17bFAIL

One person handing a small gift box to another person, both sets of hands gripping the box correctly

Evaluator (Qwen 3.5 122B)

The image shows three visible hands interacting with the box. 1. **Left Hand (white sleeve):** This hand appears to have fused fingers. It is difficult to distinguish a clear pinky finger; the fingers look somewhat merged and lack distinct knuckles. It likely has fewer than 5 distinct fingers. 2. **Top Right Hand:** This hand has 5 fingers, but they appear slightly long and the grip is a bit stiff. 3. **Bottom Right Hand:** This hand has 5 fingers, but they are unnaturally long, smooth, and

Multi-SubjectClick to expand100%vs100%

Easy 100%vs100%

local/hidream-i1-full-17bPASS

Two women standing side by side: one with short blonde hair wearing a red jacket, the other with long black hair wearing a blue dress

Evaluator (Qwen 3 VL)

Left woman: short blonde hair, red blazer, black top. Right woman: long black hair, blue dress. Both visually distinct and match prompt descriptions precisely. No missing or mismatched features. VERDICT: PASS

local/hidream-i1-full-17bPASS

Two men at a cafe: one with a beard and glasses wearing a green sweater, the other clean-shaven in a black hoodie

Evaluator (Qwen 3 VL)

Left man: beard, glasses, green sweater. Right man: clean-shaven, black hoodie. Both visually distinct and match descriptions perfectly. Cafe setting confirmed. All prompt elements accurately rendered. VERDICT: PASS

local/hidream-i1-full-17bPASS

Two friends posing: one tall with curly red hair in denim overalls, the other short with straight brown hair in a yellow sundress

Evaluator (Qwen 3 VL)

Tall person: curly red hair, denim overalls over white tee. Short person: straight brown hair, yellow sundress. Both visually distinct and match descriptions. VERDICT: PASS

Medium 100%vs100%

local/hidream-i1-full-17bPASS

Four people at a dinner table: a bald man in a suit, a woman with silver hair in a red blouse, a young man with dreadlocks in a denim jacket, and a woman with a hijab in a green dress

Evaluator (Qwen 3 VL)

All four individuals are present and visually distinct. Bald man in suit; silver-haired woman in red blouse; young man with dreadlocks in denim jacket; woman in hijab with green dress. Features match descriptions precisely. VERDICT: PASS

local/hidream-i1-full-17bPASS

Three musicians on stage: a tall woman with short pink hair playing bass, a man with a beard and bandana on drums, and a slim man with glasses playing keyboard

Evaluator (Qwen 3 VL)

Tall woman: pink short hair, playing bass. Man: beard, bandana, on drums. Slim man: glasses, playing keyboard. All three are visually distinct and match described features. VERDICT: PASS

local/hidream-i1-full-17bPASS

Three hikers on a trail: a man with a backpack and sunhat, a woman with braided hair and hiking poles, and a teenager with a baseball cap and water bottle

Evaluator (Qwen 3 VL)

Man: wears sunhat and backpack. Woman: has braided hair, uses hiking poles, backpack. Teenager: wears baseball cap, holds water bottle, backpack. All three are visually distinct and match described features. VERDICT: PASS

Truthfulness 56%vs82%

PhotorealismClick to expand67%vs100%

Medium 67%vs100%

local/hidream-i1-full-17bPASS

A still life on a wooden table featuring a clear glass of water, a polished silver spoon, a folded linen napkin, and a ripe red apple — each material rendered with accurate texture and light behavior

Evaluator (Qwen 3.6 27B)

The image displays a high-quality still life where each material is rendered with distinct and accurate physical properties. The glassware shows correct transparency, refraction, and sharp specular highlights on the rims. The silver spoon exhibits realistic metallic reflections and sheen. The linen napkin has a visible woven texture with soft, natural folds. The apple features a waxy, glossy skin with appropriate subsurface scattering effects, and the wooden table background displays clear gra

local/hidream-i1-full-17bFAIL

A workspace desk with a brushed aluminum laptop, a leather-bound notebook, a ceramic coffee mug, and a glass paperweight — all materials visually distinct and realistic

Evaluator (Qwen 3.6 27B)

The image successfully renders the wood desk, brushed aluminum laptop, leather notebook, and ceramic mugs with high fidelity and realistic textures. However, the prompt specifically requested a "glass paperweight," which is completely absent from the scene. Instead, there is a computer mouse on a mousepad. Since a key requested object and its corresponding material (glass) are missing, the image fails to meet the prompt's criteria. VERDICT: FAIL

local/hidream-i1-full-17bPASS

A kitchen counter with a stainless steel pot, a wooden cutting board, a glass bottle of olive oil, and a cotton dish towel — each surface reflecting light correctly

Evaluator (Qwen 3.6 27B)

The image successfully renders the requested materials with high fidelity. The stainless steel pot exhibits appropriate metallic sheen and sharp reflections. The wooden cutting board displays natural grain and a matte finish. The glass bottle shows correct translucency and refraction of the yellow oil inside, along with specular highlights. The cotton towel has a soft, textured appearance with realistic folds and diffuse lighting. All surfaces interact with the light source correctly. VERDICT

Physics & ReflectionsClick to expand83%vs83%

Easy 67%vs67%

local/hidream-i1-full-17bPASS

A red sphere on a white surface with a single shadow cast to the right, consistent with a light source from the upper left

Evaluator (Qwen 3.6 27B)

The image displays a red sphere with a bright specular highlight on its upper-left side, indicating a light source from that direction. Consequently, a dark, elongated shadow is cast downwards and to the right, which is physically consistent with the lighting setup. The shadow's shape and position align perfectly with the sphere's placement on the surface. The subtle reddish reflection beneath the sphere also adds to the realism. VERDICT: PASS

local/hidream-i1-full-17bPASS

A coffee mug on a sunny windowsill casting a sharp shadow to the left

Evaluator (Qwen 3.6 27B)

The image shows a coffee mug on a windowsill with sunlight streaming in from the left. The mug casts a shadow to the right, which is physically consistent with the light source direction. There is a reflection of the mug visible in the window glass to the left, which is also physically consistent with the window's position. The reflection is slightly blurred, adding to the realism. The colors and lighting interactions appear natural. VERDICT: PASS

local/hidream-i1-full-17bFAIL

A person standing outdoors at midday with a short shadow directly below them

Evaluator (Qwen 3.6 27B)

The image shows a person standing on pavement with a very long shadow cast to the left side. The prompt specifically requested a "midday" setting with a "short shadow directly below them." The shadow in the image is long and angled, indicating a low sun angle (early morning or late afternoon), which contradicts the prompt's requirement for midday lighting and a short, central shadow. VERDICT: FAIL

Medium 67%vs67%

local/hidream-i1-full-17bFAIL

A woman standing in front of a full-length mirror, her reflection accurately showing the back of her outfit and reversed orientation

Evaluator (Qwen 3.6 27B)

The reflection is physically impossible. The woman is facing the mirror, so the reflection should show her front (face and chest). Instead, the mirror shows her back again, with the same dress straps and posture, merely shifted to the right. The orientation is also wrong; the reflection is facing the same direction as the subject rather than being mirrored. VERDICT: FAIL

local/hidream-i1-full-17bPASS

A cat sitting on a polished marble floor with its reflection visible below

Evaluator (Qwen 3.6 27B)

The reflection is physically plausible. It is positioned directly beneath the cat, inverted vertically as expected on a horizontal surface. The reflection exhibits realistic distortion and blurring, becoming less distinct further down, which accurately simulates the properties of a polished marble floor. The colors and lighting in the reflection align with the subject and the ambient light source. VERDICT: PASS

local/hidream-i1-full-17bPASS

A vase of flowers on a glass table with a clear reflection visible on the table surface

Evaluator (Qwen 3.6 27B)

The reflection of the vase on the glass table is vertically inverted and positioned directly beneath the object, which is physically correct. It shows appropriate clarity and slight distortion consistent with a glass surface. The shadow falls to the left, aligning with the light source coming from the window on the right. Both elements are rendered realistically. VERDICT: PASS

Hard 100%vs100%

local/hidream-i1-full-17bPASS

A row of trees reflected in a calm lake at sunset, with the reflection slightly darker and gently rippled

Evaluator (Qwen 3.6 27B)

The reflections are physically plausible. They are correctly inverted vertically and align perfectly with the row of trees. The water surface shows gentle ripples that distort the reflections slightly, consistent with a calm lake. The reflections are appropriately darker than the actual trees, and the warm sunset light creates a realistic glare on the water's surface to the right. VERDICT: PASS

local/hidream-i1-full-17bPASS

A city skyline reflected in a rain puddle on a street, with mild distortion and color shift

Evaluator (Qwen 3.6 27B)

The reflection in the puddle is physically plausible. It correctly inverts the city skyline vertically, maintaining the perspective and alignment of the buildings and streetlights. The distortion mimics natural water ripples, slightly warping the reflected structures and elongating the light sources (bokeh). The colors in the reflection accurately match the scene above, with appropriate darkening and saturation shifts typical of water reflections. VERDICT: PASS

local/hidream-i1-full-17bPASS

A mountain range reflected in a still alpine lake, with the reflection inverted and subtly muted in color

Evaluator (Qwen 3.6 27B)

The reflection is physically plausible. It is correctly inverted vertically, mirroring the mountain range perfectly. The water surface is depicted as extremely still, resulting in a sharp, mirror-like image with minimal distortion. The colors in the reflection are slightly darker and muted compared to the actual mountains, which accurately simulates light absorption in water. The lighting and shadows on the peaks are consistent with their mirrored counterparts below. VERDICT: PASS

Extreme 100%vs100%

local/hidream-i1-full-17bPASS

A clear glass sphere sitting on a mirror surface near a pool of water, showing refraction through the sphere, reflection on the mirror, and water reflection simultaneously

Evaluator (Qwen 3.6 27B)

The image displays a glass sphere on a reflective surface. The reflection of the sphere is correctly inverted vertically, matching the object above. The refraction within the sphere inverts the background scene (trees and sky), which is physically accurate for a glass sphere. The lighting highlights and starburst effects are consistent with a bright light source (sun) positioned behind the sphere. The colors and distortions align with optical physics. VERDICT: PASS

local/hidream-i1-full-17bPASS

A chrome sculpture in a room with a glass floor and large windows, showing multiple reflections and environmental mapping on every surface

Evaluator (Qwen 3.6 27B)

The reflections on the chrome sculpture accurately distort the surrounding environment, including the window frames and ceiling lights, consistent with a highly polished, curved metallic surface. The floor acts as a mirror, displaying a vertically inverted reflection of the sculpture and the painting on the wall. The lighting direction from the windows and track lights is consistent with the highlights and shadows observed. The environmental mapping is coherent. VERDICT: PASS

local/hidream-i1-full-17bPASS

A crystal wine glass on a polished marble counter next to a window overlooking a lake — showing refraction in the glass, reflection on the marble, and the lake's own reflective surface

Evaluator (Qwen 3.6 27B)

The image displays physically plausible optical effects. The wine glass casts a clear, slightly distorted reflection on the polished marble surface directly beneath it, consistent with the material's sheen. The crystal cuts on the glass show realistic refraction, distorting the background view of the lake and trees in a manner that aligns with the glass's geometry. The lake itself acts as a reflective surface for the distant trees, and the lighting direction from the window creates appropriate

World KnowledgeClick to expand25%vs75%

Easy 33%vs100%

local/hidream-i1-full-17bFAIL

The Eiffel Tower in Paris on a clear day, architecturally accurate with its iron lattice structure and three levels

Evaluator (Qwen 3.6 27B)

The image depicts the Eiffel Tower under a clear sky with warm lighting. While the subject is clearly recognizable and captures the general form, including the iron lattice structure and the distinct levels, it is not strictly architecturally accurate. The railing on the second level appears as a solid black band rather than a lattice structure. Additionally, the intricate ironwork near the top lacks the precise geometric logic of the real tower, appearing somewhat generic and distorted. VERD

local/hidream-i1-full-17bFAIL

The Taj Mahal in Agra seen from the reflecting pool, with accurate dome, minarets, and symmetry

Evaluator (Qwen 3.6 27B)

The image depicts the Taj Mahal, recognizable by its white marble, central dome, and reflecting pool. However, the architectural accuracy is flawed. The real Taj Mahal has four identical minarets at the corners of the main platform. This image displays six tower-like structures: two tall minarets on the far outside and two shorter, thinner towers closer to the main building. This incorrect arrangement and variation in tower height/style violate the prompt's request for "accurate... minarets."

local/hidream-i1-full-17bPASS

The Statue of Liberty on her island pedestal, with correct torch, crown, tablet, and green patina

Evaluator (Qwen 3.6 27B)

The image depicts the Statue of Liberty with high fidelity. The statue features the correct green patina, the raised torch in the right hand, the spiked crown, and the tablet held in the left hand. The pedestal and island base are recognizable, though the fort walls at the very bottom appear slightly simplified compared to the actual hexagonal Fort Wood. The specific elements requested (torch, crown, tablet, patina) are all present and accurate. VERDICT: PASS

Medium 33%vs67%

local/hidream-i1-full-17bFAIL

A traditional Japanese shamisen instrument with accurate three-stringed design, wooden body, and bachi plectrum

Evaluator (Qwen 3.6 27B)

The image depicts a stringed instrument that resembles a hybrid between a banjo and a shamisen, but it fails the specific criteria. Most notably, it has six strings instead of the required three. The neck is fretted, whereas a traditional shamisen is fretless. Additionally, the requested "bachi plectrum" is missing from the image. The body construction also mimics a banjo rim rather than a traditional shamisen body. VERDICT: FAIL

local/hidream-i1-full-17bFAIL

A medieval astrolabe with accurate rete, tympan, and engraved degree markings on brass

Evaluator (Qwen 3.6 27B)

The image depicts a brass instrument recognizable as an astrolabe, but it fails the accuracy criteria. While the general shape and material (brass) are correct, the specific details are hallucinated. The "engraved degree markings" and text are nonsensical gibberish rather than legible numbers or Latin/Arabic script. The "accurate rete" is missing; instead of a complex web of star pointers, it shows a simplified cross with strange, non-functional curved arms. The tympan markings are also illegi

local/hidream-i1-full-17bPASS

A Japanese kintsugi bowl — cracked ceramic repaired with visible gold lacquer lines

Evaluator (Qwen 3.6 27B)

The image clearly depicts a blue ceramic bowl held in a hand. The surface features a network of cracks filled with gold, accurately representing the Japanese art of kintsugi (golden joinery). The gold lines are distinct and follow the fracture patterns, mimicking the repair process described in the prompt. The lighting and texture of the ceramic and skin appear realistic. VERDICT: PASS

Hard 0%vs33%

local/hidream-i1-full-17bFAIL

A detailed diagram of the water cycle showing evaporation from the ocean, condensation forming clouds, precipitation as rain, and collection in rivers flowing back to the sea

Evaluator (Qwen 3.6 27B)

The image contains significant factual and textual errors. The cloud labeled "Evaporation" is depicted raining down, which contradicts the scientific process of water vapor rising. The text below it reads "rean from ocean," which is nonsensical. Additionally, "Condensation" is misspelled as "Condesnation." While the general theme is recognizable, the specific depiction of the cycle's mechanics is incorrect and confusing. VERDICT: FAIL

local/hidream-i1-full-17bFAIL

A cross-section showing tectonic plates at a convergent boundary with subduction, a trench, volcanic arc, and layered mantle

Evaluator (Qwen 3.6 27B)

The image depicts a stylized geological cross-section featuring a deep ocean trench, colorful stratified earth layers, and a glowing column of magma rising from the depths. A large volcano is visible in the background. However, the prompt specifically requested a "convergent boundary with subduction." The image instead shows magma upwelling from a central fissure in a trench, which is characteristic of a divergent boundary (mid-ocean ridge) or a rift valley, not a subduction zone where a tecto

local/hidream-i1-full-17bFAIL

An anatomical illustration of the human heart showing four chambers, valves, aorta, pulmonary arteries, and veins with correct placement

Evaluator (Qwen 3.6 27B)

The image depicts a heart, but it fails the accuracy criteria significantly. While the general color coding (red for oxygenated, blue for deoxygenated) is roughly correct, the anatomical labels are nonsensical gibberish (e.g., "Aorrrta," "P·lnlary," "Graoty," "Danpan"). An anatomical illustration requires precise terminology to be factually accurate. Furthermore, the internal structures like the four chambers are not clearly distinct or correctly labeled, and the vessel connections appear dist

Extreme 33%vs100%

local/hidream-i1-full-17bFAIL

A Morpho menelaus butterfly with accurate iridescent blue dorsal wings, brown ventral wing pattern with eyespots, and correct body morphology

Evaluator (Qwen 3.6 27B)

The image depicts a butterfly with iridescent blue forewings and brown hindwings featuring eyespots. This is factually inaccurate for a *Morpho menelaus*. In reality, the dorsal (top) side of *both* the forewings and hindwings is brilliant blue. The brown pattern with eyespots is located on the ventral (underside) of the wings. The image incorrectly combines the dorsal forewing appearance with the ventral hindwing appearance. Additionally, the body segmentation appears somewhat unnatural and t

local/hidream-i1-full-17bPASS

The Hagia Sophia interior showing the massive central dome, pendentives, semi-domes, and Islamic calligraphy medallions alongside Byzantine mosaics

Evaluator (Qwen 3.6 27B)

The image depicts a recognizable interior of the Hagia Sophia. It features the massive central dome with a ring of windows, the supporting semi-domes, and the characteristic large circular calligraphy medallions on the piers. While the calligraphy itself is illegible "AI script" rather than authentic Arabic, and the dome's internal ribs are slightly more pronounced/decorative than the historical reality, the overall architectural layout, lighting, and inclusion of the requested elements (dome,

local/hidream-i1-full-17bFAIL

A specimen of bismuth crystal showing accurate iridescent oxide layers, geometric hopper crystal formation, and characteristic staircase structure

Evaluator (Qwen 3.6 27B)

The image captures the characteristic iridescent oxide layers (blues, purples, golds) and the "staircase" stepped growth pattern seen on the smaller, blocky crystals, which are correctly cubic. However, the dominant central crystal is depicted as a hexagonal prism with a pyramidal termination. Bismuth is a cubic mineral and forms hopper crystals that are skeletal cubes; it does not form hexagonal prisms. This significant crystallographic inaccuracy regarding the primary subject's geometry make

Professional Studio 67%vs78%

Camera & LightingClick to expand75%vs75%

Easy 100%vs67%

local/hidream-i1-full-17bPASS

A wide-angle shot of a modern living room interior showing expansive space with slight barrel distortion at the edges

Evaluator (Gemma 4 26B)

The image features a wide-angle perspective with significant barrel distortion, as requested. The circular framing (fisheye effect) creates a strong sense of expansive space. The lighting is bright and natural, coming from the large floor-to-ceiling windows, which illuminates the modern living room interior effectively. The depth of field is deep, keeping the entire room in focus, which matches the requirements for an architectural wide-angle shot. VERDICT: PASS

local/hidream-i1-full-17bPASS

A close-up macro shot of a single dewdrop on a rose petal, shallow depth of field

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. It features a close-up macro shot of a single, prominent dewdrop resting on a petal. The depth of field is very shallow, with the background and foreground petals being heavily blurred, which is characteristic of macro photography. The lighting is soft and natural, highlighting the clarity and refraction within the central dewdrop. VERDICT: PASS

local/hidream-i1-full-17bPASS

An aerial bird's-eye view looking straight down at a traffic roundabout with cars and trees

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. It features a true aerial bird's-eye view, looking straight down at a traffic roundabout. The perspective is flat and top-down, with a deep depth of field that keeps the entire scene—including the cars, road markings, and surrounding trees—in sharp focus. The lighting is consistent and even, typical of a high-angle daylight shot. VERDICT: PASS

Medium 67%vs67%

local/hidream-i1-full-17bFAIL

A portrait lit with classic Rembrandt lighting — triangle of light on the shadowed cheek, moody atmosphere

Evaluator (Gemma 4 26B)

The image fails to meet the specific lighting requirements. While it has a "moody atmosphere," it does not feature Rembrandt lighting. Rembrandt lighting is characterized by a specific light triangle on the shadowed cheek of the subject. In this image, the light is coming from a large, bright source (likely a window) that creates harsh, broad stripes of light on the background, but the subject's face is largely in shadow without the characteristic light triangle. VERDICT: FAIL

local/hidream-i1-full-17bPASS

A landscape photograph bathed in golden hour light — long warm shadows, orange sky, soft glow

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. The lighting is a classic golden hour scene, featuring a low-hanging sun that creates a soft, warm glow across the landscape. The sky is filled with orange and golden hues, and the light creates long, soft shadows across the meadow. The perspective is a wide landscape shot with a shallow depth of field, keeping the foreground grass slightly soft while focusing on the sun's glow. VERDICT: PASS

local/hidream-i1-full-17bPASS

A high-key portrait of a person against a bright white background, minimal shadows, airy and luminous

Evaluator (Gemma 4 26B)

The image features a high-key portrait with a bright, white background and an airy, luminous quality. The lighting is soft and bright, though there is a distinct, sharp-edged light patch on the wall behind the subject. While this patch introduces a more defined shadow/light shape than "minimal shadows" might strictly imply, the overall aesthetic remains high-key, bright, and luminous as requested. VERDICT: PASS

Hard 100%vs67%

local/hidream-i1-full-17bPASS

A portrait shot at 85mm f/1.4 with creamy bokeh background, backlit with warm rim light separating the subject from a blurred garden

Evaluator (Gemma 4 26B)

The image perfectly matches the prompt. The portrait features a shallow depth of field consistent with an 85mm f/1.4 lens, creating a creamy, soft-focus bokeh background. The lighting is a clear match: the subject is backlit by a warm light source, creating a distinct golden rim light along her hair and shoulders, effectively separating her from the blurred garden background. VERDICT: PASS

local/hidream-i1-full-17bPASS

A street photograph with shallow depth of field as if shot at 50mm f/1.8, subject sharp with soft city lights in the background

Evaluator (Gemma 4 26B)

The image successfully matches the prompt's requirements. The subject is in sharp focus, while the background exhibits a shallow depth of field characteristic of a 50mm f/1.8 lens. The background features soft, out-of-focus city lights (bokeh) that create a pleasing aesthetic. The perspective and lighting patterns are consistent with a street photograph taken during twilight or evening. VERDICT: PASS

local/hidream-i1-full-17bPASS

A food photograph shot from above at 35mm f/2.8 with natural window light from the left, slight vignette, warm color temperature

Evaluator (Gemma 4 26B)

The image matches the prompt's specifications. The perspective is a top-down shot (flat lay). The lighting is clearly coming from the left side (the window area), creating soft shadows on the right. The depth of field is shallow, with the edges of the bowl slightly out of focus, consistent with f/2.8. The color temperature is warm, and a subtle vignette is present around the edges of the bowl. VERDICT: PASS

Extreme 33%vs100%

local/hidream-i1-full-17bPASS

A tilt-shift photograph of a city street making real-life look like a miniature model — selective focus strip, saturated colors

Evaluator (Gemma 4 26B)

The image successfully implements the tilt-shift effect. It features a distinct selective focus strip (a sharp band in the middle ground) with heavy blurring in the foreground and background, which is characteristic of the miniature model look. The colors are highly saturated, and the high-angle perspective enhances the toy-like appearance. The lighting is bright and even, consistent with the described aesthetic. VERDICT: PASS

local/hidream-i1-full-17bFAIL

A cinematic frame with blue and orange anamorphic lens flare streaking horizontally across a dark moody scene

Evaluator (Gemma 4 26B)

The image features a prominent orange square light structure and a central light source with blue and orange streaks. However, the streaks are not "anamorphic lens flares streaking horizontally across a dark moody scene." Instead, the light forms a geometric square shape with radial light rays. The light pattern is a stylized graphic effect rather than a cinematic lens flare, and the composition lacks the specific horizontal anamorphic characteristics requested. VERDICT: FAIL

local/hidream-i1-full-17bFAIL

A portrait with dramatic split lighting — half the face brightly lit, the other half in deep shadow, high contrast chiaroscuro

Evaluator (Gemma 4 26B)

The image features a portrait with high-contrast lighting, but it does not match the "split lighting" description. Instead of one side of the face being in deep shadow, the lighting is more centered and uneven, with light patterns (likely from shadows or blinds) cast across the forehead and nose. The face is not divided into a bright half and a dark half; both sides of the face remain relatively illuminated. VERDICT: FAIL

Color PrecisionClick to expand67%vs83%

Easy 100%vs100%

local/hidream-i1-full-17bPASS

A bright red sports car parked on a gray asphalt road — the red is vivid and unambiguous

Evaluator (Gemma 4 26B)

The image features a bright, vivid red sports car parked on a gray asphalt road. The red color of the car is intense, saturated, and unambiguous, perfectly matching the prompt's description. The road surface is a consistent gray color. Both the car's color and the road's color align exactly with the specified requirements. VERDICT: PASS

local/hidream-i1-full-17bPASS

A woman wearing a royal blue evening dress against a neutral background

Evaluator (Gemma 4 26B)

The image shows a woman wearing a vibrant, deep blue evening dress. The color is a true royal blue, matching the prompt's specification. The background is a solid, neutral beige/tan color, which also aligns with the prompt's requirement for a neutral background. All color constraints are accurately met. VERDICT: PASS

local/hidream-i1-full-17bPASS

A field of pure sunflower yellow flowers under a clear blue sky

Evaluator (Gemma 4 26B)

The image features a field of bright, sunflower yellow flowers and a clear, vibrant blue sky. The colors perfectly match the prompt's description of "pure sunflower yellow flowers" and a "clear blue sky." There are no deviations in color or hue from the requested palette. VERDICT: PASS

Medium 67%vs33%

local/hidream-i1-full-17bPASS

A bedroom wall painted in #FF6B35 (burnt orange) with white furniture and natural daylight

Evaluator (Gemma 4 26B)

The wall color is a vibrant, saturated orange that closely aligns with the description of "burnt orange" and the hex code #FF6B35. The furniture (dresser, chest, and nightstand) is clearly white, and the scene is illuminated by natural daylight as requested. All color requirements from the prompt are accurately represented in the image. VERDICT: PASS

local/hidream-i1-full-17bFAIL

A forest scene where the foliage is precisely #2D5F2D (dark muted green) with brown tree trunks

Evaluator (Gemma 4 26B)

The image features foliage in various shades of green, ranging from bright, yellowish-green in the sunlit areas to a much lighter, pale green in the background. The prompt specifically requested a "dark muted green" with the hex code #2D5F2D. The actual colors in the image are much brighter, more vibrant, and lighter than the specified dark, muted green. VERDICT: FAIL

local/hidream-i1-full-17bPASS

A room with walls in #4A90D9 (medium blue) with warm wooden floor and white trim

Evaluator (Gemma 4 26B)

The image features walls in a medium blue shade that closely aligns with the requested #4A90D9. The floor is a warm wood tone, and the baseboard/trim is white, matching all color specifications provided in the prompt. VERDICT: PASS

Hard 100%vs100%

local/hidream-i1-full-17bPASS

A minimalist room with walls in #F5E6CC (warm cream), a sofa in #2C3E50 (dark navy), and accent pillows in #E74C3C (bright red)

Evaluator (Gemma 4 26B)

The image accurately reflects the specified colors: - The walls are a warm cream color (#F5E6CC). - The sofa is a dark navy blue (#2C3E50). - The accent pillows are a bright red (#E74C3C). All three color requirements from the prompt are correctly represented in the generated image. VERDICT: PASS

local/hidream-i1-full-17bPASS

A product flat-lay on a #1A1A2E (deep dark blue) background, with a #F0C27F (soft gold) watch and #FFFFFF (pure white) earbuds

Evaluator (Gemma 4 26B)

The image matches the color requirements. The background is a deep dark blue consistent with #1A1A2E. The watch features a soft gold color (the strap is a pale gold/yellow and the metal accents are gold) consistent with #F0C27F. The earbuds and their charging case are pure white, consistent with #FFFFFF. VERDICT: PASS

local/hidream-i1-full-17bPASS

A graphic design composition with background #FAFAFA (near-white), main shape #FF4500 (orange-red), and accent lines #333333 (dark charcoal)

Evaluator (Gemma 4 26B)

The image accurately reflects the requested color palette. The background is a near-white/off-white color consistent with #FAFAFA. The main geometric shapes use a vibrant orange-red color consistent with #FF4500. The dark, overlapping shape/accent is a very dark charcoal/near-black, consistent with #333333. All colors match the specified hex values and descriptions. VERDICT: PASS

Extreme 0%vs100%

local/hidream-i1-full-17bFAIL

A smooth gradient on a wall going from #FF0000 (red) on the left to #0000FF (blue) on the right, transitioning evenly through purple

Evaluator (Gemma 4 26B)

The image fails to follow the prompt's color direction and specific hex values. The prompt requires a gradient from #FF0000 (red) on the left to #0000FF (blue) on the right. However, the image shows a gradient from a light cyan/blue on the left to an orange/red on the right. The colors are essentially reversed and do not match the specified hex values or the left-to-right color order. VERDICT: FAIL

local/hidream-i1-full-17bFAIL

A sunset sky gradient on a flat surface from #FF6B35 (orange) at the bottom to #1B1464 (deep indigo) at the top

Evaluator (Gemma 4 26B)

The image shows a gradient from a bright orange/yellow at the bottom to a bright blue at the top. The prompt specifically requested a top color of #1B1464 (deep indigo/dark navy). Instead, the top of the image is a bright, saturated blue, which is significantly lighter and more vibrant than the requested deep indigo. Therefore, the color at the top does not match the specified hex value or description. VERDICT: FAIL

local/hidream-i1-full-17bFAIL

A horizontal banner with a gradient from #00C9FF (cyan) on the left to #92FE9D (mint green) on the right

Evaluator (Gemma 4 26B)

The image features a horizontal banner in the center. However, the banner is a solid, flat light green color. It does not contain a gradient, nor does it transition from the specified cyan (#00C9FF) on the left to the specified mint green (#92FE9D) on the right. The color requirement is not met. VERDICT: FAIL

PhotorealismClick to expand33%vs67%

Easy 33%vs67%

local/hidream-i1-full-17bFAIL

A single red sneaker on a clean white background with soft studio lighting and gentle shadow — product photography style

Evaluator (Gemma 4 26B)

The image fails to meet the criteria for a professional product shot. While the lighting is decent, there are significant AI artifacts. The text/logo on the side of the shoe is illegible, nonsensical, and distorted. Additionally, the lace structure is inconsistent, and the interior branding is garbled. The way the shoe interacts with the background lacks the precision of a real studio photograph, showing typical AI-generated structural errors. VERDICT: FAIL

local/hidream-i1-full-17bFAIL

A clear glass perfume bottle on a reflective black surface with studio lighting, no background distractions

Evaluator (Gemma 4 26B)

The image fails to meet the criteria for a real photograph. While the lighting and reflections look high-quality, there are significant AI artifacts. Most notably, the liquid inside the bottle is physically impossible: the straw (dip tube) appears to bend and warp unnaturally through the liquid, and the liquid level does not interact realistically with the glass or the straw. The geometry of the bottle's cap and the way the light refracts through the glass are also inconsistent. VERDICT: FAIL

local/hidream-i1-full-17bPASS

A ripe green apple on a seamless white background with soft side lighting and subtle shadow — catalog style

Evaluator (Gemma 4 26B)

The image successfully meets the prompt requirements. The green apple features realistic textures, including subtle skin imperfections and a natural-looking stem. The lighting is soft and professional, creating a realistic highlight on the left side and a gentle, natural shadow underneath the fruit. The seamless white background is clean, and there are no obvious AI artifacts or distortions. It looks like a high-quality studio product photograph. VERDICT: PASS

Graphical design 42%vs92%

Data VisualisationClick to expand33%vs67%

Easy 33%vs67%

local/hidream-i1-full-17bFAIL

A bar chart showing monthly sales for January through June with values 12K, 18K, 15K, 22K, 19K, 25K — labeled axes, color-coded bars, and a title 'Monthly Sales 2025'

Evaluator (Qwen 3 VL)

The chart shows six bars for Jan–June, but values are mislabeled: Jan=12K (correct), Feb=18K (should be 15K), Mar=15K (should be 22K), Apr=22K (missing), May=19K (correct), Jun=25K (correct). Axis labels are incorrect (e.g., “Mer” for Feb, “App” for Mar, “Sure” for Jun). Title reads “Monthly Sales 2025K” (extra “K”). Color legend mismatches bar values. VERDICT: FAIL

local/hidream-i1-full-17bPASS

A pie chart showing market share: Company A 35%, Company B 25%, Company C 20%, Others 20% — each slice a different color with labels and percentages

Evaluator (Qwen 3 VL)

The image shows a pie chart creatively rendered as a pie. Percentages and labels match the prompt: Company A (35%), Company B (25%), Company C (20%), Others (20%). Colors are distinct, labels and percentages are readable. Chart type is correctly a pie chart. All requested elements are present and accurate. VERDICT: PASS

local/hidream-i1-full-17bFAIL

A simple line graph showing daily website visitors over one week (Mon-Sun): 1200, 1500, 1100, 1800, 2000, 1700, 1400 — with grid lines, axis labels, and title 'Weekly Traffic'

Evaluator (Qwen 3 VL)

The chart shows a line graph titled "Weekly Traffic" with axes and grid lines. However, the data labels are incorrect (e.g., "1UE1" instead of "1100", "14000" instead of "1400"), and the x-axis labels are missing for Tue, Thu, Fri, Sat. The legend contains garbled text. The chart type matches, but critical data and labels are wrong. VERDICT: FAIL

Layout & DesignClick to expand33%vs100%

Easy 33%vs100%

local/hidream-i1-full-17bFAIL

A movie night poster with a large popcorn image in the center, the title 'MOVIE NIGHT' in bold at the top, and the tagline 'Every Friday at 8 PM' at the bottom

Evaluator (Qwen 3 VL)

The poster features a large popcorn bucket centered, with “MOVIE NIGHT” in bold at the top (styled as filmstrip text). The tagline “Every Friday at 8 PM” is present but misspelled (“Evvery”). Text is readable, though the error violates accuracy. Visual hierarchy is correct, but the typo disqualifies it. VERDICT: FAIL

local/hidream-i1-full-17bFAIL

An event poster featuring a guitar illustration, the headline 'LIVE MUSIC FEST' in large text, and 'Tickets available now' as a tagline below

Evaluator (Qwen 3 VL)

The poster includes a guitar illustration and the headline "LIVE MUSIC FEST" with correct visual hierarchy. However, "Tickets available now" is not present — instead, "TICKETS AVAILABLE" appears with garbled text below it. The phrase "REP DDS NOW" is also illegible. Layout elements are partially correct but text is unreadable and misaligned. VERDICT: FAIL

local/hidream-i1-full-17bPASS

A book club poster with an open book illustration, the title 'JOIN THE BOOK CLUB' prominently displayed, and 'Meets every Wednesday' as the tagline

Evaluator (Qwen 3 VL)

The poster features an open book illustration centered below the title "JOIN THE BOOK CLUB" and above the tagline "MEETS EVERY WEDNESDAY". Text is clearly readable, correctly placed, and follows proper visual hierarchy with title largest, then book, then tagline. All required elements are present and correctly formatted. VERDICT: PASS

Medium 33%vs100%

local/hidream-i1-full-17bFAIL

A landing page design with a hero header image at the top, a headline and subtitle in the middle, a 'Sign Up' call-to-action button, and a footer with three social media icons

Evaluator (Qwen 3 VL)

The image shows a landing page with a hero header, headline, and “Sign Up” button, but text is garbled and unreadable. Social media icons are present but mislabeled. The footer contains nonsensical text. Layout zones are structurally present, but critical text elements fail readability and correctness, violating core requirements. VERDICT: FAIL

local/hidream-i1-full-17bFAIL

A newsletter layout with a banner image header, a two-column body section with text on the left and an image on the right, a highlighted quote block, and a footer with unsubscribe link

Evaluator (Qwen 3 VL)

The image shows a newsletter layout with a header, two-column body (text left, image right), a quote block, and footer. However, all text is gibberish or unreadable, violating the requirement for readable, correctly placed text. The visual hierarchy is present but meaningless due to illegible content. VERDICT: FAIL

local/hidream-i1-full-17bPASS

A restaurant menu layout with a logo header, three category sections (appetizers, mains, desserts) each with items and prices, and a footer with address and hours

Evaluator (Qwen 3 VL)

The menu has a logo header, three category sections (Appetizers, Mains, Desserts) with items and prices, and a footer with contact info. Text is readable and correctly placed. However, prices are "$.00" (likely errors), and category labels are inconsistently styled. Core layout and hierarchy are present. VERDICT: PASS

Hard 33%vs100%

local/hidream-i1-full-17bPASS

A magazine double-page spread with a left page containing a large hero photo, and a right page with a two-column article, a pull quote, a sidebar with author bio, and a page number

Evaluator (Qwen 3 VL)

Left page has a large hero photo with headline below. Right page shows two columns of text, a pull quote (“Cahyck...”), and a sidebar with author bio and page number. All layout zones are present. Text is readable and correctly placed per prompt. Visual hierarchy is clear. VERDICT: PASS

local/hidream-i1-full-17bFAIL

A mobile app UI mockup showing a dashboard with a top navigation bar, a search field, a 2x2 grid of feature cards with icons, a list of recent activity items, and a bottom tab bar

Evaluator (Qwen 3 VL)

The image shows a mobile app UI with a dashboard, top navigation bar, search field, 2x2 grid of feature cards, recent activity list, and bottom tab bar. However, all text is gibberish and unreadable, violating the requirement for correct text placement and readability. Layout structure is present but text fails critical criteria. VERDICT: FAIL

local/hidream-i1-full-17bFAIL

A fashion magazine spread with a full-bleed left-page photo, right page with three-column layout, headline, body copy, a smaller inset image, and styled page numbers

Evaluator (Qwen 3 VL)

The layout lacks a full-bleed left-page photo; instead, it has a large photo on the left and a smaller inset on the right. The right page has no three-column layout, only one column with text. Headline and body copy are present but text is unreadable due to gibberish. Page numbers are present but not styled as requested. VERDICT: FAIL

Style DiversityClick to expand50%vs92%

Easy 33%vs67%

local/hidream-i1-full-17bPASS

A golden retriever sitting in a garden, painted in the style of an oil painting with visible brushstrokes and rich texture

Evaluator (Qwen 3 VL)

The image depicts a golden retriever in a garden with visible brushstroke textures and rich color saturation, mimicking oil painting aesthetics. Brushwork is evident in fur, foliage, and background. While not perfectly traditional oil paint texture, the style is clearly recognizable and meets the core request. VERDICT: PASS

local/hidream-i1-full-17bFAIL

A golden retriever sitting in a garden, rendered in Japanese anime style with flat colors and large expressive eyes

Evaluator (Qwen 3 VL)

The image shows a golden retriever in a garden, rendered in a stylized, cartoonish manner. It features large, expressive eyes and simplified forms, aligning with Japanese anime aesthetics. However, the style lacks true flat colors—subtle gradients and shading are present—reducing stylistic fidelity. The core anime traits (large eyes, simplified anatomy) are recognizable but not fully aligned with the prompt’s “flat colors” requirement. VERDICT: FAIL

local/hidream-i1-full-17bFAIL

A golden retriever sitting in a garden, rendered in pixel art style with a 32x32 grid and limited color palette

Evaluator (Qwen 3 VL)

The image shows a golden retriever in a garden, rendered with visible pixelation consistent with a 32x32 grid. It uses a limited, blocky color palette. However, the pixelation is not sharp enough to clearly define individual 32x32 blocks, and the style lacks the crisp, low-resolution aesthetic typical of strict pixel art. The rendering is more stylized than authentic pixel art. VERDICT: FAIL

Medium 67%vs100%

local/hidream-i1-full-17bPASS

A city street scene in the style of Art Deco — geometric shapes, gold and black palette, ornamental symmetry

Evaluator (Qwen 3 VL)

The image clearly embodies Art Deco: bold geometric forms, symmetrical composition, and a dominant gold-black palette. Buildings feature stylized verticals, stepped roofs, and ornamental details. Streetlights and vehicles are simplified, enhancing the graphic style. The overall aesthetic matches the requested Art Deco characteristics precisely. VERDICT: PASS

local/hidream-i1-full-17bFAIL

A woman with a parasol in the style of Impressionism — soft brushstrokes, dappled light, pastel palette

Evaluator (Qwen 3 VL)

The image shows soft, blended brushstrokes and a pastel palette, with dappled light filtering through foliage. However, the subject’s facial features are overly sharp and detailed, lacking the loose, impressionistic rendering characteristic of the style. The overall aesthetic leans toward digital painting or hyperrealism rather than true Impressionism. VERDICT: FAIL

local/hidream-i1-full-17bPASS

A portrait of a man in Pop Art style — bold outlines, Ben-Day dots, flat vivid primary colors

Evaluator (Qwen 3 VL)

The image displays bold outlines, flat vivid primary colors (red, yellow, blue), and Ben-Day dot patterns in hair and beard. Facial features are stylized with sharp color blocks, matching Pop Art conventions. Background uses geometric color fields. All requested style elements are clearly present and recognizable. VERDICT: PASS

Hard 33%vs100%

local/hidream-i1-full-17bFAIL

A forest rendered in photorealistic watercolor — the detail and lighting of a photo combined with visible watercolor bleeds and paper texture

Evaluator (Qwen 3 VL)

The image shows a forest with soft, diffused lighting and atmospheric haze, resembling a photorealistic scene. However, it lacks visible watercolor bleeds or paper texture; instead, it appears digitally rendered with smooth gradients and clean edges. The requested watercolor characteristics are absent. VERDICT: FAIL

local/hidream-i1-full-17bPASS

A woman in a futuristic city rendered in cyberpunk Art Nouveau — neon colors and tech elements combined with flowing organic lines and floral borders

Evaluator (Qwen 3 VL)

The image features neon-lit cyberpunk streets with a woman in futuristic attire. Pink floral borders with organic, flowing Art Nouveau lines frame the scene, blending with glowing tech elements. Neon colors dominate, and the fusion of organic floral motifs with cyberpunk architecture is clearly visible. The style matches the requested fusion. VERDICT: PASS

local/hidream-i1-full-17bFAIL

A medieval castle depicted in Japanese ukiyo-e woodblock print style combined with photorealistic sky and lighting

Evaluator (Qwen 3 VL)

The image depicts a Japanese castle, not a medieval one, so the core subject is incorrect. The ukiyo-e style is evident in the flat colors, bold outlines, and cloud composition. However, the “photorealistic sky and lighting” element is absent—lighting is stylized, not realistic. The requested style is not clearly recognizable due to the mismatched subject and lack of photorealistic elements. VERDICT: FAIL

Extreme 67%vs100%

local/hidream-i1-full-17bPASS

An interior scene rendered in soft brutalism — raw concrete textures combined with rounded pastel furniture and warm diffused light

Evaluator (Qwen 3 VL)

The image features raw concrete walls and floors, matching the soft brutalism request. Rounded pastel pink furniture and warm, diffused lighting are clearly present. The textures are subtle but consistent with the style. All key elements from the prompt are recognizable and well-executed. VERDICT: PASS

local/hidream-i1-full-17bFAIL

A still life painted in anxious minimalism — sparse composition with uncomfortable negative space, muted colors, and subtle visual tension

Evaluator (Qwen 3 VL)

The image shows a still life with sparse composition and strong negative space defined by stark shadows. Muted tones dominate, and subtle visual tension arises from the interplay of light and shadow. However, the vibrant red apples and orange lack the “anxious” mood implied; the scene feels calm, not unsettling. The style is partially present but misses the emotional core. VERDICT: FAIL

local/hidream-i1-full-17bPASS

A cityscape rendered in nostalgic futurism — retro-futuristic 1960s space-age aesthetics with a melancholic warm-toned patina

Evaluator (Qwen 3 VL)

The image clearly exhibits retro-futurism: warm, hazy lighting; sleek, optimistic architecture with domes and spires; and a nostalgic, melancholic patina. Key 1960s space-age elements like the floating sphere and stylized towers are unmistakable. The color palette is warm and atmospheric, enhancing the desired mood. Style is unmistakably present and well-executed. VERDICT: PASS