Why it’s so hard to get image generation models to create pictures of people with eyes aligned vertically, as explained by ChatGPT

(Last updated 3:15 pm CST February 19th, 2026)

A man with eyes aligned vertically instead of horizontally, as generated by ChatGPT 5.2 after three tries.

Me, after trying three times to get ChatGPT to create a picture of a man with eyes aligned vertically:

That’s correct! Why was that so hard for you to generate in the way I described it?

ChatGPT 5.2:

Short answer: because image generation models are strongly biased toward human facial anatomy, and “removing” expected features is harder than “adding” unusual ones.

Here’s the straight talk.

1. Human-face priors are extremely strong

These models are trained on vast numbers of normal human faces. The learned prior is effectively:

Two eyes
Horizontally aligned
Symmetrically placed

When you asked for vertically aligned eyes, the model complied by adding eyes rather than subtracting the canonical pair, because adding violates the prior less than deleting it. Deletion fights the model’s internal consistency checks.

2. Negative instructions are weaker than positive ones

“Remove the normal eyes” is a negative constraint. Image models handle positive constraints (“add,” “place,” “include”) far more reliably than exclusionary ones (“do not include,” “remove,” “only”).

So the model kept reintroducing:

The default horizontal eye pair
Even after multiple attempts Because that configuration is statistically “correct” for a face.

3. Iterative edits weren’t true edits

Despite the conversational framing, each attempt was effectively a fresh generation, not a pixel-level edit with hard masks. Without an explicit mask saying “this area must not contain eyes,” the model kept reconstructing a plausible face.

This is why it took repetition to finally overpower the prior.

4. Ambiguity in anatomical language

Phrases like:

“eyes in the normal places”
“just above the nose” are clear to humans but still somewhat fuzzy to a model that reasons probabilistically over pixels, not anatomy diagrams. Until the final attempt, it was still “hedging” by keeping extra eyes.

5. Why the last one worked

By the final prompt, you had:

Repeated the constraint
Narrowed the acceptable outcome
Forced a binary condition: only two vertically aligned eyes

At that point, the probability mass finally collapsed onto the configuration you wanted.

Bottom line

This wasn’t user error. It’s a known limitation:

Adding anomalies is easy
Suppressing defaults is hard
Faces are the hardest case

If you wanted this done faster in the future, the most effective phrasing would be something like:

“A man’s face with no horizontally aligned eyes anywhere on the face. Exactly two eyes total, stacked vertically along the facial midline.”

Blunt, redundant, and absolute constraints work best.

You weren’t asking for something unreasonable—the model just had to be bludgeoned into compliance.

——-

The initial prompt: Generate an image of a face of a man, in his mid 40s, with eyes arranged vertically on the face instead of horizontally, as they normally are.

The rejects:

Why it’s so hard to get image generation models to create pictures of people with eyes aligned vertically, as explained by ChatGPT

1. Human-face priors are extremely strong

2. Negative instructions are weaker than positive ones

3. Iterative edits weren’t true edits

4. Ambiguity in anatomical language

5. Why the last one worked

Bottom line

Leave a ReplyCancel reply

Past Blasts

1. Human-face priors are extremely strong

2. Negative instructions are weaker than positive ones

3. Iterative edits weren’t true edits

4. Ambiguity in anatomical language

5. Why the last one worked

Bottom line

Share this:

Leave a ReplyCancel reply