Almost everyone who tries an image generator for the first time writes the same kind of prompt. Something like "a beautiful landscape" or "a cool character." The model dutifully produces an image. The image is fine. It is not, however, the image that was in your head. That gap — the one between the picture you imagined and the picture you got — is where prompt craft lives.
This is a working guide to closing that gap. It's not a list of magic words. It's a way of thinking about what you're asking the model to do, what information actually helps it, and how to iterate when the first try misses. It's the guide we wish we'd had when we started building tools around these models.
What a prompt actually is
It's tempting to think of a prompt as a search query — like you're looking up the right image in some giant warehouse. That mental model fails almost immediately. The model isn't retrieving anything. It's hallucinating, conditioned on your text. Every word you write is a constraint that narrows the space of plausible images. Words you leave out aren't "unspecified" — they're filled in by the model's defaults, which lean toward whatever was most common in training.
That's why "a beautiful landscape" produces a generic Instagram-ready photo. You haven't constrained anything except the broadest topic. The model fills in everything else from its default — soft golden hour light, mountains, a still lake, maybe some clouds. If that's what you wanted, great. If you wanted something specific, you have to say so.
The mental shift that helps most: stop describing the image and start specifying the shoot. A prompt is closer to a brief than a search. You're not finding a picture; you're directing one.
The five-part anatomy of a strong prompt
Almost every prompt that works carries five kinds of information. Most weak prompts are missing two or three of them. Once you internalize this list, you'll see the gaps in your own prompts immediately.
1. Subject
Who or what is in the image. Be concrete. "A young woman" is weaker than "a young woman with curly auburn hair, freckles across her cheeks, wearing a moss-green wool coat." The model can't render specificity you didn't provide. If you don't care about the details, the model picks for you — and it usually picks the most generic version.
2. Action or state
What the subject is doing or how they exist in the frame. "Standing" is different from "mid-stride." "Sitting" is different from "slouched in an armchair, one leg pulled up." Action gives the image energy and a sense of moment, even when the moment is stillness.
3. Environment
Where the subject is. This is the part beginners skip most often. "In a sunlit kitchen" carries more compositional information than the entire prompt before it. Setting tells the model about lighting, color palette, props, depth, and tone — all from one phrase.
4. Style
How the image should look. Photographic? Illustrated? Cinematic, watercolor, isometric, low-poly? This is where most aesthetic decisions live. "Shot on 35mm film" sets a different visual register than "watercolor on cold-press paper" — even if the subject is identical.
5. Technical or framing detail
Camera lens, angle, lighting, color grading. "Shallow depth of field, 85mm portrait lens, soft window light from the left." These are the details that move an image from "looks like AI" to "looks like a photograph someone took on purpose."

Why your first prompt usually fails
When a prompt produces something disappointing, it's almost never because the model couldn't do it. It's because you didn't ask for it. The model gave you a reasonable answer to a vague question. The fix is rarely "add more adjectives" — it's almost always "name the constraint you assumed was obvious."
If your character keeps coming out blonde when you wanted brunette, you didn't say brunette. If the lighting is always sunny when you wanted overcast, you didn't say overcast. If the framing is always centered when you wanted off-center, you didn't say where to place them. The model can't read your mind. It can read your words.
Specificity beats verbosity
There's a temptation, when a prompt isn't working, to make it longer. Add more adjectives, pile on more references, throw in a dozen quality words like "masterpiece, hyperdetailed, 8k, trending on artstation." This almost never helps. Most quality words are dead weight — they make the prompt feel more serious without telling the model anything new.
What works better is replacing vague words with specific ones. Compare:
- Weak: "a cool sci-fi scene with a person and futuristic elements, very high quality"
- Strong: "a woman in a battered orange spacesuit standing on a rust-red rocky plain, distant industrial silhouettes on the horizon, low evening sun, warm rim light, anamorphic lens flare"
- The second isn't longer because it's padded — it's longer because every word does work.
Negative prompting: what to leave out
Some tools let you specify what you don't want — extra fingers, blurry edges, certain styles. Use this sparingly. The bigger lever is positive specificity. If you tell the model what you do want clearly, you usually don't need to tell it what to avoid. Negative prompts are best for very specific known failure modes (e.g., the model loves to add a watermark; tell it not to).
Iteration is part of the craft
Nobody writes the perfect prompt on the first try. The right mental model is conversation: you describe what you want, the model offers an interpretation, you adjust based on what it gave you. The first run is for discovering what the model assumed. The second run is for correcting those assumptions. By the third or fourth run, you usually have something close.
The trap to avoid is changing too much between iterations. If your first run was almost right but the lighting was off, change only the lighting language. If you rewrite the whole prompt, you've lost the parts that were working. Treat each iteration as a controlled experiment — change one variable, see what happens.
"Prompts aren't search queries. They're shoot briefs. The model is your crew, and it can only render what you direct it to."
A worked example
Let's go from a vague prompt to a strong one in four passes.
Pass 1: "a cozy cafe interior." Result: a generic stock-photo coffee shop, two people blurry in the background, golden hour light. Fine but soulless.
Pass 2: "a small Parisian cafe interior at dawn, empty tables, soft window light." Better — now we have place and time, and the emptiness gives it mood. Still feels like a stock photo.
Pass 3: same as above, plus "shot on 35mm film, slight grain, warm color grade, shallow depth of field, single croissant on a marble counter in foreground." Now we have a frame, a focal subject, a film register. This is where most prompts stop.
Pass 4: same as above, plus "steam rising from an espresso cup just out of frame, the sound of rain implied by water beading on the window." The last details aren't visible literally — but they shape the model's interpretation toward a specific atmosphere. The result feels like a moment, not a photograph of a place.
Four passes. Same model, same tool, same number of credits per run. The difference is entirely in how the prompt was constructed. That's the craft.
Phrasings that consistently work
Some specific phrasings come up again and again in prompts that hit. They're not magic — they work because they carry concrete information the model can act on. Worth keeping in your back pocket:
- Lighting: "soft window light from the left," "backlit golden hour," "overcast diffuse light," "harsh midday sun with strong shadows" — each names both the source and its character.
- Lens and depth: "shallow depth of field, 85mm portrait lens," "wide-angle 24mm with slight distortion," "macro lens, focus on the eyelashes" — gives the model focal-length cues that constrain composition.
- Color register: "warm color grade with teal shadows," "desaturated, muted earth tones," "high-contrast black and white with deep shadows" — names a grading direction instead of saying "colorful" or "moody."
- Material specificity: "hand-knit wool with visible loops," "matte clay surface with fingerprints," "polished obsidian with reflections" — concrete materials produce concrete textures.
- Time and weather: "a cold morning with breath visible," "the last hour before dusk," "pre-thunderstorm green-gray sky" — implies light, color, mood in one phrase.
Notice that none of these are aesthetic adjectives like "beautiful" or "stunning." They're concrete observations of how light behaves, what objects are made of, and what time of day it is. That's the level of specificity that gives the model something to render.
Save your best prompts
Prompts that work are infrastructure. Save them. We add a star button to the gallery for exactly this — once you find a phrasing that consistently produces what you want, treat it like a template you can adapt for similar shots. Most working creators we talk to have a small library of 10–20 base prompts they evolve over months.
Prompt craft isn't about memorizing tricks. It's about learning to see your own assumptions, then naming them. The model meets you halfway when you do.




