0:00
/
0:00

How to write prompts that reliably generate the photos and videos you imagine

Quick guide

Great AI photos and videos don’t happen by accident. They come from clear direction. Think of a prompt as a creative brief plus a shot list, written in the tightest, most unambiguous language you can manage. The model is your crew, but it’s very literal. It won’t read your mind—only your words and references.

Start with the right expectations about your tool

Different models parse prompts differently. Midjourney favors short, descriptive tags and style cues. SDXL-based tools let you use negative prompts, weights, and referencecontrols. DALL·E 3 handles natural language well but is less parameter-driven.

Check what your tool supports: negative prompts, aspect ratios, seeds, image references, camera controls, upscalers, ControlNet/IP-Adapter, and for video, prompt scheduling or keyframes.

The anatomy of a strong prompt

A reliable prompt covers the same core questions a photographer or director would ask:

  • Subject: Who or what is the focus? One main subject works best.

  • Context and environment: Where are we? Era, location, weather, time of day.

  • Composition and framing: Shot type (macro, close-up, portrait, wide), angle (eye-level, low-angle), framing (rule of thirds, centered), depth of field.

  • Camera and optics: Lens focal length (24mm for wide, 50mm natural, 85–135mm portrait), aperture (f/1.8 bokeh vs f/8 sharp), shutter effects (long exposure), film stock or digital sensor vibe.

  • Lighting: Key light quality (softbox, hard sunlight), direction (backlit, rim light), color temperature (warm tungsten, cool daylight), practicals (neon, candlelight), time of day (golden hour, blue hour).

  • Color and mood: Palette (muted, high-contrast black-and-white), tone (melancholic, energetic), grading (teal and orange, bleach bypass).

  • Style references: Genre labels (editorial fashion, documentary, film noir), historical era(1900s silver gelatin), materials (oil painting, Polaroid), tasteful nods to movements instead of living artists.Technical constraints: Aspect ratio, resolution target, render quality, and for videos, duration and frame rate.

  • Negative constraints: What to avoid (text, watermark, extra fingers, motion blur, distortion).

A simple photo prompt formula you can reuse

  • Main subject + key detail

  • Location and time

  • Composition and lens

  • Lighting

  • Mood and palette

  • Style or medium

  • Constraints and negatives

Example mini-templates you can adapt

  • Portrait, photoreal: “Elderly beekeeper, weathered face, rural apiary at golden hour, 85mm portrait, shallow depth of field, soft rim light, warm muted tones, documentary realism, no text or watermark.”

  • Editorial fashion: “Tailored linen suit, rooftop at dusk in Tokyo, low-angle medium shot, 35mm lens, mixed neon and tungsten, cinematic contrast, magazine editorial style, clean background, no logos.”

  • Product macro: “Brushed steel wristwatch on wet basalt stone, studio macro, f/8 sharpness, soft gradient key light, crisp reflections controlled, black-on-black high contrast, e-commerce ready, no fingerprints or dust.”

  • Cinematic landscape: “Foggy pine forest at dawn, wide 24mm, low vantage, volumetric light rays, desaturated greens, moody and quiet, film grain, no people, no buildings.”

  • Surreal collage: “Victorian townhouse floating above a calm sea, long exposure water, matte painting style, limited palette of indigo and bone white, high detail, no text.”

Iterate methodically

Start simple: Subject + setting + lighting. Generate. Then add one variable at a time. Lock a seed when you like the general look; explore variations by changing single terms. Compare A/B: Swap only one word (35mm vs 85mm), keep everything else fixed.

Keep a prompt log: Save the exact text, seed, model, sampler, steps, aspect ratio, and notable settings. This is your reproducibility backbone.Diagnose misses: If the subject is wrong, move it to the first words and restate it. If composition fails, explicitly add “centered,” “rule of thirds,” or “headroom.” If style overrides realism, remove heavy style cues.

Common pitfalls and fixes

  • Too many ideas in one prompt: Split into multiple generations or shots; composite later if needed.

  • Style overwhelms subject: Move style terms later in the prompt or reduce their weight.

  • Human anatomy artifacts: Add “hands fully visible, correct fingers” and a strong negative list; use a hand reference pose if possible.

  • Text in frame when you don’t want it: Add “no text, no typography, no watermark, no logo.”

  • Avoid words that look like requests for posters or magazine covers if you want clean backgrounds.

  • Busy backgrounds: Specify “clean backdrop,” “neutral seamless,” or “shallow depth of field background bokeh.”

  • Unrealistic lighting: Name a real lighting setup: “softbox key 45 degrees, negative fill on camera right, hair light.”

  • Ethics and practical caution. Avoid naming living artists; use genres or historical movements. It’s clearer, fairer, and often more effective.

  • Be mindful with photorealistic people. If realism is vital, state it, but consider adding context that prevents misuse. Label synthetic media when sharing.

  • For branded assets, use your own logos or placeholders, or generate clean plates and add brand elements later.

A quick checklist before you hit generate:

  • One clear subject and one clear action

  • Place, time of day, and weather

  • Composition and lens choice

  • Lighting direction and quality

  • Color palette and mood

  • Style or era reference

  • Aspect ratio and duration (for video)

  • Negative constraints

  • Image/style references attached

  • Seed locked for iterating

Finally, think in sequences, not single prompts.

Good luck in creation!

Discussion about this video

User's avatar

Ready for more?