Mochi 1 Prompt Guide: Getting Sharp Motion From Genmo's Model

You ran a quick prompt through Mochi 1, expected the smooth motion everyone talks about, and got something stiff or melty instead. The model’s reputation for high-fidelity movement is earned — but it shows up only when the prompt describes motion clearly. Vague input gives Mochi vague output, and that’s the most common reason a generation disappoints.

Mochi 1 is Genmo’s open-source text-to-video model, a 10-billion-parameter diffusion model released under a permissive Apache 2.0 license. It produces clips up to about 5.4 seconds at 30fps, optimized for 480p, and it’s known for convincing physics — liquids, fur, and hair move with real weight. This Mochi 1 prompt guide covers how to write for it, where its strengths are, and gives you example prompts you can use immediately.

What Makes Mochi 1 Worth Prompting Carefully

A few traits shape how you should write for this model:

Strong prompt adherence. Mochi tends to follow what you actually write. That’s good news — it means specific instructions translate into specific output — but it also means a sloppy prompt produces a sloppy clip.
High-fidelity motion and physics. This is the headline feature. Mochi renders fluid, physically believable movement, including tricky things like splashing liquid and flowing hair. Prompts that describe motion explicitly play directly to this.
Single text encoder. Mochi encodes prompts with one T5-XXL language model rather than stacking several. In practice it responds well to natural, descriptive language — write like you’re briefing a cinematographer, not assembling keywords.
Short clips, 480p. Generations are brief and not high-resolution by default. Plan shots that read well in five seconds at 480p: clear subject, one motion, controlled framing.

The pattern: Mochi rewards motion-forward, naturally written prompts and punishes vagueness.

How to Structure a Mochi 1 Prompt

Because Mochi uses a single language model and responds to natural description, you don’t need rigid labeled sections. A flowing description that still covers the core elements works best:

Subject → Motion → Camera → Lighting → Style

The non-negotiable element is motion. If you describe a scene without describing how things move, you waste Mochi’s best capability. Effective Mochi prompts name specific camera moves like “dolly zoom” or “panning shot,” describe the subject’s emotion or physical action, and set the lighting and style.

A Worked Example

Start with a subject:

A surfer in a black wetsuit paddling out past the break

Add the motion — this is where Mochi earns its reputation:

…a wave rises and curls over them, water spraying off the lip and droplets scattering in the air as they duck under

Add camera, lighting, and style:

Tracking shot following the surfer low to the water, late afternoon golden light, cinematic, realistic textures

Put together:

A surfer in a black wetsuit paddles out past the break as a wave rises and curls over them, water spraying off the lip and droplets scattering in the air while they duck under. Tracking shot following low to the water. Late afternoon golden light, cinematic, realistic textures.

The water motion is what makes this a Mochi prompt rather than a generic one. For more on building this kind of structure across any model, see our complete prompt engineering guide.

Mochi 1 Prompt Examples

1. Liquid and Physics

A glass of red wine sits on a dark wooden table. A hand reaches in and tilts the glass slowly, the wine swirling and clinging to the inside of the bowl, a single drop running down the stem. Slow close-up push-in, shallow depth of field. Warm candlelight from the side, deep shadows, moody cinematic style.

Why it works: Mochi handles liquid physics well, and this prompt gives it a clear physical event — tilt, swirl, cling, drip — to render.

2. Hair and Fabric in Motion

A woman in a flowing red dress stands on a windy cliff, her long dark hair and the fabric of her dress whipping sideways in strong gusts. She turns her head slowly to look out at the sea. Medium wide shot, slow tracking from the side. Overcast diffused light, cool desaturated tones, dramatic and atmospheric.

Why it works: Flowing hair and fabric are exactly the kind of soft-body motion Mochi simulates convincingly.

3. Animal Movement

A golden retriever shakes off water after climbing out of a lake, droplets flying outward in an arc, fur fluffing as it dries. The dog then trots toward the camera across the grass. Tracking shot at the dog’s eye level, slight motion blur. Bright midday sun, vivid natural colors, sharp realistic detail.

Why it works: Fur dynamics and water spray combine two of Mochi’s documented strengths in one shot.

4. Subtle Human Emotion

A man in his forties sits alone at a kitchen table in the early morning, a cup of coffee cooling in front of him. He stares out the window, then slowly closes his eyes and exhales. Static medium close-up, very slow push-in. Soft gray dawn light from the window, muted tones, quiet and contemplative mood.

Why it works: Mochi follows prompts closely, so naming the emotion and the small physical action — closing eyes, exhaling — gives it something specific to perform rather than a generic expression.

5. Dynamic Camera and Action

A motorcyclist leans hard into a curve on a wet mountain road, water spraying off the rear tire, the bike tilting low to the asphalt. Camera tracks alongside at speed, slight handheld shake, low angle. Overcast stormy light, reflections on the wet road, desaturated cinematic color, high contrast.

Tips for Better Mochi 1 Prompts

Always describe motion explicitly. This is the single most important habit. Replace “she dances” with “she spins on one foot, her skirt flaring outward as her arms extend.” Mochi turns described motion into fluid output; it fills in vague verbs with generic results.

Write naturally. Because Mochi uses a single language model, full sentences read better to it than comma-separated keyword soup. Describe the shot the way you’d brief a camera operator.

Plan for five seconds. Clips are short. Don’t pack a multi-step narrative into one generation — pick one moment and one motion, and let it breathe.

Name the camera move. “Dolly zoom,” “slow pan,” “tracking shot,” and “push-in” all translate into specific behavior. A prompt with no camera direction defaults to a flat static shot.

Lean into physics. If your idea involves water, fur, hair, fabric, or weight, foreground it. That’s where Mochi visibly outperforms a generic prompt.

Generate Motion-Forward Prompts Faster

The hard part of prompting Mochi 1 is consistently describing motion in enough detail — it’s easy to slip back into static scene descriptions. LzyPrompt takes your shot idea in plain language and returns a structured, motion-forward prompt with the camera, lighting, and physical action spelled out the way Mochi responds to best. Generate your first prompt free and see how much sharper the motion descriptions come back.

FAQ

What is Mochi 1 best at?

High-fidelity, physically realistic motion. It renders liquids, fur, hair, and fabric with convincing weight and dynamics, and it follows prompts closely. Build shots around movement to get the most out of it.

How long are Mochi 1 video clips?

Mochi 1 generates clips up to about 5.4 seconds at 30fps, optimized for 480p output. Plan single-moment shots that read clearly in that window rather than multi-step sequences.

Should I write Mochi 1 prompts as keywords or full sentences?

Full sentences. Mochi encodes prompts with a single T5-XXL language model and responds best to natural, descriptive language — write it like a shot brief, not a keyword list.

Is Mochi 1 free to use?

The model is open-source under a permissive Apache 2.0 license, so you can run it yourself and iterate without per-generation costs, given suitable hardware. Hosted versions may charge for compute.

How is Mochi 1 different from other open-source video models?

Its standout trait is motion fidelity, especially soft-body physics like hair, fur, and liquids, paired with strong prompt adherence. Other open models may offer longer clips, higher resolution, or built-in audio, but Mochi is a strong choice when believable movement is the priority.