Text to Audio
Text-to-Audio generates sound from a text prompt, letting you create voiceovers, sound effects, and music tracks directly on the canvas
You can start an audio workflow from a text node. Write a script, describe a sound, or sketch the music you want, then connect it to an audio node and pick a model. What comes back is playable audio on the canvas. Ready to export, layer under a video, or drive a lipsynced performance in a video node.
Speech
Text-to-speech turns a written script into a voiceover. Type what you want said, pick a voice from the preset library, and generate. Good for narration on explainer videos, scratch VO for concept pitches, or any moment where a voice needs to land on the canvas without a recording session.
You can interact with it in a number of ways:
Type or paste your script into the prompt field
Select a voice from the dropdown, or use a recorded custom voice asset
Use expressive prompt tags (e.g. whisper, shout, happy) on models that support them to shape delivery
Connect the output to a video node with a lipsync model to drive a lipsynced performance
SFX
Text-to-SFX generates individual sound effects from a plain-language description. Describe the sound — "heavy wooden door slamming shut in a stone hallway" — and generate. Useful for ambient texture, hit sounds, foley, and anywhere you'd otherwise be digging through a stock library.
Best practices:
Be specific about the source and environment (material, space, distance)
Generate a few variations with different seeds and pick the strongest
Layer multiple SFX audio nodes for richer atmospheres
Music
Text-to-music generates short music tracks from a text prompt describing genre, mood, instrumentation, or reference. Useful for background beds on social clips, pitch videos, and mood-setting on ad concepts where you don't want to license a track.
You can interact with it in a number of ways:
Describe the track — genre, tempo, instruments, mood, references
Generate variations across seeds to find the right feel
Pair music output with a video node for a scored clip
How to use:
Here are some example workflows using Text to Audio:
Write a script → generate speech → connect to a video node with a lipsync model → export a lipsynced clip
Generate a product-demo voiceover → layer a music bed from text-to-music → export a finished promo
Generate three SFX variations for a scene → pick the strongest → layer onto a generated video
Models
Visit our Audio Models section to learn about the speech, SFX, and music models available in the Audio Node.
Last updated
Was this helpful?