> For the complete documentation index, see [llms.txt](https://docs.flora.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.flora.ai/nodes/audio-node.md).

# Audio Node

The Audio Node brings audio generation and transcription to the FLORA canvas. You can generate speech from text, create sound effects, and transcribe audio to text — all integrated into your node-based workflows.

## Overview

Audio nodes support two directions of generation:

* **Text to Audio** — Generate speech or sound effects from a text prompt
* **Audio to Text** — Transcribe or analyze audio content into text

These modes are determined automatically based on how you connect the audio node to other nodes on your canvas.

***

## Text to Audio

### How It Works

Connect a text prompt or text node output to an audio node to generate speech or sound effects. The audio node will produce an audio file based on your prompt and selected model.

### Voice Selection

Audio nodes include an inline **voice selector** in the floating controls and pre-output state. You can:

* **Browse voices** — Open the voice dropdown to see all available voices for the selected model
* **Preview voices** — Click the play button next to any voice to hear a sample before selecting
* **Switch voices** — Change the voice at any time without leaving the node

The voice selector reads options directly from the model's available parameters, so the voice list updates automatically when you switch models.

### Supported Models

Audio generation is available through providers including **ElevenLabs** for text-to-speech and sound effects. Available voices and capabilities vary by model.

***

## Audio to Text

### How It Works

Connect an audio node's output to a text node to enable audio-to-text transcription. FLORA automatically detects the `Audio to Text` mode and routes the audio through a speech-to-text model.

### Supported Models

Audio-to-text transcription is supported by:

* **ElevenLabs** — Speech-to-Text via the ElevenLabs STT endpoint
* **Gemini** — Audio input is sent as inline data alongside your text prompt, enabling audio analysis and transcription

### Connecting Audio to Text Nodes

1. Add an audio node to your canvas with generated or uploaded audio
2. Drag a connection from the audio node's right output handle to a text node's input
3. The text node automatically switches to `Audio to Text` mode
4. Run the text node to transcribe or analyze the audio

***

## Credit Costs

Audio generation costs vary by model and duration. Before generating, hover over the **generate button** to see a credit cost tooltip showing:

* **Estimated credit cost** for the generation
* **Your available credits**

Credits are charged when the generation starts. If a generation fails, credits are automatically refunded.

***

## Tips

* **Preview voices before committing** — Use the play buttons in the voice selector to find the right voice for your project before spending credits
* **Chain audio into text** — Connect audio outputs to text nodes for transcription, then feed that text into image or video prompts for end-to-end multimedia workflows
* **Check model capabilities** — Different audio models support different voices, languages, and output qualities. Check the model details for specifics.

***

*Last updated: April 2026* The audio node brings voice, sound, and sonic atmosphere into your FLORA canvas. It closes the loop between what you see and what you hear, letting you generate voiceovers, sound effects, and transcriptions alongside the image and video work already happening on your canvas. No jumping out to a separate tool, no booking scratch VO, no silent concepts. Audio nodes turn sound into another first-class node you can pipe into the rest of your workflow.

Here is a quick introduction on how to get started with audio nodes:

{% embed url="<https://youtu.be/sP1yhgYfOnw?si=pGA3quOEqMjRreTN>" %}

## Capabilities

* Text-to-speech. Type a script, choose a voice, and get a voiceover back. Play it on the canvas, export as MP3 or WAV.
* Text-to-SFX. Describe a sound effect in plain language and generate it in place.
* Text-to-Music. Describe a sonic environment in plain language and recieve a matching track.
* Audio-to-text. Transcribe an audio clip into a text node for editing, captioning, or downstream prompting.
* Lipsync. Pair an audio node with a video node to drive a lipsynced performance.
* FAUNA-aware. FAUNA can create and chain audio nodes for you as part of a multi-step workflow.

## Models

Visit our [Audio Models](/models/audio-models.md) section to learn about the video models and capabilities available in the Video Node.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.flora.ai/nodes/audio-node.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
