# Video to Text

## Summary

Video to text can be performed by connecting a Video Node with a Text Node. Similar to Image to text, it can be used to extract information from any video inputs.

This combination of nodes can be used to take actions like **describe the video**, or **give a numbered list of frames with detailed description of visual content.**

<figure><img src="https://526296967-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FWthH0GpcCHwVdtbvahZS%2Fuploads%2FNHoC1pglZd6Ar0arGMbT%2FScreenRecording2025-02-23at11.38.45AM-ezgif.com-optimize.gif?alt=media&#x26;token=858deba5-7397-403b-81bb-e84b9f9c0df2" alt="" width="563"><figcaption></figcaption></figure>

## Prompt

* **"**&#x44;escribe this video in a few sentenc&#x65;**"**
* "Describe the composition and focal points of each frame, on how elements are arranged and how they guide the viewer's attention."
* "Illustrate the atmosphere of each scene, focusing on sensory details such as lighting, visceral color sensation, and spatial depth."
* "Describe how recurring motifs and visual patterns contribute to thematic development in the video."
* "Give me a list of frame in this video and describe each visual composition."
