Is it possible to specify length of assets that depends on other assets

Diana · September 10, 2024, 2:49pm

Hello,

I am looking for assistance with a video generation feature that involves dynamic text-to-voice audio and corresponding visual assets. The goal is to ensure that the length of the audio and the display time of visual assets (images or text) are synchronized. Here are the details of my use case:

Dynamic Text-to-Voice Audio:

The audio is generated via a text-to-voice system.
The length of this audio can vary depending on different business scenarios.
The audio can be either a single continuous file or split into multiple parts.

Segmented Audio and Visuals:

The audio consists of several distinct parts.
Each part of the audio needs to have a corresponding visual element (such as background images or text).
These visual elements should match the length of each specific audio segment.

Synchronization Needs:

Is it possible to adjust the display time of text and image assets based on the length of the corresponding audio segments?
If the audio is split into multiple files, how can we ensure the visual assets are properly synchronized with the appropriate audio segments?

Your guidance on how to achieve this synchronization between audio and visual assets in the video generation process would be greatly appreciated.

Thank you!

dazzatron · September 23, 2024, 12:33am

Hi Diana,

Currently this isn’t possible via the template itself, and would require you to programmatically manipulate the JSON payload sent through via the API.

We are experimenting with the concept of an alias, which you can use to reference a value in another clip. This would allow you to reference the length of your text-to-voice asset in an image clip. Is this what you are after?

Diana · September 23, 2024, 1:59pm

Hi Dazzatron,
Thanks for jumping on it!
Yes, you spotted it correctly. If it’s possible to set the lengths of static assets (like images) linked to audio, that would be great!
Also, do you plan to include avatars in the future?

spike_wood · December 15, 2024, 1:26pm

I am also looking for this feature. The concept is a chunk of content where audio defines the length of the clip and the images/video and text align to that length. Multiple chunks of content would be stitched together sequentially to create the video.

I am new to ShotStack; how much of this can be done today with the current capabilities?

dazzatron · December 31, 2024, 2:50am

Alright. We’ll put this on the backlog. It seems like a useful feature.

dazzatron · December 31, 2024, 10:13am

We got something working on our dev environment. We’ll look to deploy soon.