Combine Two Videos in Speaker Mode

Yes, it’s the audio from the videos.

Is there a way to use the JSON to merge the audio in this one request?

Or would we need to do one request with volume=0, another request to merge the audio, and another request to add the audio onto the video?