Yes, this is possible. The JSON below takes 3 voice-overs (created using our text-to-speech API) plus a soundtrack that plays for the duration of the video;
{
"timeline": {
"soundtrack": {
"src": "https://shotstack-assets.s3-ap-southeast-2.amazonaws.com/music/unminus/berlin.mp3",
"volume": 0.35,
"effect": "fadeOut"
},
"tracks": [
{
"clips": [
{
"asset": {
"type": "audio",
"src": "https://shotstack-create-api-v1-assets.s3.amazonaws.com/msgtwx8iw6/01h94-by9x2-fdxjj-7jwvd-4rgktq.mp3"
},
"start": 0,
"length": 3.38
},
{
"asset": {
"type": "audio",
"src": "https://shotstack-create-api-v1-assets.s3.amazonaws.com/msgtwx8iw6/01h94-c59ac-8dsej-84nw7-t8bzxb.mp3"
},
"start": 3.42,
"length": 3.21
},
{
"asset": {
"type": "audio",
"src": "https://shotstack-create-api-v1-assets.s3.amazonaws.com/msgtwx8iw6/01h94-cb8kq-245tj-t0qkr-bhhgzg.mp3"
},
"start": 6.70,
"length": 2.66
}
]
}
]
},
"output": {
"format": "mp3",
"resolution": "hd"
}
}
Clips can be set up in a sequence by using the audio clip type and set the start and length to get them to play at any point on the timeline. The soundtrack can be used to add background music. The output is an mp3 file but could of course be a video mp4 file.
This is the final audio file:
Full details in the API reference: Shotstack v1 API Reference Documentation